We may have had a Y2K failure in a PLC.

greenspun.com : LUSENET : Electric Utilities and Y2K : One Thread

While operating our spillway gates (these are the gates that allow water to spill through the dam and bypass the turbines) one out the four gates stopped operating, and brought up an impossible alarm condition for the type of operation it was performing.

All four gates were commanded to raise, which three of them did successfully. But #2 gate started to move, then stopped and remained at its position, and after a short time brought up a "control Failed" alarm. This alarm told us nothing that we didn't already know, and although a further command was sent, it had the same result. A technician was called in to fix the problem, and of course he wanted to see for himself just what was happening. A further command was sent to the gate while he was there, and the gate performed perfectly with no sign of any failure. He didn't believe there ever had been a fault, but he did download the last few events from the PLC and see what happened.

To our surprise, the control failed due to an excessive "gate drift" situation. This is a method of detecting if the gate moves when it isn't meant to. On questioning the technician further as to how the software is set up, he commented that the gate position is polled every 600 mS, and if there is any movement in the position, then the PLC checks back to see if there should have been any movement, and that the movement is in the correct direction. I then pressed further as to how the 600 mS timer works, and discovered that it gets its time from an RTC in the PLC. This technician is also the one who tested these controllers for Y2K compliance, so I pressed him further on how the 600 mS time is claculated, and whether it could have been a Y2K issue. He looked startled for a second, then quickly claimed that it couldn't be Y2K as only the millisecond counter was used, and no date had ever been set in the RTC since it was installed in 1991.

However on looking into it I am of the opinion that what has happened is that during its operation, the RTC has rolled over, and for a short period of time, the time calculation was negative, which would make the controller think that the gate was moving in the wrong direction. Within a couple of minutes the whole calculation would have been operating in year 00, and so no further problems.

This may have been a Y2K issue, or it may have been a self correcting random failure of a type that does happen from time to time. Maybe we'll never know.

Malcolm

-- Anonymous, December 16, 1999

Answers

I should have mentioned that I intend to cross post this message in TB2000 as it may have implications outside the electricity industry.

Malcolm

-- Anonymous, December 16, 1999


Malcolm - thanks - please keep us posted. As you note, with any instrumentation, sometimes you'll never know what caused the problem. The best we can do is conduct a good failure analysis and make an educated guess.

But then, that's what Y2k has always been about, to some extent.

-- Anonymous, December 16, 1999


Malcomb,

The presence of a RTC does not necessarily mean that the 600ms timer accesses it. We us ms timers in our few PLC applications, and the PLC does not even have the RTC option. It is only needed when date/calendar applications are needed.

Will you try to reproduce the failure? did your Y2K tests include power off rollover and re-boot tests after the rollover? Good luck

-- Anonymous, December 16, 1999


Bingo. Typically there are at least three low level access routines for retrieving time values in embedded systems. Keeping the discussion simple, the three timer types are roughly: TIMER_RELATIVE, TIMER_ABSOLUTE, and TIMER_TIMEOFDAY.

TIMER_RELATIVE is a relative timer/counter that simply increments a counter every 10ms. TIMER_ABSOLUTE is similar to TIMER_RELATIVE but is usually anchored to a well defined epoc. In embedded Unix/POSIX-like systems this EPOC is anchored to midnight Jan 1st, 1970. If this timer is used it shouldn't cause any problems either since it uses the number of seconds and nanoseconds elapsed since midnight 01/01/1970, which at the stroke of midnight 01/01/2000 UTC will be 946684800.000000000 seconds. Problems arise when software engineers use the TIMER_TIMEOFDAY, which is a structured version of TIMER_ABSOLUTE. It takes the absolute time since 01/01/1970 and breaks it down into year, month, day, hour, minute, second. The tm_year structure member of all Unix/Posix systems is the number of years since 1900, ie. at midnight 01/01/2000 it rolls over to 100 not 00. Incidentally most PLCs that provide date/time registers also only provide the last two digits of the year (but they roll over from 99 to 00).

Most experienced engineers will use the counter type of timers for computing results and these reults should usually be immune from Y2K issues. However it is entirely possible to have code in firmware or in other lower levels below the actual ladder logic and application code that is doing something funky and may result in a Y2K bug.

Two weeks to go!

-- Anonymous, December 17, 1999


A. J. Edgar:

This is the kind of information that I really want to hear. This is the same problem that Java Script has on the net the getYear function will return 100 for the year on 1/1/2000. It seems to me that they are using the same type of function. Do you have any idea how widespread the use of this kind of system is?

-- Anonymous, December 17, 1999



There ya go. Almost certainly the JavaScript routine getYear is simply returning the value of the Unix tm_year structure member as seen here. However it seems that the bug isn't manifested the same way on different platforms. In this ar ticle Microsoft states that getYear returns 00-99 for the years 1990 to 1999 but will return a full four digit year starting in 2000.

So there you have it, a simple example of the same language manifesting a Y2K bug in a different manner depending on whether it's running on a Unix system or a Windows system. It's a particularly ironic example since JavaScript has only been around since December 1995. You would have thought they would have got it right.

One language, two platforms, how many more permutations are there?

As far as installed base goes, with this example it appears that just about every web site on the planet is affected. Almost all major e-commerce sites are run by either Unix (Sun Solaris) or Windows NT. You can find a nice chart here (almost all of the Apache servers run on Unix), and everybody uses JavaScript.

But frankly, I am personally not too concerned with the web servers and e-commerce (perhaps the glitches will help bring their stock valuations back down to earth ;). I am really mostly concerned about deeply embedded controls in things like: utilities, traffic lights, shipping and transportation, process control and factory automation, and medical equipment. These systems, if they have problems, aren't easy to fix overnight, or even in a fortnight.

-- Anonymous, December 17, 1999


OK A.J. and Reporter...

For the more technically-challenged out here, can you explain what impact this last issue will have on us "little people"? What is an application of the problem to which you're referring, and how will it impact us?

In addition (to keep it on topic so Rick won't kill it immediately!) how will it impact electricity?

Thanks!

Bob

-- Anonymous, December 17, 1999


Must say I am in agreement with the Year 2000 commitee's attempts to establish baseline failure incidence rates. Well, maybe of no practical value, if a particular type and number of failures is statiscally anomalous dring a time window for posited Y2K failures, it might allow one to narrow remediation efforts and lessen potential down time. It sounds like this is an isolated data point and so a deductive, rather than inferential case for why the failure might have occured would be the best (only) approach. Thankfully this was only a nuisance failure. I was alarmed though, Malcolm, by your statement that this, "brought up an impossible alarm condition". Can you elaborate at all. Also, (and I apologize if this is posted twice, the thought just occured) was/is there a manual workaround to remedy/mitigate this sort of failure? Here's hoping the rest of your holidays are uneventful.

-- Anonymous, December 17, 1999

Woah, Malcom, I'm gonna have to call you on this one. I believe that you have just a bit of "Y2k vision" here, y2k having been your primary focus for so long ;)

To make such claims and then say "Maybe we'll never know". Is a bit irresponsible, in my opinion. You COULD know if you really wanted to!

If you have no display of the PLC RTC date, and have no immediate way of observing whether the date rolled over to y2k as you speculate, then I can tell you how to find out for sure, if you REALLY want to know....

My first suggestion is to get another technician on this, better yet, an engineer who knows PLCs, since I have some doubts about the first. Then connect to the PLC with a laptop and software provided by the manufacturer, and interrogate the PLC and get the RTC date/time.

If this PLC is using only the timer functions, it very well may NOT be using the RTC, PLCs can have their own timer functions (milliseconds, seconds, minutes, etc) that run off the system clock (microprocessor pulses) that have no date function whatsoever.

If this PLC is using the RTC for has a Y2k problem that caused the problem you see, it should be very easy to reproduce by setting the RTC and watching it roll over.

Also, could there be some problem with the gate positioner, coupled with the PLC programming logic that could have resulted in this alarm?

Basically, when I see claims of PLC failures on y2k, I am skeptical. It may be possible, but I have yet to see decent evidence. This post is no exception. Please have the RTC date checked, then get back with us.

Regards,

-- Anonymous, December 17, 1999


AJ, Thanks for the info above, but it what context to you use it in regards to PLCs? The manufacturers programming of the operating system? It's certainly not applicable to PLC application programs, which typically use ladder logic and timer functions that don't use the code you cite.

Regards,

-- Anonymous, December 17, 1999



Factfinder, it looks as though you are right and I did jump the gun on this one. I think we are now fairly certain that this failure was NOT a Y2K issue. The PLC concerned is an Allen-Bradley micro- logix 1500 which Rockwell list as being ready. However the reason we can now be certain that it was not Y2K related is that the same controller failed again last night with exactly the same symptoms. We can find no reason for it to measure a gate drift, which requires either negative movement (possible but wasn't happening at the time), or negative time (an impossible situation except during Y2K rollover).

We have now passed this issue on to our performance engineer who is usually quite good at trouble shooting, and we'll see what he finds.

Malcolm

-- Anonymous, December 18, 1999


Factfinder, to which particular post are you referring? To the initial post about low level TIMER_TYPES, or to the second post about JavaScript and the getYear routine?

I assume you are referring to the first post since it is patently obvious that I'm not talking about the use of JavaScript in a PLC (however I wouldn't put it past a newbie engineer to monitor and control a mission-critical PLC using Windows NT and a Web browser ;-}.

So, with regard to the first post please explain exactly what you mean. I have tried before to get on some common ground with you without much success, so let's try again. If you can answer these simple yes or no questions than I think we will have a starting point for further discussion.

Do some PLC's have firmware in them?

Do some PLC's have lower-level code being executed below the ladder-logic and without any direct access from the ladder-logic?

Do some PLC's have hardware time-of-day / date-of-year clocks in them?

Do some PLC's have software time-of-day / date-of-year clocks in them?

Is it true that most PLC's with hardware time-of-day / date-of-year clocks in them only use two 8-bit BCD registers to represent the year?

Is it true that most PLC's with software time-of-day / date-of-year clocks in them only use two 8-bit BCD registers to represent the year?

Is it true that a PLC with a software time-of-day / date-of-year clock in it is using low-level firmware to simmulate that clock and present the time/date registers to the higher-level ladder logic?

Is it true that many new high-end PLCs actually have full blown embedded IBM compatible PCs in them?

Is it true that Texaco found many PLCs in one of their refineries that presented time/date information to a downstream data acquisition unit and that said PLCs presented the date after year 2000 roll-over as 01/01/@@, and that said PLCs themselves did not fail but the downstream control unit shutdown the control cell because of corrupted log data?

Regards, --aj

-- Anonymous, December 20, 1999


Well, it appears that Factfinder has decided to once again ignore my attempt to start an open discussion on what kinds of problems can be caused by PLCs. Oh well.

For those of you who are interested, the answer to all the above questions is "yes".

Regards,

-- Anonymous, December 23, 1999


Moderation questions? read the FAQ