Embedded Systems Failures That Can Occur More Than a Week After a Trigger Date or a Restart

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

A few days ago a very interesting thread was posted on embedded systems failures. The thread described failures that do not show up immediately. These failures involve overflow buffers. The thread had some notable submissions and is well worth reading. See http://hv.greenspun.com/bboard/q-and-a-fetch-msg.tcl?msg_id=002EOu
Here are some related comments sent me on 1/10/2000 by an engineer who remediates and tests embedded systems for a nationally known company. This individual also took part in the November 9, 1999 meeting of embedded systems experts convened by the Federal government in Washington. These comments seem to confirm what contributors to the above thread had been saying:
This engineer writes:
"... First embedded systems do not have a standardization program in place. In essence there can be several ways programs are written. Now to the Y2K question. There are buffers in most embedded system programs. This buffers can be varying in size. When a command is registered with an embedded system and it is improper or not accepted it can place the command in the buffer. It could be doing this by the hour, day, week etc...
Now if I have a program that does not recognize year 2000 it will search for the date (Year) for a period of time. If it does not find it then a loop is created and placed in the buffer. Now comes the problem. When the buffer is full the system can shutdown, be degraded or begin to act up. The concern is that it could take hours, days, even weeks before the buffer is full.
Simple Example: A fire alarm panel each day at midnight registers the date in this format- Month Day Year- Now on Dec 31, 1999 at midnight it does not recognize year 2000. So it attempts to complete the command "store date".
When this fails over a period of time a loop is created in the buffer. This continues for two weeks. At the end of two weeks the buffer is full. No place to send the command so the system shutdown, becomes degraded, or begins to send out erroneous commands.
Concerns
Shutdown the system and restarting could clear the buffer and then the process restarts again.
Degraded systems could fail when needed the most.
Programs that are running on erroneous commands are sent out, i.e., open valve to 30 percent rather than 10 percent at the temperature of 70 degrees.
Hope this helps explains things that can go bump in the night."
[End of quoted material]
These comments help explain why some embedded systems failures may not become apparent for over a week or more after a trigger date or restart.
I would be pleased to pass e-mail along to the engineer who offered these observations if anyone wishes to get in touch with him. (He did not wish his name and contact information posted.)
I have shared the views of this engineer in recent days with officials at the National Institute of Standards and Technology (NIST), the General Accounting Office (GAO), and the President's Council on Year 2000 Conversion, among others. Officials at NIST and GAO have indicated that they would be getting in touch with the engineer.
I will start a second thread that concerns problems that 7 day clock may have as early as the first holiday weekend in January. That second thread includes comments of the same engineer.

-- Paula Gordon (pgordon@erols.com), January 11, 2000

Answers

Thank you Ms. Gordon for your post and ongoing efforts.

-- Pa Engineer (PA Engineer@longtimelurker.com), January 11, 2000.

Thanks Paula

-- Ishkabibble (ishman@home.com), January 11, 2000.

HOOORRAAAYYY!! Paula Gordon is BAAAAAACK! Commissioner Gordon, I was afeared we were not to hear any more of you after the smashing publicity coup of January 1, 2000: concerned less your gumption, courage and stick-to-it-iveness might not match your class, intellect and fortitude. For better or worse splendidly refreshing and restorative to know that you have been here all along, and are posting again, and might continue. This means that TB2000 WILL NOT DIE ..... not soon at least. Go fot it.
>"<

-- Squirrel Hunter (nuts@upa. tree), January 11, 2000.

Pa Engineer, Ishkabibble, and Squirrel Hunter,
Thanks!

-- Paula Gordon (pgordon@erols.com), January 11, 2000.

Ms. Gordon,
Thank you.
Will send an email request in re Simplex fire alarm panel I accepted vendor statements on but did not test. No reason for concern except there can always be variables that were missed even by the most diligent OEM's.
I won't attempt to defend my decision to not test, this panel cannot easily be recovered if it crashes as past experience has proven.
It's generous of you to be offering this help.
Regards,

-- Tom Beckner (becknert@erols.com), January 11, 2000.

:-) Paula will never let a possible danger with embeddeds pass into the night without a call for scrutiny and remediating action. Accolades to a true warrior!

-- Ashton & Leska in Cascadia (allaha@earthlink.net), January 11, 2000.

some embedded systems failures may not become apparent for over a week or more after a trigger date or restart
Paula,
You're obviously still concerned about embeddeds. Don't you think a majority or a predominance of these problems would have occurred on or near the 'trigger date'? Since Jan 1 was uneventful (in terms of embeddeds) can't we also assume that the prospects for deferred problems has also diminished considerably. You and RC have contended that in testing some probs didn't occur until as much as 30 days later. What percent? What percent occurred on the trigger date?
In the oil industry, American Petro Institute stats are released every Tuesday late afternoon. This will be our first glimpse of refinery operating rates, production probs, implied demand and stock levels during and post rollover. I'll post a thread with the stats and assumptions although one shouldn't put too much in one weekly report. There were alot of refining problems, but I just don't think they're related to this issue.

-- Downstreamer (downstream@bigfoot.com), January 11, 2000.

Thanks for your informative post Paula. I can just imagine how many people were holding there breath for the rollover moment. Time indeed will tell.....

-- kevin (innxxs@yahoo.com), January 11, 2000.

---stoopid laymans question. On some embeddeds, does the accumulated "not online or plugged in" time count against the "trigger date" of when /if some glitch might appear? If so, seems that different systems will be approaching this time frame at different times, you'd have to go back and see what the accumulated total of downtime was, and project into the future when that point should be theoretically reached.
have no idea if this idea has ANY merit WHATSOEVER. I have ZERO expertise in this matter, because...

-- itain'twhytookay (notanengineer@experts.bah.humbug), January 11, 2000.

Thanks, Paula. Life sure is interesting. Watching the embeddeds--and software.
By the way, regarding later failures, my programmer friend says he is definitely expecting some in his applications, which look at historical data. His dates are not yet for this year, therefore, no problems yet.

-- Mara (MaraWayne@aol.com), January 11, 2000.

I have been trying to get this failure scenario into some people's head for over a year. The idea of being ready to do "wall plug resets" to keep certain systems running a week at a time has been a hard sell to people who deny any Y2K problems exist.
But I've already seen one possible buffer overflow that cleared with a "wall plug reset". I'm watching that device to see if there's a repeat failure after a similar period of time. Truly, time will tell.
WW

-- Wildweasel (vtmldm@epix.net), January 11, 2000.

1999=11111001111
2000=11111010000
Why would that cause an overflow?
Besides, a buffer is a "storage" device, it will not "break" physically.
Once again, Paula does not know what she is talking about, and the source sounds rather uneducated in digital devices, and if you people are sucking up this crap and believing it, you have not learned by experience and are going be made fools of again.
I have shared the views of this engineer in recent days with officials at the National Institute of Standards and Technology (NIST), the General Accounting Office (GAO), and the President's Council on Year 2000 Conversion, among others. Officials at NIST and GAO have indicated that they would be getting in touch with the engineer.
Why would these officials be looking into this? Because they also have absolutly no idea of how these things work. Few people do. Even a "embedded chip" or "embedded system" programmer does not usually know how the digital electronics works. They may know a little hex, but boolean algebra and the physical make-up ~ forget it.
If they cannot "program" in machine language, they do not know this stuff. And this stuff is pure BULLSHIT.
Now if I have a program that does not recognize year 2000 it will search for the date (Year) for a period of time.
Search? What does he mean by "search"? If a date has been given an address, the "date", even if wrong, will reside at that address. If it is stored in the buffer, then the program will look for it in the buffer, and even if it is wrong, or 0, or anything, it will be found there and used for whatever it is trying to be used for.
A buffer is a temporary storage area. The date will not get lost.
This person sounds like he does not understand what he is talking about and is "reaching" for some discription of what he "thinks" a buffer is.
If it does not find it then a loop is created and placed in the buffer.
Loop is created? What does he mean by that? A small program is created that loops? Then placed in the buffer? The buffer is not a "working" program, although instructions can be placed there. But the buffer is like people standing in line, when one moves up, the rest move up one. Ifsomething is to be added together, the data is taken from the buffer, arithmatically manipulated in an arithmatic "program" and the result is placed back in the buffer. The buffer itself does not do the math. So the buffer may have x amounts of "storage" that moves up in line, but it is just ones and zero's.
An example that may help you understand is printing jobs. If you initiate 5 printing jobs, as one is done the rest move up in line until it is their turn to be printed.
Now comes the problem. When the buffer is full the system can shutdown, be degraded or begin to act up.
This proves that he does not understand what he is talking about, if he did, he would know exactly whay happens when the buffer is full. He is guessing, and saying it will "act up" is putting human attributes on electronics. Digital computing, with the exception of software, is an exact science. There is a cause for each thing theat happens.
The concern is that it could take hours, days, even weeks before the buffer is full.
Computers run in nanoseconds, not days or weeks...geeze...
Simple Example: A fire alarm panel each day at midnight registers the date in this format- Month Day Year- Now on Dec 31, 1999 at midnight it does not recognize year 2000. So it attempts to complete the command "store date".
What it is doing when it gets the command to store date is to go to the memory address, or the buffer and take the data out to aato put it somewhere-to store it. The program is mearly picking up what is there, correct or not, and moving it to "somewhere" that it has been designated to store it.
You have the instruction-to store at a designated address, and the data which at this time is noting to the software but data- the instruction does not care what it is because it is just being moved, not computed.
When this fails over a period of time a loop is created in the buffer.
Once again a buffer does not work this way, the instruction may be stored in the buffer~so what?And what exactly fails anyway? The program has gone somewhere to get data (date) and move it somewhere else, even if it is all zero's or the wrong date, whatever it has picked up there will be moved to where it is to be put. Now once the data is put in it's place and something else wanting the date accesses it there, the wrong data-or date can be used and cause an error, but that is a Y2K programming error, not an "embedded" error, and certainly not a buffer problem.
This continues for two weeks. At the end of two weeks the buffer is full. No place to send the command
Buffers are used constantly, by many different instructions, it will not sit there being used for this purpose only, and eventually "fill up" from it. As a matter of fact after the buffer has been used, or before it gets ued again for another function, it is cleard. If it is not cleared, then that is a programming error, that has nothing to do with Y2K or dates, and would have shown up when the computer first started running and used the buffer, not wait until now.
so the system shutdown, becomes degraded, or begins to send out erroneous commands.
Wrong. Depending on the computer, and how it has been programmed, when the buffer has become full, the buffer will no longer accept data, or will step it forward and drop the first thing in line into the bit bucket (it just does not exist anymore) or some other predesignated function that it was designed to do.
So, the entire "theory" is based on noting more than aan unknowledgable person making guesses while not understanding the actual gut-leval functions of computing.

-- Cherri (sams@brigadoon.com), January 19, 2000.

Moderation questions? read the FAQ