IEE's embedded chip problems -scary-

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

All these problems are listed here. http://www.iee.org.uk/2000risk/Casebook/eg_index.htm Here are some of the more scary cases. Note the continued reuse of the words 'catastropic', 'critical', 'shutdown', 'serious', 'chaos' Also look at the places where these systems are used. ============ EXAMPLE NO EG-84

Equipment Type Stand alone instrument Industry Sector Manufacturing PC or Computer based Yes System Age 10 Application Level and flow monitoring of waste acid treatment plant Description of the Problem Problem experience with some versions of firmware. If the unit rolls over any year (i.e. not Y2K specific) with the power supply off, on power up, the display is blank and the keyboard locked so that the device will not operate. How was it Identified During off line testing in the workshop. What was the Solution A known compliant version of the firmware has been installed. Long term, the unit will be replaced.

Consequences for the SYSTEM System Stops Consequences of failure to the BUSINESS Inability to treat acid, resulting in shutdown of plant. Other

================== EXAMPLE NO EG-14

Equipment Type HVAC Industry Sector ALL PC or Computer based No System Age 6 Application Package Boiler Control System local and remote Description of the Problem Hardware and software How was it Identified Z180 Microprocessors found during physical examination and 2 digit date found when examining code ( assembler ?) What was the Solution Solution not yet known as manufacturer not now in business Consequences for the SYSTEM System Stops Consequences of failure to the BUSINESS Failure would result in no bulk oil supplies to a major works as steam is used to preheat heavy oil for distribution(5 x 10,000 tonne tanks pumped at 50t /hour) Users of these instruments would need to test compliancy of all versions of firmware. In this case, it was the latest version that exhibited the problem. ================= EXAMPLE NO EG-15

Equipment Type HVAC Industry Sector ALL PC or Computer based No

Application Air Conditioning/Heating Controls Description of the Problem Loss of control of HVAC system. Critical date 01/01/2000. How was it Identified Manufacturer aware, confirmed through testing. What was the Solution Upgrade software. Manufacturer supplying free upgrade. Consequences for the SYSTEM Erroneous Result Consequences of failure to the BUSINESS Potentially catastrophic. ========================= EXAMPLE NO EG-04

Equipment Type Complex Process Industry Sector Communications PC or Computer based No

Application A multi-site organisation has a multi-service bandwidth manager with management system which is non-compliant. Description of the Problem The vendor advised the client not to allow the system to roll into the next century as 'unstable or unpredictable results could occur'. The cause is that the management software application has not been designed to take account of four digit dates or the year '00'. How was it Identified It was recommended that the client contact the vendors of their systems. As a result of this contact, the vendor advised the client that there was a major problem with the management software system. What was the Solution The management system is completely non-compliant. The solution is to replace the management system with a system which is compliant. Consequences for the SYSTEM System Stops Consequences of failure to the BUSINESS Critical =========================== EXAMPLE NO EG-28

Equipment Type OTHER Industry Sector Rail Transport PC or Computer based No

Application System used for voice and data communications between train drivers and signallers. Description of the Problem Before updating the time, the management processor sets all of its internal registers to zero, and monitors the status of them afterwards. If the status of one or more registers is still zero, this is interpreted as message not received. The processor will await the arrival of a valid signal before updating the time and date. So, effectively, it will cease to function for one year, then resume normal operation on 01/01/2001. How was it Identified Discussions with the users and then structured interview with the equipment manufacturer. The manufacturer was unable to answer all questions satisfactorily and during follow-up work discovered the error. What was the Solution The equipment manufacturer must provide a software upgrade. Consequences for the SYSTEM System Stops Consequences of failure to the BUSINESS If information gets out of sequence, chaos will ensue. Train delays will occur, and there will be increased risk of rail accidents. The cost of this could be considerable. =================================== EXAMPLE NO EG-76

Equipment Type OTHER Industry Sector Oil & Gas PC or Computer based Yes

Application A hand held vibration data logger and PC based software package. Description of the Problem During functional testing and with the date set to 1st January 2000, vibration was monitored using the device and the data downloaded into the PC. The trend was then displayed. The trend normally displayed the last few weeks' data ending with the latest piece of information. As a result of the test, the display became totally nonsensical. When the date was returned to 22/11/97 and the test repeated, the system still failed. How was it Identified The manufacturer claimed that the system was compliant. The problem was discovered during functional testing. What was the Solution The package will be upgraded to a version running under Windows NT, which is claimed compliant

Consequences for the SYSTEM Erroneous Result Consequences of failure to the BUSINESS Any planned monitoring and maintenance programme would be severely disrupted by such an error. There will also be regulatory problems as, in the event of an emergency, logs and sequencing information is needed for post-incidence enquiries. =========================== EXAMPLE NO EG-07

Equipment Type DCS Industry Sector Oil & Gas PC or Computer based No System Age 6 years Application DCS control system control for petrochemical plant Description of the Problem Online rollover to Year 2000 How was it Identified During testing. Offsite testing on a testbed was performed with satisfactory results. Upon testing of stations on site, control was no longer possible after the system had rolled over to Year 2000. It was not until this problem was evident on three of the four operating stations was testing aborted. What was the Solution No known workaround. Plant had to be operated from one station until problem could be rectified Consequences for the SYSTEM System Stops Consequences of failure to the BUSINESS Near catastrophic. Limited reliability and operability of plant. Reduced production ====================== EXAMPLE NO EG-60

Equipment Type DCS Industry Sector Manufacturing PC or Computer based Yes System Age 10 Application Works Energy monitoring system Description of the Problem Statement from manufacturer How was it Identified Statement from manufacturer What was the Solution Upgrade to hardware and software and subsequent testing Consequences for the SYSTEM System Stops Consequences of failure to the BUSINESS Serious consequences on works fuel distribution and control =================== EXAMPLE NO EG-80

Equipment Type SCADA Industry Sector Manufacturing PC or Computer based Yes System Age 4 Application Operation & control of gas flare stack Description of the Problem How was it Identified Audit by supplier identified that version of UNIX was not compliant What was the Solution This system is to be relocated /modified in Year2000 Decision made to roll date back 8 years (non compliance date associated )and not spend money to fix .8 Years was to keep leap year in line if system was not immediately replaced Consequences for the SYSTEM System Stops Consequences of failure to the BUSINESS Failure to open the flare stack could cause over pressurisation of gas distribution system including gas holder if gas manufacturing units could not be reduced especially if consumers were to fail due to Y2K problems. (Over pressurisation, as in go boom!?) ============================ EXAMPLE NO EG-27

Equipment Type OTHER Industry Sector Rail Transport PC or Computer based No Application Vibration monitoring on rail network. If train wheels have a flat spot, or axles are damaged, the rail will vibrate due to the uneven load distribution. These vibrations are detected by monitors on the rails, allowing faults to be identified. Description of the Problem There are two models of this system: Mark 1 will fail to operate completely after 09/09/99 due to the fact that 999 was used as an end of file marker; Mark 2 will operate until the end of 1999, but its internal clock will fail to rollover into the next century. How was it Identified Supplier information was referred to initially, rollover testing was then carried out. What was the Solution Both systems were rolled back to determine which, if any, of the previous leap years it would be possible to use. The Mark 2 systems cannot be rolled back to a date prior to system installation, for example 1996.

Mark 1 systems can be rolled back to any date, but will fail again once their internal clock reached 9/99. Consequences for the SYSTEM Erroneous Result Consequences of failure to the BUSINESS Catastrophic if a problem, which subsequently leads to an accident, cannot be identified. ===================== EXAMPLE NO EG-82

Equipment Type SCADA Industry Sector Manufacturing PC or Computer based Yes System Age 5 years Application Windows based SCADA system for chemical handling plant Description of the Problem The first test was a BIOS rollover test with (unfortunately) the SCADA package running. A failure occurred, causing a `lockup' with a complete lack of response. A manual trip was initiated, however not before an acid spill occurred resulting in a minor environmental problem and a major safety incident. Unfortunately, the company's personnel failed to realise the implications of conducting such a test on an operational plant. Another test, was performed some weeks later on a later version this time with properly prepared test plans, and plant personnel awareness. One function of the tests was for 09-09-99, 09-09-1999 and 99-99-9999. These `dates' resulted in the software failing to execute. Note that 99 is often used as end of file indicator. How was it Identified See above What was the Solution Replacement Consequences for the SYSTEM System Stops Consequences of failure to the BUSINESS Dangerous chemical spill Other Testing live systems requires careful forethought and preparation. Always back up data first. ====================== EXAMPLE NO EG-56

Equipment Type Complex Process Industry Sector Manufacturing PC or Computer based Yes Application Component Placement Machine in Surface Mount Assembly of electronic components Description of the Problem User believes that the machine may stop working but the vendor is going to provide a fix. How was it Identified Vendor reports that the machine is not compliant. What was the Solution The vendor is going to provide a complete new PC to replace the original that is actually embedded into the machine. Consequences for the SYSTEM System Stops Consequences of failure to the BUSINESS Impact would be serious if the machine stopped. Production capacity would be reduced ====================

-- John Ainsworth (ainsje@cstone.net), June 25, 1999

Answers

link

-- at work (why@not.com), June 25, 1999.

why the double post/waste of bandwith?

-- zoobie (zoobiezoob@yahoo.com), June 25, 1999.

zoobie- See the other post -- my fingers slipped. Sorry.

-- John Ainsworth (ainsje@cstone.net), June 25, 1999.

over at Rick Cowles' site there was a post on this and I could not believe that some engineers/programmers were saying that what the IEEE said in testimony was not really accurate/impt whatever.

unfathomable imho.

so, these guys are lying? the things they talked about are only of minimal importance???

-- walterskold (wskold@lazrus.org), June 25, 1999.


walterskold:

You need to carefully distinguish between what the IEEE said and what they didn't say.

They said that remediation efforts today are resource-intensive (very true). They said we'd never fix everything in time (very true). They said there will be a whole lot of firefighting later when the bugs we missed start biting us (very true).

Beyond that, they were addressing legal ramifications, and made two key points: that current efforts to create legal protections are interfering with remediation in some ways, and that major efforts at legal defense later (combined with adverse judgments) will interfere with the firefighting efforts as well.

What IEEE did *not* say was what sort of breakdowns we could expect, nor for how long, nor how serious, nor what economic or other macro- level changes will result from any y2k bugs. In other words, IEEE emphasized the reality of y2k issues (and yup, they're real OK), but made no techical effort to *quantify* those issues. They were concerned with getting any government assistance they could to help address y2k bugs as efficiently as possible.

Recognizing that disease exists is quite different from announcing a pandemic. IEEE recognized that y2k bugs exist, but didn't say what would happen as a result.

-- Flint (flintc@mindspring.com), June 25, 1999.



At least not as YOU were able to ascertain, Flint. You never won at playing 'Clue', did you? Colonel Mustard, in the library, with a candle stick. Flint asks, "What is the model number on that candlestick?" " Is anybody REALLY able to read Colonel Mustard's mind?" "Where was this game board purchased anyway? This could have searious implications about the underlying 'motives' behind the storekeeper's attitude regarding the various outcomes"

-- Will continue (farming@home.com), June 25, 1999.

Well, Flint, sometimes you don't get someone to explain all this stuff to you, maybe they think anybody ought to be able to THINK THROUGH the cause-effect. Like, say 4 million gallons of raw sewage showed up at your doorstep. Would you just marvel at it, saying "Oh, fascinating. Gee, this could be a problem. Or maybe not. I mean, I really have to have more information. This could be problem if it led to infectious diseases. On the other hand, my lawn needed to be fertilized anyway. And so this could actually save me money and work. And keep my Mother In-law away. And...."

-- King of Spain (madrid@aol.com), June 25, 1999.

Moderation questions? read the FAQ