Where Y2K can occur in a system - the various levels of system design

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

This is a simplified description of the various 'levels' of hardware and software that make up a complete system. The levels are arbitrary, and meant for the convenience of the people working with the hardware/software, after all, the system must have all these things in operating condition or it cannot run! But generally, someone designing microcode does not have to worry about the requirements of a compiler designer, the operating system designer does not worry about the condition of individual logic gates, and so forth.

(Yes, there are exceptions to every rule, and MS makes a bunch of them all by itself - as when they recently got Intel to modify the PIII to make certain common Win 2000 OS instructions more efficient. But we are talking about the usual case here.)

The lowest level is the simple PN junction. The characteristics of the junction are solely determined by the laws of Physics - no sort of problem related to Y2K is possible on this level.

The next level is the transistor level. PNP and NPN junctions make up the simple bipolar transistor. There are other transistors, FET's, MOSFET's and so forth, but none can have any sort of problem with Y2K.

Next we combine transistors (with other components) to create simple devices. We can now build both analog and digital basic devices - oscillators, amplifiers and simple logic gates. And here we begin to see engineers using abstraction to remove themselves from the complexity of the underlying system - block diagrams and logic layouts start to appear. None of these devices have any sensitivity to Y2K. Analog chips and some SSI chips may be designed at this level.

On the next level we combine logic gates to create simple computer components. (I am dropping the Analog circuits now - they are not pertinent to this discussion.) We can now build adders, half adders, simple memory circuits and so forth. These are usually considered to be the lowest building blocks of a computer. If you have ever seen any small scale integration chips, this is what they usually contain - examining any IC catalog will provide you with a wealth of choices from such devices.

Now in theory, you could combine these devices to build a system that would do date arithmetic at this level, but it would be like building a house with tweezers, toothpicks and glue - the tools of a hobbyist rather than a carpenter. No one is going to do this - the work involved is not worth the reward. Devices at the logic gate level are not Y2K sensitive.

The next level is the level of the microprocessor and the support chips. Large Scale Integration and Very Large Scale Integration enters the picture. And this is the first level where you can actually discuss programming - in microcode.

Microcode is the program/interpreter that actually executes the machine code - the string of 0's and 1's that the computer actually deals with. A 1 is a high voltage; a 0 is low voltage - in a typical system, though it is possible to build systems that reverse this. Microcode is not necessary - you can build a system that executes the instructions by pure hardware instead, but the microcode design is commonly used as it cuts development time and is easier to modify during the testing and debugging stages of designing a new chip. CPU's, memory, RTC chips - these devices are capable of reporting dates, but without combining them you still have no Y2K sensitivity.

The next level is combining parts to make a computer. You now require machine instructions in ROM, PROM or EEPROM to bootstrap the machine, RTC chips if you want to keep track of dates inside the computer, video chips and external devices must be connected - and the simple computer is finally born!

AND you finally can discuss realistic occurrences of Y2K problems at this level; always remembering that hardware must boot into a known or default state. You may have a problem in the program in ROM or PROM as regards dates. This is not actually a problem for this device - just as the 2-digit BCD date field from a standard RTC chip is not a problem for that chip. It may, however, cause problems at a higher level of abstraction/sophistication. Some Y2K problems do exist here, and these may require replacing or updating a program in ROM, but you are extremely unlikely to find problems that will freeze up or lock a device at this level.

On the next level we now have the Operating System or OS. This level provides hooks into the hardware and such basic services as disk operations and such. Few Y2K problems are found in the pure OS, most such are actually limited to the OS SHELL.

The shell is the program you actually interact with after your computer boots - the desktop in Windows, the command line in MS-DOS or LINUX. And shells may indeed have Y2K problems and require patches - check with your OS vendor.

On the next level we can finally talk about programs. This is the level of the assembly language programmer (he also makes his appearance in the bootstrap program mentioned two levels down, as well as much of the OS - told you these things were pretty arbitrary). Peter Norton makes his living on this level. Assembly programs generally are written to perform low-level tasks that interface directly with the hardware. It is certainly possible to do date arithmetic on this level, just not common. Y2K problems on this level will require fixing or replacing programs, many of which will be burnt into chips.

Next we reach the compiler level. At the compiler level, we have reached a pretty high level of abstraction from the actual hardware. Here is the common place for date arithmetic, forecasting and such, and this is the place where most Y2K problems actually exist. Y2K problems on this level require remediation or replacement. This is the realm of the COBOL, BASIC and FORTRAN programmer. And this is the level where odd or unexpected results can easily lock up a system.

Some will put another level or two above these - Fourth Generation Languages and Artificial Intelligence programs capable of basic reasoning or actions on their own hook. But Y2K problems at these levels require the same sort of fixes as the compiler level - so skip it.

And now you know why CE and CS people go ballistic when you tell us about a Y2K 'chip' problem. Tell us the chip output causes a problem in a higher level program and we might buy it - but a chip that has a real problem in gate logic that relates directly to Y2K? That has to be as rare as flying ostriches.

And speaking in such general terms means the above applies to almost any system you could possibly encounter - whether PLC, embedded controller, Real Time Operating System or whatever.

Paul Davis



-- Paul Davis (davisp1953@yahoo.com), April 15, 1999

Answers

You're talking about the design of commercial systems (PCs, mainframes), or alternatively indulging in pedantry. In fact, there are a lot of chips out there that are Y2K buggy in the embedded systems realm, because there are a series of downward steps there which you don't describe.

For example, a programmer compiles a program using a PC, but isn't generating binary code for a PC. He routes the binary code to an EEPROM programmer, takes that chip to a development board, and proceeds to debug an embedded board. Every time it don't work right, he alters the code, erases and reprograms the EEPROM and tries again.

Once he is satisfied that the code is bug-free, it may go back to an electronics engineer with a brief to turn the prototype embedded board into a system capable of mass-production at the lowest possible cost. If the production run will be large, he'll send the binary code off (as a near-meaningless string of binary 1s and 0s) to a chipmaker to be cut into a ROM, or into the ROM component of a single chip which also contains a microprocessor, memory and maybe IO ports compatible with those on the prototype board. The chips which come back are soldered into production systems and can't be reprogrammed, only replaced (if that). What was "soft" is now "hard".

And then a process of attrition starts. The original design engineer leaves to work on something else. His source code gets lost, and nobody cares: they've sold a million things with this binary code embedded in then and no-one is compaining. Or the design ceases to be manufactured, and the company making them goes out of business. Etc. Etc.

Then, years down the line, it is discovered that the binary code blown into these chips will fail when the realtime clock or other time source that it relies on reaches 1/1/2000. Oops! And there's no list of end-users. Quite possibly not even a list of every customer who bought some of these preprogrammed chips or boards.

The embedded Y2K problem is quite definitely a chip problem! No amount of wishful thinking about what was once malleable code in a developer's system can alter the fact that the only way to fix these things is surgery to hardware or total replacement of the system.

I don't want anyone to junp to doom-laden conclusions based on this, any more than I feel that the originally posted item gave any cause for optimism. But the assertion that it's purely a software problem is just plain wrong in every sense except an excessively pedantic one.

-- Nigel Arnot (nra@maxwell.ph.kcl.ac.uk), April 15, 1999.


Wrong Nigel - I pointed out very carefully that Y2K problems could exist in ROM. BUT that those problems generally were not a problem for that device, usually causing trouble for higher level programs rather than causing device lockup. And you make my main point for me anyhow - programs are the source of Y2K problems, not logic hardware. Whether in ROM or PROM or RAM - a program is a program is a series of code instructions executed by the machine that are not part of the logic hardware.

-- Paul Davis (davisp1953@yahoo.com), April 15, 1999.

Bottom line.

There are gazillions of these chips worldwide.

Last word - remember this - worldwide.

So-called experts disagree vehemently on the extent of the problem, the seriousness, the implications and the method of fixes.

Remember - worldwide - the rest of the world is going to be an almighty mess, chip wise and code wise.

I'm gonna start using my sig line again because this whole scenario does not give me a good feeling.

Andy

Two digits. One mechanism. The smallest mistake.

"The conveniences and comforts of humanity in general will be linked up by one mechanism, which will produce comforts and conveniences beyond human imagination. But the smallest mistake will bring the whole mechanism to a certain collapse. In this way the end of the world will be brought about."

Pir-o-Murshid Inayat Khan, 1922 (Sufi Prophet)

-- Andy (2000EOD@prodigy.net), April 15, 1999.


Paul,

I don't understand your obsession with the "chip level" terminology. What counts in the end is the system function. Nigel clearly outlined the problem of code made into firmware and then scattered through the technologic infrastructure. Though its inaccurate, most people refer to this as the "chip" problem. Granted, only a small percentage of these devices will have a problem. But, the sum total of defective devices could be huge. Lets say that only a total of 100 million such devices reside in "critical" devices in the USA (out of several billion total). A 1% failure rate yields 1 million failures. Out of these 1 million, lets say that 1% are in truly life threatening places (refineries, airplanes, medical equipment etc). That yields 10,000 serious emergencies here. Or do you take the position that there is NO Y2K defective firmware existant?

-- RD. ->H (drherr@erols.com), April 15, 1999.


Paul,

Well done and well said. Chips don't cause Y2K problems; programs do.

There's plenty of good news about embedded systems. The Gartner Group just testified to Congress that only about .001% of all embedded systems are at risk of Y2K bugs, for example. That is a statistically insignificant number. More systems will fail on any given day from the usual causes -- everything from lightning damage to operator stupidity to simple wear and tear.

And yet, some people are slow to get the news -- as witness a few articles I read today elsewhere.

But this makes one wonder where the original 5% ("1 in 20") figure came from -- and how it could have been so wrong.

First, there's a lack of understanding on the part of programmers about how these chips initialize. As you correctly point out, they default to some innocuous state -- usually all zeroes. The most common case is that the _program_ must enable and setup the RTC before it does anything useful.

Here's the key: if the embedded program ignores the RTC, the RTC has no effect -- same as the clock or calendar on the wall of your kitchen has no effect unless you read them and act on the information. An RTC is nothing but a timekeeper.

(Someone will doubtless pull up an exceptional example, but that's all it'll be -- an exception. I'm speaking of the GENERAL RULE here.)

Second, there's no meaningful definition of "compliance." From my point of view, as long as you can demonstrate that my system will continue working after 2000/01/01, I'm satisfied. How we EVER let ourselves fall into this "percentage" and "letter grade" trap is a mystery to me, because it's virtually meaningless.

The EBS/EAS machines in my radio station are a perfect example: they need the date SOLEY for printing out records of events. In January, they'll start printing "mm/dd/00" on the forms. Big deal.

And yet, had someone picked up on this, you can imagine the articles: "Over 50% of all EBS/EAS machines are non-compliant! This critical public-safety and information system is being ignored by the FCC ..."

The statistics that I've seen at Doom and Gloom sites are usually based on a very strict definition of "compliance" -- that the machine should no Y2K problems whatsoever. The people who actually use these machines as process controllers in industry are somewhat more sanguine about the thing. :)

Again: well done and well said.

-- Stephen
http://www.wwjd.net/smpoole

-- Stephen M. Poole, CET (smpoole7@bellsouth.net), April 15, 1999.



let's get real for a minute, folks. the average person couldn't care less about hardware vs. software. all they want to know is, will the damn device work, or not? yes or no. they don't care about the details. if my brother in law's thermostat fails next january 1, and it's 40 degrees below zero, do you think he's going to spend any time debating whether it's a hardware or a software problem? all he knows is that he has to do something, fast.

-- jocelyne slough (jonslough@tln.net), April 15, 1999.

Jocelyn (-1 sp) - ...that's IF he can do something real fast. Else, he has to have an alternative to what failed.

The real problem - we can't specifically predict what will fail, where it will fail, what the effect of its failure will be, and how long it will take to recover.

If ANY of these were known, we would know exactly what to prepare for.

-- Robert A Cook, PE (Kennesaw, GA) (Cook.R@csaatl.com), April 15, 1999.


jocelyne:

Remove the thermostat from the wall. Hook the wires together. When it gets too hot, unhook them until it's too cold. Repeat. Not convenient, but beats freezing.

-- Flint (flintc@mindspring.com), April 15, 1999.


Paul: one of the interesting things I am finding in my Y2K remediation is that 90% of the problems we incur are not involved with y2k proper, but complications from incompatibility in new firmware, OS and application software. In a way, the problem is being made worse by mandates that the systems be y2k-ized, when in many cases a simple contingency plan would have sufficed.

-- a (a@a.a), April 15, 1999.


Jocelyne,

You've nailed the problem. No one can _guarantee_ with 100% certainty that it will keep working, but that's primarily because those "smart" thermostats are notorious for failing NOW. What would he do in THAT case?

But having said that, if there's no way to enter a date on that thermostat, it won't cause a problem.

(THAT's really how this debate started; Beach apparently thinks it could, and he's wrong. He's frightening a lot of non-technical people like you.)

If there is a way to enter a date, tell him to simply set the date/time ahead and watch what happens. The worst-case will be that the heat won't go off; in that event, simply remove the battery and/or unhook the thermostat for a moment, then hook it back up (by the way, you should see what Paul said -- it'll default to some innocuous date)

THEN you go scream at your Heat/Air vendor.

-- Stephen (you have my permission - scream LOUD)
http://www.wwjd.net/smpoole

-- Stephen M. Poole, CET (smpoole7@bellsouth.net), April 16, 1999.




Mr. Cook,

The real problem - we can't specifically predict what will fail, where it will fail, what the effect of its failure will be, and how long it will take to recover.

In spite of my respect for the "PE" after your name (I know what it takes to get one of those rascals ), I respectfully disagree with your statement.

Design engineers are actually very good at that sort of thing. I've worked with engineers from Sony and JVC, and they can predict -- with astonishing accuracy -- the number of units which will fail in a given month. (How do you think they calculate their cost for the warranty?)

When you're talking about complete large systems, people like me are paid to implement them so that, if (correction: when! ) a failure occurs, it will be quickly contained and limited to the faulting area.

The embedded systems thing has been blown all out of proportion. Doom and Gloom articles have taken a few exceptional (some catastrophic, I'll admit it) failures and tried to present them as the norm, and that's ludicrous.

(Beach goes one step further into plain bad science; but Paul's dealing with him handily enough. )

Does this mean there have been no embedded problems? No, of course there were. But these gadgets have _never_ had the ability to cause the systemic failures that have been attributed to them, because people like me don't trust expensive equipment to them -- because they fail all the time NOW!

In fact, that's a great question when someone points out an embedded system that could fail due to a Y2K bug. Ask them: "OK, what would they do NOW if it failed? Or are you assuming that these things have never failed before?"

I guarantee that, in the vast majority of cases, the answer won't be, "uhhh ... the whole thing will blow up and we'll die." :) -- Stephen
http://www.wwjd.net/smpoole

-- Stephen M. Poole, CET (smpoole7@bellsouth.net), April 16, 1999.


a - yes, there is a very real chance of remediation efforts causing more trouble and expense than simply identifying the problem areas and figuring out reasonable work-arounds.

We hear constant yelping about businesses that might fail due to Y2K - but what about businesses that fail due to badly planned remediation efforts, or over-remediation that puts them at a disadvantage for future growth? Those are real problems too.

I personally know of a form filling program was just replaced for Y2K compatibility last week. The old forms printed out 1/1/0 instead of 1/1/00. Now anyone could still read and sign the old forms, and they made sense. I am sure it cost a bundle for the new program - thousands of users. Do YOU think that money was well spent?

-- Paul Davis (davisp1953@yahoo.com), April 16, 1999.


Moderation questions? read the FAQ