What's wrong with this picture? - another look at Peach Bottom nuclear Y2K prob

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

I know that we already have a couple of Peach Bottom Incident threads going, but there is a "side issue", perhaps only remotely dealing with Y2K per se, that I think is real important to consider. It was pretty much reported in all accounts that I have seen (Washington Post, Rick Cowles' EUY2K Newsroom, Dick Mill's article, see the other threads for the links), and so far no one has apparently given much thought to it. Mill's account is as good as any, so let me quote:
The tester entered a post 2000 date. Nothing at all appeared to
happen. He assumed he entered the date wrong, so he did it a second
time. This time the process computer functions failed, and as a
consequence, the SPDS (Safety Parameter Display System) went
blank. It took several hours, and reportedly several attempts, to
restore things to normal.

The error was that when the tester entered the first date into the
backup computer, it halted. Because the primary computer took over,
it appeared nothing happened. The second command did not repeat
to the backup computer it went to the primary computer, which also
halted.
(emphasis mine)

Doesn't this strike anyone else as pretty strange? I mean, if you have a secondary system that is supposed to take over from the primary system in the event that the primary system fails, shouldn't there be some sort of indication that the primary system has failed and that you are relying on your backup system? Especially when the secondary system kicks in so seamlessly that it is otherwise apparently impossible to tell the difference? Especially when the systems are doing important stuff like monitoring nuclear reactors?? (You know, something like "WARNING: PRIMARY SYSTEM FAILURE -- SECONDARY SYSTEM NOW IN CONTROL".)

I realize that this gets into the entire How To Design Reliable Software arena, where Ed Yourdon is as much a heavyweight as other expert, and is obviously independent of Y2K per se. But once again, the Y2K aspect is that software that is obviously not written like it should have been, and thus is in a somewhat "fragile" state anyhow, will now have Y2K to kick it around, with what may very well be very grave consequences....

-- Jack (jsprat@eld.net), March 08, 1999

Answers

Wellllll, I'm no "computer expert" I just test the damn things - like the CAD release today that should have fixed four problems - and failed to even recognize the d**m data file. (The previous version loaded and successfully edited and saved the same data files last night at 11:00, so why won't this version? Who knows....it's back in the developer's hands.)
Are they in "fragile" states - like a rack of inverted crystal glasses hanging over the bar - they are perfectly safe as long the user doesn't touch them or throw something or shake the bar in an earthquake.. Once in your hand, all bets are off - the user can get cut and spill things.
So - what is the problem in the controller program, and in the test user's response of trying agian - a series of goofs that shut down the system, as usual.

-- Robert A. Cook, P.E. (Kennesaw, GA) (cook.r@csaatl.com), March 08, 1999.

Well, when I worked on a UNIVAC 90/60 (1982), the only way to tell which computer was on-line was to watch the "busy lights" on the front of the computer itself.
Basically, we had two identical mainframes sitting right next to each other with a bunch of white flashing lights. By looking at the lights, you could see which one was busy, and which was idle.
Perhaps Peach has an ancient system, with the hardware in another room?
I will say, that it does seem to me to be insane in 1999 not to have better indicators, though. Maybe the operator was an idiot, too?
Jolly

-- Jollyprez (Jolly@prez.com), March 08, 1999.

I think this is a great question Jack. Why not ask it over at Rick Cowles' EUY2K forum? They've got some pretty sharp people hanging out there! <:)=

-- Sysman (y2kboard@yahoo.com), March 08, 1999.

Well, since I wear the "software engineer" hat among others, I can say this: Reliable software is almost a contradiction in terms. All a competent designer can do is ensure it (the "it" being the package he/she is developing) is as crash-tolerant and bug-resistant as he/she can. When it enters the real world of interdependent systems and variations in system configuration (not to mention other software running on the system) all you can do is cross your fingers and hope your code is smartly designed enough to recognize problems and deal with them as well as possible.
I do not envy the remediation programmers' tasks, since they not only have to work with outdated languages, but have to do it often on or alongside live systems working with live data in real-use situations. And with programs written by who-knows-how-many other coders with who- know-what-kind of error handling (or often the lack thereof.)
I fortunately write apps for 32-bit Windows platforms. I say "fortunately" because the OS developer (MicroSloth, of course) wants to own the world and figures (quite correctly as it turns out) that if they encourage developers and support them as best they can they'll have more products tailored to their OS products.
I'd not even WANT to try my hand on a mainframe, etc. even though I could easily learn how. (I can code in assembly for microcontrollers and some processors.)
There's a lot going on. I for one am far from convinced that any of the major remediation efforts from any decent-sized company will be pulled off with even a modicum of success... Rough times are coming, folks!
OddOne, who can also make a mean webpage and don't get him started about the stungun he built!

-- OddOne (mocklamer@geocities.com), March 08, 1999.

Jack; Good catch. Robert and Jolly; Appreciate your input. OddOne; Your concluding comment is chilling. It stimulates more preparation for self-reliance. Thanks folks.

-- Watchful (seethesea@msn.com), March 08, 1999.

If architects built buildings the way software engineers write software....the first woodpecker to come along would destroy civilization.
LM

-- LM (latemarch@usa.net), March 08, 1999.

Moderation questions? read the FAQ