Y2k Metrics and Error Rates

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

At one time or another, most of us "Polly's" bring up the fact that the current systems have bugs and errors, which are fixed during normal business. And that, while Y2k may strain the current systems, we don't see any evidence that the failures will cause a systems collapse.

For the most part, this is based primarily on experience. And this opinion has been challenged at various times, due to lack of any real facts to back up the opinion.

But recently, I ran across this set of metrics, which can provide some measurements. These metrics are provided by Howard Rubin, a member of Ed Yourdon's own Cutter Consortium.

The metrics are published at More Millenium Metrics, and are a compilation of various case studies and surveys.

First, a baseline. From the reference, the current defect density in code in the US ranges from 2 to 4 per 1000 lines of code. That is, current systems have between 2 and 4 bugs for every 1000 lines of code. These bugs are what we refer to when talking of errors fixed today.

But what of Y2k? Well, the reference indicates that 3% of the current code base contains date references. Meaning 30 of every 1000 lines of code contain date references.

Of that 30, 25%, or 7.5 of every 1000 lines of code, require date expansion, and 15%, or 4.5 of every 1000 lines of code, involve date calculations. It isn't clear whether the 15% involving date calculations is a subset of the 25%; personally, my guess is it is, but to be conservative, let's assume they are totally exclusive.

What this means is that the potential for system failures and bugs in completely unremediated code due to Y2k is 12 in every 1000 lines of code, or a total 3 to 6 times higher than normal error rates.

But obviously, companies haven't spend all of this money and time doing nothing. Assume that only 75% of the potential errors are caught. This drops the error rate for Y2k down to 3 in every 1000 lines of code, or basically the same as current error rates of IT systems.

Personally, I see no evidence that in effect doubling the current errors will collapse the system. In fact, the reference also states that while the benchmark error rate in the US is 2.3 per 1000 lines of code, the benchmark rate in Canada is 5.13 defects per 1000 lines of code. But we can take it one step further. The argument has always been in regard to massive numbers of simultaneous errors. And virtually everyone concedes that the largest spike in errors will occur around the actual rollover. Gartner Group estimates that only 8% of Y2k errors will occur during rollover. Whether you agree with their figures or not, it is pretty evident that only some fraction of errors will occur then. Assume even 20%. Going back to our errors in lines of code, that leads to an increase at rollover of .6 errors per 1000 lines of code.

Will it be a "bump". Probably not. But these numbers tend to demonstrate why myself and others feel that the problems will be fixed, just as they always have.

-- Hoffmeister (hoff_meister@my-dejanews.com), May 17, 1999

Answers

Many good points have been made on this thread, but one point seems to have been overlooked: the overall volume of code that has to be remediated. Citibank, if I remember the figures correctly, has 400 million lines of code; indeed, many of the Fortune 500 companies have portfolios exceeding 100 million lines of code. If we round off your statistics to 1 bug per 1,000 lines of code, that means 100,000 bugs in an organization that has a portfolio of 100 million LOC. If we assume that 80% of these bugs are "trivial", 19% are "moderate", and only 1% are "show-stoppers" (to use a Microsoft phrase), that still leaves 1,000 show-stopper bugs.

Maybe this is unduly pessimistic, by an order of magnitude; maybe even by two orders of magnitude. What if it's only 10 show-stoppers? And instead of 19,000 "moderate" bugs, what if it's only a mere 190 moderate bugs that the organization has to cope with? Q: How many show-stoppers do you need before the show really stops? A: nobody really knows. Among other things, it will depend on how comprehensive and effective the contingency plans are.

Oh, one other thing ... what about show-stoppers in the organization's mission-critical vendors and suppliers?

To paraphrase your final remark: these numbers tend to demonstrate why I, and at least a few others, are concerned that these problems will cause some major-league disruptions.

-- Ed Yourdon (ed@yourdon.com), May 17, 1999.


An analogy occurred to me, in response to Hoffmeister's argument: if 90% of auto accidents are mere fender benders, does that mean I shouldn't bother wearing seat belts? Does it really justify someone saying "there really isn't a problem with auto accidents; if something goes wrong, it's just a localized problem"?

As someone pointed out on an earlier thread, evaluating the risks (by estimating defect density) is only half of the job. The other half is evaluating the stakes -- how much do we lose if we're wrong?

-- Ed Yourdon (ed@yourdon.com), May 17, 1999.


So, I guess the Italians have it right. Ignore Y2K except to throw a really big party. Silly us. We don't need no stinkin' new air traffic control system!

-- Doug (douglasjohnson@prodigy.net), May 17, 1999.

I think you are correct as far as you have taken your analysis, BUT what if some of the errors cause physical damage to plant or material?

A simple example: To add energy to a child's swing you must time your push so that you add energy exactly at the top of the arc of the swing when all forces are neutral. If you try to push on the swing at the bottom of the stroke while the swing is moving toward you, you could break your wrist.

A power plant example: To energize the power grid, a utility requires precise timing achieved using SCADA, satellites and telecommunications. To add energy to the grid, power must be in exact phase with the energy already in the grid. A mismatch in timing results in power being out of phase and quickly overloads and trips circuits. Trying to energize the grid with something less than complete control could very likely cause power outages that last longer than the 8 hr to 72 hrs disruption projected to fix a SOFTWARE ONLY power outage.

The same is true for other "Systems of Systems".

Peace,

-- Bill P (porterwn@one.net), May 17, 1999.


As for the power grid and synchronization, see:

Another Myth, We Need Computers to Synchronize

-- Hoffmeister (hoff_meister@my-dejanews.com), May 17, 1999.



Bill,

To energize the power grid, a utility requires precise timing achieved using SCADA, satellites and telecommunications ... power must be in exact phase with the energy already in the grid.

The entire point of the April drill was for utilities to test their ability to do this WITHOUT their SCADA systems. Lane Core and other Doomers have called that test a mere "PR stunt;" in fact, it was a significant demonstration of their ability to control the grid "by the seat of their pants," as it were, and was a success.

See the discussion of power in general at my Web site; Mark Kinsler, for example, and a great letter from a utilities guy on my email page.

-- Stephen M. Poole, CET (smpoole7@bellsouth.net), May 17, 1999.


How to put it? How about: some errors are more equal than others? :-)

A system on which I work has numerous bugs (code defects) that have been around for years, and may well continue to exist for years, because their adverse effects are so trivial that they have not been viewed as worth the effort to fix. On the other hand, date calculation errors in any one of hundreds of places in the system would make it worse than useless until fixed.

Not all bugs have comparable consequences. If, for example, we had the option of magically replacing our old bugs with new date calculation bugs at a ratio of 25 to 1, it would be a losing proposition, probably even at a ratio of 100 to 1.

Jerry

-- Jerry B (skeptic76@erols.com), May 17, 1999.


Cinergy, power provider for SW Ohio, advised that Cinergy's Contingency Plans will have a staff of 500 distributed at all points of the system (generation, transmission and distribution) starting on 12/30/1999. He said every sub-station, compressing station (Cinergy is also local natural gas provider) etc will be staffed for 24 hour coverage. All generating stations will be spinning at full load - 11,000 mega watts. He says the local December 31, 1999 demand was 3,000 mega watts for greater Cincinnati area. He advises that Cinergy will try to energize the grid to help other utilities that suffer an outage BUT this would be difficult if there is a significant disruption in telecommunications, SCADA, transportation delvieries etc.

Cinergy felt big rub could come in July 2000 when peak demand occurs. In January 2000 he said some rolling brownout would be possible IF city A lost power and cities B,C and D had to make up the shortfall.

-- Bill P (porterwn@one.net), May 17, 1999.


Correction to my post above:

The December 31, 1998 demand was 3,000 megawatts.

-- Bill P (porterwn@one.net), May 17, 1999.


Hoff -- gotta run, want desperately to dig into your numbers and assumptions, very interesting post (Poole I'm still forced to ignore because he doesn't know how to say ANYTHING interesting).

Like everything any of us post, there are some tricky assumptions, but the core of it demands serious attention: is it likely that reasonably extrapolated Y2K "enhanced" error rates (and you've given some real ones, instead of vague-ities) will collapse systems? Hope when I come back that our best technical forum minds have advanced the discussion.

-- BigDog (BigDog@duffer.com), May 17, 1999.



Jerry B has nailed it: not all bugs are created equal. Which is one of the key reasons why Y2K is statistically unforecastable.

Bill P is right too in that it may be possible that more problems could occur in the summer of 2000 than at the rollover. Dick Mills has written that more than once, as I recall.

-- Drew Parkhill/CBN News (y2k@cbn.org), May 17, 1999.


Hoff:

These metrics always give me a creepy feeling. They are often like saying if you have one foot in the oven and one foot in the icebox, then on the average you're comfortable. Show me an organization that relies on, say, a million LOC, and I'll make two bets. I bet that

1) I could introduce 10,000 errors into that code and nobody would ever care about them.

2) I could introduce one, single error that would put them right out of business before they could find it.

So what's crucial here is testing. No, testing is not going to find every error, as shown by the error rates being reported by IV&V evaluation of already-tested code. But it doesn't take a whole lot of testing to weed the howlers out of the main processing logic sequences. And it doesn't take a whole lot more to find more subtle data-corruption bugs that don't abend, especially with so many eagle eyes watching those data so carefully.

I suspect the reason Gartner is saying that 70% of residual bugs will be found and fixed in three days is because most y2k errors are howlers, and cause crashes or produce clearly hilarious results. They almost stand up and shout *here I am* at you. So collateral damage (like those damn smelting plants, or zorched databases) might be worse than the code bugs themselves. Those testing live systems are asking for trouble, and maybe some of the glitch reports we're seeing are reflections of those troubles (although most of the time we can't know this for sure).

-- Flint (flintc@mindspring.com), May 17, 1999.


Hoff. I think You/we, are forgetting 4 variables in the Y2K equation, the least (?) of which may be delibertly introduced bugs by unhappy programers , WHILE TESTING ! Also, what of the one in every 500-1000 lines of code that are unintentional ? Then, there could be a few infiltrators under foreign pay WORKING IN the "iron triangle" who introduce their bugs on or before Dec.31st, to appear on Jan. 1st,2nd or 4th ; and are ment to ADD to the expected ones, but are killers ! Finally, there is the uncontrolable, unknown factors of the cosmic storms from the sun. I was in U.S.Navy in 1946, when sun spots caused us to lose total communication with Washington , D.C. for nine days while we were at Thule, Greenland with CB's , building an airstrip there. We could get NOTHING coming in but some radio station in the south playing records. Mother Nature WILL have her way when SHE WANTS TOO ! Eagle

-- Hal Walker (e999eagle@freewwweb.com), May 17, 1999.

Hoff's "statistical analysis" ignores two things: one has already been pointed out, which is that not all Y2K problems are created equally, and trying to account for power outages, food shortages, contaminated water, etc., kind of makes it tough to apply the metrics.

The other thing that is ignored, which pollys seem to always do, is TIME. Computer bugs have heretofore occured with some statistical frequency distribution or other, but a huge number of Y2K bugs are expected to occur ALL AT ONCE, thereby causing MULTIPLE, SIMULTANEOUS problems in PARALLEL. (Note, I do not actually expect Hoff or any other polly to understand what I have just written. They never have, they never will. This is for the non-pollys who are really trying to understand what makes Y2K so different.)

Actually, this itself is just a special case of the argument that since "complex systems adapt" -- to include how people manage to endure all kinds of hardship such as wars, famine, depression, etc. -- then we all make it through Y2K just fine, thank you, because we are so doggone adaptable. This of course ignores the important point that complex systems need TIME to adapt. (Infomagic did a great treatment of this, somewhere in Cory Hamasaki's D.C. Weather Reports of old. Sorry I can't ref the specific issue....)

-- King of Spain (madrid@aol.com), May 17, 1999.

If it will be so trivial - then go test it.

Pick a city, a county, and a state - hell get a whole country involved - New Zealand, Iceland or Scotland might volunteer; and most cities in the US apparently are not finished remediation, and push its dates ahead, then find and fix what fails.

Evidently, the problems will be trivial, and so nobody will mind what ever failures occur. The failures that are found will be instructive, to say the least.

You're trying to out-logic failures and errors - you can't. The programs out there are already debugged and are deeply entrenched - they are not subject to this kind of logic, since the failures subject to year 2000 are not the kind (as noted above) that are equal to "left overs" from exisiting earlier debugging efforts.

-- Robert A. Cook, PE (Kennesaw, GA) (cook.r@csaatl.com), May 17, 1999.



First, a general response.

No, not all bugs are created equal. But Y2k errors tend to run then gamut from the trivial (reports not displaying dates correctly) to the catatstrophic, as do normal bugs. In fact, especially in date calculations, Y2k errors do not tend to be subtle errors, but extreme errors that are readily observed.

To Ed Yourdon

The point was volume of errors, but relative error rates. Using your numbers, corporations with 100 million lines of code already deal with, on average, 200,000 to 400,000 bugs, disregarding Y2k. Applying your argument, it would appear we have no chance of surviving today. Yet we do, quite well, actually. The same argument applies down through suppliers and vendors. How do you explain the current funtioning systems?

To King

It's not my analysis, but Howard Rubin's.

Yes, the concern has always been with basic infrastructure, which is why I spend so much time with utility reports. But even the most pessimistic observers are starting to move away from widespread power failures, etc.

To Robert

This post addressed IT systems, not embedded systems, etc. I'm not trying to "outlogic" failures, just trying to quantify them. And no, nothing thus far from an IT perspective has made me think Y2k errors are some how "extraordinary". In fact, the majority of Y2k errors are downright trivial. Finding them is the problem.

-- Hoffmeister (hoff_meister@my-dejanews.com), May 17, 1999.


Hoff -

ref your comments "To Ed Yourdon

The point was volume of errors, but relative error rates. Using your numbers, corporations with 100 million lines of code already deal with, on average, 200,000 to 400,000 bugs, disregarding Y2k. Applying your argument, it would appear we have no chance of surviving today. Yet we do, quite well, actually. The same argument applies down through suppliers and vendors. How do you explain the current funtioning systems? "

My reply would be that they have eliminated these errors already - in the systems running there jobs in the Fortune 500, these have been already been eliminated from all *routine* tasks over the past 25-30 years.

Y2K throws these stable legacy systems - and all tehy affect - back into simultaneous turmoil - which can be very hard to recover from. The errors "buried" before - the ones NOT causing problems now (dead code, no exit loops, no logic, bad logic, etc.) that have never been triggered before - may come back up suddenly as end conditions changes.

Routine operations done very month for 20 years - "won't run" all in a sudden in Jan, Feb, MAr next year. If they really are present, the 200,00 hidden errors offer 200,000 chances for sudden failure.

-- Robert A. Cook, PE (Kennesaw, GA) (cook.r@csaatl.com), May 17, 1999.


Rubin's analysis reeks of techno-elitism. That's what I call it when the uppity-ups in software management start waving their hands and belittling the severity of a problem or the level of effort required for a new task. These folks see complexity of systems increasing in a linear fashion: 1,000,000 LOC last year; 1,100,000 this year; 1,200,000 next year. The grunts in the trenches, those that have been with the system for a while, understand the real picture: complexity is increasing exponentially. And not Rubin, nor Hoff, nor any other pollyanna can change the fact that when these systems start malfunctioning in three, four, five times as many ways as normal... Murphy will be the law of the land.

-- a (a@a.a), May 17, 1999.

Hoff: Are you aware of what MONTH, DAY and YEAR that we are having this discussion??? You -- and all the other pollys -- act like the year is 1995 or something. There is nobody out there of any size or significance that is saying "We are ready for the year 2000" (with the obvious qualification that everyone they depend on must also be ready).

The problem is really, and always has been, TIME -- the immovable, fixed deadline that makes Y2K different from any other problem ever encountered. Errors that will suddenly occur at about the same time, throughout all industries. This is hardly the "business as usual" bug-fixing that you envision.

-- King of Spain (madrid@aol.com), May 17, 1999.

Sorry, Robert, but there is absolutely no reason to assume that Y2k will somehow suddenly activate latent bugs.

Again, existing errors run the gamut from trivial, to those that are worked around, to those that may be intermediate, and finally those that cause system failure. What will Y2k do to enhance these existing errors?

-- Hoffmeister (hoff_meister@my-dejanews.com), May 17, 1999.


Geez, 'a', sorry.

I found these metrics while researching some other info. Thought that, since Rubin was a member of Yourdon's Cutter Consortium, and also the author of the latest survey being thrown around from the NY Times, that we could at least agree the numbers were at minimum unbiased, if not pessimistic.

If you have something better, why don't you post it?

-- Hoffmeister (hoff_meister@mt-dejanews.com), May 17, 1999.


"No, not all bugs are created equal. But Y2k errors tend to run then gamut from the trivial (reports not displaying dates correctly) to the catatstrophic, as do normal bugs. In fact, especially in date calculations, Y2k errors do not tend to be subtle errors, but extreme errors that are readily observed."

Huge misinterpretation, or confusion, of errors and their appearances. i.e. their symptoms. Even when their symptoms are obvious, the underlying errors, the bugs that need to be found and fixed, will often be quite obscure.

In ordinary, day to day, practice, when a strange symptom appears, one looks at what code has recently been changed. In most, but not all, cases, this will lead to the culprit. But in the case of Y2K bugs, this SOP will be of little avail. On the one hand, large numbers of code changes have been made, and on the other, the culprit may be among the huge numbers of LOC that have not been changed.

Jerry

-- Jerry B (skeptic76@erols.com), May 17, 1999.


Jerry, quite right, SOP for errors is to examine recently modified code.

But I'd submit, while not quite equal, certainly analogous is that errors that occur on rollover will be almost exclusively related to dates. Which points the debugger in the right direction, as does recently modified code.

-- Hoffmeister (hoff_meister@my-dejanews.com), May 17, 1999.


I strongly disagree, Sir Hoff - the most errors I've found have been when legacy code gets "touched" to make a change or improvement - many times to functions that have correctly operated for years.

Starting over is much, much easier than starting in the middle.

-- Robert A. Cook, PE (Kennesaw, GA) (cook.r@csaatl.com), May 17, 1999.


Robert, you are referencing what I refer to as "residual errors", or errors introduced through modifications.

But, the vast majority of these will not be encountered at rollover, but when the remediated systems are placed back into production. And these can and are being addressed over an extended time frame.

-- Hoffmeister (hoff_meister@my-dejanews.com), May 17, 1999.


"Again, existing errors run the gamut from trivial, to those that are worked around, to those that may be intermediate, and finally those that cause system failure. What will Y2k do to enhance these existing errors?"

Hoffmeister,

Tell us you didn't write that; tell us your evil twin brother wrote that; even tell us the devil made you write that, but don't tell us to take that query seriously.

But I will answer it nevertheless:

Y2K will introduce a large quantity of errors in a short span of time.

Surprise! (Sorry, I just couldn't resist.)

Jerry

-- Jerry B (skeptic76@erols.com), May 17, 1999.


Granted - to be cleared also if legacy testing is adequately done.

So then, why do you believe 200+ thousand errors exist already - in system running continuously to support ops - the supposed critical error have been eliminated, right? Else they would not be running.

So, you cannot extend logically the maintenance situation now - no critical errors in operating - to the emergency repair situation probable in un-remediated Y2K -affected systems. In emergency repair, many simultaneous problems will hide each other, and be hid by minor and cosmetic problems. But my experience (in trouble shooting) tells me that the symptom of the critical failure will be hidden in the cosmetic failure - and the cosmetic failure will be more evident than the critical failure "below it".

Either wy, its not routine, not comparable to any routine current computer problem - in unremediated systems.

In repaired systems, Y2K - if services remain up in the area - the problems may be no more complex that those occurring after a simultaneous new install of the operation systems, the hardware, and the program itself, and the data.

Easy.

-- Robert A. Cook, PE (Kennesaw, GA) (cook.r@csaatl.com), May 17, 1999.


Jerry, yes, Y2k will introduce new errors. The original post was an attempt to quantify them.

But again, how does Y2k increase or enhance or activate existing errors?

-- Hoffmeister (hoff_meister@my-dejanews.com), May 17, 1999.


"But I'd submit, while not quite equal, certainly analogous is that errors that occur on rollover will be almost exclusively related to dates. Which points the debugger in the right direction, as does recently modified code."

That direction would be narrow, or relatively specific, only in systems with very few date calculations. Many systems are riddled with date calculations, and in those, the direction would, in 2D terms, be 360 degrees.

Jerry

-- Jerry B (skeptic76@erols.com), May 17, 1999.


Perhaps I haven't understood something here, but I don't see how you can jump to Y2k outcome conclusions based only on 'defect density' in remediated code. What about those who started too late to complete remediation? I would think that the so-called 'defect density' would have to include remaining unremediated code to predict outcome. Numbers anyone?

-- David Binder (dbinder@sympatico.ca), May 17, 1999.

Yes, Jerry, but the actual erroneous result can in all likelihood be traced to relatively few date calculations.

-- Hoffmeister (hoff_meister@my-dejanews.com), May 17, 1999.

It looks like we are getting into an infinite loop here, and it's past time for me to be knocking out Zs.

Good night all!

Jerry

-- Jerry B (skeptic76@erols.com), May 17, 1999.


No Jerry - we've got two DNA-style infinite loops going at the same time, with both conversations twisting around each other in reverse order, answered by the wrong person to other question being responsed to by the other wrong person out of order to the original reply from the original wrong person.

Now, isn''t that more clear?

-- Robert A. Cook, PE (Kennesaw, GA) (cook.r@csaatl.com), May 17, 1999.


Echo Jerry. Will check back in AM.

But to David, the original post dealt with unremediated systems. Remember, as well, that many systems have been installed that require no remeidation. (Can't go a whole thread without throwing a sop to SAP, can I?)

-- Hoffmeister (hoff_meister@my-dejanews.com), May 17, 1999.


Robert,

Absolutely perspicuous! :-)

Hoffmeister,

Today I'll be busy prepping for a trip tomorrow. I shouldn't even be taking the time to post this note. If I get ahead today, I will get back here tonight, else tomorrow night.

Jerry

-- Jerry B (skeptic76@erols.com), May 18, 1999.


Meetings all day in the real world. Have printed out thread. Also hope to join in tonight or tomorrow before heading off to DC/South Carolina. Appreciate the give-and-take, this is almost as important to me as the NWO :-)

-- BigDog (BigDog@duffer.com), May 18, 1999.

Flint,

Some time ago you posted in another thread on this forum an excellently clear and consise description of some standard debugging practices and why they will not work as usual for Y2K bugs after 12-31-1999. I cannot now find it. If you recognize that description to which I so inadequately refer, and if you remember where to find it, please post it in this thread. I think it may be very helpful.

Thanks in advance.

Jerry

-- Jerry B (skeptic76@erols.com), May 18, 1999.


Robert, you said:

So then, why do you believe 200+ thousand errors exist already - in system running continuously to support ops - the supposed critical error have been eliminated, right? Else they would not be running.

Critical errors are continually introduced into systems, usually but not always through system modifications. New system implementations also contain a disproportionately higher rate of critical errors.

It is simply wrong to assume that the current defects contain no critical errors.

-- Hoffmeister (hoff_meister@my-dejanews.com), May 18, 1999.


I want to call out the points that struck me, so it will be clear where I'm coming from with my own comments subsequently.

Act One:

Hoff's original question seemed to be, "assuming a doubling of the current error rate, will that collapse the system?" Because Hoff finds that level in some systems broadly (e.g., Canada) and does not believe system reintroductions are fundamentally simultaneous, he concludes that Y2K will not be a bump but will not be a disaster either. It isn't all fixed yet, but it is fixable, both now and after rollover as need be (and, hey, there is always that wee little SAP stuff you can bring in, ya know).

Act Two:

Jerry introduced the issue of "different bugs, different consequences".

Ed's point is that the sheer volume of remediated code would seem to indicate a large number (not relatively but absolutely) of show-stoppers (up to 1,000 in a large enterprise), not to mention show-stoppers among that enterprise's own critical vendors.

The King of Spain introduced "time" and asserted the general simultaneity of errors.

Cook argued that Y2K-related errors will be appearing throughout legacy systems that are already debugged and are not of the same "kind" as those long-squished critters.

Act Three:

Hoff conceded the principle that bugs are of different impact types. Generally, he sees Y2K bugs themselves as overwhelmingly trivial in nature with the difficulty in finding them the real crux.

He asked, if Y2K will have so many show-stoppers per enterprise, why don't enterprises grind to a halt today from the many show-stoppers one would expect using the same reasoning that Yourdon used (but clearly the systems are functional enough)?

Act Four:

Cook retorted that the showstoppers have been gradually eliminated in the legacy systems over 25 years or more. Not only will remediated code have its own show-stoppers showing up more-or-less at once (compared to 25 years), but will expose problems (other potential show-stoppers) that were dormant until remediation coding unintentionally re-exposed them.

"a" argued that addition of noise to an already overloaded support system is not a linear problem but an exponential complexity burden. By implication, Hoff's original concession of a "twice as normal" noise level would therefore constitute a three, four or five times greater complexity hit to the existing systems.

Act Five:

Hoff insists that there is no "reason" to assume that Y2K will activate latent bugs.

Act Six:

Jerry argues that while Y2K symptoms may be obvious, as Hoff claimed (ie, very weird bad results), this doesn't mean that the underlying bugs are necessarily easy to find and fix.

Cook states that, in his experience, disturbances to legacy systems (with respect to patches or outright enhancements) provoke many more errors than would be predicted in the abstract, so much so that starting over is often preferable. "The symptom of the critical failure will be hidden in the cosmetic failure."

My response:

Granting, as Hoff does, that the Y2K-induced noise level will be 2X greater at least, the crux of the matter seems to be:

 will the found Y2K bugs expose dormant legacy problems and/or fail to reveal critical bugs beneath or around the ones found?

 how many show-stoppers are likely and are they any different in type, number or simultaneity than what we currently slop around with?

 is the added 2X noise level linear and comparable to other known system levels or are we adding near-unbearable complexity to systems that are already loaded-for-bear when it comes to complexity (ie, they are maxed out with respect to brittleness)?

How about, "we can't be sure of the answers to any of those three issues"?

But, lest I weasel, I think the first and third, while quite possible, are more speculative than the second. It seems undeniable that there WILL BE show-stoppers from the huge volume of remedial work, no matter how trivial the errors (and Ed tried to come up with a very low relative percentage of serious errors).

Hoff seemed to respond by saying that we already deal with show-stoppers successfully. Indeed. Here is where the simultaneity and dormant or hard-to-find legacy breakage speculation comes back into play. Even granting that a fair bit of remediated code is being put back into production throughout 1999, there is still a whole world of stuff that remains to be released more or less "at once", certainly when compared to three decades of gradual release of fixes.

Historically, we squeeze show-stoppers to the absolute minimum before we release production code, whether new or primarily maintenance. Y2K doesn't afford us that luxury. Show-stoppers AS DEFINED ON THIS THREAD so far will be FOF, which is always our worst IT nightmare in the best of times, and in my experience, not typical. Unexpected breakage, yes. Endless need for work-arounds. Sure. Show-stoppers? More than one or two show-stoppers and, uh, the show actually does stop.

As Flint said way up above, one show-stopper can stop a system in its tracks, let alone a dozen, a hundred or a thousand.

It would seem that the best counter-argument would be one of arguing that there simply won't BE any show-stoppers or their number will be vanishingly small and to show why this is plausible.

Now for something entirely different, or is it? Without meaning to reintroduce sterile debates about interface corruption in financial systems, let's broaden the issue to data interfaces broadly.

Arguably, Y2K's technical uniqueness touches more closely on the requirement to AND the impossibility of substantially testing interfaces between systems and within-across entire industry sectors to ensure data integrity. (I'm thinking primarily here of enterprises communicating with one another, not systems within a single enterprise, though it is doubtful how much testing is being conducted even at that more-or-less micro level).

Yes, it may (without testing, we can't know) turn out that "bad guy data" is turned back at the system door. That's the best case. Even so, rejection of data introduces not only the requirement to fix the interfaces and, now, test them but also to keep the business functional without the needed data itself. Hideous FOF. Worst case is indeed major database corruption: ultimate impact probably varying from trivial to catastrophic.

-- BigDog (BigDog@duffer.com), May 19, 1999.


Preliminary assumptions:

1. Y2K code errors will run the gamut from trivial consequences through very serious consequences, including some data base corruption.

2. Y2K code errors will run the gamut from easy to diagnose (i.e. trace back from the symptom(s) to the cause) to very difficult to diagnose.

Here's a summary of how I expect Y2K to effect the mean time to repair of applications software.

1. Unlike other code errors, the activation of Y2K bugs will cluster in the months near 1-1-2000, so the FOF bug pruning process to which other bugs have been subjected for years has barely started.

2. Therefore, the distribution mentioned in assumption 1 is different for Y2K bugs than for other bugs in old code, since the serious other bugs in old code are more likely to have been FOFd long ago. This is not to say that all serious non-Y2K bugs have been fixed; it is saying that the seriousness distribution curve of old non-Y2K bugs has been shortend and flattened over the years.

3. Due to the clustering of the activation of Y2K bugs, multiple symptoms of multiple bugs will confuse and mislead diagnostic efforts.

4. Some standard diagnostic techniques will be of little help after 12-31-1999: (Flint posted a good description of this aspect in another thread, but I can't find it, so I'll give my not so good description.)

4a. Running old versions of software to see if the problem "goes away" will not be an option.

4b. Checking the most recent code changes is problematic because:

4b1. A large amount of code has been changed recently.

4b2. The bug may be in code that has not changed in many years.

5. The above will increase the mean time to repair of most individual bugs, and coupled with the clustering, will greatly increase the mean time to repair of applications containing multiple bugs, of which there are likely to be many.

Jerry

-- Jerry B (skeptic76@erols.com), May 19, 1999.


BigDog,

That real world does have its way of distracting us from interesting stuff. :-)

Good summary of the thread!

Jerry

-- Jerry B (skeptic76@erols.com), May 19, 1999.


BigDog, pretty decent summary. I won't nitpick on a couple of items.

My response:

Granting, as Hoff does, that the Y2K-induced noise level will be 2X greater at least, the crux of the matter seems to be:

 will the found Y2K bugs expose dormant legacy problems and/or fail to reveal critical bugs beneath or around the ones found?

_ how many show-stoppers are likely and are they any different in type, number or simultaneity than what we currently slop around with?

 is the added 2X noise level linear and comparable to other known system levels or are we adding near-unbearable complexity to systems that are already loaded-for-bear when it comes to complexity (ie, they are maxed out with respect to brittleness)?

How about, "we can't be sure of the answers to any of those three issues"?

But, lest I weasel, I think the first and third, while quite possible, are more speculative than the second. It seems undeniable that there WILL BE show-stoppers from the huge volume of remedial work, no matter how trivial the errors (and Ed tried to come up with a very low relative percentage of serious errors).

Agreed, to some extent. I see no reason that Y2k errors will somehow expose dormant bugs. Dormant bugs exist because the logic paths are not used, and relate to business processes, not whether a date is 1999 or 2000.

Hoff seemed to respond by saying that we already deal with show-stoppers successfully. Indeed. Here is where the simultaneity and dormant or hard-to-find legacy breakage speculation comes back into play. Even granting that a fair bit of remediated code is being put back into production throughout 1999, there is still a whole world of stuff that remains to be released more or less "at once", certainly when compared to three decades of gradual release of fixes.

Historically, we squeeze show-stoppers to the absolute minimum before we release production code, whether new or primarily maintenance. Y2K doesn't afford us that luxury. Show-stoppers AS DEFINED ON THIS THREAD so far will be FOF, which is always our worst IT nightmare in the best of times, and in my experience, not typical. Unexpected breakage, yes. Endless need for work-arounds. Sure. Show-stoppers? More than one or two show-stoppers and, uh, the show actually does stop.

As Flint said way up above, one show-stopper can stop a system in its tracks, let alone a dozen, a hundred or a thousand.

It would seem that the best counter-argument would be one of arguing that there simply won't BE any show-stoppers or their number will be vanishingly small and to show why this is plausible.

True show-stoppers, while a major pain in the butt, tend to be quite easier to diagnose and fix. The problem isn't hidden, but ends to jump out and bite you, whether looking or not.

Systems remediated for Y2k tend to undergo more testing than normal maintenance. (Probably not enough testing, as I don't think I've ever done enough testing). As you indicated, we tend to minimalize these during normal times, and for systems remediated and tested, I would truly expect fewer still. For them to dramatically increase, either the system must be "missed" as critical, or not completed. This seriously reduces the actual exposure.

Also, the 2X number is for Y2k errors in total. The number of potential simultaneous errors may be open to debate, for example the 8% predicted by the Gartner Group. My original post was pretty conservative, using a 20% figure. That leaves the .6 bugs per 1000 LOC as the starting point, before applying percentages of "show-stoppers".

And once again, using the "volume" of code is misleading. The same analysis can be applied to existing bugs, which again would leave the impression that funtioning systems today are impossible.

Now for something entirely different, or is it? Without meaning to reintroduce sterile debates about interface corruption in financial systems, let's broaden the issue to data interfaces broadly.

Arguably, Y2K's technical uniqueness touches more closely on the requirement to AND the impossibility of substantially testing interfaces between systems and within-across entire industry sectors to ensure data integrity. (I'm thinking primarily here of enterprises communicating with one another, not systems within a single enterprise, though it is doubtful how much testing is being conducted even at that more-or-less micro level).

Yes, it may (without testing, we can't know) turn out that "bad guy data" is turned back at the system door. That's the best case. Even so, rejection of data introduces not only the requirement to fix the interfaces and, now, test them but also to keep the business functional without the needed data itself. Hideous FOF. Worst case is indeed major database corruption: ultimate impact probably varying from trivial to catastrophic.

Yes, interface testing is important. But interfaces are one of the primary reasons for using "windowing". If you leave the file definition alone, you are in essence "Fixing in Place"; the results can be compared to previous results, with the exception of dates.

I can envision no circumstance where file definitions are changed, without testing with the external partner. Finally, to Jerry on diagnosis. Yes, the normal method of checking recently modified code is not there. But again, errors occurring on rollover are the main topic, and will almost exclusively be caused by invalid date handling. And again, these types of errors do not tend to be subtle, and my guess is they will fairly easily be found.

-- Hoffmeister (hoff_meister@my-dejanews.com), May 19, 1999.


From above:

Hoff. "See no circumstances where file formats would not be changed without testing with external partners."

.....for "Intentially" changed data formats, that is certainly true - it is the unintential changes (from an external partner or internal business "partner", or from the IT department sending data out that is unintentially re-formatted or changed) that could disrupt things considerably. Again, and as usual, adequate testing will find and allow a department to eliminate these "unintended" results.

BUT - the rate of increasing delays in Y2K remediation (now 92% are late), and the far larger number of big and small businesses falling behind even their pessimistic schedules (20 + percent could fail to meet the Jan deadline) will mean complete and thorough testing (both ends) will become increasingly rare.

"...Agreed, to some extent. I see no reason that Y2k errors will somehow expose dormant bugs. Dormant bugs exist because the logic paths are not used, and relate to business processes, not whether a date is 1999 or 2000. ..."

Maybe yes - maybe no. The radical nature of the change in the date itself, and the many permutations in the change in date as different things get changed at diffrent times and in different revisions (99 to 00), 1999 to 00, 99 to 2000, 20 to 19, 20 to 99, 1999 to 2000, etc. as well as simple errors in translation of data (first two digits in the zip code become last two digits of SSN - and other stupid errors) - will expose dozens, hundreds, thousands ? of "new" loops and dead-end coding decision trees that simply have never been exercised before.

Many simple routines previously running fine - and still capable of running fine if "perfect data" were supplied - will fail in wierd ways. So, a troubleshooter, who "knows" the simple routine didn't use dates, won't look there - and thus extend debugging times until the real problem is found.

Don't ignore hardware problems either as a delaying impact.

-- Robert A. Cook, PE (Kennesaw, GA) (cook.r@csaatl.com), May 19, 1999.


Hi, Hoff. Normally, I'd never stick my nerve-knot in an intense thread such as this, but I have a coupla questions for you since system installation/conversion is your bag.

I was involved in a Y2K conversion, not repair, for a 100m subsidiary of a huge elec. manufacturer which you may have provided SAP consulting to in PA. Went from a package called GrowthPower to one called BPCS. I did the analysis and programming to output datafiles that could be shoved into DB2.

Two things that were particularly vexing: the item-number field went from X(18)(legacy) to X(15)(new), creating lots of duplicate keys that had to be renamed via (bleechh) hard-coding. Also, the PO# field went from X(12) to numeric 6, requiring a renaming (renumbering) of all the POs. Also, most A/R document numbers (invoices, credits) had to be renumbered. You can imagine, right?

Anyway, my question to you is this: with all EDI file structures having to be completely re-worked, with both customers and vendors, do you suppose this could be why we're not seeing more readiness claims? Are those who are about to go live on new systems still trying to test EDI processing, you think?

Is this what's holding up the show? From the field, what is your observation? Where I work, we have vendors who are migrating/converting to new systems any day now who are requiring complete EDI re-works. Many re-write projects not started yet, with 7 months left to go.

Any word on status of the Big Three regarding vendor EDI interfaces?

-- Lisa (lisa@work.now), May 19, 1999.


To Robert:

Hoff. "See no circumstances where file formats would not be changed without testing with external partners."

.....for "Intentially" changed data formats, that is certainly true - it is the unintential changes (from an external partner or internal business "partner", or from the IT department sending data out that is unintentially re-formatted or changed) that could disrupt things considerably. Again, and as usual, adequate testing will find and allow a department to eliminate these "unintended" results.

Sorry, Robert, just don't buy it. My main area in IT is interfaces, and it is virually impossible to somehow "unintentionally" change an interface file definition. Theoritically, I suppose the use of a Data Dictionary could cause this to occur, but external interfaces are well-defined and known.

Even if it did occur, the error would be caught as soon as re-implemented into production, and not occur on rollover.

BUT - the rate of increasing delays in Y2K remediation (now 92% are late), and the far larger number of big and small businesses falling behind even their pessimistic schedules (20 + percent could fail to meet the Jan deadline) will mean complete and thorough testing (both ends) will become increasingly rare.

Testing of interfaces is a normal part of the process, and does not require "full integration" testing. As above, this is again a primary reason for windowing; allows interfaces to be tested even with systems not fully remediated.

"...Agreed, to some extent. I see no reason that Y2k errors will somehow expose dormant bugs. Dormant bugs exist because the logic paths are not used, and relate to business processes, not whether a date is 1999 or 2000. ..."

Maybe yes - maybe no. The radical nature of the change in the date itself, and the many permutations in the change in date as different things get changed at diffrent times and in different revisions (99 to 00), 1999 to 00, 99 to 2000, 20 to 19, 20 to 99, 1999 to 2000, etc. as well as simple errors in translation of data (first two digits in the zip code become last two digits of SSN - and other stupid errors) - will expose dozens, hundreds, thousands ? of "new" loops and dead-end coding decision trees that simply have never been exercised before.

Many simple routines previously running fine - and still capable of running fine if "perfect data" were supplied - will fail in wierd ways. So, a troubleshooter, who "knows" the simple routine didn't use dates, won't look there - and thus extend debugging times until the real problem is found.

Yes, error routines are an area that may be exposed. But again, being error routines, the processing would probably stop anyway. Fixing the bug is obviously still required.

-- Hoffmeister (hoff_meister@my-dejanews.com), May 19, 1999.


Lisa, I'd hate to attribute current status to any one factor. But yes, I've seen quite a few companies issuing edicts that all partners will use, for example, the 4010 standard by such and such a date.

I have some mixed feelings on whether or not this is a "good" thing. Obviously, long-term, it is the way to go. Short term, it is causing an extra load to be delivered.

-- Hoffmeister (hoff_meister@my-dejanews.com), May 19, 1999.


Hoffmeister,

"Yes, the normal method of checking recently modified code is not there."

So far, so good.

"But again, errors occurring on rollover are the main topic, and will almost exclusively be caused by invalid date handling."

Oddly redundant, as if replying to some other post discussing some other topic, or, to mention a skeptic's possible view, perhaps it might be a suggestion to casual readers that the post to which it replies was not addressing the subject at hand.

"And again, these types of errors do not tend to be subtle, and my guess is they will fairly easily be found."

Statements like that might lead someone to imagine that you might be something of a "polly". :-) (Now that was my evil twin brother writing.)

As you might imagine, such opinions as "these types of errors do not tend to be subtle" and "my guess is they will fairly easily be found" are clearly regarded as implausible by some people hereabouts, but also by some other people whose actions might tell us more than words.

Certainly, my disagreement with such opinions would be insignificant to someone who does not personally know me and know about my experience.

Disagreement with such opinions by anyone on this forum might be discounted as having been unduly influenced by pessimists on this forum or on other fora on which Y2K concerns may commonly be expressed.

But, and I hope this may be evident even to an optimist, the actions of the managements of companies and governments that are spending very large sums of money to remedy Y2K bugs in advance are clear signs of rejection of such opinions by people who: 1. are not likely much to be influenced by fora such as this, 2. would rather spend that money elsewhere, and 3. did not get to their current management positions by being pessimists.

But perhaps even more clearly, such opinions are clearly regarded as implausible by the managements of companies which are preparing: 1. to change vendors based on such managements' perceptions of such vendors' Y2K readiness as of mid 1999, and/or 2. to stockpile inventory this year.

Question your assumptions.

Jerry

-- Jerry B (skeptic76@erols.com), May 19, 1999.


Well, Jerry, I do believe I see those individual strands of straw being carefully placed. Why yes, I think I see the shape of a man. Gotta admit, you're more subtle than most. But let me save you the trouble.

No, Jerry, I'm not of the opinion that Y2k should have been ignored. Had nothing been done, I would most definitely be on the doomer side of this discussion.

But that was not the point of this post. The point behind Y2k remediation is to either remove potential errors, or at the least get the potential errors to a manageable number. That, Jerry, was the point of this post. Are the number of errors that can reasonably be expected, following remediation efforts, manageable?

Whether or not Y2k errors are blatant and easily found when they occur is really beside the point, if the sheer volume overwhelms an organization.

-- Hoffmeister (hoff_meister@my-dejanews.com), May 20, 1999.


Ah - to answer that question- in general - Yes.

Once remediated, the number of errors "left" in place should definitiely be manageable.

If not remediated, the number of errors "left" in place will probably be the same as if no action were taken (obviously!) - hence a company could need as much time as was spent (on average) for remediation and testing (14-16 months, perhaps more.)

Actually, assuming the un-remediated company could survive the first 2 months, it would probably only need 3-4 additional months to get somewhere back to steady state: though at much reduced efficiency (assuming all the original programs were important - not at all a given!), and assuming that production and analysis capability to predict the next 4-6 months were adequate.

Laso, expect many of the un-remediated companies to have failed, or closed non-comliant divisions, so the competition is reduced, adn the workload on IS and office management is reduced.

-- Robert A. Cook, PE (Kennesaw, GA) (cook.r@csaatl.com), May 20, 1999.


Lisa, to your question:

From http://biz .yahoo.com/prnews/990520/oh_sterlin_1.html

Thursday May 20, 10:02 am Eastern Time

Company Press Release

SOURCE: Sterling Commerce, Inc.

Wal-Mart is Using Sterling Commerce For Year 2000 Testing and Certification Program

COMMERCE Y2K Service Will Help Ensure Compliance for Wal-Mart Suppliers

COLUMBUS, Ohio, May 20 /PRNewswire/ -- Sterling Commerce (NYSE: SE - news) today announced that Wal-Mart Stores, Inc. (NYSE: WMT - news) is using COMMERCE Y2K, Sterling's comprehensive Year 2000 testing and compliance solution for electronic commerce-based business communities.

COMMERCE Y2K has been providing testing and compliance services to help assure that suppliers meet Wal-Mart's Year 2000-related requirements for the electronic submission of purchase orders.

Wal-Mart requires that its suppliers use ANSI ASC X12 version 4010 (which supports the four-digit year field ``2000'') for all electronic data interchange (EDI) transactions. Sterling Commerce is working with suppliers on Wal-Mart's behalf to assess their Year 2000 readiness and offer data transformation to 4010 from earlier versions to suppliers unable to meet the requirement on time.

``An uninterrupted exchange of information with our suppliers is key to a smooth Year 2000 transition,'' said Charlie McMurtry, vice president of applications development for Wal-Mart Stores, Inc. ``We selected Sterling Commerce for this important program because of their business process knowledge and track record in commerce community management.''

Other major Sterling Commerce customers using COMMERCE Y2K include JCPenney, a major retailer, Toys 'R' Us, a leading toy retailer, Cummins Engine Co., Inc., a global manufacturer of diesel engines and power systems, Office Depot, the world's largest seller of office products, and Fred Meyer Stores, a leading food and drug retailer.

Wal-Mart has been using Sterling Commerce for E-commerce community management services internationally, and for secure, reliable data movement between Wal-Mart stores and suppliers worldwide.

``We are helping customers and their suppliers understand and resolve a host of E-commerce-related Year 2000 issues,'' said Paul L.H. Olson, executive vice president, Sterling Commerce. ``We're pleased that Wal-Mart, the world's largest retailer, has asked us to assist them with this critical activity.''

About Sterling Commerce

Sterling Commerce is recognized as the worldwide leader in providing Internet E-commerce solutions for the Global 5000 and their commerce communities. The company is one of 40 companies included in the Dow Jones Internet Services Index. The company is focused on providing E-commerce solutions through its COMMERCE, CONNECT and GENTRAN product families to address Extranet Management, Business Process Integration, Community Management, Infrastructure and Outsourcing. Sterling Commerce has been providing E-commerce solutions for 25 years, and has 2,500 employees, 37 office locations and more than 40 distributors worldwide. With more than 45,000 customers worldwide, the company had 1998 revenues of more than $490 million.

For more information, visit the Sterling Commerce Web site at www.sterlingcommerce.com.

Wal-Mart Stores, Inc. operates more than 2,400 stores and 450 Sam's Clubs in the United States. Internationally, the company operates more than 720 units. Wal-Mart employs more than 815,000 associates in the United States and 135,000 internationally. In 1998, the company raised and donated more than $127 million for charitable organizations.

All products or company names mentioned are used for identification purposes only, and may be trademarks of their respective owners.

-- Hoffmeister (hoff_meister@my-dejanews.com), May 20, 1999.


Thanks, Hoff, although I'm surprised that EDI would be outsourced.

I mean, forcing, er, encouraging vendors to hit a layout correctly is one thing - understanding the (possibly new) systems that generate the data - possibly a "re-engineered" business - is another.

Actually, have to retract that I'm surprised. An organization (even as big as Wal-Mart) could never corral the thousands of EDI vendors and make the ANSI conversion work.

I'm going to look at the rest of Sterling's customer roster, if it's listed. Thanks again.

-- Lisa (lisa@work.now), May 20, 1999.


I should add that this is a rather late date to be awarding the contract?

-- Lisa (lisa@work.now), May 20, 1999.

Here's their posted customer roster, which includes Social Security, DOT, and a few other impressive names.

-- Lisa (back@on.the_air_again), May 20, 1999.

Hoffmeister,

I can appreciate your grasping at straws, but perhaps you are simply beginning to see what are in clear view to some others: the strands of your position unraveling. :-)

In one post you wrote: "But Y2k errors tend to run then gamut from the trivial (reports not displaying dates correctly) to the catatstrophic, as do normal bugs." But then you continued: "In fact, especially in date calculations, Y2k errors do not tend to be subtle errors, but extreme errors that are readily observed."

I pointed out that: Even when their symptoms are obvious, the underlying errors, the bugs that need to be found and fixed, will often be quite obscure.

To which you replied: "But I'd submit, while not quite equal, certainly analogous is that errors that occur on rollover will be almost exclusively related to dates. Which points the debugger in the right direction, as does recently modified code."

However, as I replied: Many systems are riddled with date calculations, and in those, the direction would, in 2D terms, be 360 degrees.

I then posted a point by point summary of reasons to expect extended mean times to repair of Y2K bugs as their activation rates peak.

Your reply to that post simply ignored the serious issues and culminated with: "But again, errors occurring on rollover are the main topic, and will almost exclusively be caused by invalid date handling. And again, these types of errors do not tend to be subtle, and my guess is they will fairly easily be found."

I then very directly questioned your "my guess is they will fairly easily be found" and the key line of your reply is: "Whether or not Y2k errors are blatant and easily found when they occur is really beside the point, if the sheer volume overwhelms an organization."

In review:

1. You start with an unsubstantiated assumption that Y2K bugs, i.e. "invalid date handling errors", will "fairly easily be found".

2. I (and others) point out reasons to expect otherwise.

3. Your reply ignores those points and simply repeats your assumption.

4. I emphasize the implausibility of that assumption.

5. You reply that it "is really beside the point"!

Well, no. It is among the main points in combination with the volume.

Also among the main points is that you confuse obviousness of symptoms with ease of diagnosis and remedy of causative errors. (Try explaining that to the farmers whose grain rotted at the depots while UPRR tried to diagnose and remedy their very blatant symptoms (among several instances that were so gross as to gain widespread public notice)).

There is nothing subtle about this. It is simply a matter of methodically checking the premises, checking the logic, checking the data, doing reiterative plausibility checks as the picture unfolds.

Until you can address, point by point, the items in my summary of reasons to expect extended mean times to repair of Y2K bugs as their activation rates peak, you have not begun to make a serious case for the optimism that you exhibit.

What should I now expect? A serious reply, or just another superficial one?

Jerry

-- Jerry B (skeptic76@erols.com), May 20, 1999.


"Top Ten Tricky Year 2000 Problems"

http://www.esj.com/fullarticle.asp?ID=1059835345PM

-- Kevin (mixesmusic@worldnet.att.net), May 21, 1999.


Yes, Jerry, you attempted to resort to one of the older "straw man" arguments in Y2k, being "if it's no problem, why are they spending all that money?".

OK, then, point by point:

In one post you wrote: "But Y2k errors tend to run then gamut from the trivial (reports not displaying dates correctly) to the catatstrophic, as do normal bugs." But then you continued: "In fact, especially in date calculations, Y2k errors do not tend to be subtle errors, but extreme errors that are readily observed."

My use of "trivial" addressed the seriousness of the problem, not how obvious it is. An invalid date on a report can be very obvious, and at the same time trivial. Do you need more explanation?

I pointed out that: Even when their symptoms are obvious, the underlying errors, the bugs that need to be found and fixed, will often be quite obscure.

And here we disagree. Some may be, but "often" is to strong a term. Obviously, just my opinion. But personally, I doubt I'd have very serious problems finding the cause of an error, given the result, and that it happened at rollover.

To which you replied: "But I'd submit, while not quite equal, certainly analogous is that errors that occur on rollover will be almost exclusively related to dates. Which points the debugger in the right direction, as does recently modified code."

No, that post of mine was in direct response to your post about it being SOP to immediately check recently modified code. If you're going to attempt to summarize, get it right.

However, as I replied: Many systems are riddled with date calculations, and in those, the direction would, in 2D terms, be 360 degrees.

The "riddling" of systems with date calculations was the original subject. And for a given erroneous result, the logic path will be quite finite, and in all probability involve very few date calculations, not matter how many in total a program may have.

I then posted a point by point summary of reasons to expect extended mean times to repair of Y2K bugs as their activation rates peak.

Your reply to that post simply ignored the serious issues and culminated with: "But again, errors occurring on rollover are the main topic, and will almost exclusively be caused by invalid date handling. And again, these types of errors do not tend to be subtle, and my guess is they will fairly easily be found."

Once again, quite wrong. My response above was to your latest post, not your "point by point" summary. To be honest, I didn't respond to your summary, but BigDog's. Sorry to ignore you.

But here were your points:

1. Unlike other code errors, the activation of Y2K bugs will cluster in the months near 1-1-2000, so the FOF bug pruning process to which other bugs have been subjected for years has barely started.

Wrong on multiple counts. While some clustering may occur, Gartner Group, among others, expects a distribution of errors. More importantly, critical errors are almost always currently clustered around the implementation of new software development. So the situation is not entirely unique.\

2. Therefore, the distribution mentioned in assumption 1 is different for Y2K bugs than for other bugs in old code, since the serious other bugs in old code are more likely to have been FOFd long ago. This is not to say that all serious non-Y2K bugs have been fixed; it is saying that the seriousness distribution curve of old non-Y2K bugs has been shortend and flattened over the years.

I'll agree, that we'll be faced with more serious bugs than normal. That was never in question. This will not be "just another day", and I never said it would be. Again, the whole point here was to get some handle on the relative quantity.

3. Due to the clustering of the activation of Y2K bugs, multiple symptoms of multiple bugs will confuse and mislead diagnostic efforts. Again, this tends to be SOP following software implementation, and is not an unknown situation. You give too little credit to IT personnnel, I feel; my experience is we are very good at triage, and finding the root cause of problems.

4. Some standard diagnostic techniques will be of little help after 12-31-1999: (Flint posted a good description of this aspect in another thread, but I can't find it, so I'll give my not so good description.)

4a. Running old versions of software to see if the problem "goes away" will not be an option.

4b. Checking the most recent code changes is problematic because:

4b1. A large amount of code has been changed recently.

4b2. The bug may be in code that has not changed in many years.

And again, yes, checking recently modified code is not really on option. I addressed this above.

5. The above will increase the mean time to repair of most individual bugs, and coupled with the clustering, will greatly increase the mean time to repair of applications containing multiple bugs, of which there are likely to be many.

Using words like "greatly" is misleading, at best. While applications may have multiple bugs, the question is how many will manifest themselves simultaneously? Which goes back to the Gartner Group estimates, and others.

Another point made previously is that systems that have undergone remediation and testing will undergo relatively more testing than normal maintenance, and will have proportionately less critical errors introduced.

1. You start with an unsubstantiated assumption that Y2K bugs, i.e. "invalid date handling errors", will "fairly easily be found".

An opinion, based on experience. Your opinion may differ. But at least I have attempted to back up my posts with some facts; from where I stand, each of your posts is merely assumption and opinion.

2. I (and others) point out reasons to expect otherwise.

Again, your opinions.

3. Your reply ignores those points and simply repeats your assumption.

Sorry, Jer, but reading back through this thread, I've tried to address as many points as I could. Sincere apologies for ignoring your one post.

4. I emphasize the implausibility of that assumption.

No, you haven't. You did state that the underlying errors may be obscure, but provide not backup for that statement. To borrow your line, merely repeating the statement doesn't emphasize it, or provide back-up.

5. You reply that it "is really beside the point"!

And once again, you misrepresent my post, and what it was replying to. Once can be overlooked, but three times makes me think it is deliverate.

Well, no. It is among the main points in combination with the volume.

Then provide some backup for your opinion. My experience with Y2k errors indicates they are indeed not subtle; subtracting 99 from 00 yields fairly obvious errors.

Also among the main points is that you confuse obviousness of symptoms with ease of diagnosis and remedy of causative errors. (Try explaining that to the farmers whose grain rotted at the depots while UPRR tried to diagnose and remedy their very blatant symptoms (among several instances that were so gross as to gain widespread public notice)).

And your point is that some system errors are hard to find? Granted. But normal errors can literally occur in any line of code. My point is that an error specifically happening on rollover can be traced to relatively few date calculations.

There is nothing subtle about this. It is simply a matter of methodically checking the premises, checking the logic, checking the data, doing reiterative plausibility checks as the picture unfolds.

Until you can address, point by point, the items in my summary of reasons to expect extended mean times to repair of Y2K bugs as their activation rates peak, you have not begun to make a serious case for the optimism that you exhibit.

Once again, I don't always have the time to respond point by point to every post. Sorry if this upsets you. I'll take your opinion of how serious a case I make under advisement. What should I now expect? A serious reply, or just another superficial one?

-- Hoffmeister (hoff_meister@my-dejanews.com), May 21, 1999.


I think the effect of any given error depends on the type of error - and the time available to fix it, and the time when the error is found.

Something found now (during remediation - when the program is "open" and deliberately being edited and reviewed - has least impact, if you want to consider any impact at all. Thus, changing a date during remediation is actually "free" in the middle of the whole remediation effort dollars. (As if 400 billion ot remediate is "free"!)

Anyway - an error found during first level (recompiling and programmer checks) is next most expensive - he has to re-open the save file, find the goof based on symptoms, re-fix it, re-compile, re-check-in the executable, blah-blah-blah, etc. and finally re-test.

that's more expensive, in time and money, but in the "big picture" of the company and economy at large, still invisible within remediation - except as it affect the company's schedule, and the company's exchange verifications and tests with other clients.

For example, if this test caused a company to fail to meet the automaker's self-declared comliant deadline, then the comapny could lose a contract or suffer penalties. (Not likely that one test woudl fail a whole company - but several undred failures certainly could totally screw up a company's contract by combining into long delays that are unacceptable.

Next higher is the failure at the system or operating systems level, etc. here the computer must be replaced, or the program can't compile, or simialr "hard process" failures that can't be worked around easily.

More later - supper calls.

-- Robert A. Cook, PE (Kennesaw, GA) (cook.r@csaatl.com), May 21, 1999.


I think we're ready to cut to the chase here.

Remediation won't find all errors. Testing will find many of the errors remediation missed, but not all of them. The remainder will show up at the worst possible time - in production.

Of production errors, errors in noncritical systems are by definition noncritical. Errors in critical systems are either critical or not (there can certainly be cosmetic errors in critical systems).

Of critical errors in critical systems, there are two issues: speed of fix, and extent of collateral damage (the database got blitzed or the refinery blew up). Gartner estimates that 70% of critical errors in critical systems can be repaired within 3 days (yes, yes, yes, assuming power and other elements of infrastructure are intact). Fixing collateral damage is contengent on the exact nature of that damage. My OPINION is that there will be some real nightmares here and there (some Bhopal or Chernobyl type events perhaps, and some databases damaged beyond repair).

SO: How widespread the collateral damage, and how numerous the critical errors in critical systems that cannot be quickly repaired? These are the important questions for ALL organizations. And they just can't be answered until they happen. We can guess, and hope, and wonder. That's about it.

-- Flint (flintc@mindspring.com), May 21, 1999.


Hoffmeister,

Thank you for your your extensive response. It was quite a bit more than that which I suggested, but I am not, of course, complaining. I do appreciate that you have other demands on your time.

I expect to post tomorrow a discussion on what I consider the subtantive aspects of your post. This post is simply to address some, shall I say, administrative aspects of your post.

I refer to three comments:

1. "No, that post of mine was in direct response to your post about it being SOP to immediately check recently modified code. If you're going to attempt to summarize, get it right."

2. "Once again, quite wrong. My response above was to your latest post, not your "point by point" summary. To be honest, I didn't respond to your summary, but BigDog's. Sorry to ignore you."

3. "And once again, you misrepresent my post, and what it was replying to. Once can be overlooked, but three times makes me think it is deliverate."

Let me be brief: keep in mind that the entire thread is intact and available to all forum visitors (including yourself). It is quite clear that in each of my attributions of your replies to my posts, either

a. the attribution correctly identified the specific post,

or

b. each of your replies in question were designed to mislead regarding to which post they replied,

which even a cynic would find implausible.

Perhaps you might prefer to state that your repies were to different portions of my posts than the portions that I mentioned in my May 20 post, except that that could not possibly work for comment 2. Again, the entirety of all posts in this thread remain available for all to see.

As for the comment that included an assertion that I misrepresented one of your posts: any quotation short of an entire post might be subjected to such an assertion of misrepresentation. I will let the ready accessibilty within the thread of the entire post in question to all observers be my response to this assertion.

Jerry

-- Jerry B (skeptic76@erols.com), May 21, 1999.


As you say, Jerry, I'm also willing to let anyone review this thread, and come to their own conclusion about your characterization of my responses, and what they applied to. I'll let it drop at that.

I think this thread has branched to another aspect, one which deserves attention, but is secondary to the main post. So I started a new thread, Y2k Metrics and Error Rates II. Hope to see you there.

-- Hoffmeister (hoff_meister@my-dejanews.com), May 22, 1999.


Nice risk management. :-)

I would expect to be visiting part II. Still working on that serious reply while dealing with a variety of distractions.

Later.

Jerry

-- Jerry B (skeptic76@erols.com), May 23, 1999.


Moderation questions? read the FAQ