Actual percentages of faulty code

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

Just curious...we hear of companies and agencies reporting some level of code repair. How many of these repairs, if left unrepaired, would have resulted in a particular system failure? That is, those programmers who have effectively "repaired" great blocks of code must surely have some indication of the severity of the problem had the code gone uncorrected. To my knowledge, there exists no published data dealing with this actual, real-life situation. I mean, these guys repairing code must surely have an inkling of how serious the problem is, and should be able to quantify the prevalence of fatal systemic y2k. Seems to me that verifiable, hard stats of the problem could then be used to extrapolate and get a valid estimate of how many lines of code in any particular organization's software are potentially lethal to that organization. This would be powerful persuasion. For example, armed with such data, a consultant could appoach Company X and announce that statistically, Company X could expect (for its size) to have 1462 lines of error code, any one of which, if uncorrected, dooms the company. Does anyone have any actual numbers?

-- Sam Rowe (rowes@gate.net), August 18, 1998

Answers

Working out what would have been the consequences of any particular bug is frequently orders of magnitude harder than just squashing it. After that one hopes that the unsquashed ones get caught during a test rather than in production...

Anyone with experience of programming will know that occasionally a bug will have consequences that nobody could reasonably have predicted, and/or that it will be sufficiently-well hidden that it escapes detection during rigourous testing and bites later. The very worst are in both classes at once, and are an inevitability but (I hope) rare. I've even known cases where *fixing* a bug *caused* a problem, because someone or something else was relying on *incorrect* outputs.

Much more useful, and possibly obtainable, might be a "triage" breakdown: just what percentage of a typical organisation's code is really, truly, mission-critical. I suspect far, far less than is commonly assumed. Nobody is likely to be willing to admit that the application he spends his entire day using is not mission-critical, because if it goes, he goes! Nevertheless, it's quite probable that it will be discovered that neither the application nor the people using at are actually needed much, if/when it breaks or gets triaged away. (Thought: this applies to a lot of government and other admin, doesn't it :)

Also many mission-critical applications are critical in the sense that "if we didn't have them the competition would eat us". If the competition are in the same situation, this doesn't apply. Unfortunately, the concept of industry-wide triage is probably impossible to realize.

-- Nigel Arnot (nra@maxwell.ph.kcl.ac.uk), August 19, 1998.


Severity of the bug isn't always the issue. Sometimes bugs that would have been minor if left alone get turned into total failures when other, more major bugs are fixed.

Suppose a company is expanding all the date fields in a particular database from two digit centuries to four. Any program accessing that data need to be adjusted to cope with the new field sizes, even those programs whose functionality might otherwise have been only marginally impared by Y2K problems. A particular program's Y2K bugs will turn from minor to sever if the data changes and the program doesn't.

This would be an example of the ripple effect Ed describes in the Appendex of "Time Bomb 2000" applied to Y2K fixes rather than the results of the bugs themselves.

-- Paul Neuhardt (neuhardt@ultranet.com), August 19, 1998.


Here is a good example of a y2k bug. At a major British Insurnance Company they use IBM mainframe COBOL/ADABAS/NATURAL for general and life insurance systems. This code dates back to 1977 onwards with new systems/enhancements/ additions over the years. There is a general purpose way of holding miscellaneous data that did not 'belong' to a specific database. It was held in an ADABAS database with data areas designed for general purpose use. It was quick and easy to set up/maintain data without having to design a new database. Many hundreds of programs use this. Trouble is the data is designed to be accessed on a from/to date basis (dates are part of the composite key). These dates are DDMMYY, all data expires on 31st Dec 1999 (ie 311299) or earlier. But the whole process will fail on 31st Dec. We had to come up with a solution that did not affect current programs, would not mean changing many hundreds of programs at the same time, and would work after 31 12 99. It was possible after some thought, it took some time to implement, it was done fortunately in 1994, it could not have been done as a production fix in 2000. This is only one graphic example of a y2k bug. In all it took 120 man years to convert all of the programs in only one part of the business (8,000 programs).

-- Richard Dale (rdale@figroup.co.uk), August 20, 1998.

I'm a card-carrying geek, and proud of it. The big problem with 'hard numbers' is that they can be misrepresented so easily. ("There are lies, damn lies, and statistics") You can have a system with 100 programs with no date logic, and voila, with a few hours work have thousands of lines of code 'compliant'. You can find your 'likely suspects' of bad code (order entry, forcasting, etc.) and can prove the consequences of doing nothing. I've worked the code, and I know how serious the problem is. Hard numbers? No. Hard scenarios? Yes.

-- Keith J Kafka (mooski@bigfoot.com), August 24, 1998.

Moderation questions? read the FAQ