Feds/SSA remediation halt

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

I found this surprising. I thought this might happen about mid '99. For SSA to publicly say NOW they are contemplating a halt to code changes, months before 1/1/2000, is amazing. To me, this means they have given up. However, it would give them some shot at a contingency plan. Look for an announcement saying they will "average" payouts 4Q 99. Now, whats the DoD going to do? http://www.y2ktoday.com/modules/home/default.asp?id=462

-- R. D..Herring (drherr@erols.com), October 25, 1998

Answers

Yeah, but:

>Although most systems will likely be ready, a few glitches could stall some programs, Curtis said. Halting code work would give agencies a chance to set contingency plans for doing their jobs, he said.<

-- sucked down (he@dincommode.com), October 25, 1998.


Why would the SSA halt work on its code with '14 months to go'?

I believe the most likely reason is that the SSA may be one of the few organizations being wholely realistic about this issue. Large coding projects are vitually impossible to test COMPLETELY.

Testing is always a balance/trade off. Most people who are not directly involved with software/systems development don't understand that it is simply not possible to test every single possible combination of paths, parameters and states in a large system.

For the puposes of this discussion, I define a 'system' as a large collection of both hardware and software that perform a needed service. Keep in mind that not all components of such a system may be in the same geographical location -- i.e. the system may be 'distributed'. The collection of all people, computers and software that that perform 'the SSA service' is a very large 'distributed' system. A complete test of the system is not feasible.

A good analogy here would be if I asked you to 'test the highway system between New York City and Los Angeles'. That might seem a reasonable request and almost certainly you could find several routes that would work and a few that, for one reason or another, would not.

But what if I said you must thoroughly test the system by including EVERY POSSIBLE COMBINATION OF ROUTES leading from New York to Los Angeles, including Interstates, primary, secondary, and gravel roads? Next I say you must repeat this test for every vehicle made and sold today (semis/lorrys, passanger cars, motorcycles, etc.) The task, while theoretical possible is simply not practical.

This also the case for large, complex, software/hardware systems. It is simply not feasible to test every single possible combination of paths and values. So do systems engineers simply give up when faced with such a large task? Obviously not.

While not 'complete' in the sense above, very good, very revealing testing can be (and frequently IS) performed - and at several levels. Indeed, much work and research has gone into developing testing methodologies for such systems over the past few decades. Today, systems testing is a specialty unto itself.

But good testing takes an enormous amount of time. And in the end, it is always a trade off between the time required to engineer and implement such testing, the 'degree of confidence' such testing results in, and the PERCEIVED RISK of a failure/bug not caught before a product goes out the door. More often than not, the determining factor is the budget allocated. I'm not saying this is bad, it is simply a reality of the way business is conducted.

But I have personally seen far too many companies that perform virtually no engineered testing (or highly inadequet testing) of their software before it goes out the door. They basically rely on their customers to test their products and let them know when something goes wrong.

Now letting your customers test your code for you is not, in and of itself, bad or wrong. In fact, if such a test (often referred to as a 'beta test') is conducted with the customer's knowledge and consent and is JUST ONE PART of a well engineered test process, it is extremely useful, even vital. Beta testing is used extensively throughout the industry. Often there NO SINGLE BETTER TEST than putting you product in the hands of the people who will be using it. But as the sole substitute for a well designed test plan, it is ineffective and often counter-productive.

There are always pressures which contend for schedule. As any software engineer will tell you, there is almost always a push to get the product out the door in order to meet shipping / expected revenue goals. Management's logic here is simple and also quite practical: if there is no revenue, there is no business. But the devil is in the details.

Some, but not all, upper level management often view the testing process as 'over-engineering'. Indeed, at one large company I worked for that provided telecommunications equipment and software to the military, there was a well-known saying: "Shoot the engineer and ship the radio."

The SSA needs the time remaining to test as best they can what they've done thus far. This does not necessarily equate 'thorough testing' but it does tell me they're not attempting to kid themselves. The more 'knowns' and the fewer 'unknowns', the better. The fact that they are 'freezing' the code is encouraging. It tells me that there are experienced and practical people at several levels there who are taking this very seriously.

If only every business and organization had gotten the same early wake up call that SSA had and treated it with the same seriousness, we might be able to realistically expect 'a bump in the road'.

Unfortunately, Y2K is not limited to the SSA.

Arnie

-- Arnie Rimmer (arnie_rimmer@usa.net), October 25, 1998.


Arnie, An excellent comment on testing. However, let me quibble a bit. From the info available via GAO and private sources, I think SSA is roughly at 50-60% remediation right now. But thats with VERY little enterprise level/time machine testing. I've heard that they have lost a lot of programmers lately to private industry. In any remediation effort, there is a certain minimum threshold for the system to work in a valid and acceptable fashion. 100% remediation (contrary to GN) is seldom necessary for this, BUT acceptable remediation is usually in the 90% range. Now we could argue a bit on what "acceptable" means. If SSA is willing to crank out checks with say a 75% accuracy rate, then perhaps freezing now (or soon) is okay. But thats presuming the system will run at all with half the date handling code doing one thing and half another.(What a FUBAR!!) However, that still means millions of missing or incorrect checks. What they need as part of the contingency plan is some kind of backup for those who get screwed. Personally, I think this approach is doomed because they are still at least nine months from a minimum politically acceptable threshold. This smacks more of a CYA effort on the part of the SSA administrators. It could mean that the PHM/horn hairs were grabbed by the throat on a dark and stormy night by the Codehead Ghost of Systems Past. ("Listen you idiots, YOU HAVE RUN OUT OF TIME AND CODEHEADS-- SO QUIT- SAVE WHAT YOU CAN."- chains rattling, maniacal laughter, MJ Thriller in the background) However, I'm getting more cynical by the day! Expect more Fed surprises/date re-adjustments as we move int 99.

-- R. D..Herring (drherr@erols.com), October 25, 1998.

That's certainly an intersting counter argument ('quibble'). I don't have any evidence to suggest your analysis may be incorrect. But if what you have suggested is true, then we are in even worse trouble than I had believed.

Toughest part of this nut to crack remains the difficulty in getting accurate and independently verified information with which to make some 'educated guesses'.

I'm not willing to simply trust what any company or organization is saying at this point. The risk is too great. The motivation for saying 'eveything is OK' is particularly strong.

This environment does make choosing well much more difficult. Especially for folks who will only listen to the 5 minute executive-overview of Y2K.

-- Arnie Rimmer (arnie_rimmer@usa.net), October 25, 1998.


Moderation questions? read the FAQ