How and Why the Internet Broke

greenspun.com : LUSENET : Current News - Homefront Preparations : One Thread

By Michelle Delio

9:35 a.m. Oct. 4, 2002 PDT The Internet was very confused on Thursday.

But cyberspace hasn't gone senile. Those massive e-mail delays, slow Internet connections and downed e-businesses were all caused by a software upgrade that went horribly wrong at WorldCom's UUNet division, a large provider of network communications.

The problem affected roughly 20 percent of UUNet's U.S. customers -- which translates to millions of users across the United States and around the world -- for most of Thursday, according to WorldCom spokeswoman Jennifer Baker.

The problem began around 8 a.m. EDT. Baker said in a statement that the company had fully restored service by 5:15 p.m. Thursday evening. Preliminary investigation by UUNet indicates the problems were caused by "a route table issue."

Sounds simple, but imagine an airport that's having an air traffic controller issue, and you'll have an idea of what happened at UUNet.

Route tables direct data from one major network to another or from one area of a network to another area.

UUNet is a vast, high-speed network. About half of the world's Internet traffic -- including about 70 percent of all e-mails sent within the United States and half of all e-mails sent in the world -- passes through UUNet. The backbone of the Internet is built from these large networks.

The Internet was designed to be fault tolerant, to route information around downed or clogged networks. But when the router tables that direct the data aren't accurate, "bedlam reigns on the network," according to Mike Sweeney, owner of the network consulting firm Packetattack.com.

According to networking experts, a "soft error" -- like a badly configured routing table -- is far worse than physical damage to equipment. Things appear to be working fine, at least for a while.

Luckily, in many cases a soft error is relatively easy to fix, since normally only one or two routers are upgraded.

"But in the case of UUNet, they changed the software on a lot of routers all at once, so any fault tolerance they had fell by the wayside as each router broke due to the bad software load or incorrect configuration," Sweeney said.

As the affected routers dropped offline yesterday, UUNet's response time got slower and slower to the point of failure.

"Other UUNet routers might have tried to pick up the load, but they would have quickly been overwhelmed by the volume of data, and they too would have slowed down," Sweeney said.

"It would be like a 10-lane freeway being blocked in both directions and yet all the traffic still trying to get from here to there using the side streets. It works for a short period, and then you end up with gridlock and nobody getting through."

Network experts were troubled at UUNet's choice to deploy a wide-scale upgrade without testing and retesting the configurations first.

"You have to test, test, test before you change configs," Mark Denham, a Toronto networking consultant, said. "And you really don't want to upgrade an entire huge system like UUNet all at once, if you can avoid doing so. It's insanely difficult to track down an error that could be hiding anywhere on a gigantic system."

"And you should always have an escape route handy in case everything goes to hell," Sweeney added. "A spare device, saved configurations, anything to get the network back and working quickly if the upgrade goes badly."

Some UUNet users said that their problems on Thursday went far deeper than slow e-mail and sluggish Internet connections.

Any Internet-based business hosted by WorldCom's service was hit hard. Not only were users unable to access the Internet, but at times their customers would have been unable to purchase goods, book travel and rental care reservations, or carry out other normal business activities.

WorldCom, which filed for Chapter 11 bankruptcy protection after a major corporate accounting scandal, claims that 60 percent of Fortune 1000 businesses use its UUNet network services.

-- Anonymous, October 05, 2002

Answers

Evidently using the same principles they have for accounting...

-- Anonymous, October 07, 2002

Moderation questions? read the FAQ