Go Daddy’s CIO Augustine Goldman, through an official blog post explained the events that lead to the site’s breakdown. It confirmed that a series of perfect storm of network failures were responsible for the outage, and that the site was not hacked.
In the blog post, Goldman answered that the DNS server of the company answers about 10 billion DNS queries everyday, over 41 million DNS zones.
The September 10 event, says Goldman, “pushed many of our routers beyond their capabilities.”
“There was not a single issue that caused the service disruption,” writes Goldman. “Rather, it was the combination of multiple factors. The combined factors that contributed to the service disruption were: Router memory exhaustion; Router hardware failure modes; Containment.”
The investigation report said that Go Daddy’s routing hardware failed to transfer a very large routing table to the Forwarding Information Base. The routing hardware fell back to software switching mode and the routers’ CPUs did not transfer the packets.
“Within minutes of the beginning of the event, a recovery procedure was executed and the errant routes were removed from the routing protocol of all of our routers,” writes Goldman. “The procedure relied on a standard response from the routers’ software – remove the routes from the FIB and begin forwarding in hardware again. This coupled with normal tiered DNS caching should have minimized any service disruption that could possibly have been caused by the change. This timeout mechanism did not execute.”
Go Daddy says it has filtered routing information from the network, and has restored the routing table. It restored services as it brought back the pods online.
www.webhostingchat.com updates its site everyday. Visit us for more news from the web hosting world.