On Monday, a mysterious problem at Amazon took out large patches of the internet, annoying almost everybody and uniting us all with a First World problem. Amazon fixed the issue, got the internet back up, and went hunting for the problem. And that problem turned out to be really embarrassing.
The Verge reports that Amazon was taking a few servers offline to tweak their billing system when a slip of the finger set off what amounts to a chain of digital dominoes that knocked over the whole system:
At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems.
In other words, some employee who will probably be known forever as “Butterfingers” — if he hasn’t already been fired — entered the wrong command and goodbye, internet. Fortunately, they figured out what the problem was, but this is a good reminder to both back up your stuff because the internet is a fragile thing. So we should treasure it, even if it usually is terrible.
(Via The Verge)