Amazon Web Services, the world’s biggest and most well-known cloud computing provider, once again had some availability problems in its US-EAST data center.
The outage stemmed from a network issue and lasted from 12:51 p.m. Pacific Time on Sunday until 1:42 p.m., according to the AWS status page. As of 3:23 P.M., AWS was reporting that most affected Elastic Compute Cloud instances were back up and running and that it was “continuing to work on a small number of instances and volumes that require additional maintenance before they return to normal performance.
It’s hard to say how many sites were affected by the brief outage, but it appears Instagram was, at least minimally. One of the company’s engineers tweeted about the problems pretty early on.
Major shit going down in us-east-1 right now.
— Rick Branson (@rbranson) August 25, 2013
Numerous reports on Twitter noted that Flipboard and Vine were down, as well.
Instagram, Flipboard, and Vine are all down but the AWS dashboard says everything is ok. Something big has broken.
— Peter M. O’Donnell (@pmod) August 25, 2013
Airbnb acknowledged some problems, as well.
Apologies – Airbnb is among several sites & apps that are temporarily down due issues w/ Amazon servers. Investigating now & will update.
— Airbnb (@Airbnb) August 25, 2013
2013 has actually been a pretty good year thus far for AWS, with this being the first outage (at least that I recall; please correct me if I’m wrong) and brief one at that. Last year, however, ended on a sour note with a Christmas Eve outage that even took down Netflix — a guiding light for all developers trying to build resilient application atop Amazon’s seven-year-old, but still evolving cloud platform.
Last week, the Amazon.com front page went down for about 45 minutes and cost the company an estimated $ 5 million.
The US-EAST region is AWS’s most popular (and least expensive) region, which means that outages there often affect a large number of web sites and services. Best practices are beginning to emerge (from places other than Netflix even) about how architect applications that survive an outage in one Availability Zone, but not everyone heeds this advice, and sometimes even the best-laid plans aren’t good enough.
Here is the full AWS report on Sunday’s incident:
Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.
- Cloud security market landscape, 2013–2017
- The fourth quarter of 2012 in cloud
- How direct-access solutions can speed up cloud adoption