On June 30th of this year, a nasty storm caused an ill-placed lightning strike to wreak havoc on a data center. For those millions of American engaged in their regular weekend routines of photo sharing, website pinning and action film viewing, things were a bit disrupted. In fact, they were shut off for several hours. That bolt of errant electricity did its deed. Amazon’s AWS was knocked out and pulled customers Pinterest, Netflix and Instagram, along with others, offline for longer than anyone expected.
And Amazon hasn’t been the only Internet giant to get knocked around by cloud problems. Google Talk also suffered a serious outage in July, and underscored a fact that is gradually becoming more and more apparent to businesses – geographically centralized cloud services isn’t a good idea.
Don’t Put All Your Data in One Basket
As Google Fellow Urs Hölzle has previouslysurmised, “At scale, everything breaks.” While this is a well-known truth, the recent AWS failure was a rather hard yank on the chain for many IT managers, many of whom were relying a bit too heavily on the cloud. And as hyped as it has been in the past, increasingly, the cloud reliability kool-aid is starting to wear off.
Frankly, if a CTO has any concern about uptime, then they’re scrambling. At the heart of the matter is the need to avoid depending on any single data center (or in Amazon lingo, “Availability Zone”). If a disaster occurring in one location takes down the whole cloud service, then a business had better be thinking hard about a different deployment strategy.
Business intelligence advisor Jorn Bettin doesn’t have much sympathy for the companies affected by the AWS outage. He argues that Netflix, Instagram and Pinterest should have been creating what he terms “geographically redundant links.” QuotedinaZDNetarticle, he states, “”They could operate at a higher level of redundancy, so that these sort of outages would only have a minimal impact on them.”
The moral of the story? Ryan Shriver says it well in hispost on the Virtualization Practice: “The simplest and most cost effective way is to deploy your applications across multiple, geographically disperse regions…a little extra cost and complexity is certainly preferable to an outage.”
Why Gridblaze Has Gone Global
Because of these concerns, we at Gridblaze have put some safety mechanisms in place to minimize the risk of localized failures. For one, we’re using an independent data center in another country. All data stored with us is mirrored at this second facility. If one storage center isn’t working, you’ll still have access through the mirror site.
Second, we’re not relying on just one service provider. For that matter, we don’t even think two is enough. In order to get maximum resilience for our clients’ data storage and access, we’ve employed 5 isolated and independent providers.
The choice to use multiple providers isn’t necessarily the cheapest or the easiest, but it does have one very important thing going for it: it’s the safest. While there’s been a whole lot of talk about risk management and the cloud, most of the discussion has been focused on security and compliance issues. If nothing else, the AWS failure is bringing outage issues back into the conversation.
Reliable service doesn’t happen by accident though. Greg Arnette, who was also affected by Amazon’s outage, chastises his fellow cloud denizens, saying, “…what’s been missing is expert supervision and serious conversations about risk management…If you’re serious about your application’s SLA to your customers, you need to invest time and money.”
We Do Our Part, You Do Your Part
Minimizing cloud service downtime is a team effort. As the storage company, we’re making every effort to be certain that your files and data are there when you need them. But the second part of the picture is the Application layer. This also needs to be designed with geographical redundancy in mind.
We recommend that you make use of multiple service providers for maximum reliability
Yeah, it’s going to be a bit of a headache, and you’re going to have to do some research to ensure data portability and reasonable data transfer costs. But in the end, that extra ounce of prevention is worth a pound of trying to “cure” a single vendor outage, alaGoogle.
Now for the sake of complete honesty, we’re going to be upfront. At the end of the day, there’s no guaranteed, 100% failsafe way to prevent outages. That’s just the hard reality of doing business in the cloud world. Bugs happen. It’s impossible to test or plan for every contingency, and Murphy’s Law applies just as much to the cloud as to any other environment.
But all that being said, what we can prepare for is every known potential issue. And that’s what we’ve done at Gridblaze. By using multiple vendors for your data and storage, even if one continent goes off the grid, you’ll still have access to everything you need.