Tuesday, August 21, 2012

Amazon's Announcement of Glacier

Today's announcement from Amazon of Glacier vindicates the point I made here, that the pricing model of cloud storage services such as S3 is unsuitable for long-term storage. In order to have a competitive product in the the long-term storage market Amazon had to develop a new one, with a different pricing model. S3 wasn't competitive. Details are below the fold.

Glacier has some excellent features. In return for accepting a significant access latency (Amazon says typically 3.5 to 4.5 hours, but does not commit to a maximum latency), you get one low price of $0.01/GB/mo. You can access 5% of your data each month free, which provides for data integrity checks at 20-month intervals at no cost other than the compute costs in EC2 needed to perform them. Other than that, each access above this costs as much as a month's storage. There is a $0.05 fee for each 1000 requests, making it important that archived objects are large.

Amazon makes a pitch for Glacier in the digital preservation market:
Digital preservationists in organizations such as libraries, historical societies, non-profit organizations and governments are increasing their efforts to preserve valuable but aging digital content such as websites, software source code, video games, user-generated content and other digital artifacts that are no longer readily available. These archive volumes may start small, but can grow to petabytes over time. Amazon Glacier makes highly durable, cost-effective storage accessible for data volumes of any size. This contrasts with traditional data archiving solutions that require large scale and accurate capacity planning in order to be cost-effective.
They are right that Glacier's advantage in this market is that it avoids the need for capacity planning. The announcement does not, however, address the point I made here, which is that the long-term cost-competitiveness of cloud storage services such as Glacier depends not on their initial pricing, but on how closely their pricing tracks the Kryder's Law decrease in storage media costs. It is anyone's guess how quickly Amazon will drop Glacier's prices as the underlying storage media costs drop. But, based on the history of Amazon's pricing, my guess is that it will follow the pattern of S3's pricing. That is:
  • The initial price will be set low, even at a loss, so that competitors are deterred and Amazon captures the vast bulk of the market.
  • Subsequent media price drops will not be used to reduce prices all round, but will be used to create tiered prices, so that the part of the value of lower media prices not captured by Amazon itself goes to their largest customers.
Thus customers, especially those with short planning horizons, will get themselves locked-in to Glacier and will in the long term pay a lot more than they could have by doing it themselves. The lock-in effect is strengthened in that getting bulk data out of Glacier quickly is expensive, equivalent to between 5 months and a year of storage. But it will be difficult for other vendors to compete with Amazon in this space; they are trading on the short-termism identified in this paper from the Bank of England (PDF).

There is more detail on Werner Vogel's blog, and on the AWS developer's blog. But note that none of these are specific about the technology underlying Glacier. The unspecified but hours-long access latency would allow them to use tape robots, but The Register believes Glacier uses the same underlying disk storage as S3. If it does, one can see why Amazon wouldn't be specific about the technology. It would reveal that suspicions as to the enormous margins Amazon enjoys from S3 are correct.


David. said...

Ian Adams points me to this article in Wired discussing the obscure details of the pricing structure for retrieving content from Glacier if you exceed the 5%/mo free allowance. As the article points out, the 5%/mo free allowance is also slightly misleading. It is more accurately described as a 0.17%/day allowance for requests for data to be retrieved. The message is that exceeding the limit will be expensive, but not as exorbitant as the wording might suggest.

Note that, from Amazon's point of view, this is good business model design, an effective lock-in. It is free to put stuff in, cheap to leave stuff there, free to access small amounts, but if you ever need to get at large amounts of data quickly, it will cost an arm and a leg. There are two reasons why you might want to get a lot of data quickly from a backup service. One is that you want to move to a competitor. Obviously Amazon wants to make that as expensive as possible The other is that you just lost a lot of data and want it back from the backups ASAP. In that case you won't care how much it costs. Amazon would have you over a barrel.

Potential customers should think carefully before putting their leg into this trap.

Elliot Metsger said...

I'd be interested to know if there are any data or studies on the pricing/cost model of long-term storage for traditional storage systems and how they compare to cloud-based solutions. For example, what would the TCO be of a IBM or Sun solution compared to Amazon Glacier over 5, 10, 15, 20 years?

David. said...

Elliot's comment misses the point I have been making, for example here. What matters when computing the endowment needed to store data for the long term (a more accurate concept than TCO) is not so much what the cost would be in the first year, but how rapidly the cost will drop over time.

We have more than 6 years of history showing that S3's prices have dropped at between 1/3 and 1/10 the rate of the underlying media. Why would you think that Glacier would be different?

Local storage, whether you pay through the nose to EMC or NetApp, or build it yourself like BackBlaze, drops close to the rate of the underlying media. Which one is going to win in the long run?

David. said...

Quantum matches Amazon's price, at least for larger customers.

David. said...

Henry Newman points me to an unofficial Glacier cost calculator. I believe, based on the Glacier FAQ, that this calculator is incorrect and will greatly underestimate the actual cost in practice. The FAQs say:

"If, during a given month, you do exceed your daily allowance, we calculate your fee based upon the peak hourly usage from the days in which you exceeded your allowance."

As I understand it, if you exceed the request rate you are charged for the entire month as if you had sustained your peak rate for the entire month. Unless your accesses to data in Glacier are highly predictable and scheduled at a very slow rate to stay under your allowance they will quickly get very expensive.

euanc said...

Hi David,

Not sure if you are aware but Google have also announced a similar offering: http://techcrunch.com/2012/11/26/google-drops-pricing-on-cloud-storage-20-adds-new-features-in-advance-of-rival-amazons-first-big-cloud-summit

"Durable Reduced Availability Storage".



David. said...

Thanks, Euan.

Google's prices for their regular storage offering are closer to competitive with S3.

But Google is not serious about competing with Glacier. The DRA per-byte prices are 7-5 times Glacier's. I need to study their access charges, but its hard to believe they are cheap enough to make up for this.