Tuesday, September 11, 2012

More on Glacier Pricing

I used our prototype long-term economic model to investigate Amazon's recently announced Glacier archival storage service. The results are less dramatic than the hype. In practice, unlike S3, it appears that it can have long-term costs roughly the same as local storage, subject to some caveats. Follow me below the fold for the details.

Glacier's pricing model is complex, so I had to make some fairly heroic assumptions:
  • No accesses to the content other than for integrity checks.
  • Accesses to the content for integrity checking are generated at a precisely uniform rate. This is important because Glacier's data access charges for each month are based on the peak hourly rate during that month.
  • Each request is for 1GB of content. This is important because Glacier charges for each request in addition to the charge for the amount of data requested.
  • In each month no more than 5% of the content may be accessed without an access charge, but the requests to do so are charged the normal request fee.
The graph shows five simulations, the two from my earlier post comparing S3 and local storage, and:
  • Glacier with no integrity checks.
  • Glacier with an integrity check of each object every 20 months.
  • Glacier with an integrity check of each object every 4 months.
Although the graph suggests that Glacier with 4-monthly integrity checks is competitive with local storage at all Kryder rates,
this assumes that they both experience the same Kryder rate. If Glacier experiences Amazon's historic 3% rate and local storage the industry's projection of 20%, Glacier is nearly 2.5 times more expensive.

If my surmise that Glacier's pricing will follow S3's is correct, then the only way to make Glacier competitive with local storage is to extend the interval between integrity checks enough that all accesses to data are covered by the 5% monthly free allowance.

The shortest interval that can possibly achieve this is 20 months, although in practice some margin for error would be needed and thus a more practical interval would be 24 months. The 20 months line in the graph suggests that this makes Glacier at a 3% Kryder rate somewhat cheaper than local storage at a 20% rate, but even if the assumptions above were to be true, this is not an apples-to-apples comparison.

The Blue Ribbon Task Force and other investigations of the sustainability of digital preservation emphasize that preservation cannot be justified as an end in itself, only as a way to provide access to the preserved content. The local disk case provides practical access; the Glacier case does not. The long latency between requesting access and obtaining it, and the severe economic penalties for unpredictable or high-rate accesses mean that Glacier cannot alone be a practical digital preservation system. At least one copy of the content must be in a system that is capable of:
  • Providing low-latency access for users of the content. Otherwise the preservation of the content cannot be justified.
  • Being a source for bulk transfer of the content, for example to a Glacier competitor. Getting bulk data out of Glacier quickly is expensive, equivalent to between 5 months and a year of storage, which provides a powerful lock-in effect.
As an example of the costs of a practical system using Glacier but providing access and guarding against lock-in, if we maintain one copy of our 135TB example in Glacier with 20-month integrity checks experiencing a 3% Kryder rate, and one copy in local storage experiencing a 20% Kryder rate (instead of the three in our earlier local storage examples), the endowment needed would be $517K. The endowment needed for three copies in local storage at a 20% Kryder rate would be $486K. Given the preliminary state of our economic model, this is not a significant difference. Replacing two copies in local storage with one copy in Glacier would not significantly reduce costs, instead it might increase them slightly. Its effect on robustness would be mixed, with 4 versus 3 total copies (effectively triplicated in Glacier, plus local storage) and greater system diversity, but at the cost of less frequent integrity checks.

Because the cost penalties for peak access to storage and for small requests are so large (see the difference between the 4-month and 20-month lines), if Glacier is not to be significantly more expensive than local storage in the long term preservation systems that use it will need to be carefully designed to rate-limit accesses and to request data in large chunks.

No comments: