Friday, September 14, 2012

Correction, please! Thank you!

I'm a critic of the way research communication works much of the time, so it is nice to draw attention to an instance where it worked well.

The UK's JISC recently announced the availability of the final version of a report the Digital Curation Centre put together for a workshop last March. "Digital Curation and the Cloud" cited my work on cloud storage costs thus:
David Rosenthal has presented a case to suggest that cost savings associated with the cloud are likely to be negligible, although local costs extend far beyond those associated just with procuring storage media and hardware.
Apparently this was not the authors' intention, but the clear implication is that my model ignores all costs for local storage except those "associated just with procuring storage media and hardware". Clearly, there are many other costs involved in storing data locally. But in the cited blog post I say, describing the simulations leading to my conclusion:
A real LOCKSS box has both capital and running costs, whereas a virtual LOCKSS box in the cloud has only running costs. For an apples-to-apples comparison, I need to compare cash flows through time.
Further, I didn't "suggest that cost savings associated with the cloud are likely to be negligible". In the cited blog post I showed that for digital preservation, not merely were there no "cost savings" from cloud storage but rather:
Unless things change, cloud storage is simply too expensive for long-term use.
Over last weekend I wrote to the primary author asking for a correction. Here it is Friday and the report now reads:
David Rosenthal has presented a case to suggest that cloud storage is currently "too expensive for long term use" in comparison with the capital and running costs associated with local storage.
Kudos to all involved for the swift and satisfactory resolution of this issue. But, looking back at my various blog posts, I haven't been as clear as I should have been in describing the ways in which my model of local storage costs leans over backwards to be fair to cloud storage. Follow me below the fold for the details.

The model of local storage that generated this graph comparing local storage with S3  is based on published costs from Backblaze to build a 135TB storage pod, but then:

I would be interested to hear of real cost numbers that suggest that this model is unrealistic. It reflects total costs of ownership much higher than those claimed by Backblaze. It clearly does not reflect only costs "associated just with procuring storage media and hardware". The result of applying it to a comparison with S3 does not  "suggest that cost savings associated with the cloud are likely to be negligible". It clearly shows that, given the historic rates of price drop of cloud storage services such as S3, and disks, cloud storage services are completely uncompetitive with local storage for the long term. The model suggests that, with these rates of price drop, S3 is at least 5 times as expensive as local storage.

This difference is despite my assumption that the only cost of cloud storage is the Amazon prices for storage and for a reserved AWS instance to perform integrity checks, no other costs of any kind. This is unrealistically kind to cloud storage. For example, the model assumes that the data stays with Amazon for the entire 100 years of the simulation, whereas the local storage hardware changes every few years, incurring moving costs. Given S3's bandwidth charges, moving to a cheaper competitor after a few years would cause a significant increase.

As I pointed out here, Amazon's recent announcement of Glacier, a product specifically aimed at long-term storage, and priced a factor of more than 5 below S3, shows that they understand that S3 and its competitors are not economic for long-term storage.


No comments: