The UK's JISC recently announced the availability of the final version of a report the Digital Curation Centre put together for a workshop last March. "Digital Curation and the Cloud" cited my work on cloud storage costs thus:
David Rosenthal has presented a case to suggest that cost savings associated with the cloud are likely to be negligible, although local costs extend far beyond those associated just with procuring storage media and hardware.Apparently this was not the authors' intention, but the clear implication is that my model ignores all costs for local storage except those "associated just with procuring storage media and hardware". Clearly, there are many other costs involved in storing data locally. But in the cited blog post I say, describing the simulations leading to my conclusion:
A real LOCKSS box has both capital and running costs, whereas a virtual LOCKSS box in the cloud has only running costs. For an apples-to-apples comparison, I need to compare cash flows through time.Further, I didn't "suggest that cost savings associated with the cloud are likely to be negligible". In the cited blog post I showed that for digital preservation, not merely were there no "cost savings" from cloud storage but rather:
Unless things change, cloud storage is simply too expensive for long-term use.Over last weekend I wrote to the primary author asking for a correction. Here it is Friday and the report now reads:
David Rosenthal has presented a case to suggest that cloud storage is currently "too expensive for long term use" in comparison with the capital and running costs associated with local storage.Kudos to all involved for the swift and satisfactory resolution of this issue. But, looking back at my various blog posts, I haven't been as clear as I should have been in describing the ways in which my model of local storage costs leans over backwards to be fair to cloud storage. Follow me below the fold for the details.
The model of local storage that generated this graph comparing local storage with S3 is based on published costs from Backblaze to build a 135TB storage pod, but then:
- The disk drive costs are increased by 60% to account for the effects of the Thai floods.
- The total hardware cost for a 135TB pod is multiplied by three to reflect three geographically separated copies.
- Running costs are added amounting to double the purchase cost of the three pods over three years, reflecting published cost figures from San Diego Supercomputer Center and Google. Note that Backblaze's running costs are much lower than this.
- When new hardware is purchased, move-in costs of 20% of the purchase cost are added.
- At least every 4 years, and sooner if it makes economic sense, move-out costs of 20% of the hardware's initial purchase cost are added.
This difference is despite my assumption that the only cost of cloud storage is the Amazon prices for storage and for a reserved AWS instance to perform integrity checks, no other costs of any kind. This is unrealistically kind to cloud storage. For example, the model assumes that the data stays with Amazon for the entire 100 years of the simulation, whereas the local storage hardware changes every few years, incurring moving costs. Given S3's bandwidth charges, moving to a cheaper competitor after a few years would cause a significant increase.
As I pointed out here, Amazon's recent announcement of Glacier, a product specifically aimed at long-term storage, and priced a factor of more than 5 below S3, shows that they understand that S3 and its competitors are not economic for long-term storage.