Thursday, November 14, 2013

Estimating Storage Costs

Ethan Miller points me to a paper on the cost of storage, How Much Does Storage Really Cost? Towards a Full Cost Accounting Model for Data Storage by Amit Kumar Dutta and Ragib Hasan (DH) of the University of Alabama, Birmingham. Unfortunately, the conference at which it was presented, GECON 2013, is one of those whose proceedings are published in Springer's awful Lecture Notes in Computer Science series, so no link. Below the fold, discussion of the relationship between DH and our on-going work on the economics of long-term storage.

DH correctly cite our paper for the UNESCO Memory of the World conference:
Rosenthal et al. discussed the economics of long term digital storage with respect to Kryder’s law, various storage business models, and the value of cloud for digital preservation [31]. They encouraged [sic] to develop an accounting model to properly recognize the long-term cost of ownership of preserved data, ... However, their work also does not include hidden and indirect environmental costs of data storage and disposal costs. Our work complements the limitations of these models by considering both direct and indirect determinants of storage cost.
They correctly identify some of the differences between their work and ours, and it is true that their cost estimates are much more detailed than my deliberately crude models. But they acknowledge only one of the conclusions of the paper, and mis-characterize it:
and utilized current low interest rates to invest on [sic] solid state technologies which despite of [sic] their higher capital cost, are likely to have a lower total cost than disk. At the same time, solid state technologies retain its [sic] fast rapid access.
What we wrote was:
If organizations can change their accounting methods to properly recognize the long-term cost of ownership of preserved data, current low interest rates provide an opportunity to invest in solid state technologies which, despite their higher capital cost, are for this decade likely to provide lower total cost than disk, while retaining its rapid access.
Earlier in the paper we pointed out how unlikely the premise of this conclusion was:
Unfortunately, for an organization to justify investing in solid state storage on this basis requires that it have both a long enough planning horizon and an accounting policy that distinguishes between capital and operating costs. Many organizations lack both; for example most University libraries run on annual budget cycles, are not allowed to carry reserves from year to year, and cannot borrow to finance equipment purchases. Thus, even if solid state storage could offer lower total cost of ownership over say 5 years, they would be unable to invest to capture these savings. This is an example of the problem of short-termism identified by Haldane and Davies.
The most important difference is that DH are concerned solely with estimating the current cost of storage and completely ignore the question of the long-term cost, which is our focus. The reason probably is that they blithely accept the continuation of Kryder's Law:
As storage cost is continuing to drop by roughly 50% every 18 months [1], we can observe two effects: storage appears to be free or very cheap, and there is an illusion of infinite storage.
Their reference [1] is to an Intel marketing website touting Moore's Law, which would be relevant only if all the world's storage were solid state, which it will not be in the foreseeable future. A less marketing-oriented view of the status of Moore's Law is far less sanguine. Our UNESCO paper, which they cite, lays out in detail the prospect for continuation of Kryder's Law's historic drop in cost. It is so dim that not even the traditionally optimistic trade press believes it will happen.

Because DH believe that storage will continue to halve in cost every 18 months, they do not believe that long-term cost is a significant issue. Thus they do not see that our main goal, to build a simulation that allows the investigation of different scenarios for Kryder's Law and other parameters, while placing their results on a comparable basis by computing the endowment needed, is completely different from theirs.

Even as regards the current cost of storage, DH's analysis is suspect:
  • They do not consider bandwidth and per-request costs, so they are in effect costing archival storage.
  • Their baseline hardware cost is from 2011 at about $426/TB of raw capacity. Compare that with Backblaze's current cost of about $60/TB of raw capacity. Paying seven times the capital cost will definitely skew the estimates.
  • Their baseline ignores RAID overhead, assuming that the full raw capacity is available for storage, even though their hardware includes RAID controllers. This means that their per-byte costs are an under-estimate.
  • They use the University of Alabama's costs for single-copy storage and claim that these are lower than S3's, but they do not allow for the facts that S3 is (a) storing three copies, not one, and (b) is not intended for archival storage.
  • A more realistic comparison would have been with S3's reduced Redundancy Storage. RRS is 70.74*103 picocent/byte/yr vs. their (under-) estimate of the University of Alabama costs of 71.51*103 picocent/byte/yr.
  • If they really meant to ignore bandwidth and per-request costs, they should have compared three times the University of Alabama's single-copy costs (214.53*103 picocent/byte/yr) with Amazon's Glacier (12*103 picocent/byte/yr). Glacier is intended for archival storage, stores three copies, and is about 18 times cheaper.
These more realistic comparisons would have shown that their cost estimates were vastly too high even as regards non-archival storage, and even more out of line as regards archival storage. DH might argue that this shows that Amazon and competing cloud suppliers are not covering the full costs of their product. Clearly, they are imposing some externalities on the public, from tax subsidies from state and local governments to energy subsidies. But I do not believe that they amount to two or more times the price that Amazon charges, which they would have to if DH's estimates were to be credible.

No comments: