Money turns out to be the major problem facing the future of our digital heritage. Paper survives benign neglect very well, but bits are very vulnerable to interruptions in the money supply. No-one has enough money to preserve even a fraction of the content worthy of preservation. Broadly speaking, the extensive research on the cost history of preservation concludes that about half the money has been spent ingesting an object, about a third storing it and about a sixth disseminating it. If storage has been only a third of the cost, why are we building a model of it?
Kryder's Law, the analog of Moore's Law for disk. There is a 30-year history of disk prices dropping about 40% per year. Figures from the San Diego Supercomputer Center show that media is about 1/3 of the total storage cost, the rest being power, cooling, space, staff and so on. But these costs are almost completely per-drive, not per-byte, so the total per-byte cost drops in line with media costs, meaning that customers got roughly double the capacity for the same price every two years. Thus the cost of storing a given digital object rapidly becomes negligible. The perception was that the delta between storing an object for a few years and storing it forever was too small to worry about. Kryder's Law has held for three decades; surely it is good for another decade or two?
Here is XKCD's explanation. It is always tempting to think that exponential curves will continue, but in the real world they are always just the steep part of an S-curve.
Note how Dave's graph shows Perpendicular Magnetic Recording (PMR) being replaced by Heat Assisted Magnetic Recording (HAMR) starting in 2009. No-one has yet shipped HAMR drives. If we had stayed on the Kryder's Law curve we should have had 4TB 3.5" SATA drives in 2010. Instead, in late 2012 the very first 4TB drives are just hitting the market.
It was clear by mid-2011 that the industry had fallen off the Kryder curve. That was before the floods in Thailand destroyed 40% of the world's disk manufacturing capacity and doubled disk prices almost overnight. Prices are still about 60% more than they were before the floods and they are not expected to return to pre-flood levels until 2014. By then they should have been 50% lower. The latest industry projections are for no more than 20% per year improvement in bit density over the next 5 years. In our paper you will find a long list of reasons why even if this is correct, it may not result in a 20%/yr drop in price. These include industry consolidation, and the shift from a 3.5" to a 2.5" form factor.
Bill McKibben's Rolling Stone article Global Warming's Terrifying New Math< uses three numbers to illustrate the looming climate crisis. Here are three numbers that illustrate the looming crisis in long-term storage, its cost:
- According to IDC, the demand for storage each year grows about 60%.
- According to IHS iSuppli, the bit density on the platters of disk drives will grow no more than 20%/year for the next 5 years.
- According to computereconomics.com, IT budgets in recent years have grown between 0%/year and 2%/year.
Although about 70% of all bytes of storage produced each year is disk, both tape and solid state are alternatives for preservation. Tape's recording technology lags about 8 years behind disk; it is unlikely to run into the problems plaguing disk for some years. We can expect its relative cost advantage over disk to grow in the medium term.
Flash memory's advantages, including low power, physical robustness and low access latency have overcome its higher cost per byte in many markets, such as tablets and servers. Properly exploited, they could result in enough lower running costs to justify use for long-term storage too. But analysis by Mark Kryder and Chang Soo Kim (PDF) at Carnegie-Mellon is not encouraging about the prospects for flash and the range of alternate solid state technologies beyond the end of the decade.
Based on recent history and projections of future trends we can be fairly confident that the period when storage costs dropped rapidly is over at least for the medium term. This has two effects on the cost of preservation. First, the proportion of the total cost attributable to storage will rise. Second, the total cost of preservation will be higher than projected by current models, which assume Kryder's law continues as it did in the past.
Thus, as a component of overall models of the cost of preservation, we need a more sophisticated model of storage costs. One that doesn't simply assume Kryder's Law continues at 40%/yr, but allows us to investigate the effects of varying rates through time. I'm going to describe some results from one of the preliminary models we have built, others are in the paper.
There are three different business models for long-term storage:
- It can be rented, as for example with Amazon's S3 which charges an amount per GB per month.
- It can be monetized, as with Google's Gmail, which sells ads against your accesses to your e-mail.
- Or it can be endowed, as with Princeton's DataSpace, which requires data to be deposited together with a capital sum thought to be enough to fund its storage "for ever".
Recent research has cast doubt on both the theoretical and practical basis of DCF. Haldane and Davies of the Bank of England showed that investors using DCF systematically used discount rates that were too high (PDF), raising unjustified barriers to future investments.
Farmer and Geanakoplos showed that the use of a constant discount rate, which averages out the effects of periods of very high or (as now) very low interest rates, produced invalid results in the long term.
We built two prototype models. The second of which includes storage media, which are replaced when their service life is over or when newer media have costs low enough to justify migrating out of the old media into them. The media have running costs and costs for moving in and out. It uses a model of interest rates based on the 20-year history of inflation-protected US treasury bonds. An initial endowment earns interest and pays for purchase, running and media migration costs.
Here is a history of the prices charged by some major cloud storage services. As you can see, they have hardly dropped at all.
- Amazon's S3 launched March '06 at $0.15/GB/mo and is now $0.125/GB/mo, a 3%/yr drop.
- Rackspace launched May '08 at $0.15/GB/mo and
has not changedreduced prices to $0.10/GB/mo 1st June 2012, about a 9%/yr drop.
- Azure launched November '09 at $0.15/GB/mo and is now $0.14/GB/mo, a 3%/yr drop.
- Google launched October '11 at $0.13/GB/mo and has not changed.
cost figures published by the Backblaze PC backup service. To make the comparison fair, we assume that three geographically separate copies are maintained in Backblaze hardware, and, based on the San Diego Supercomputer Center study, that over 3 years non-hardware costs are double the hardware costs.
The model suggests that S3 is not competitive with local storage at any Kryder rate. But they don't have the same Kryder rates. If S3 continues its historic 3%/yr rate and Backblaze experiences the industry projection of a 20%/yr drop the endowment needed in S3 is more than 5 times larger.
Why is cloud storage so expensive? For the majority of customers, it isn't. Amazon prices S3 against the value it delivers to the majority of customers, not against their cost. That value is largely the flexibility to cope with spikes in demand. But digital preservation is the canonical example of an application with a stable, predictable demand for storage. S3's pricing model is inappropriate for this, as Amazon has acknowledged with their recent announcement of Glacier, a different service with a different pricing model that is aimed at the digital preservation market. Its headline pricing is 5-12 times lower than S3.
Why isn't cloud storage getting cheaper? Two reasons:
- Amazon has the vast majority of the market and is under no competitive pressure to reduce prices. Note that S3's competitors charge more than S3 does.
- Bandwidth charges and the hassles of getting large amounts of data out of S3 in order to move to a competitor provide a very effective customer lock-in.