Tuesday, August 8, 2017

Approaching The Physical Limits

As storage media technology gets closer and closer to the physical limits, progress on reducing the $/GB number slows down. Below the fold, a recap of some of these issues for both disk and flash.

Thursday, August 3, 2017

Preservation Is Not A Technical Problem

As I've always said, preserving the Web and other digital content for posterity is an economic problem. With an unlimited budget collection and preservation isn't a problem. The reason we're collecting and preserving less than half the classic Web of quasi-static linked documents, and much less of "Web 2.0", is that no-one has the money to do much better.

The budgets of libraries and archives, the institutions tasked with acting as society's memory, have been under sustained attack for a long time. I'm working on a talk and I needed an example. So I drew this graph of the British Library's annual income in real terms (year 2000 pounds). It shows that the Library's income has declined by almost 45% in the last decade.

Memory institutions that can purchase only half what they could 10 years ago aren't likely to greatly increase funding for acquiring new stuff; it's going to be hard for them just to keep the stuff (and the staff) they already have.

Below the fold, the data for the graph and links to the sources.

Tuesday, August 1, 2017

Disk media market update

Its time for an update on the disk media market., based on reporting from The Register's Chris Mellor here and here and here.

Thursday, July 27, 2017

Decentralized Long-Term Preservation

Lambert Heller is correct to point out that:
name allocation using IPFS or a blockchain is not necessarily linked to the guarantee of permanent availability, the latter must be offered as a separate service.
Storage isn't free, and thus the "separate services" need to have a viable business model. I have demonstrated that increasing returns to scale mean that the "separate service" market will end up being dominated by a few large providers just as, for example, the Bitcoin mining market is. People who don't like this conclusion often argue that, at least for long-term preservation of scholarly resources, the service will be provided by a consortium of libraries, museums and archives. Below the fold I look into how this might work.

Tuesday, July 25, 2017

Initial Coin Offerings

The FT's Alphaville blog has started a new series, called ICOmedy looking at the insanity surrounding Initial Coin Offerings (ICOs). The blockchain hype has created an even bigger opportunity to separate the fools from their money than the dot-com era did. To motivate you to follow the series, below the fold there are some extracts and related links.

Thursday, July 20, 2017

Patting Myself On The Back

Cost vs. Kryder rate
I started working on economic models of long-term storage six years ago, and quickly discovered the effect shown in this graph. It plots the endowment, the money which, deposited with the data and invested at interest, pays for the data to be stored "forever", as a function of the Kryder rate, the rate at which $/GB drops with time. As the rate slows below about 20%,  the endowment needed rises rapidly. Back in early 2011 it was widely believed that 30-40% Kryder rates were a law of nature, they had been that way for 30 years. Thus, if you could afford to store data for the next few years you could afford to store it forever

2014 cost/byte projection
As it turned out, 2011 was a good time to work on this issue. That October floods in Thailand destroyed 40% of the world's disk manufacturing capacity, and disk prices spiked. Preeti Gupta at UC Santa Cruz reviewed disk pricing in 2014 and we produced this graph. I wrote at the time:
The red lines are projections at the industry roadmap's 20% and a less optimistic 10%. [The graph] shows three things:
  • The slowing started in 2010, before the floods hit Thailand.
  • Disk storage costs in 2014, two and a half years after the floods, were more than 7 times higher than they would have been had Kryder's Law continued at its usual pace from 2010, as shown by the green line.
  • If the industry projections pan out, as shown by the red lines, by 2020 disk costs per byte will be between 130 and 300 times higher than they would have been had Kryder's Law continued.
Backblaze average $/GB
Thanks to Backblaze's admirable transparency, we have 3 years more data. Their blog reports on their view of disk pricing as a bulk purchaser over many years. It is far more detailed than the data Preeti was able to work with. Eyeballing the graph, we see a 2013 price around 5c/GB and a 2017 price around half that. A 10% Kryder rate would have meant a 2017 price of 3.2c/GB, and a 20% rate would have meant 2c/GB, so the out-turn lies between the two red lines on our graph. It is difficult to make predictions, especially about the future. But Preeti and I nailed this one.

This is a big deal. As I've said many times:
Storage will be
Much less free
Than it used to be
The real cost of a commitment to store data for the long term is much greater than most people believe, and there is no realistic prospect of a technological discontinuity that would change this.

Tuesday, July 11, 2017

Is Decentralized Storage Sustainable?

There are many reasons to dislike centralized storage services. They include business risk, as we see in le petit musée des projets Google abandonnés, monoculture vulnerability and rent extraction. There is thus naturally a lot of enthusiasm for decentralized storage systems, such as MaidSafe, DAT and IPFS. In 2013 I wrote about one of their advantages in Moving vs. Copying. Among the enthusiasts is Lambert Heller. Since I posted Blockchain as the Infrastructure for Science, Heller and I have been talking past each other. Heller is talking technology; I have some problems with the technology but they aren't that important. My main problem is an economic one that applies to decentralized storage irrespective of the details of the technology.

Below the fold is an attempt to clarify my argument. It is a re-statement of part of the argument in my 2014 post Economies of Scale in Peer-to-Peer Networks, specifically in the context of decentralized storage networks.