Tuesday, September 20, 2016

Brief Talk at the Storage Architecture Meeting

I was asked to give a brief summary of the discussions at the "Future of Storage" workshop to the Library of Congress' Storage Architecture meeting. Below the fold, the text of the talk with links to the sources.

As always, I'm grateful to the Library of Congress for inviting me. I was asked to give a brief report of what happened at the DARPA workshop on "The Future of Storage" that took place at Columbia last May. There has yet to be a public report on the proceedings, so I can't be specific about who (other than me) said what.

Three broad areas were discussed. First, I and others looked at the prospects for bulk storage over the medium term. How long is the medium term? Hard disk has been shipping for 60 years. Flash as a storage medium is nearly 30 years old (Eli Harari filed the key enabling patent in 1988), and it has yet to make an impact on bulk storage. It is pretty safe to say that these two media will dominate the bulk storage market for the next 10-15 years.

WD unit shipments
The debate is about how quickly flash will displace hard disk in this space. Flash is rapidly displacing hard disk in every market except bulk storage. High performance, low power, high density and robustness overwhelm the higher price per byte of flash.

WD revenues
In unit volume terms, we have hit peak disk. Since disk manufacture is a volume business, these reduced unit volumes are causing both major manufacturers financial difficulties, resulting in layoffs at both, and manufacturing capacity reductions.

Seagate revenues
These financial difficulties make the investments needed to further increase densities, specifically HAMR, more difficult. Continuing the real-time schedule slip of this much-delayed technology further into the future is reducing the rate at which $/GB decreases, and thus making hard disk less competitive with flash. Though it is worth noting that shingled drives are now available. We're starting to use Seagate's very affordable 8TB archive drives.

Exabytes shipped
Despite these difficulties, hard disk completely dominates the bytes of storage shipped. What would it take for flash to displace hard disk in the bulk storage market?

The world's capacity to make bytes of flash would have to increase dramatically. There are two possible (synergistic) ways to do this; it could be the result of either or both of:
  • Flash vs HDD Capex
    Building a lot of new flash fabs. This is extremely expensive, but flash advocates point to current low interest rates and strategic investment by Far East governments as a basis for optimism.

    But even if the money is available, bringing new fabs into production takes time. In the medium term it is likely that the fabs will come on-line, and accelerate the displacement of hard disk, but this won't happen quickly.
  • Increasing the bytes of storage on each wafer from existing fabs. Two technologies can do this; 3D flash is in volume production and quad-level cell (16 bits/cell) is in development. Although both are expensive to manufacture, the investment in doing so is a lot less than a whole new fab, and the impact is quicker.

    Write endurance
    As the table shows, the smaller the cell and the more bits it holds the lower the write endurance (and the lower the reliability). But QLC at larger cell size is competitive with TLC at a smaller size. QLC isn't likely to be used for write-intensive workloads but archival uses fit its characteristics well. Whether enough of the bulk storage market has low enough write loads to use QLC economically is an open question.
Second, there was discussion of potential alternate storage media, including DNA. Nature has just amplified the hype about DNA storage with How DNA could store all the world’s data, based on recent research from Microsoft involving 151KB of data. I believe that DNA will be an important archival medium decades from now, but to get there will require solving huge problems:
  • Source
    Writing data to DNA needs to get 6-8 orders of magnitude cheaper. The goal of the recently announced HGP-W project is to reduce it only 3 orders of magnitude in a decade. It has been getting cheaper more slowly than hard disk or flash.
  • Reading the data may be cheap but is always going to be very slow, so the idea that "DNA can store all the world's data" is misleading. At best it could store all the world's backups; there needs to be another copy on some faster medium.
  • The use of long-lived media whose writing cost is vastly greater than their reading cost is extremely difficult to justify. It is essentially a huge bet against technological progress.
  • As we see with HAMR there is a very long way between lab demos of working storage media and market penetration. We are many years from working DNA storage media.
Third, there was discussion of aggressive compression techniques. DARPA's customers have vast amounts of surveillance video on which they do face recognition and other feature extractions, resulting in much less data to store. But for forensic purposes, for example after an attack, they would like to be able to reconstruct the small fraction of the total that was relevant. This is becoming possible. By storing a small amount of additional data with the extracted features, and devoting an immense amount of computation to the task, the original video can be recovered. This provides an amazing level of compression, but it probably isn't suitable for most archival content.

Thanks are due to Brian Berg and Tom Coughlin for input to this talk, which drew on the reporting of Chris Mellor at The Register, but these opinions are mine alone.

No comments: