Friday, February 25, 2011

Paying for Long-Term Storage

I was part of a panel on Economics at the Personal Digital Archiving 2011 conference at the Internet Archive. I talked about the idea of endowing data with a sum sufficient to pay for its indefinite storage. I first blogged about this in 2007. Below the fold is an edited and expanded version of my talk with links to sources.

Friday, February 18, 2011

FAST'11

I attended USENIX's File And Storage Technologies conference. Here's a brief list of the things that caught my attention:
  • The first paper, and one of the Best Paper awardees, was "A Study of Practical Deduplication" (PDF), an excellent overview of deduplication applied to file systems. It makes available much valuable new data. In their environment whole-file deduplication achieves about 3/4 of the total savings from aggressive block-level deduplication.
  • In fact, deduplication and flash memory dominated the conference. "Reliably Erasing Data From Flash-Based Solid State Drives" from a team at UCSD on revealed that, because flash memories effectively require copy-on-write techniques, they contain many logically inaccessible copies of a file. These copies are easily accessible by de-soldering the chips and thus gaining a physical view of the storage. Since existing "secure delete" techniques can't go around the controller, and most controllers either don't or don't correctly implement the "sanitization" commands, it is essential to use encrypted file systems on flash devices if they are to store confidential information.
  • Even worse, Michael Wei's presentation of this paper revealed that at least one flash controller was doing block deduplication "under the covers". This is very tempting, in that it can speed up writes and extend the device lifetime considerably. But it can play havoc with the techniques file systems use to improve robustness.
  • "AONT-RS: Blending Security and Performance in Dispersed Storage Systems" was an impressive overview of how all-or-nothing transforms can provide security in Cleversafe's k-of-n dispersed storage system, without requiring complex key management schemes. I will write more on this in subsequent posts.
  • "Exploiting Memory Device Wear-Out Dynamics to Improve NAND Flash Memory System Performance" from RPI provides much useful background on the challenges flash technology faces in maintaining reliability as densities increase.
  • Although it is early days, it was interesting that several papers and posters addressed the impacts that non-volatile RAM technologies such as Phase Change Memory and memristors will have.
  • "Repairing Erasure Codes" was an important Work In Progress talk from a team at USC, showing how to reduce one of the more costly functions of k-of-n dispersed storage systems, organizing a replacement when one of the n slices fails. Previously, this required bringing together at least k slices, but they showed that it was possible to manage it with many fewer slices for at least some erasure codes, though so far none of the widely used ones. The talk mentioned this useful Wiki of papers about storage coding.

Tuesday, February 15, 2011

Disk growth

The Register interprets a recent analyst briefing by Seagate as predicting that this year could see the long-awaited 4TB 3.5" drive introduction. This is based on Seagate's claim of a 6-th generation of Perpendicular Magnetic Recording (PMR) technology, and The Register's guess that this would provide a 30% increase in areal density. Thomas Coughlin makes similar projections but with only an 18% increase in areal density. These projections can be viewed optimistically, as continuing the somewhat slower growth in capacity of recent years, or pessimistically, as the industry being forced to stretch PMR technology because the transition to newer technologies (HARM and BPM) offering much higher densities is proving much more difficult and expensive than anticipated.

On a related note, Storage Newsletter reports on Trend Focus's estimate that the industry shipped 88 Exabytes of disk capacity in the last quarter of 2010, made up of 29.3 Exabytes of mobile drives, 48.4 Exabytes of desktop drives, 2.6 Exabytes of enterprise drives, and 8 Exabytes of drives for consumer equipment (primarily DVRs). There were 73 million mobile and 64 million desktop drives, confirming that the market is moving strongly to the (lower capacity) 2.5" form factor.

Cisco estimates that the global IP traffic was 15 Exabytes/month at the start of 2010 growing at 45%/year. If they were right, the rate at the end of 2010 would be 66 Exabytes per quarter. The 88 Exabytes per quarter rate of disk shipments is still capable of storing all the IP traffic in the world. Because unit shipments of disks are growing slowly, and the capacity of each unit is growing less than 45%/year, they will shortly become unable to do so.

Tuesday, February 8, 2011

Are We Facing a "Digital Dark Age?"

Last October I gave a talk to the Alumni of Humboldt University in Berlin as part of the celebrations of their 200th anniversary. It was entitled "Are We Facing A 'Digital Dark Age?'". Below the fold is an edited text of this talk, which was aimed at a non-technical audience.