Thursday, June 19, 2014

More on long-lived media

I've already written skeptically about the concept of quasi-immortal media as a solution to the problem of digital preservation. But the misplaced enthusiasm continues. The latest wave surrounds Facebook's prototype Petabyte Blu-Ray jukebox; one of its touted features was the the media had a 50-year life. The prototype is extraordinarily interesting, and I hope to write more about it soon. But I doubt Facebook or anyone expects that the hardware will still be in use in 10 years, let alone 50. After all, you can search any large-scale data center in vain for 10-year-old hardware. So why is a 50-year media life interesting in this application? Follow me below the fold for yet another dose of skepticism.

How does Facebook know the media have 50-year life? There are no 50-year-old Blu-Ray disks around to test. The media life that the manufacturers claim is the result of the kind of experiment we did on hard disks. You load batches of media into ovens and subject them to temperatures and humidity levels much higher than they would ever encounter in real life for a few thousand hours, see how much data degradation has occurred, then use models such as Arrhenius' and Eyring's to project how much slower this rate of degradation would be in normal conditions. In other words, the 50 year number is a projection based on a model of the mechanisms of data degradation. As with disks, we can expect that in the real world degradation will be much faster, because the models will not capture all the bad things that the real world does to bits on the media.

The reason people believe the manufacturers claimed 50-year media life is that it sounds like a solution to the problem of preserving data. It suggests that you can write your data to the medium, store it for 50 years, and read your data back undamaged. And that in turn suggests that you can design a long-term storage system without having to worry about detecting and recovering from media failures.

Getting seduced by this superficially attractive idea has two problems. First, media failures are only one of many, many threats to stored data, but they are the only threat long-lived media address even claim to address.

Second, in the real world, you can't ignore the possibility of media failures. That a data storage medium has a long life does not mean that it is inherently more reliable than a medium with a shorter life; it means that the reliability degrades more slowly. Media are specified with an Unrecoverable Bit Error Rate (UBER). The Blu-Ray manufacturers don't appear to have announced their UBER, but it will probably be around 10-14, the same as a SATA disk. This means that reading a Petabyte (8*1018 bits) from Blu-Ray immediately after you wrote it you can expect to read up to 80K bad bits. The number of bad bits will increase with time, just more slowly than a medium such as disk with the same UBER but a shorter life. This fact alone shows that long-lived media are not, in themselves, a solution.

4 comments:

Ian Adams said...

Also to say nothing about the reliability or likelihood of the readers surviving/being supported far into the future.

David. said...

Chris Mellor at The Register reports on more misplaced enthusiasm for quasi-immortal media at Hitachi's federal system division.

David. said...

Not content with their 1000-year disk, Hitachi claims that data written into fused silica glass is good for 300 million years.

Note the complexity of the reading process. But not to worry. The inevitability of technological progress means that readers 300 million years in the future will easily be able to cope.

David. said...

The latest entrant in the competition to hype new quasi-immortal media is ETH Zurich with DNA encapsulated in silicon spheres. The analogy with fossils is actually quite convincing.