Wednesday, July 24, 2013

Talk at Digital Preservation 2013

I was on a panel at the Library of Congress' Digital Preservation 2013 meeting entitled Green Bytes: Sustainable Approaches to Digital Stewardship. Below the fold is the text of my brief presentation, with links to the sources.

However much we would all like to be environmentally responsible in our digital preservation activities, it is an unfortunate fact that reducing our energy demand isn't the biggest problem we face. As the Blue Ribbon Task Force on Sustainable Digital Preservation and Access pointed out 3 years ago, to no-one's surprise, the big problem is economic. No-one has the money to preserve even the stuff they regard as high priority. The only way funds for preservation can be justified is by a commitment to provide access.

The research into the historical costs of digital preservation can be summarized by the following rule of thumb. Ingest takes about half, preservation (mainly storage) takes about one-third, and access about one-sixth of the total. How much of the storage cost does power represent?

Lets take a Seagate 4TB SATA drive, retail price $170, which has a typical operating power draw of 7.5W, and the current Palo Alto Utilities small business rate which averages $0.13353/KWhr. Assuming it was operating for the whole of a 4-year service life, the disk's power would cost $35.09. Backblaze uses 45 similar but more expensive Hitachi drives to build its Storage Pod 3.0, which has dual redundant 760W power supplies. Assuming the pod can survive a power supply failure with all drives operating, the drives would take 337.5W and the rest of the system 422.5W, or a total of about 2.25 times the disk alone. The drive's share of the total system build cost is $213.17. With its share of the system power the drive would use over 4 years $78.95 worth of power, or 27% of the total 4-year cost. As you can see, power is a significant cost of preservation. But even if disk is the only medium it is probably only about 10% of the total, and is about one-third as important as the cost of the disk media.

Eventually, power will become a priority. Kryder's Law, the exponential increase in disk density, used to mean that you could grow your collection 40%/yr at roughly constant power. About 3 years ago it became clear that Kryder's law was slowing; now even optimistic projections are for no more than 20%. Many archives grow faster than 20%/yr; the more they do the more their power costs will increase.

Although demands for computation during ingest can be high, they are over quickly and thus don't contribute much to the energy demand of long-term preservation. We have a very low-energy way to store data - write it to durable off-line media and put them in a salt mine. But this doesn't provide usable access, and it raises awkward preservation issues about media obsolescence and integrity verification.

We can get a little bit of access by holding the off-line media in robots, but the robot infrastructure uses energy all the time, and the exponential increase in storage media density means that we don't keep the media for their theoretical life, but migrate (using energy) to newer media when they are no longer dense enough to justify their slot in the robot.

But increasingly the access scholars want is keyword search and other forms of data-mining. Robots full of off-line media cannot support this. A recent paper on Characteristics of low-carbon data centres shows that the key to reducing power consumption is to use the servers efficiently, assigning and migrating tasks to keep the powered-up servers fully utilized and keep as many as possible powered-down. While it is easy to migrate tasks among servers to keep only a fraction of them powered-up, this isn't practical for storage. Even before the demand for search and data-mining, research showed that there were few hot-spots in the access patterns to preserved data. Search and data-mining will spread access fairly evenly across the entire collection. This is likely to raise both the proportion of cost attributable to access, and the proportion attributable to power.

Thus, to satisfy the demands of scholars, at least one copy of your preserved data has to be on disk or some other medium with equivalent access latency and bandwidth. Can we design a storage medium that provides rapid yet energy-efficient access together with very low energy usage through time?

In 2009, a team from CMU showed that a fabric of a large number of very low-power CPUs each with a fairly small amount of flash memory could answer key-value queries at the same speed as conventional servers, using two orders of magnitude less power per query. They called their architecture FAWN (Fast Array of Wimpy Nodes). It worked well because the key-value problem parallelizes well, and because the I/O performance of the wimpy CPU and the flash memory was much better than disk.

In 2011 Ian Adams, Ethan Miller and I showed that, if the life-cycle costs were properly accounted for, a similar approach could be cost-competitive with disk for long-term storage despite the much higher initial cost of flash. It would provide rapid access with a much lower energy demand than conventional disk storage. We called our architecture DAWN (Durable Array of Wimpy Nodes). It worked well because of a series of synergistic effects that greatly reduced power consumption and led to much longer media service life.

Unfortunately, there is an important caveat, namely "if life-cycle costs were properly accounted for". The much higher capital costs of DAWN are balanced by much lower running costs over a much longer media service life than a disk system like Backblaze's. I don't know any institution operating a digital archive that has a planning horizon long enough to make that tradeoff. Most operate on an annual budget cycle. Large savings in, say, years 4 through 10 are ignored. Amazon, a company that is notorious for not worrying about making a profit, does have a long enough planning horizon. It is one of the reasons they dominate the market for Web services, especially storage.

Because your short-termism means you aren't going to buy the initially more expensive but in the longer term cheaper DAWN systems, vendors aren't going to make them. They have their own version of short-termism. They are very happy to sell you a product with a limited service life. Doing so is called planned obsolescence, and it has a long history. In the storage world it is driven by Kryder's Law. In 2009 I blogged about Dave Anderson's description of:
... Seagate's investigation of the idea of a disk drive specifically for archival use. Technologically, it was easy. They could build a very reliable, long-lived drive. But there was no way to make money building it. One reason was that customers wouldn't buy it; the economics for them of replacing older drives with newer ones that were identical in all respects except for greater capacity are irresistible. ... The other reason was that even if customers did want to buy these drives, they would be a niche product sold in small volumes. So they would cost a lot more per byte than the consumer drives. Customers are, with good reason, skeptical of manufacturer's claims for reliability. Thus even if the special archival disks actually did repay the additional cost in greater reliability, it would likely not be possible to persuade customers of this.
Of course, as Kryder's Law slows down, it makes sense to keep the drives for longer. But it slows down slowly, never providing a big enough motivation to invest the extra up-front to get the lower running costs out to the planning horizon.

So, the bulk of preserved data is going to be on hard disk, burning more power than it should, for a good long time. Economics means that dramatic technological change, even it can reduce power consumption by orders of magnitude, isn't viable in the marketplace. This is both because power isn't (yet) a big part of the total cost of preservation, and because institutions systematically discount the impact of future running costs.

Nevertheless, there are things you can do while we wait for the dramatic technological change that can reduce your power consumption. They won't make a big difference to the overall cost, but every little helps. The following speakers, Kris Carpenter Negulescu and Krishna Kant, will address them.


  1. My partners on the panel also gave valuable insights; I will link to their materials as soon as I can find them on-line.

    Kris Carpenter recounted the remarkable success the Internte Archive has had using their data center's waste heat to heat their church, which has no air-conditioning, and by scheduling activities to flatten out peaks in usage. Being on the foggy side of San Francisco is a great advantage; there are only a few days a year when it gets hot enough to force them to slow or stop activities. Much of this is driven by a home-built $500 on-line, real-time, whole data-center power monitor.

    Krishna Kant made two important points. First, what matters is not the power usage but the carbon footprint. Running on renewable power reduces the carbon footprint even if the power consumed is the same. But also, the carbon footprint of the disk media, the servers and the buildings needs to be figured in. Second, the carbon footprint of data you don't store is zero. Techniques for selecting, de-duplicating and compressing data can have a huge impact.

    In this context, Kris pointed to the Internet Archive's use of a team of interns to look in detail at the 20,000 or so most storage-consuming sites in the Wayback Machine to see whether all their content was really important for the mission of collecting and preserving a representative sample of the Web for the future.

  2. The Rocky Mountain Institute has an interesting blog post on the idea of Distributed Green Data Centers:

    "But there is a major paradigm shift at work: the PODs are designed to shift “load” by migrating server workloads to other PODs via fiber optic connections when local, inherently variable renewables aren’t producing enough power, using these resources much more effectively."

    Again, the idea here is to exploit the agility of services to concentrate them where power can be used most effectively. Unfortunately for preservation, storage can't be rapidly re-located in this way. If, as for example with Amazon's S3, replicated storage is distributed among data centers then services can be re-located among the replicas so that replicas with expensive power could be taken off-line. But spin-up/spin-down times for large storage arrays would still limit the granularity of re-location and thus how effective this technique was.