Thursday, March 5, 2015

Archiving Storage Tiers

Tom Coughlin uses Hetzler's touch-rate metric to argue for tiered storage for archives in a two-part series. Although there's good stuff there, I have two problems with Tom's argument. Below the fold, I discuss them.

First, Tom's just wrong about Facebook's optical storage when he writes:
Finally let’s look at why a company like Facebook is interested in optical archives. The figure below shows the touch rate vs. response time for an optical storage system with a goal of <60 seconds response time, which can be met at a range of block sizes with 12 optical drives per 1 PB rack in an optical disc robotic library.
The reason Facebook gets very low cost by using optical technology is, as I wrote here, that they carefully schedule the activities of the storage system to place a hard cap on the maximum power draw, and to provide maximum write bandwidth. They don't have a goal of <60s random read latency. Their goals are minimum cost and maximum write bandwidth. The design of their system assumes that reads almost never happen, because they disrupt the write bandwidth. As I understand it, reads have to wait while a set of 12 disks is completely written. Then all 12 disks of the relevant group are loaded, read and the data staged back to the hard disk layers above the optical storage. Then a fresh set of 12 disks is loaded and writing resumes.

Facebook's optical read latency is vastly longer than 60s. The system Tom is analysing is a hypothetical system that wouldn't work nearly as well as Facebook's given their design goals. And the economics of such a system would be much worse than Facebook's.

Second, it is true that Facebook gains massive advantages from their multi-tiered long-term storage architecture, which has a hot layer, a warm layer, a hard-disk cold layer and a really cold optical layer. But you have to look at why they get these advantages before arguing that archives in general can benefit from tiering. Coughlin writes:
Archiving can have tiers. ... In tiering content stays on storage technologies that trade off the needs (and opportunities) for higher performance with the lower costs for higher latency and lower data rate storage. The highest value and most frequently accessed content is kept on higher performance and more expensive storage and the least valuable or less frequently accessed content is kept on lower performance and less expensive storage.
Facebook stores vast amounts of data, but a very limited set of different types of data, and their users (who are not archival users) read those limited types of data in highly predictable ways. Facebook can therefore move specific types of data rapidly to lower-performing tiers without imposing significant user-visible access latency.

More normal archives, and especially those with real archival users, do not have such highly predictable access patterns and will therefore gain much less benefit from tiering. More typical access patterns to archival data can be found in the paper at the recent FAST conference describing the two-tier (disk+tape) archive at the European Center for Medium-Range Weather Forecasting. Note that these patterns come from before the enthusiasm for "big data" drove a need to data-mine from archived information, which will reduce the benefit from tiering even more significantly.

Fundamentally, tiering like most storage architectures suffers from the idea that in order to do anything with data you need to move it from the storage medium to some compute engine. Thus an obsession with I/O bandwidth rather than what the application really wants, which is query processing rate. By moving computation to the data on the storage medium, rather than moving data to the computation, architectures like DAWN and Seagate's and WD's Ethernet-connected hard disks show how to avoid the need to tier and thus the need to be right in your predictions about how users will access the data.

1 comment:

David. said...

Facebook's BluRay cold storage technology is making its way into the open market. Panasonic has announced "freeze-ray", a product based on it:

"Panasonic worked with Facebook to design the freeze ray systems, Enokido said. But Panasonic won't have the market to itself. Rival Sony recently bought Optical Archive, a Facebook spin-off company that's working on similar technology. Also, Facebook planned to release its cold storage designs through the Open Compute Project, meaning other manufacturers can build similar products."