Friday, June 19, 2015

EE380 talk on eBay storage

Russ McElroy & Farid Yavari gave a talk to Stanford's EE380 course describing how eBay's approach to storage (YouTube) is driven by their Total Cost of Ownership (TCO) model. As shown in this screengrab, by taking into account all the cost elements, they can justify the higher capital cost of flash media in much the way, but with much more realistic data and across a broader span of applications, that Ian Adams, Ethan Miller and I did in our 2011 paper Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes.

We were inspired by a 2009 paper FAWN A Fast Array of Wimpy Nodes in which David Andersen and his co-authors from C-MU showed that a network of large numbers of small CPUs coupled with modest amounts of flash memory could process key-value queries at the same speed as the networks of beefy servers used by, for example, Google, but using 2 orders of magnitude less power.

As this McElroy slide shows, power cost is important and it varies over a 3x range (a problem for Kaminska's thesis about the importance of 21 Inc's bitcoin mining hardware). He specifically mentions the need to get the computation close to the data, with ARM processors in the storage fabric. In this way the amount of data to be moved can be significantly reduced, and thus the capital cost, since as he reports the cost of the network hardware is 25% of the cost of the rack, and it burns a lot of power.

At present, eBay relies on tiering, moving data to less expensive storage such as consumer hard drives when it hasn't been accessed in some time. As I wrote last year:
Fundamentally, tiering like most storage architectures suffers from the idea that in order to do anything with data you need to move it from the storage medium to some compute engine. Thus an obsession with I/O bandwidth rather than what the application really wants, which is query processing rate. By moving computation to the data on the storage medium, rather than moving data to the computation, architectures like DAWN and Seagate's and WD's Ethernet-connected hard disks show how to avoid the need to tier and thus the need to be right in your predictions about how users will access the data.
That post was in part about Facebook's use of tiering, which works well because Facebook has highly predictable data access patterns. McElroy's talk suggests that eBay's data accesses are somewhat predictable, but much less so than Facebook's. This makes his implication that tiering isn't a good long-term approach plausible.

1 comment:

  1. Patrick Thibodeau at Computerworld pointed me to a paper by Jonathan Koomey of Stanford and Jon Taylor of the Athensis Group which reports that 30% of data center servers are zombies, powered up but doing no useful work. This figure is the same as was reported in 2008 by McKinsey, and in 2014 by the Uptime Group.

    The NRDC believes that the problem is concentrated in smaller data centers, and not in the large cloud centers, which burn only about 5% of the total.

    ReplyDelete