Tuesday, March 22, 2016

The Dawn of DAWN?

At the 2009 SOSP David Anderson and co-authors from C-MU presented FAWN, the Fast Array of Wimpy Nodes. It inspired me to suggest, in my 2010 JCDL keynote, that the cost savings FAWN realized without performance penalty by distributing computation across a very large number of very low-power nodes might also apply to storage.

The following year Ian Adams and Ethan Miller of UC Santa Cruz's Storage Systems Research Center and I looked at this possibility more closely in a Technical Report entitled Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes. We showed that it was indeed plausible that, even at then current flash prices, the total cost of ownership over the long term of a storage system built from very low-power system-on-chip technology and flash memory would be competitive with disk while providing high performance and enabling self-healing.

Although flash remains more expensive than hard disk, since 2011 the gap has narrowed from a factor of about 12 to about 6. Pure Storage recently announced FlashBlade, an object storage fabric composed of large numbers of blades, each equipped with:
  • Compute – 8-core Xeon system-on-a-chip – and Elastic Fabric Connector for external, off-blade, 40GbitE networking,
  • Storage – NAND storage with 8TB or 52TB raw capacity of raw capacity and on-board NV-RAM with a super-capacitor-backed write buffer plus a pair of ARM CPU cores and an FPGA,
  • On-blade networking – PCIe card to link compute and storage cards via a proprietary protocol.
Chris Mellor at The Register has details and two commentaries.

FlashBlade clearly isn't DAWN. Each blade is much bigger, much more powerful and much more expensive than a DAWN node. No-one could call a node with an 8-core Xeon, 2 ARMs, and 52TB of flash "wimpy", and it'll clearly be too expensive for long-term bulk storage. But it is a big step in the direction of the DAWN architecture.

DAWN exploits two separate sets of synergies:
  • Like FlashBlade, it moves the computation to where the data is, rather then moving the data to where the computation is, reducing both latency and power consumption. The further data moves on wires from the storage medium, the more power and time it takes. This is why Berkeley's Aspire project's architecture is based on optical interconnect technology, which when it becomes mainstream will be both faster and lower-power than wires. In the meantime, we have to use wires.
  • Unlike FlashBlade, it divides the object storage fabric into a much larger number of much smaller nodes, implemented using the very low-power ARM chips used in cellphones. Because the power a CPU needs tends to grow faster than linearly with performance, the additional parallelism provides comparable performance at lower power.
So FlashBlade currently exploits only one of the two sets of synergies. But once Pure Storage has deployed this architecture in its current relatively high-cost and high-power technology, re-implementing it in lower-cost, lower-power technology should be easy and non-disruptive. They have done the harder of the two parts.

2 comments:

David. said...

Chris Mellor at The Register reports that Andy Warfield, Coho's CTO, thinks FlashBlade is the wrong way to go. But even if his arguments apply to FlashBlade they don't apply to DAWN.

Ian Adams said...

A bit of shameless self promotion, but we had a paper at HotStorage this year looking at a small part of the computational storage problem space, specifically getting computations down to block storage without requiring that the block storage target itself become a full fledged server or file system itself.

Basic idea is to make a block storage device temporarily aware of what blocks compose a file or object. Then computations become relatively straightforward, and it doesn't upset the eco-system above it, and you can continue to use whatever transport is most convenient (iSCSI, NVME, etc). It also turned out to be pretty easy to adapt applications, as they just needed to hand off a file path and op-code and our library took care of the rest.


https://www.usenix.org/conference/hotstorage19/presentation/adams