Thursday, August 21, 2014

Is This The Dawn of DAWN?

More than three years ago, Ian Adams, Ethan Miller and I were inspired by a 2009 paper FAWN: A Fast Array of Wimpy Nodes from David Andersen et al at C-MU. They showed how a fabric of nodes, each with a small amount of flash memory and a very low-power processor, could process key-value queries as fast as a network of beefy servers using two orders of magnitude less power.

We put forward a storage architecture called DAWN: Durable Array of Wimpy Nodes, similar hardware but optimized for long-term storage. Its advantages were small form factor, durability, and very low running costs. We argued that these would outweigh the price premium for flash over disk. Recent developments are starting to make us look prophetic - details below the fold.

A year and a half ago Micron announced a very small TLC (Triple-Level Cell) flash memory chip, 128Gb in a 20nm chip 12mm on a side. It was very low cost but slow, with very limited write endurance. Facebook talked about using this TLC flash for cold data, where high speed and high write endurance aren't needed. Chris Mellor at The Register writes:
There's a recognition that TLC flash is cheap as chips and much faster to access than disk or even, wash your mouth out, tape. Frankovsky, speaking to Ars Technica, said you could devise a controller algorithm that tracked cell status and maintained, in effect, a bad cell list like a disk drive's bad block list. Dead TLC flash cells would just be ignored. By knowing which cells were good and which were bad you could build a cold storage flash resource that would be cheaper than disk, he reckons, because you wouldn't need techies swarming all over the data centre replacing broken disk drives from the tens of thousands that would be needed.
Last month, at the Flash Memory Summit, a startup called  NxGnData founded two years after we proposed DAWN announced a flash memory controller that would pretty much implement it. They claim:
  • Architectural features to deliver consistent performance and low power draw,
  • Best-in-class variable code rate LDPC-based ECC (Low-Density Parity-Check Error Correcting Codes) to extract the maximum endurance from flash devices,
  • Advanced signal processing capability enabling the use of MLC and TLC down to 1z-nm geometries,
  • Be the first in the industry with in-storage computation capability: In-Situ Processing,
  • Software-defined media channel architecture enabling Flash-agnostic SSD solutions,
  • Enablement of ultra-high capacity, low-cost TLC solutions for cold storage SSD.
In-situ processing? “The company believes that by moving computational tasks closer to where the data resides, its intelligent storage products will considerably improve overall energy efficiency and performance by eliminating storage bandwidth bottlenecks and the high costs associated with data movement.”
They are obviously thinking along the same lines that we were, and Facebook is about cold storage in TLC flash. Others in the storage industry are too, as evidenced by Seagate and Western Digital announcing Ethernet-connected disk drives. WD's drives even run Linux.