Wednesday, January 1, 2014

Implementing DAWN?

In a 2009 paper "FAWN A Fast Array of Wimpy Nodes" David Andersen and his co-authors from C-MU showed that a network of large numbers of small CPUs coupled with modest amounts of flash memory could process key-value queries at the same speed as the networks of beefy servers used by, for example, Google, but using 2 orders of magnitude less power. In 2011, Ian Adams, Ethan Miller and I proposed extending this concept to long-term storage in a paper called “Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes”. DAWN was just a concept, we never built a system.

Now, in a fascinating talk at the Chaos Computer Conference called "On Hacking MicroSD Cards" the amazing Bunnie Huang and his colleague xobs revealed that much of the hardware for a DAWN system may already be on the shelves at your local computer store. Below the fold, details of the double-edged sword that is extremely low-cost hardware, to encourage you to read the whole post and watch the video of their talk.

Bunnie points out that the complexity of the algorithms needed to manage flash memory, especially at the low end where the vendors need to cover up the miserable quality of the actual chips, is increasing.
Flash memory is really cheap. So cheap, in fact, that it’s too good to be true. In reality, all flash memory is riddled with defects — without exception. The illusion of a contiguous, reliable storage media is crafted through sophisticated error correction and bad block management functions. This is the result of a constant arms race between the engineers and mother nature; with every fabrication process shrink, memory becomes cheaper but more unreliable. Likewise, with every generation, the engineers come up with more sophisticated and complicated algorithms to compensate for mother nature’s propensity for entropy and randomness at the atomic scale.
The low-cost way to implement the "sophisticated and complicated algorithms" turns out to be to with a micro-controller chip. So your SD card is actually some flash memory plus:
The embedded microcontroller is typically a heavily modified 8051 or ARM CPU. In modern implementations, the microcontroller will approach 100 MHz performance levels, and also have several hardware accelerators on-die. 
But:
The inevitable firmware bugs are now a reality of the flash memory business, and as a result it’s not feasible, particularly for third party controllers, to indelibly burn a static body of code into on-chip ROM. The crux is that a firmware loading and update mechanism is virtually mandatory, especially for third-party controllers.
xobs and Bunnie, who among his other amazing achievements reverse-engineered the encryption for the original Xbox, reverse-engineered the firmware loading protocol for one particular micro-controller and were able to insert code that ran on the only two SD cards they found that used it. (Later, they were able to do the same for a more modern micro-controller). Especially in the light of other talks at the conference, the obvious implication is that:
code execution on the memory card enables a class of MITM (man-in-the-middle) attacks, where the card seems to be behaving one way, but in fact it does something else.
On the other hand, the SD card implements everything needed for the DAWN concept except a network interface. One can easily envisage a box with something like a Raspberry Pi interfacing between a lot of SD cards and a network and power. Bunnie points out in the talk as he hands out cards, the real difficulty would be actually sourcing SD cards with a known micro-controller.

PS - check out bunnie and Jie Qi's Circuit Stickers!

1 comment:

Doug Gibbs said...

Speaking as someone who has implemented a complete SD Card stack and multiple host side controller drivers, I must ask, "are you nuts?"

The SD/MMC/eMMC standard four or 8 wire interface to the cards is a mess. The command and control messages to identify and access cards are not at all clean, or nicely scalable.

Basically you would need one processor per 1 or two cards. Unless the plan is to hack the cards, then you are in uncharted territory.

If you are going fully custom, there may be more useful work using the NAND flash parts directly, and doing direct flash on PCIe. It would be cheaper, more bandwidth and a much cleaner interface, not to mention faster.
http://en.wikipedia.org/wiki/NVM_Express