Wednesday, February 22, 2012
We gave a work-in-progress paper (PDF) and a well-received poster (PDF) on our economic modeling work at the 2012 FAST conference. As usual, the technical sessions featured some very interesting papers, although this year it was hard to find any relevant to long-term storage. Below the fold are notes on the papers that caught my eye.
My favorite, and one of the Best Paper awardees, was Revisiting Storage for Smartphones. Hyojun Kim, Nitin Agrawal, and Cristian Ungureanu of NEC Labs demonstrated, convincingly if counter-intuitively, that the performance bottleneck for a wide range of common apps on smartphones, such as browsers, is not the phone network but the random write performance of the phone's NAND flash memory. The reason is that the apps, for good reasons, are using SQLite databases in flash to store their state. For example, the browser uses SQLite to store the URL-to-filename map of its cache of web pages. SQLite, like most databases, does a lot of random I/O.
Flash memory is fast, so why is this a problem? It turns out that, measured on an actual phone, the sequential write performance of common SD cards is typically 5-10MB/s. But the random write performance is typically 3 orders of magnitude worse, around 10KB/s. Moving the SQLite database into (simulated) PCM memory, which does random writes quickly, improved a web browsing benchmark from ~500s to ~200s.
SFS: Random Write Considered Harmful in Solid State Drives by Changwoo Min and colleagues from Sungkyunkwan University and Samsung focused on the same problem, describing a log-structured file system optimized for flash that achieved both good random write performance and good flash lifetime by minimizing block erases.
Optimizing NAND Flash-Based SSDs via Retention Relaxation by Ren-Shuo Liu and colleagues from the National Taiwan University and Intel suffered from a slightly misleading presentation, which led many in the audience to think that they had misunderstood the file system traces they were using to evaluate their technique. But in the end it was clear that they hadn't, and that the technique was another interesting approach to the problem of NAND flash write performance.
In conventional NAND flash memory, writes are slow because it needs many small incremental voltage pulses to get the cell into a state that will remain in the voltage range required to represent the programmed bits long enough to achieve (with the help of ECC) the specified data retention time. They speed up writes by using a smaller number of larger voltage increments. This results in data that won't last as long. But much of the data written to mass storage gets overwritten before it gets old. They avoid losing data by having the flash controller sweep through in the background and copy data that has survived for a few weeks without being overwritten. For the copy they use traditional slow, precise writes so it lasts the usual long time.
The other well-deserved best paper award winner was Recon: Verifying File System Consistency at Runtime by Daniel Fryer and colleagues at the University of Toronto. Their goal is to avoid the need for off-line file system consistency checks, such as fsck. The basic idea is to ensure that every update to file system metadata results in a consistent state of the persistent storage, and fail any updates that would not.
They claim that since file systems are structured to maintain consistency efficiently, it should be possible to check consistency efficiently, at least at the boundaries of file system transactions. They use fast, local consistency invariants derived from the global consistency constraints of the file system to do this.
Recon has to live in the block layer, below the file system, and intercept the I/O on its way from the file system to the disk. When it sees a transaction commit, it needs to compare the proposed new state with the old state to see what changed. This needs file-system specific code to interpret the metadata. In most cases recon is within the margin of error of the system without it, and in the worst case only about 8% slower, while catching all the problems fsck does. The ability to ensure that the file system remains in a consistent state eliminates the need to run fsck, and thus lots of time on reboots after crashes.