Thursday, March 31, 2016

The Amazon Tax

Ben Thompson at Stratechery has an insightful post entitled The Amazon Tax on the 10th anniversary of the rollout of Amazon S3:
Until then Amazon Web Services had primarily been about providing developers with a way to tap into the Amazon retail store; S3, though, had nothing at all to do with retail,2 at least not directly.
Below the fold, some comments.

Tuesday, March 29, 2016

Following Up On The Emulation Report

A meeting was held at the Mellon Foundation to follow up on my report Emulation and Virtualization as Preservation Strategies. I was asked to provide a brief introduction to get discussion going. The discussions were confidential, but below the fold is an edited text of my introduction with links to the sources.

Thursday, March 24, 2016

Long Tien Nguyen & Alan Kay's "Cuneiform" System

Jason Scott points me to Long Tien Nguyen and Alan Kay's paper from last October entitled The Cuneiform Tablets of 2015. It describes what is in effect a better implementation of Raymond Lorie's Universal Virtual Computer. They attribute the failure of the UVC to its complexity:
They tried to make the most general virtual machine they could think of, one that could easily emulate all known real computer architectures easily. The resulting design has a segmented memory model, bit-addressable memory, and an unlimited number of registers of unlimited bit length. This Universal Virtual Computer requires several dozen pages to be completely specified and explained, and requires far more than an afternoon (probably several weeks) to be completely implemented.
They are correct that the UVC was too complicated, but the reasons why it was a failure are far more fundamental and, alas, apply equally to Chifir, the much simpler virtual machine they describe. Below the fold, I set out these reasons.

Tuesday, March 22, 2016

The Dawn of DAWN?

At the 2009 SOSP David Anderson and co-authors from C-MU presented FAWN, the Fast Array of Wimpy Nodes. It inspired me to suggest, in my 2010 JCDL keynote, that the cost savings FAWN realized without performance penalty by distributing computation across a very large number of very low-power nodes might also apply to storage.

The following year Ian Adams and Ethan Miller of UC Santa Cruz's Storage Systems Research Center and I looked at this possibility more closely in a Technical Report entitled Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes. We showed that it was indeed plausible that, even at then current flash prices, the total cost of ownership over the long term of a storage system built from very low-power system-on-chip technology and flash memory would be competitive with disk while providing high performance and enabling self-healing.

Although flash remains more expensive than hard disk, since 2011 the gap has narrowed from a factor of about 12 to about 6. Pure Storage recently announced FlashBlade, an object storage fabric composed of large numbers of blades, each equipped with:
  • Compute – 8-core Xeon system-on-a-chip – and Elastic Fabric Connector for external, off-blade, 40GbitE networking,
  • Storage – NAND storage with 8TB or 52TB raw capacity of raw capacity and on-board NV-RAM with a super-capacitor-backed write buffer plus a pair of ARM CPU cores and an FPGA,
  • On-blade networking – PCIe card to link compute and storage cards via a proprietary protocol.
Chris Mellor at The Register has details and two commentaries.

FlashBlade clearly isn't DAWN. Each blade is much bigger, much more powerful and much more expensive than a DAWN node. No-one could call a node with an 8-core Xeon, 2 ARMs, and 52TB of flash "wimpy", and it'll clearly be too expensive for long-term bulk storage. But it is a big step in the direction of the DAWN architecture.

DAWN exploits two separate sets of synergies:
  • Like FlashBlade, it moves the computation to where the data is, rather then moving the data to where the computation is, reducing both latency and power consumption. The further data moves on wires from the storage medium, the more power and time it takes. This is why Berkeley's Aspire project's architecture is based on optical interconnect technology, which when it becomes mainstream will be both faster and lower-power than wires. In the meantime, we have to use wires.
  • Unlike FlashBlade, it divides the object storage fabric into a much larger number of much smaller nodes, implemented using the very low-power ARM chips used in cellphones. Because the power a CPU needs tends to grow faster than linearly with performance, the additional parallelism provides comparable performance at lower power.
So FlashBlade currently exploits only one of the two sets of synergies. But once Pure Storage has deployed this architecture in its current relatively high-cost and high-power technology, re-implementing it in lower-cost, lower-power technology should be easy and non-disruptive. They have done the harder of the two parts.

Thursday, March 17, 2016

Dr. Pangloss loves technology roadmaps

Its nearly three years since we last saw the renowned Dr. Pangloss chuckling with glee at the storage industry's roadmaps. But last week he was browsing Slashdot and found something much to his taste. Below the fold, an explanation of what the good Doctor enjoyed so much.

Tuesday, March 15, 2016

Elsevier and the Streisand Effect

Nearly a year ago I wrote The Maginot Paywall about the rise of research into the peer-to-peer sharing of academic papers via mechanisms including Library Genesis, Sci-Hub and #icanhazpdf. Although these mechanisms had been in place for some time they hadn't received a lot of attention. Below the fold, a look at how and why this has recently changed.

Friday, March 11, 2016

Talk on Evolving the LOCKSS Technology at PASIG

At the PASIG meeting in Prague I gave a brief update on the ongoing evolution of the LOCKSS technology. Below the fold, an edited text of the talk with links to the sources.

Thursday, March 10, 2016

Talk on Private LOCKSS Networks at PASIG

I stood in for Vicky Reich to give an overview of Private LOCKSS Networks to the PASIG meeting. Below the fold, an edited text of the talk with links to the sources.

Thursday, March 3, 2016

Death of the "free internet"?

I've linked before to the excellent work of Izabella Kaminska at the FT's Alphaville blog. She's recently started a new series of posts she's calling Web Perestroika:
an occasional series lamenting the hypothetical eventuality of a world without a free internet* and the extraordinary implications this could have for markets and companies. A tragedy of the web commons if you will.

It is inspired both by India’s ruling to bar Facebook from subsidising internet availability with Free Basics packages (see Kadhim’s series of posts for more on that) but also Balaji Srinivasan (he of 21 Inc toaster fame), and his attempts — including a Stanford Bitcoin course — to convince the world the web should in fact be a paid-for luxury product of scarcity.
And yes, the asterisk means she does understand that The Internet is not free:
*when we say “internet” we mean it in the popular sense of the word.
She means a world without free Web content. Below the fold, some thoughts on the first two posts in the series, both from Feb 10th.

Tuesday, March 1, 2016

The Cloudy Future of Disk Drives

For many years, following Dave Anderson of Seagate, I've been pointing out that the constraints of manufacturing capacity mean that the only medium available on which to store the world's bulk data is hard disk. Eric Brewer's fascinating FAST2016 keynote, entitled Spinning Disks and their Cloudy Future and Google's associated white paper, start from this premise:
The rise of portable devices and services in the Cloud has the consequence that (spinning) hard disks will be deployed primarily as part of large storage services housed in data centers. Such services are already the fastest growing market for disks and will be the majority market in the near future.
Eric's argument is that since cloud storage will shortly be the majority of the market, and that other segments are declining, the design of hard drives no longer needs to be a compromise suitable for a broad range of uses, but should be optimized for the Cloud. Below the fold, I look into some details of the optimizations and provide some supporting evidence.