Friday, May 23, 2014

Bezos' Law

Greg O'Connor had a piece at Gigaom entitled Moore's Law Gives Way To Bezos' Law sparked by the recent price cuts by Google and Amazon in which he claimed that:
The latest cuts make it clear there’s a new business model driving cloud that is every bit as exponential in growth — with order of magnitude improvements to pricing — as Moore’s Law has been to computing.
If you need a refresher, Moore’s Law is “the observation that, over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years.” I propose my own version, Bezos’s law. Named for Amazon CEO Jeff Bezos, I define it as the observation that, over the history of cloud, a unit of computing power price is reduced by 50 percent approximately every three years.
Both Moore's and Kryder's laws held for multiple decades. Below the fold I ask whether a putative Bezos' law could be equally long-lived?

Wednesday, May 21, 2014

DAWN is breaking

I posted last October on Seagate's announcement of Kinetic, their object storage architecture for Ethernet-connected hard drives (and ultimately other forms of storage). This is a conservative approach to up-levelling the interface to storage media, providing an object storage architecture with a fixed but generally useful set of operations. In that way it is similar to, but less ambitious than, our proposed DAWN architecture.

The other half of the disk drive industry has now responded with a much more radical approach. Western Digital's HGST unit has announced Ethernet connected drives that run Linux. This approach has some significant advantages:
  • It sounds great as a marketing pitch.
  • It gets computing as close as possible to the data, which is the architecturally correct direction to be moving. This is something that DAWN does but Kinetic doesn't.
  • It will be easy to make HGST's drives compatible with Seagate's by running an implementation of the Kinetic protocol on them.
  • It provides a great deal of scope for researching and developing suitable protocols for communicating with storage media over IP.
But it is also very risky:
  • In many cases manufacturers find disks returned under warranty work fine; the cause of the failure was an unrepeatable bug in the disk firmware. Running Linux on the drive will provide a vastly increased scope for such failures, and make diagnosing them much harder for the manufacturer.
  • If the interface between the Linux and the drive hardware emulates the existing SATA or other interface, the benefits of the architecture will be limited to some extent. On the other hand, to the extent it exposes more of the hardware it will increase the risk that applications will screw up the hardware.
  • Kinetic's approach takes security of the communication with the drives seriously. HGST's "anything goes" approach leaves this up to the application.
On balance I think that HGST's acceptance that up-levelling the interface to media is important is a very positive development.

Thursday, May 15, 2014

Stored safe in the Cloud

Steve Kolowich at The Chronicle of Higher Education reports on a major outage and data loss on May 6 at Dedoose:
Dedoose, a cloud-based application for managing research data, suffered a “devastating” technical failure last week that caused academics across the country to lose large amounts of research work, some of which may be gone for good.
...
The crash nonetheless has dealt frustrating setbacks to a number of researchers, highlighting the risks of entrusting data to third-party stewards.
Below the fold, I look at what has been reported and discuss some of these risks.

Wednesday, May 14, 2014

Talk at Seagate

I gave a talk at Seagate entitled:
Storage Will Be
Much Less Free
Than It Used To Be
Below the fold is an edited text with links to the sources.

Tuesday, May 13, 2014

Named Data Networking gets major grant

Gigaom explains some great news for the future of the Internet from yesterday. The Named Data Networking project is one of three projects originally funded under the NSF's Future Internet Architecture program to share a $15M grant to support trial deployments of their new architectures.


Named Data Networking is inspired by Van Jacobson's work on Content-Centric Networking (CCN), which continues. They just announced a further code release. I explained the importance of CCN for digital preservation in a long blog post early last year.

Tuesday, May 6, 2014

On the Economics of Throwing Stuff Away

I've been arguing for some time that storing bits will be a lot less free than it used to be. The Big Data zealots who say:
Save it all—you never know when it might come in handy for a future data-mining expedition.
will have to adapt to this new reality. Below the fold I look at possible adaptations.