Tuesday, November 26, 2013

In-browser emulation

Jeff Rothenberg's ground-breaking 1995 article Ensuring the Longevity of Digital Documents described and compared two techniques to combat format obsolescence; format migration and emulation, concluding that emulation was the preferred approach. As time went by and successive digital preservation systems went into production it became clear that almost all of them rejected Jeff's conclusion, planning to use format migration as their preferred response to format obsolescence. Follow me below the fold for a discussion on why this happened and whether it still makes sense.

Wednesday, November 20, 2013

Patio Perspectives at ANADP II: Preserving the Other Half

Vicky Reich and I moderated a session at ANADP II entitled Patio Perspectives Session 2: New Models of Collaborative Preservation. The abstract for the session said:
This session will explore how well current preservation models keep our evolving scholarly communication products accessible for the short and long term. Library and publisher practices are changing in response to scholars' needs and market constraints. Where are the holes in our current approaches, and how can they be filled? Or are completely new models required?
I gave a brief introductory talk; an edited text with links to the sources is below the fold.

Thursday, November 14, 2013

Estimating Storage Costs

Ethan Miller points me to a paper on the cost of storage, How Much Does Storage Really Cost? Towards a Full Cost Accounting Model for Data Storage by Amit Kumar Dutta and Ragib Hasan (DH) of the University of Alabama, Birmingham. Unfortunately, the conference at which it was presented, GECON 2013, is one of those whose proceedings are published in Springer's awful Lecture Notes in Computer Science series, so no link. Below the fold, discussion of the relationship between DH and our on-going work on the economics of long-term storage.

Tuesday, November 12, 2013

The Bitcoin vulnerability

Last month I wrote a ten-year retrospective of some of the ideas underlying the LOCKSS anti-entropy protocol in our SOSP paper, relating them to recent work on securing SSL communications. This month Ittay Eyal and Emin Gun Sirer (ES) published an important paper describing a vulnerability in Bitcoin. There are two similarities between this attack and the stealth modification attack we examined in that paper:
  • The attack involves a conspiracy in which the members strategically switch between good and bad behavior. The defense involves randomizing the behavior of the peers. The general lesson is that predictable behavior by honest peers is often easy to exploit.
  • The attack involves deploying an army of Sybil peers that appear legitimate but are actually under the control of the conspiracy. The defense involves making peer operations expensive using a proof-of-work technique. The general lesson is that peer reputations cheaply acquired are worth what they cost.
Follow me below the fold for the details.

Wednesday, November 6, 2013

Fire at Internet Archive

A side building at the Internet Archive used for book-scanning was consumed by fire last night. The people, the data and the library are safe but the Internet Archive is asking for donations to help them rebuild. If you can afford to, please help; I just did.

Update: they need to replace an estimated $600K in scanning equipment, plus rebuild the building.

Tuesday, November 5, 2013

Cloud lock-in

Back in June I used the demise of Google Reader to list a number of business issues with using third-party cloud storage services for long-term digital preservation. Scott Gilbertson was one of the users who were left high and dry. He has an interesting piece at The Register about the process of recovering from the loss of Reader. He starts from the well-known but very apt quote:
If you're not paying for something, you're not the customer; you're the product being sold.
then points out that:
Just because you are paying companies like Google, Apple or Microsoft you might feel they are, some how, beholden to you. The companies are actually beholden only to their stockholders whose interests may or may not be aligned with your own, so will change services accordingly.
and, after pointing out how easy it is these days for users to run cloud-like services for themselves, ends up concluding:
If you aren't hosting your data, it's not your data.
Also, Joe McKendrick at ZDnet pointed me to the Open Group's interesting Cloud Computing Portability and Interoperability. Joe introduces it by saying:
Along with security, one of the most difficult issues with cloud platforms is the risk of vendor lock-in. By assigning business processes and data to cloud service providers, it may get really messy and expensive to attempt to dislodge from the arrangement if it's time to make a change.
The guide, compiled by a team led by Kapil Bakshi and Mark Skilton, provides key pointers for enterprises seeking to develop independently functioning clouds, as well as recommendations to the industry on standards that need to be adopted or extended.
It is mainly about avoiding getting locked-in to a vendor of cloud computing services rather than cloud storage services, so its focus is on open, standard interfaces to such services. But the main message of both pieces is that any time you are using cloud services, you need an up-to-date, fully costed exit strategy. Trying to come up with an exit strategy when you're given 13 days notice that you need one is guaranteed to be an expensive disaster.