DSHR's Blog: November 2014

Tuesday, November 25, 2014

Dutch vs. Elsevier

The discussions between libraries and major publishers about subscriptions have only rarely been actual negotiations. In almost all cases the libraries have been unwilling to walk away and the publishers have known this. This may be starting to change; Dutch libraries have walked away from the table with Elsevier. Below the fold, the details.

Friday, November 21, 2014

Steve Hetzler's "Touch Rate" Metric

Steve Hetzler of IBM gave a talk at the recent Storage Valley Supper Club on a new, scale-free metric for evaluating storage performance that he calls "Touch Rate". He defines this as the proportion of the store's total content that can be accessed per unit time. This leads to some very illuminating graphs that I discuss below the fold.

Tuesday, November 18, 2014

Talk "Costs: Why Do We Care?"

Investing in Opportunity: Policy Practice and Planning for a Sustainable Digital Future sponsored by the 4C project and the Digital Preservation Coalition featured a keynote talk each day. The first, by Fran Berman, is here.

Mine was the second, entitled Costs: Why Do We Care? It was an update and revision of The Half-Empty Archive, stressing the importance of collecting, curating and analyzing cost data. Below the fold, an edited text with links to the sources.

Monday, November 17, 2014

Andrew Odlyzko Strikes Again

Last year I blogged about Andrew Odlyzko's perceptive analysis of the business of scholarly publishing. Now he's back with an invaluable, must-read analysis of the economics of the communication industry entitled Will smart pricing finally take off?. Below the fold, a taste of the paper and a validation of one of his earlier predictions from the Google Scholar team.

Friday, November 14, 2014

Talk at Storage Valley Supper Club

I gave a very short talk to the Storage Valley Supper Club's 8^th meeting. Below the fold, an edited text with links to the sources.

Wednesday, November 12, 2014

Five Minutes Of Fame

On Monday, Chris Mellor at The Register had a piece with a somewhat misleading title that provides a good summary of the argument we've been making since at least early 2011 that the Kryder rate, the rate of annual decrease in the cost per byte of storage, had slowed dramatically. As we have shown, this slowing has huge implications for the cost of long-term storage.

Today, Chris is back with a similar summary of Preeti Gupta et al's MASCOTS paper, An Economic Perspective of Disk vs. Flash Media in Archival Storage. This paper reports on some more sophisticated economic modelling that supports the argument of DAWN: a Durable Array of Wimpy Nodes. This 2011 technical report showed that, using a similar fabric to Carnegie-Mellon's 2009 FAWN: a Fast Array of Wimpy Nodes for long-term storage instead of computation, the running costs would be low enough to overcome the much higher cost of the flash media as compared to disk

Monday, November 10, 2014

Gossip protocols: a clarification

Butch Lazorchak blogged on the Library of Congress' Digital Preservation blog about one of his take-aways from the Library's Designing Storage Architectures workshop; the importance of anti-entropy protocols for preservation. He talks about these as "a subtype of “gossip” protocols" and cites LOCKSS as an example, saying:

Not coincidentally, LOCKSS “consists of a large number of independent, low-cost, persistent Web caches that cooperate to detect and repair damage to their content by voting in “opinion polls” (PDF). In other words, gossip and anti-entropy.

The main use for gossip protocols is to disseminate information in a robust, randomized way, by having each peer forward information it receives from other peers to a random selection of other peers. As the function of LOCKSS boxes is to act as custodians of copyright information, this would be a very bad thing for them to do.

It is true that LOCKSS peers communicate via an anti-entropy protocol, and it is even true that the first such protocol they used, the one I implemented for the LOCKSS prototype, was a gossip protocol in the sense that peers forwarded hashes of content to each other. Alas, that protocol was very insecure. Some of the ways in which it was insecure related directly to its being a gossip protocol.

An intensive multi-year research effort in cooperation with Stanford's CS department to create a more secure anti-entropy protocol led to the current protocol, which won "Best Paper" at the 2003 Symposium on Operating System Principles. It is not a gossip protocol in any meaningful sense (see below the fold for details). Peers never forward information they receive from other peers, all interactions are strictly pair-wise and private.

For the TRAC audit of the CLOCKSS Archive we provided an overview of the operation of the LOCKSS anti-entropy protocol; if you are interested in the details of the protocol this, rather than the long and very detailed paper in ACM Transactions on Computer Systems (PDF), is the place to start.

Monday, November 3, 2014

First US web page

Stanford's Web Archiving team of Nicholas Taylor and Ahmed AlSum have bought up SWAP, the Stanford Web Archive Portal, using the Open Wayback code developed under IIPC auspices from the Internet Archive's original. And, thanks to the Stanford staff's extraordinary ability to recover data from old backups, it features the very first US web page, bought up by Paul Kunz at SLAC around 6^th Dec. 1991.