Monday, July 28, 2014

TRAC Certification of the CLOCKSS Archive

The CLOCKSS Archive is a dark archive of e-journal and e-book content, jointly managed by publishers and libraries, implemented using the LOCKSS technology and operated on behalf of the CLOCKSS not-for-profit by the LOCKSS team at the Stanford Library. For well over a year the LOCKSS team and CLOCKSS management have been preparing for and undergoing the Trustworthy Repositories Audit and Certification (TRAC) process for the CLOCKSS Archive with the Center for Research Libraries (CRL).

CRL just released the Certification Report on the CLOCKSS Archive. I'm happy to report that our work was rewarded with an overall score that equals the previous best, and the first ever perfect score in the "Technologies, Technical Infrastructure, Security" category. We are grateful for this wonderful endorsement of the LOCKSS technology.

In the interests of transparency the LOCKSS team have released all the non-confidential documentation submitted during the audit process. As you will see, there is a lot of it. What you see at the link is not exactly what we submitted. It has been edited to correct errors and obscurities we found during the audit, and to add material from the confidential part of the submission that we decided was not really confidential. These documents will continue to be edited as the underlying reality changes, to keep them up-to-date and satisfy one of the on-going requirements of the certification.

This is just a news item. In the near future I will follow up with posts describing the process of being audited, what we did to make the process work, and the lessons we learned that may be useful for future audits.

Friday, July 25, 2014

Coronal Mass Ejections

In my talk What Could Possibly Go Wrong last April I referred to a paper on the 2012 Coronal Mass Ejection (CME) that missed Earth by only nine days:
Most of the information needed to recover from such an event exists only in digital form on magnetic media. These days, most of it probably exists only in "the cloud", which is this happy place immune from the electromagnetic effects of coronal mass ejections and very easy to access after the power grid goes down.
NASA has a post discussing recent research into CMEs which is required reading:
Analysts believe that a direct hit by an extreme CME such as the one that missed Earth in July 2012 could cause widespread power blackouts, disabling everything that plugs into a wall socket.  Most people wouldn't even be able to flush their toilet because urban water supplies largely rely on electric pumps.
An extreme CME called the "Carrington Event" actually did hit the Earth in September 1859:
Intense geomagnetic storms ignited Northern Lights as far south as Cuba and caused global telegraph lines to spark, setting fire to some telegraph offices and thus disabling the "Victorian Internet."
A similar storm today could have a catastrophic effect. According to a study by the National Academy of Sciences, the total economic impact could exceed $2 trillion or 20 times greater than the costs of a Hurricane Katrina. 
Not to worry, because:
In February 2014, physicist Pete Riley of Predictive Science Inc. published a paper in Space Weather entitled "On the probability of occurrence of extreme space weather events."  In it, he analyzed records of solar storms going back 50+ years.  By extrapolating the frequency of ordinary storms to the extreme, he calculated the odds that a Carrington-class storm would hit Earth in the next ten years.

The answer: 12%.
Only 12%. I'd say that CMEs need to be part of the threat model of digital preservation systems.

Tuesday, July 1, 2014

Discounting the far future

In 2011 Andrew Haldane and Richard Davies of the Bank of England (HD) presented research showing that, when making investment decisions, investors applied discount rates much higher than the prevailing interest rates, and that this gap was increasing through time. One way of looking at their results was as an increase in short-termism; investors were increasingly reluctant to make investments with a long-term payoff. This reluctance clearly has many implications, including making dealing with climate change even more difficult. Their work has influenced our efforts to build an economic model of long-term storage, another area where the benefits accrue over a long period of time.

Now, Stefano Giglio of the Booth School and Matteo Maggiori and Johannes Stroebel of the Stern School (GMS) have a post entitled Discounting the very distant future announcing a paper entitled Very Long-Run Discount Rates. Their work, at first glance, seems to contradict HD. Below the fold, I look into this apparent disagreement.

Tuesday, June 24, 2014


For a long time there have been a number of possible "holy grails" for digital preservation, ideas that if it were possible to implement them would transform the problem. One of them has been the idea of an Internet-scale peer-to-peer network that would use excess disk storage at everyone's computers, in the same way that networks like Folding@Home use excess CPU, to deliver a robust, attack-resistant, decentralized storage infrastructure. Intermemory, from NEC's Princeton lab in 1998, was one of the first, but the concept is so attractive that there have been many others, such as Berkeley's Oceanstore. None have succeeded in attracting the mass participation of projects such as Folding@Home. None have become a widely-used infrastructure for digital preservation because without mass participation none provides the needed robustness or capacity.

By far the most successful peer-to-peer network in attracting participation has been Bitcoin, because the reward for participation is monetary. Now, it seems to me that Andrew Miller and his co-authors from the University of Maryland and Microsoft Research have taken a giant step towards this "holy grail" with their paper Permacoin: Repurposing Bitcoin Work for Data Preservation (hereafter MJSPK). This is despite the fact that, as I predicted in a comment last April, the current Bitcoin implementation has now definitively failed in its goal of establishing a decentralized currency because GHash has, for extended periods, controlled an absolute majority of the mining power. Follow me below the fold for my analysis of Permacoin and how this failure affects it.

Friday, June 20, 2014

X Window System turns 30

Yesterday was the 30th anniversary of Bob Scheifler's announcement of the first release of the X Window System. Congratulations to Bob and the other pioneers! That doesn't include me - I started work at Sun on X a couple of years later by doing the first port (of Version 10) to non-DEC hardware.

Thursday, June 19, 2014

More on long-lived media

I've already written skeptically about the concept of quasi-immortal media as a solution to the problem of digital preservation. But the misplaced enthusiasm continues. The latest wave surrounds Facebook's prototype Petabyte Blu-Ray jukebox; one of its touted features was the the media had a 50-year life. The prototype is extraordinarily interesting, and I hope to write more about it soon. But I doubt Facebook or anyone expects that the hardware will still be in use in 10 years, let alone 50. After all, you can search any large-scale data center in vain for 10-year-old hardware. So why is a 50-year media life interesting in this application? Follow me below the fold for yet another dose of skepticism.

Tuesday, June 17, 2014

Digital New York Times

More than four years ago Marc Andreesen gave a talk at Stanford's Business School in which, among many other interesting topics, he talked about the problems the New York Times had dealing with digital media. The recently leaked NYT Innovation Report 2014, the result of a six-month review headed by the Times' heir apparent, shows how prescient Andreesen was. Below the fold, some evidence.