Thursday, October 20, 2016

A Cost-Effective Large LOCKSS Box

Back in August I wrote A Cost-Effective DIY LOCKSS Box, describing a small, 8-slot LOCKSS box capable of providing about 48TB of raw RAID-6 storage at about $64/TB. Now, the existing boxes in the CLOCKSS Archive's 12-node network are nearing the end of their useful life. We are starting a rolling program to upgrade them with much larger boxes to accommodate future growth in the archive.

Last week the first of the upgraded LOCKSS boxes came on-line. They are 4U systems with 45 slots for 3.5" drives from, the same boxes Backblaze uses. We are leaving 9 slots empty for future upgrades and populating the rest with 36 8TB WD Gold drives, giving about 224TB of raw RAID-6 storage, say a bit over 200TB after file system overhead. etc. We are specifying 64GB of RAM and dual CPUs. This configuration on the 45drives website is about $28K before tax and shipping. Using the cheaper WD Purple drives it would be about $19K.

45drives has recently introduced a cost-reduced version. Configuring this with 45 8TB Purple drives and 32GB RAM would get 280TB for $17K, or about $61/TB. It would be even cheaper with the Seagate 8TB archive drives we are using in the 8-slot box.

Tuesday, October 18, 2016

Why Did Institutional Repositories Fail?

Richard Poynder has a blogpost introducing a PDF containing a lengthy introduction that expands on the blog post and a Q&A with Cliff Lynch on the history and future of Institutional Repositories (IRs). Richard and Cliff agree that IRs have failed to achieve the hopes that were placed in them at their inception in a 1999 meeting at Santa Fe, NM. But they disagree about what those hopes were. Below the fold, some commentary.

Thursday, October 13, 2016

More Is Not Better

Quite a few of my recent posts have been about how the mainstream media is catching on to the corruption of science caused by the bad incentives all parties operate under, from science journalists to publishers to institutions to researchers. Below the fold I look at some recent evidence that this meme has legs.

Tuesday, October 11, 2016

Software Art and Emulation

Apart from a short paper describing a heroic effort of Web archaeology, recreating Amsterdam's De Digitale Stadt, the whole second morning of iPRES2016 was devoted to the preservation of software and Internet-based art. It featured a keynote by Sabine Himmelsbach of the House of Electronic Arts (HeK) in Basel, and three papers using the bwFLA emulation technology to present preserved software art (proceedings in one PDF):
  • A Case Study on Emulation-based Preservation in the Museum: Flusser Hypertext, Padberg et al.
  • Towards a Risk Model for Emulation-based Preservation Strategies: A Case Study from the Software-based Art Domain, Rechert et al.
  • Exhibiting Digital Art via Emulation – Boot-to-Emulator with the EMiL Kiosk System, Espenschied et al.
Preserving software art is an important edge case of software preservation. Each art piece is likely to have many more dependencies on specific hardware components, software environments and network services than mainstream software. Focus on techniques for addressing these dependencies in an emulated environment is useful in highlighting them. But it may be somewhat misleading, by giving an exaggerated idea of how hard emulating more representative software would be. Below the fold, I discuss these issues.

Thursday, October 6, 2016

Software Heritage Foundation

Back in 2009 I wrote:
who is to say that the corpus of open source is a less important cultural and historical artifact than, say, romance novels.
Back in 2013 I wrote:
Software, and in particular open source software is just as much a cultural production as books, music, movies, plays, TV, newspapers, maps and everything else that research libraries, and in particular the Library of Congress, collect and preserve so that future scholars can understand our society.
There are no legal obstacles to collecting and preserving open source code. Technically, doing so is much easier than general Web archiving. It seemed to me like a no-brainer, especially because almost all other digital preservation efforts depended upon the open source code no-one was preserving! I urged many national libraries to take this work on. They all thought someone else should do it, but none of the someones agreed.

Finally, a team under Roberto di Cosmo with initial support from INRIA has stepped into the breach. As you can see at their website they are already collecting a vast amount of code from open source repositories around the Internet. statistics 06Oct16
They are in the process of setting up a foundation to support this work. Everyone should support this important essential work.

Wednesday, October 5, 2016

Another Vint Cerf Column

Vint Cerf has another column on the problem of digital preservation. He concludes:
These thoughts immediately raise the question of financial support for such work. In the past, there were patrons and the religious orders of the Catholic Church as well as the centers of Islamic science and learning that underwrote the cost of such preservation. It seems inescapable that our society will need to find its own formula for underwriting the cost of preserving knowledge in media that will have some permanence. That many of the digital objects to be preserved will require executable software for their rendering is also inescapable. Unless we face this challenge in a direct way, the truly impressive knowledge we have collectively produced in the past 100 years or so may simply evaporate with time.
Vint is right about the fundamental problem but wrong about how to solve it. He is right that the problem isn't not knowing how to make digital information persistent, it is not knowing how to pay to make digital information persistent. Yearning for quasi-immortal media makes the problem of paying for it worse not better, because quasi-immortal media such as DNA are both more expensive and their more expensive cost is front-loaded. Copyability is inherent in on-line information, that's how you know it is on-line. Work with this grain of the medium, don't fight it.

Tuesday, October 4, 2016


LOCKSS is eighteen years old! I told the story of its birth three years ago.

There's a list of the publications in that time, and talks in the last decade, on the LOCKSS web site.

Thanks again to the NSF, Sun Microsystems, and the Andrew W. Mellon Foundation for the funding that allowed us to develop the system, and to the steadfast support of the libraries of the LOCKSS Alliance, and the libraries and publishers of the CLOCKSS Archive that has sustained it in production.