In the aftermath of the Dyn DDoS attack too much is happening to fit into a comment on Tuesday's post. Below the fold, a roundup of the last two day's news from the IoT war zone.
I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation.
Thursday, October 27, 2016
Tuesday, October 25, 2016
You Were Warned
Four weeks ago yesterday I posted The Things Are Winning about the IoT-based botnet attack on Krebs On Security. I wrote:
It is important to note that these attacks are far from the largest we can expect, and that it is extraordinarily difficult to obtain reliable evidence as to who is responsible. Attackers will be able to produce effects far more disruptive than a temporary inability to tweet with impunity. Below the fold some commentary and useful links.
And don't think that knocking out important individual Web sites like KrebsOnSecurity is the limit of the bad guys capabilities. Everyone seems to believe that the current probing of the root servers' defenses is the work of China but, as the Moon Worm showed, careful preparation isn't necessarily a sign of a state actor. There are many bad guys out there who could take the Internet down; the only reason they don't is not to kill the goose that lays the golden eggs.Last Friday's similar attack on Dyn, a major US DNS provider, caused many of its major customer websites to be inaccessible, including Twitter, Amazon, Tumblr, Reddit, Spotify, Netflix, PayPal and github. Dyn's DNS infrastructure was so overloaded that requests for name-to-IP-address translations were dropped or timed out. The LOCKSS team uses github, so we were affected.
It is important to note that these attacks are far from the largest we can expect, and that it is extraordinarily difficult to obtain reliable evidence as to who is responsible. Attackers will be able to produce effects far more disruptive than a temporary inability to tweet with impunity. Below the fold some commentary and useful links.
Thursday, October 20, 2016
A Cost-Effective Large LOCKSS Box
Back in August I wrote A Cost-Effective DIY LOCKSS Box, describing a small, 8-slot LOCKSS box capable of providing about 48TB of raw RAID-6 storage at about $64/TB. Now, the existing boxes in the CLOCKSS Archive's 12-node network are nearing the end of their useful life. We are starting a rolling program to upgrade them with much larger boxes to accommodate future growth in the archive.
Last week the first of the upgraded LOCKSS boxes came on-line. They are 4U systems with 45 slots for 3.5" drives from 45drives.com, the same boxes Backblaze uses. We are leaving 9 slots empty for future upgrades and populating the rest with 36 8TB WD Gold drives, giving about 224TB of raw RAID-6 storage, say a bit over 200TB after file system overhead. etc. We are specifying 64GB of RAM and dual CPUs. This configuration on the 45drives website is about $28K before tax and shipping. Using the cheaper WD Purple drives it would be about $19K.
45drives has recently introduced a cost-reduced version. Configuring this with 45 8TB Purple drives and 32GB RAM would get 280TB for $17K, or about $61/TB. It would be even cheaper with the Seagate 8TB archive drives we are using in the 8-slot box.
Last week the first of the upgraded LOCKSS boxes came on-line. They are 4U systems with 45 slots for 3.5" drives from 45drives.com, the same boxes Backblaze uses. We are leaving 9 slots empty for future upgrades and populating the rest with 36 8TB WD Gold drives, giving about 224TB of raw RAID-6 storage, say a bit over 200TB after file system overhead. etc. We are specifying 64GB of RAM and dual CPUs. This configuration on the 45drives website is about $28K before tax and shipping. Using the cheaper WD Purple drives it would be about $19K.
45drives has recently introduced a cost-reduced version. Configuring this with 45 8TB Purple drives and 32GB RAM would get 280TB for $17K, or about $61/TB. It would be even cheaper with the Seagate 8TB archive drives we are using in the 8-slot box.
Tuesday, October 18, 2016
Why Did Institutional Repositories Fail?
Richard Poynder has a blogpost introducing a PDF containing a lengthy introduction that expands on the blog post and a Q&A with Cliff Lynch on the history and future of Institutional Repositories (IRs). Richard and Cliff agree that IRs have failed to achieve the hopes that were placed in them at their inception in a 1999 meeting at Santa Fe, NM. But they disagree about what those hopes were. Below the fold, some commentary.
Thursday, October 13, 2016
More Is Not Better
Quite a few of my recent posts have been about how the mainstream media is catching on to the corruption of science caused by the bad incentives all parties operate under, from science journalists to publishers to institutions to researchers. Below the fold I look at some recent evidence that this meme has legs.
Tuesday, October 11, 2016
Software Art and Emulation
Apart from a short paper describing a heroic effort of Web archaeology, recreating Amsterdam's De Digitale Stadt, the whole second morning of iPRES2016 was devoted to the preservation of software and Internet-based art. It featured a keynote by Sabine Himmelsbach of the House of Electronic Arts (HeK) in Basel, and three papers using the bwFLA emulation technology to present preserved software art (proceedings in one PDF):
- A Case Study on Emulation-based Preservation in the Museum: Flusser Hypertext, Padberg et al.
- Towards a Risk Model for Emulation-based Preservation Strategies: A Case Study from the Software-based Art Domain, Rechert et al.
- Exhibiting Digital Art via Emulation – Boot-to-Emulator with the EMiL Kiosk System, Espenschied et al.
Thursday, October 6, 2016
Software Heritage Foundation
Back in 2009 I wrote:
Finally, a team under Roberto di Cosmo with initial support from INRIA has stepped into the breach. As you can see at their website they are already collecting a vast amount of code from open source repositories around the Internet.
They are in the process of setting up a foundation to support this work. Everyone should support this important essential work.
who is to say that the corpus of open source is a less important cultural and historical artifact than, say, romance novels.Back in 2013 I wrote:
Software, and in particular open source software is just as much a cultural production as books, music, movies, plays, TV, newspapers, maps and everything else that research libraries, and in particular the Library of Congress, collect and preserve so that future scholars can understand our society.There are no legal obstacles to collecting and preserving open source code. Technically, doing so is much easier than general Web archiving. It seemed to me like a no-brainer, especially because almost all other digital preservation efforts depended upon the open source code no-one was preserving! I urged many national libraries to take this work on. They all thought someone else should do it, but none of the someones agreed.
Finally, a team under Roberto di Cosmo with initial support from INRIA has stepped into the breach. As you can see at their website they are already collecting a vast amount of code from open source repositories around the Internet.
softwareheritage.org statistics 06Oct16 |
Wednesday, October 5, 2016
Another Vint Cerf Column
Vint Cerf has another column on the problem of digital preservation. He concludes:
These thoughts immediately raise the question of financial support for such work. In the past, there were patrons and the religious orders of the Catholic Church as well as the centers of Islamic science and learning that underwrote the cost of such preservation. It seems inescapable that our society will need to find its own formula for underwriting the cost of preserving knowledge in media that will have some permanence. That many of the digital objects to be preserved will require executable software for their rendering is also inescapable. Unless we face this challenge in a direct way, the truly impressive knowledge we have collectively produced in the past 100 years or so may simply evaporate with time.Vint is right about the fundamental problem but wrong about how to solve it. He is right that the problem isn't not knowing how to make digital information persistent, it is not knowing how to pay to make digital information persistent. Yearning for quasi-immortal media makes the problem of paying for it worse not better, because quasi-immortal media such as DNA are both more expensive and their more expensive cost is front-loaded. Copyability is inherent in on-line information, that's how you know it is on-line. Work with this grain of the medium, don't fight it.
Tuesday, October 4, 2016
RU18?
LOCKSS is eighteen years old! I told the story of its birth three years ago.
There's a list of the publications in that time, and talks in the last decade, on the LOCKSS web site.
Thanks again to the NSF, Sun Microsystems, and the Andrew W. Mellon Foundation for the funding that allowed us to develop the system, and to the steadfast support of the libraries of the LOCKSS Alliance, and the libraries and publishers of the CLOCKSS Archive that has sustained it in production.
There's a list of the publications in that time, and talks in the last decade, on the LOCKSS web site.
Thanks again to the NSF, Sun Microsystems, and the Andrew W. Mellon Foundation for the funding that allowed us to develop the system, and to the steadfast support of the libraries of the LOCKSS Alliance, and the libraries and publishers of the CLOCKSS Archive that has sustained it in production.
Panel on Software Preservation at iPRES
I was one of five members of a panel on Software Preservation at iPRES 2016, moderated by Maureen Pennock. We each had three minutes to answer the question "what have you contributed towards software preservation in the past year?" Follow me below the fold for my answer.