Tuesday, January 31, 2017

Preservable emulations

This post is an edited extract from my talk at last year's IIPC meeting. This part was the message I was trying to get across, but I buried the lede at the tail end. So I'm repeating it here to try and make the message clear.

Emulation technology will evolve through time. The way we expose emulations on the Web right now means that this evolution will break them. We're supposed to be preserving stuff, but the way we're doing it isn't preservable. We need to expose emulations to the Web in a future-proof way, a way whereby they can be collected, preserved and reanimated using future emulation technologies. Below the fold, I explain what is needed using the analogy of PDFs.

Wednesday, January 25, 2017

Rick Whitt on Digital Preservation

Google's Rick Whitt has published "Through A Glass, Darkly" Technical, Policy, and Financial Actions to Avert the Coming Digital Dark Ages (PDF), a very valuable 114-page review of digital preservation aimed at legal and policy audiences. Below the fold, some encomia and some quibbles (but much less than 114 pages of them).

Thursday, January 19, 2017

The long tail of non-English science

Ben Panko's English Is the Language of Science. That Isn't Always a Good Thing is based on Languages Are Still a Major Barrier to Global Science, a paper in PLOS Biology by Tatsuya Amano, Juan P. González-Varo and William J. Sutherland. Panko writes:
For the new study, Amano's team looked at the entire body of research available on Google Scholar about biodiversity and conservation, starting in the year 2014. Searching with keywords in 16 languages, the researchers found a total of more than 75,000 scientific papers. Of those papers, more than 35 percent were in languages other than English, with Spanish, Portuguese and Chinese topping the list.

Even for people who try not to ignore research published in non-English languages, Amano says, difficulties exist. More than half of the non-English papers observed in this study had no English title, abstract or keywords, making them all but invisible to most scientists doing database searches in English.
Below the fold, how this problem relates to work by the LOCKSS team.

Tuesday, January 10, 2017

Gresham's Law

Jeffrey Beall, who has done invaluable work identifying predatory publishers and garnered legal threats for his pains, reports that:
Hyderabad, India-based open-access publisher OMICS International is on a buying spree, snatching up legitimate scholarly journals and publishers, incorporating them into its mega-fleet of bogus, exploitative, and low-quality publications. ... OMICS International is on a mission to take over all of scholarly publishing. It is purchasing journals and publishers and incorporating them into its evil empire. Its strategy is to saturate scholarly publishing with its low-quality and poorly-managed journals, aiming to squeeze out and acquire legitimate publishers.
Below the fold, a look at how OMICS demonstrates the application of Gresham's Law to academic publishing.

Friday, January 6, 2017

Star Wars Storage Media

At Motherboard, Sarah Jeong's From Tape Drives to Memory Orbs, the Data Formats of Star Wars Suck is a must-read compendium of the ridiculous data storage technologies of the Empire and its enemies.

Its a shame that she uses "formats" when she means "media". But apart from serious questions like:
Why must the Death Star plans be stored on a data tape the size of four iPads stacked on top each other? Obi-Wan can carry a map of the entire galaxy in a glowing marble, and at the end of Episode II, Count Dooku absconds with a thumb drive or something that contains the Death Star plans.
absolutely the best thing about it is that it inspired Cory Doctorow to write Why are the data-formats in Star Wars such an awful mess? Because filmmakers make movies about filmmaking. Doctorow understands that attitudes to persistent data storage are largely hang-overs from the era of floppy disks and ZIP drives:
But we have a persistent myth of the fragility of data-formats: think of the oft-repeated saw that books are more reliable than computers because old floppy disks and Zip cartridges are crumbling and no one can find a drive to read them with anymore. It's true that media goes corrupt and also true that old hardware is hard to find and hard to rehabilitate, but the problem of old floppies and Zips is one of the awkward adolescence of storage: a moment at which hard-drives and the systems that managed them were growing more slowly than the rate at which we were acquiring data.
So:
the destiny of our data will be to move from live, self-healing media to live, self-healing media, without any time at rest in near-line or offline storage, the home of bitrot. 
Just go read the whole of both pieces.

Thursday, January 5, 2017

Transition (personal)

After eighteen and a quarter years I'm now officially retired from Stanford and the LOCKSS Program. Its been a privilege to work with the LOCKSS team all this time, and especially with Tom Lipkis, whose engineering skills were essential to the program's success.

I'm grateful to Michael Keller, Stanford's Librarian, who has consistently supported the program, to the National Science Foundation, Sun Microsystems, and the Andrew W. Mellon Foundation (especially to Don Waters) for funding the development of the system, and to the member institutions of the LOCKSS Alliance and the CLOCKSS Archive for supporting the system's operations.

I'm still helping with a couple of on-going projects, so I still have a stanford.edu e-mail address. And I have the generous Stanford retiree benefits. Apart from my duties as a grandparent, and long-delayed tasks such as dealing with the mess in the garage, I expect also to be doing what I can to help the Internet Archive, and continuing to write for my blog.

Wednesday, January 4, 2017

Error 400: Blogger is Bloggered

If you tried to post a comment and got the scary message:
Bad Request
Error 400
please read below the fold for an explanation and a work-around.

Tuesday, January 3, 2017

Travels with a Chromebook

Two years ago I wrote A Note Of Thanks as I switched my disposable travel laptop from an Asus Seashell to an Acer C720 Chromebook running Linux. Two years later I'm still traveling with a C720. Below the fold, an update on my experiences.