Tuesday, March 3, 2015


I wasn't able to attend IDCC2015 two weeks ago in London, but I've been catching up with the presentations on the Web. Below the fold, my thoughts on a few of them.

Tony Hey's opening keynote is an 84-slide tour through the last decade of e-science, which punctuates the normal optimistic gee-whizzery of e-science talks with some cautionary observations. Many of them are in the form of well-chosen quotes. Three that are particularly relevant are:
  • Michael Kurtz (ADS): "The problem with curation is that the funding is almost entirely local but in the digital world the use is mainly global. Leads to tragedy of the commons where no one will assume long-term obligation to curate and manage data which is mainly not from local sources."
  • James Frew (UCSB): "Frew’s first law: scientists don’t write metadata. Frew’s second law: any scientist can be forced to write bad metadata."
  • Michael Lesk: "Most of the cost of archiving is spent at the start, before we know whether the articles will be read or the data used. With data, with no emotional investment in peer review, it might be easier to do a simpler form of deposit, where as much as possible is postponed till the data are called for. There is of course some risk that a just-in-time system will leave us, some years down the road, with a data set which we wish we had curated while the creator was still alive. However, the longer the data has gone unused, the more likely it is to never be used."
My favorites presentations were from the British Library's Web archiving team. Helen Hockx-Yu's closing keynote was an overview of the first ten years of the program, including the start of non-print digital legal deposit. I've always liked the way the BL's repository strategy leveraged the distributed nature of the UK's print legal deposit system to implement Lots Of Copies (well, four, but that's way more than most).

Some of the BL's Web archive, the part for which they have website owner's permission, is freely available. The major part, including the 2013 and 2014 UK domain crawls, is available only on-site. Both feature faceted full-text search.
Andy Jackson's brief talk explained that although the BL is restricted by copyright from making most of its Web collections freely available, they can and have (as they have always done) make their metadata freely available as Open Data. He showed this example of the link data from 1996, and Helen's slides are full of many other interesting examples of the way archived web data can be analysed and used by scholars.

Ł. Bolikowski, A. Nowiński, and W. Sylwestrzak from the University of Warsaw presented another potential use for blockchain technology, to mint persistent identifiers. Although their proposal is technically feasible, their presentation does not address any of the many reasons I'm skeptical of the idea that blockchains are the Solution to Everything.

Matthew Addis gave a great marketing pitch for the use of the Arkivum service for research data management. Arkivum is the supplier until 2023 for the UK's Janet Data Archiving Framework. The service interesting, and unusual, in that they accept liability for the data they preserve. I hope to find time to blog about this aspect of their service soon.

 [Update: corrected links to BL talks - sorry!]

No comments: