Tuesday, December 30, 2008

Persistence of Poor Peer Reviewing

Another thing I've been doing during the hiatus is serving as a judge for Elsevier's Grand Challenge. Anita de Waard and her colleagues at Elsevier's research labs set up this competition with a substantial prize for the best demonstration of what could be done to improve science and scientific communication given unfettered access to Elsevier's vast database of publications. I think the reason I'm on the panel of judges is that after Anita's talk at the Spring CNI she and I had an interesting discussion. The talk described her team's work to extract information from full-text articles to help authors and readers. I asked who was building tools to help reviewers and make their reviews better. This is a bete noir of mine, both because I find doing reviews really hard work, and because I think the quality of reviews (including mine) is really poor. For an ironic example of the problem, follow me below the fold.

Sunday, December 28, 2008

Foot, meet bullet

The gap in posting since March was caused by a bad bout of RSI in my hands. I believe it was triggered by the truly terrible ergonomics of the mouse buttons on the first-generation Asus EEE (which otherwise fully justifies its reputation as a game-changing product). It took a long time to recover and even longer to catch up with all the work I couldn't do when I couldn't type for more than a few minutes.

One achievement during this enforced hiatus was to turn my series of posts on A Petabyte for a Century into a paper entitled Bit Preservation: A Solved Problem? (190KB PDF) and present it at the iPRES 2008 conference last September at the British Library.

I also attended the 4th International Digital Curation Conference in Edinburgh. As usual these days, for obvious reasons, sustainability was at the top of the agenda. Brian Lavoie of OCLC talked (461KB .ppt) about the work of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access which he co-chairs with Fran Berman of the San Diego Supercomputer Center. NSF, the Andrew W. Mellon Foundation and others are sponsoring this effort; the LOCKSS team have presented to the Task Force. Their interim report has just been released.

Listening to Brian talk about the need to persuade funding organizations of the value of digital preservation efforts I came to understand the extent to which the tendency to present simply preserving the bits as a trivial, solved problem has caused the field to shoot itself in the foot.

The activities that the funders are told they need to support are curation-focused, such as generating metadata to prepare for possible format obsolescence, and finding the content for future readers. The problem is that, as a result, the funders see a view of the future in which, even if they do nothing, the bits will survive. There might possibly be problems in the distant future if formats go obsolete, but there might not be. There might be problems finding content in the future, but there might not be. After all, funders might think, if the bits survive and Google can index them, how much worse than the current state could things be? Why should they pour money into activities intended to enhance the data? After all, the future can figure out what to do with the bits when they need them; they'll be there whatever happens.

A more realistic view of the world, as I showed in my iPRES paper, would be that there are huge volumes of data that need to be preserved, that simply storing a few copies of all of it is more costly than we can currently cope with, and that even if we spend enough to use the best available technology we can't be sure the bits will be safe. If this were the view being presented to the funders, that unless they immediately provide funds important information would gradually be lost, they might be scared into actually doing something.