Tuesday, December 30, 2008

Persistence of Poor Peer Reviewing

Another thing I've been doing during the hiatus is serving as a judge for Elsevier's Grand Challenge. Anita de Waard and her colleagues at Elsevier's research labs set up this competition with a substantial prize for the best demonstration of what could be done to improve science and scientific communication given unfettered access to Elsevier's vast database of publications. I think the reason I'm on the panel of judges is that after Anita's talk at the Spring CNI she and I had an interesting discussion. The talk described her team's work to extract information from full-text articles to help authors and readers. I asked who was building tools to help reviewers and make their reviews better. This is a bete noir of mine, both because I find doing reviews really hard work, and because I think the quality of reviews (including mine) is really poor. For an ironic example of the problem, follow me below the fold.

I like to cite an example of really bad reviewing that appeared in AAAS Science in 2003. It was Dellavalle RP, Hester EJ, Heilig LF, Drake AL, Kuntzman JW, Schillin MGLM: Going, Going, Gone: Lost Internet References. Science 2003, 302:787, a paper about the decay of Internet links. The the authors failed to acknowledge that the paper repeated, with smaller samples and somewhat worse techniques, two earlier studies that had been published in Communications of the ACM 9 months before, and in IEEE Computer 32 months before. Neither of these are obscure journals. It is particularly striking that neither the reviewers nor the editors bothered to feed the keywords from the article abstract into Google; had they done so they would have found both of these earlier papers at the top of the search results.

The Grand Challenge is now at the semi-final stage and the quality of the submissions is very encouraging. Many of them are available on the Web, linked from Elsevier's web site for the competition.

The irony is that one of them cites the paper in Science that set me off!

Here is the note I submitted to Science's e-letter feature about the paper. The AAAS declined to publish it:

The decay of Internet links as described here has been the subject of concern for some time. The authors should perhaps have referenced earlier work by Spinellis[1], and even earlier work by Lawrence et al.[2], both describing similar results using similar techniques on somewhat larger samples from the computer science and engineering literature.

With the invaluable participation of Science and the AAAS, and funding from the Andrew W. Mellon Foundation, NSF and Sun Microsystems Labs, efforts to ameliorate this problem have been underway since 1999. The LOCKSS (Lots Of Copies Keep Stuff Safe)[3] program is providing librarians with tools they can use to preserve their community's access to important web-published material via the URL at which it was originally published, whether or not it remains available there from the publisher.

The references missed by the authors and reviewers of this paper illustrate an even more serious problem. The balkanization of science and scientific journals prevents scholars searching the whole literature, especially journals outside their immediate field. The success of Google shows that search is a more effective mode of access to web content than explicit links. Phelps and Wilensky[4] show that searching for a small number of well-chosen words can be an effective substitute for explicit links. The authors here show that search is less transient than links; broken links were recovered using Google and the Internet Archive

In this case, a Google search for "decay web references" returns the Spinellis paper, and a search for "persistence web references" returns the Lawrence et al. paper, both as the first result. The lesson for authors and journals is to ensure that their content continues to be indexed by Internet search engines.


1. Spinellis D. "The Decay and Failures of Web References", Communications of the ACM, Vol 46, No 1, Jan 2003, pp 71-77
2. Lawrence S. et al. "The Persistence of Web References in Scientific Research" IEEE Computer Vol 34 No 2 2001 pp 26-31.
3. Maniatis P. et al. "Preserving Peer Replicas By Rate-Limited Sampled Voting", Proc. 19th ACM Symp. on Operating Systems Principles, Bolton Landing, NY Oct. 2003, pp 44-59.
4. Phelps T. and Wilensky R. "Robust Hyperlinks Cost Just Five Words Each". UC Berkeley Computer Science Tech. Rept. UCB//CSD-00-1091. Berkeley, CA. 2000.

Google Scholar currently shows that the paper in Science (impact factor 26.372) has been cited 74 times, the paper in Communications of the ACM (impact factor 1.593) has been cited 62 times, and the paper in IEEE Computer (impact factor 1.15) has been cited 78 times. This is interesting, I was expecting that the higher profile and impact factor of Science would have caused it to garner the bulk of the references despite being the inferior paper.

No comments: