Tuesday, March 20, 2018

Pre-publication Peer Review Subtracts Value

Pre-publication peer review is intended to perform two functions; to prevent bad science being published (gatekeeping), and to improve the science that is published (enhancement). Over the years I've written quite often about how the system is no longer "fit for purpose". Its time for another episode draw attention to two not-so recent contributions:
Below the fold, the details.


Klein et al:
investigated the publishers' value proposition by conducting a comparative study of pre-print papers from two distinct science, technology, and medicine corpora and their final published counterparts. This comparison had two working assumptions: (1) If the publishers' argument is valid, the text of a pre-print paper should vary measurably from its corresponding final published version, and (2) by applying standard similarity measures, we should be able to detect and quantify such differences. Our analysis revealed that the text contents of the scientific papers generally changed very little from their pre-print to final published versions.
This is a quantitative validation of much earlier work suggesting that journals add very little value to the articles they accept. But adding very little value isn't the whole problem:
  • Pre-publication review adds significant delays to the communication of scholarship. On average, the value of academic articles decays rapidly with time, so these delays subtract value.
  • The process of choosing among submitted articles selects for "sexy" results that are more likely subsequently to be retracted, and thus subtracts value from the literature.
  • The pre-publication review process is rife with corruption, which subtracts value.
It turns out that the story of the publication of this work is more complicated than Moody (or I) understood. The work reported at CNI covered only articles from arXiv.org. It was published as Comparing Published Scientific Journal Articles to Their Pre-print Versions in Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries. The authors continued work, covering in addition articles from bioRxiv.org, and published Comparing Published Scientific Journal Articles to Their Pre-print Versions in Springer's International Journal of Digital Libraries. The authors kindly supplied me with a pre-print of this expanded version, which they have yet to upload to arXiv.org (contradicting one of their findings). There is no question of this transaction violating copyright; the first thing to notice about the "official publication" is the copyright notice:
This is a U.S. government work and its text is not subject to copyright protection in the United States; however, its text may be subject to foreign copyright protection 2018
I'm in the US (and a taxpayer) but Springer won't let me get to the full text of the "official" version despite the fact that it isn't their or anyone else's copyright!
USD 39.95
  • Unlimited access to the full article
  • Instant download
  • Include local sales tax if applicable
So I can't check the differences between the "official" version and the pre-print to see what added value I would be buying for the low, low price of only $39.95! The publishers have a strong interest in obscuring the fact, as shown by the pre-print's data:
that the difference between the earliest possible pre-print version and the final published one seems insignificant, given the similarity measures we applied to our corpus.
Fortunately, I still retain enough affiliation with Stanford to eventually access the "official" version and observe that, like the vast majority of the articles they examined, its differences from the pre-print are purely cosmetic. Definitely not worth $39.95!

Their data support the argument that pre-publication peer review imposes significant delays on the research process. At least 95% of their arXiv.org sample, and 91% of their bioRxiv.org sample, appeared as pre-prints before their official publication. Figure 10 shows that the most likely delay between the appearance of the article at arXiv.org and its publication was between 91 and 180 days. Articles at bioRxiv.org were most likely to suffer delays between 91 and 270 days. Again, it is important to note that the result of the delay to the vast majority of the articles was barely detectable changes to their text. The delay was overwhelmingly value-subtracting.


Gowers contrasts the history of Andrew Wakefield's notorious Lancet article linking the MMR vaccine and autism, which went through pre-publication peer review, with that of two mathematical contributions that were never formally published, but were posted to arXiv.org:
  • Grigori Perelman's proof of Thurston's conjecture, which after intensive review was accepted, and led to offers of a Field medal and a $1M prize from the Clay Mathematics Institute (both of which were declined).
  • Norbert Blum's claimed proof of P≠NP, another of the Clay Mathematics Institute's prize problems, which was rapidly found to harbor an obscure flaw.
Because Wakefield's article had undergone formal peer review, discrediting it took twelve years, boosting the anti-vaccination movement and probably leading to many preventable deaths. In contrast, the soundness or otherwise of the two proofs was established rapidly by informal post-publication review. Gowers:
aim here is to question whether we need formal peer review. It goes without saying that peer review in some form is essential, but it is much less obvious that it needs to be organized in the way it usually is today, or even that it needs to be organized at all.
He describes how in mathematics, the adoption of arXiv.org has sped up communication and made journals close to irrelevant:
These days, the arXiv is how we disseminate our work, and the arXiv is how we establish priority. A typical pattern is to post a preprint to the arXiv, wait for feedback from other mathematicians who might be interested, post a revised version of the ­preprint, and send the revised version to a journal. The time between submitting a paper to a journal and its appearing is often a year or two, so by the time it appears in print, it has already been thoroughly assimilated. Furthermore, looking a paper up on the arXiv is much simpler than grappling with most journal websites, so even after publication it is often the arXiv preprint that is read and not the journal’s formatted version. Thus, in mathematics at least, journals have become almost irrelevant: their main purpose is to provide a stamp of approval, and even then one that gives only an imprecise and unreliable indication of how good a paper actually is.
He notes that many fields outside mathematics, physics and computer science are reluctant to post pre-prints:
Journals in the biomedical sciences, for example, often do not allow authors to post versions of their articles that have been revised in response to comments from referees. These sometimes differ in important ways from the versions originally submitted: for example, it is not uncommon for referees to go as far as to require authors to carry out further experiments.
This common perception seems to be contradicted by Klein et al's data, but this may be skewed by the selection of journals that allow posting of pre-prints.

In What Is Wrong With Science? I quoted from The 7 biggest problems facing science, according to 270 scientists by Julia Belluz, Brad Plumer, and Brian Resnick:
numerous studies and systematic reviews have shown that peer review doesn't reliably prevent poor-quality science from being published.
Gowers agrees, citing a number of instances, including:
The current publication system exacerbates this problem, since scientists report only on their positive results. It would be very helpful if they also revealed all their negative results, since that would make it much easier for others to do proper statistical analyses, but there are no rewards for doing so. The result is that in some disciplines, the social sciences being particularly notorious, it has been discovered that a large percentage of the experiments described in the literature, even in the best journals, do not yield the claimed results when repeated.
Six years ago I wrote What's Wrong With Research Communication, much of which focused on peer-review.  It was indebted to the excellent report of the UK House of Common's Science & Technology Committee entitled Peer review in scientific publications. I wrote:
The big deal deprived librarians of their economic ability to reward high quality journals and punish low quality journals:
'Libraries find the majority of their budgets are taken up by a few large publishers,' says David Hoole, director of brand marketing and institutional relations at [Nature Publishing Group]. 'There is [therefore] little opportunity [for libraries] to make collection decisions on a title-by-title basis, taking into account value-for-money and usage.'
The inevitable result of stretching the "peer-reviewed" brand in this way has been to devalue it. Almost anything, even commercial or ideological messages, can be published under the brand
Since then the explosion of predatory publishers has further corrupted the value of the pre-publication peer reviewed brand. Gowers concludes:
Defenders of formal peer review usually admit that it is flawed, but go on to say, as though it were obvious, that any other system would be worse. But it is not obvious at all. If academics put their writings directly online and systems were developed for commenting on them, one immediate advantage would be a huge amount of money saved. Another would be that we would actually get to find out what other people thought about a paper, rather than merely knowing that somebody had judged it to be above a certain not very precise threshold (or not knowing anything at all if it had been rejected). We would be pooling our efforts in useful ways: for instance, if a paper had an error that could be corrected, this would not have to be rediscovered by every single reader.
Unfortunately, everyone in a position to drive this change has bad incentives.


David. said...

You don't need to pay Springer $39.95, because the final version of Klein et al is now up on arXiv.org.

David. said...

"In 1942, the US Book Republication Program permitted American publishers to reprint "exact reproductions" of Germany's scientific texts without payment; seventy-five years later, the fate of this scientific knowledge forms the basis of a "natural experiment" analysed by Barbara Biasi and Petra Moser for The Center for Economic and Policy Research, who compare the fate of these texts to their contemporaries who didn't have this semi-public-domain existence.

Here's the headline finding: "This artificial removal of copyright barriers led to a 25% decline in prices, and a 67% increase in citations. These results suggest that restrictive copyright policies slow down the progress of science considerably."

Cory Doctorow points us to Barbara Biasi and Petra Moser's Effects of copyrights on science.

David. said...

"In fact, Sci-Hub has become such a commonly used tool for some scientists that they include Sci-Hub URLs in the references sections of their published papers. Ironically, there are even links to Sci-Hub in papers published by Elsevier, showing how dangerously useful it is." reports Ernesto at TorrentFreak.