Thursday, May 5, 2016

Signal or Noise?

I've been blogging critically about the state of scientific publishing since my very first post 9 years ago. In particular, I've been pointing out that the several billion dollars a year that go to the publisher's bottom lines, plus the several billion dollars a year in unpaid work by the reviewers, is extremely poor value for money. The claim is that the peer-review process guarantees the quality of published science. But the reality is that it doesn't; it cannot even detect most fraud or major errors.

The fundamental problem is that all participants have bad incentives. Follow me below the fold for some recent examples that illustrate their corrupting effects.

Publishers tend to choose reviewers who are prominent and in the mainstream of their subject area. This hands them a powerful mechanism for warding off threats to the subject's conventional wisdom. Ian Leslie's The Sugar Conspiracy is a long and detailed examination of how prominent nutritionists used this and other mechanisms to suppress for four decades the evidence that sugar, not fat, was the cause of obesity. The result was illustrious careers for the senior scientists, wrecked lives for the dissidents, and most importantly a massive, world-wide toll of disease, disability and death. I'm not quoting any of Leslie's article because you have to read the whole of it to understand the disaster that occurred.

At Science Translational Medicine Derek Lowe's From the Far Corner of the Basement has more on this story, with a link to the paper in BMJ that re-evaluated the data from the original, never fully published study:
It’s impossible to know for sure, but it seems likely that Franz and Keys may have ended up regarding this as a failed study, a great deal of time and effort more or less wasted. After all, the results it produced were so screwy: inverse correlation with low cholesterol and mortality? No benefit with vegetable oils? No, there must have been something wrong.
Dahlia Lithwick's Pseudoscience in the Witness Box, based on a Washington Post story, describes another long-running disaster based on bogus science. The bad incentives in this case were that the FBI's forensic scientists were motivated to convict rather than exonerate defendants:
This study was launched after the Post reported that flawed forensic hair matches might have led to possibly hundreds of wrongful convictions for rape, murder, and other violent crimes, dating back at least to the 1970s. In 90 percent of the cases reviewed so far, forensic examiners evidently made statements beyond the bounds of proper science. There were no scientifically accepted standards for forensic testing, yet FBI experts routinely and almost unvaryingly testified, according to the Post, “to the near-certainty of ‘matches’ of crime-scene hairs to defendants, backing their claims by citing incomplete or misleading statistics drawn from their case work.”
The death toll is much smaller:
"the cases include those of 32 defendants sentenced to death.” Of these defendants, 14 have already been executed or died in prison.
Via Dave Farber's IP list and Pascal-Emmanuel Gobry at The Week I find William A. Wilson's Scientific Regress.Wilson starts from the now well-known fact that many published results are neither replicated nor possible to replicate, because the incentives to publish in a form that can be replicated, and to replicate published results, are lacking:
suppose that three groups of researchers are studying a phenomenon, and when all the data are analyzed, one group announces that it has discovered a connection, but the other two find nothing of note. Assuming that all the tests involved have a high statistical power, the lone positive finding is almost certainly the spurious one. However, when it comes time to report these findings, what happens? The teams that found a negative result may not even bother to write up their non-discovery. After all, a report that a fanciful connection probably isn’t true is not the stuff of which scientific prizes, grant money, and tenure decisions are made.
And even if they did write it up, it probably wouldn’t be accepted for publication. Journals are in competition with one another for attention and “impact factor,” and are always more eager to report a new, exciting finding than a killjoy failure to find an association. In fact, both of these effects can be quantified. Since the majority of all investigated hypotheses are false, if positive and negative evidence were written up and accepted for publication in equal proportions, then the majority of articles in scientific journals should report no findings. When tallies are actually made, though, the precise opposite turns out to be true: Nearly every published scientific article reports the presence of an association. There must be massive bias at work. 
He points out the ramifications of this problem:
If peer review is good at anything, it appears to be keeping unpopular ideas from being published. Consider the finding of another (yes, another) of these replicability studies, this time from a group of cancer researchers. In addition to reaching the now unsurprising conclusion that only a dismal 11 percent of the preclinical cancer research they examined could be validated after the fact, the authors identified another horrifying pattern: The “bad” papers that failed to replicate were, on average, cited far more often than the papers that did! As the authors put it, “some non-reproducible preclinical papers had spawned an entire field, with hundreds of secondary publications that expanded on elements of the original observation, but did not actually seek to confirm or falsify its fundamental basis.”
And, as illustrated by The Sugar Conspiracy, this is a self-perpetuating process:
What they do not mention is that once an entire field has been created—with careers, funding, appointments, and prestige all premised upon an experimental result which was utterly false due either to fraud or to plain bad luck—pointing this fact out is not likely to be very popular. Peer review switches from merely useless to actively harmful. It may be ineffective at keeping papers with analytic or methodological flaws from being published, but it can be deadly effective at suppressing criticism of a dominant research paradigm. Even if a critic is able to get his work published, pointing out that the house you’ve built together is situated over a chasm will not endear him to his colleagues or, more importantly, to his mentors and patrons. 
Science is supposed to provide a self-correcting mechanism to handle this problem, and The Sugar Conspiracy actually shows that in the end it works, but
even if self-correction does occur and theories move strictly along a lifecycle from less to more accurate, what if the unremitting flood of new, mostly false, results pours in faster? Too fast for the sclerotic, compromised truth-discerning mechanisms of science to operate? The result could be a growing body of true theories completely overwhelmed by an ever-larger thicket of baseless theories, such that the proportion of true scientific beliefs shrinks even while the absolute number of them continues to rise.
The four-decade reign of the fat hypothesis shows this problem.

In The Prevalence of Inappropriate Image Duplication in Biomedical Research Publications, Elisabeth M BikArturo Casadevall and Ferric C Fang report on a study of the images in biomedical publications. From their abstract:
This study attempted to determine the percentage of published papers containing inappropriate image duplication, a specific type of inaccurate data. The images from a total of 20,621 papers in 40 scientific journals from 1995-2014 were visually screened. Overall, 3.8% of published papers contained problematic figures, with at least half exhibiting features suggestive of deliberate manipulation. The prevalence of papers with problematic images rose markedly during the past decade. Additional papers written by authors of papers with problematic images had an increased likelihood of containing problematic images as well. As this analysis focused only on one type of data, it is likely that the actual prevalence of inaccurate data in the published literature is higher. The marked variation in the frequency of problematic images among journals suggest that journal practices, such as pre-publication image screening, influence the quality of the scientific literature.
At least this is one instance in which some journals are adding value. But lets look at the set of journal value-adds Marcia McNutt. the editor-in-chief of Science, cites in her editorial attacking Sci-Hub (quotes in italics):
  • [Journals] help ensure accuracy, consistency, and clarity in scientific communication. If only. Many years ago, the peer-reviewed research on peer-review showed conclusively that only the most selective journals (such as McNutt's Science) add any detectable value to their articles. And that is before adjusting for the value their higher retraction rate subtracts.
  • editors are paid professionals who carefully curate the journal content to bring readers an important and exciting array of discoveries. This is in fact a negative. The drive to publish and hype eye-catching, "sexy" results ahead of the competition is the reason why top journals have a higher rate of retraction. This drive to compete in the bogus "impact factor" metric, which can be easily gamed, leads to many abuses. But more fundamentally, any ranking of journals as opposed to the papers they publish is harmful.
  • They make sure that papers are complete and conform to standards of quality, transparency, openness, and integrity. Clearly, if the result is a higher rate of retraction the claim that they conform to these standards is bogus.
  • There are layers of effort by copyeditors and proofreaders to check for adherence to standards in scientific usage of terms to prevent confusion. This is a task that can easily be automated, we don't need to pay layers of humans to do it.
  • Illustrators create original illustrations, diagrams, and charts to help convey complex messages. Great, the world is paying the publishers many billions of dollars a year for pretty pictures?
  • Scientific communicators spread the word to top media outlets so that authors get excellent coverage and readers do not miss important discoveries. And the communicators aren't telling the top media outlets that the "important discoveries" are likely to get retracted in a few years.
  • Our news reporters are constantly searching the globe for issues and events of interest to the research and nonscience communities. So these journals are just insanely expensive versions of the New York Times?
  • Our agile Internet technology department continually evolves the website, so that authors can submit their manuscripts and readers can access the journals more conveniently. Even if we accept the ease of submission argument, the ease of access argument is demolished by, among others, Justin Peters and John Dupuis. Its obviously bogus; the whole reason people use Sci-Hub is that it provides more convenient access! Also, lets not forget that the "Internet technology department" is spending most of their efforts in the way the other Web media do, monetizing their readers, and contributing to the Web obesity crisis. Eric Hellman's study 16 of the top 20 Research Journals Let Ad Networks Spy on Their Readers gave Science a D because:
    10 Trackers. Multiple advertising networks.
    To  be fair, Eric also points out that Sci-Hub uses trackers and Library Genesis sells Google ads too.
McNutt's incentives are clearly not aligned with the interests of researchers. Note the reference above to Science Translational Medicine. In Stretching the "peer reviewed" brand until it snaps, I wrote:
a trend publishers themselves started many years ago of stretching the "peer reviewed" brand by proliferating journals. If your role is to act as a gatekeeper for the literature database, you better be good at being a gatekeeper. Opening the gate so wide that anything can get published somewhere is not being a good gatekeeper.
The wonderful thing about Elsevier's triggering of the Streisand Effect is that it has compelled even Science to advertise Sci-Hub, and to expose the flimsy justification for the exorbitant profits of the major publishers.


David. said...

John Oliver covers this problem far better than I could. Its a must-watch.

Dragan Espenschied said...

The University of Montréal has canceled 2,116 journal subscriptions:

David. said...

In a different media market, but revealing nonetheless. Cory Doctorow at BoingBoing reports on a German court judgement that, in effect, publishers have been stealing from authors 30-50% of the fees on blank media. They owe authors €100M, and:

"are claiming that this is their death-knell, without acknowledging the hardship they imposed on authors by misappropriating their funds. ... if publishers can't survive without these funds, that means the industry was only viable in the first place because it was stealing from writers"

David. said...

Bjorn Brembs chimes in.

David. said...

Roheeni Saxena at Ars Technica discusses two reports. One from AAAS describes implicit bias in article reviewing:

"journal editors presented evidence of a US-centric bias in scientific publication. Countries with fewer resources tend to be poorly represented among reviewers, and therefore may receive less attention from publishers."

The other from the GAO show bias against women and minorities in grant reviewing:

"for the National Science Foundation (NSF), ... only a quarter of all applications for funding come from women ... But that's great compared to minorities. Black scientists in the US submit only two percent of all NSF grants, and only eighteen percent of those applications are successful. The NIH, another major source of federal funding, reports that black researchers receive awards at half the rate of whites, so racial disparities persist across funding agencies."

David. said...

The Finnish Ministry of Education and Culture is publishing a database of journal subscription payments:

"This dataset includes academic publisher costs paid by Finnish research organizations to publishers and suppliers during the years 2010–2015. The dataset includes total costs of license contracts made with individual publishers or suppliers. The dataset also includes information on the different material packages the contracts included. Also included is the information on how the materials were acquired."

The dire economic situation in Finland, which led the University of Helsinki to lay off nearly 1000 people, is clearly a factor:

"[Rector Jukka] Kola says that staff cuts are unavoidable because of the current government's drastic funding cuts to education. According to the university's calculations, the need for cost-cutting will amount to 106 million euros annually by the end of Prime Minister Juha Sipilä’s government term in 2019-20."

David. said...

The Economist reports on the GRIM test, an amazingly simple statistical test:

"When Mr Brown and Dr Heathers test-drove their method on 71 suitable papers published in three leading psychology journals over the past five years, what they found justified the pessimistic sounding label they gave it. Just over half the papers they looked at failed the test. Of those, 16 contained more than one error. The two researchers got in touch with the authors of these, and also of five others where the lone errors looked particularly egregious, and asked them for their data—the availability of which was a precondition of publication in two of the journals. Only nine groups complied, but in these nine cases examination of the data showed that there were, indeed, errors.

The mistakes picked up looked accidental. Most were typos or the inclusion of the wrong spreadsheet cells in a calculation. Nevertheless, in three cases they were serious enough to change the main conclusion of the paper concerned.

That, plus the failure of 12 groups to make their data available at all, is alarming. But if knowledge that the GRIM test might be applied to their work makes future researchers less careless and more open, then Mr Brown’s and Dr Heathers’s maths will have paid dividends."

David. said...

For the enforcement of conventional wisdom by senior researchers in the field of economics, see The Superiority of Economists by Marion Fourcade, Etienne Ollion, and Yann Algan.

David. said...

Another damming report on the corruption of the peer-review and publishing process by pharma marketing is here.

David. said...

Steven Poole's Why bad ideas refuse to die is more general, but includes:

"Nearly every academic inquirer I talked to while researching this subject says that the interface of research with publishing is seriously flawed. Partly because the incentives are all wrong – a “publish or perish” culture rewards academics for quantity of published research over quality. And partly because of the issue of “publication bias”: the studies that get published are the ones that have yielded hoped-for results. Studies that fail to show what they hoped for end up languishing in desk drawers."

David. said...

As I've mentioned here for a long time, among the bad ideas that refuse to die is journal impact factor.

XXX et al have published A simple proposal for the publication of journal citation distributions. John Bohannon at Science writes:

"The 11 journals taking part in today's data release are Science, eLife, The EMBO Journal, the Journal of Informetrics, the Proceedings of the Royal Society B, three journals published by the Public Library of Science, and Nature along with two of its sister journals. In 2013 and 2014, those journals published more than 366,000 research articles and 13,000 review articles. The team then combed through the Thomson Reuters database to count all citations to those articles in 2015. Vincent Larivière, an expert on journal citations at the University of Montreal in Canada led the analysis of the data.

The results give more ammunition to JIF critics. The citation distributions are so skewed that up to 75% of the articles in any given journal had lower citation counts than the journal's average number. So trying to use a journal’s JIF to forecast the impact of any particular paper is close to guesswork. The analysis also revealed a large number of flaws in the Thomson Reuters database, with citations unmatchable to known articles."

David. said...

JOn Tennant's Why I will never publish with Wiley again is an account of the "value" a commercial publisher adds to a paper in return for, in this case, $3.6K in author processing charge:

"As Wiley make around a 40% profit margin, about $1440 of this fee went straight to their lucky shareholders, which must explain that nice, warm feeling you get when paying."

David. said...

Anahad O'Connor's How the Sugar Industry Shifted Blame to Fat reports on JAMA's publication of:

"internal sugar industry documents, recently discovered by a researcher at the University of California, San Francisco ... suggest that five decades of research into the role of nutrition and heart disease, including many of today’s dietary recommendations, may have been largely shaped by the sugar industry.

“They were able to derail the discussion about sugar for decades,” said Stanton Glantz, a professor of medicine at U.C.S.F. and an author of the JAMA paper."

The sugar industry paid off scientists:

"the Sugar Research Foundation ... paid three Harvard scientists the equivalent of about $50,000 in today’s dollars to publish a 1967 review of research on sugar, fat and heart disease. The studies used in the review were handpicked by the sugar group, and the article, which was published in the prestigious New England Journal of Medicine, minimized the link between sugar and heart health and cast aspersions on the role of saturated fat."