Thursday, October 13, 2016

More Is Not Better

Quite a few of my recent posts have been about how the mainstream media is catching on to the corruption of science caused by the bad incentives all parties operate under, from science journalists to publishers to institutions to researchers. Below the fold I look at some recent evidence that this meme has legs.

Donald S. Kornfeld and Sandra L. Titus have a comment in Nature entitled Stop ignoring misconduct arguing that the bad incentives for researchers inevitably produce misconduct, but that this is routinely swept under the carpet:
In other words, irreproducibility is the product of two factors: faulty research practices and fraud. Yet, in our view, current initiatives to improve science dismiss the second factor. For example, leaders at the US National Institutes of Health (NIH) stated in 2014: “With rare exceptions, we have no evidence to suggest that irreproducibility is caused by scientific misconduct”. In 2015, a symposium of several UK science-funding agencies convened to address reproducibility, and decided to exclude discussion of deliberate fraud.
The scientific powers-that-be are ignoring the science:
Only 10–12 individuals are found guilty by the US Office of Research Integrity (ORI) each year. That number, which the NIH used to dismiss the role of research misconduct, is misleadingly low, as numerous studies show. For instance, a review of 2,047 life-science papers retracted from 1973 to 2012 found that around 43% were attributed to fraud or suspected fraud. A compilation of anonymous surveys suggests that 2% of scientists and trainees admit that they have fabricated, falsified or modified data. And a 1996 study of more than 1,000 postdocs found that more than one-quarter would select or omit data to improve their chances of receiving grant funding.
Linked from this piece are several other Nature articles about misconduct:
  • Misconduct: Lessons from researcher rehab by James M. DuBois, John T. Chibnall, Raymond Tait and Jillon Vander Wal is an interesting report on a program to which researchers are referred after misconduct has been detected. It identifies the pressure for "more not better" as a factor:
    By the metrics that institutions use to reward success, our programme participants were highly successful researchers; they had received many grants and published many papers. Yet, becoming overextended was a common reason why they failed to adequately oversee research. It may also have led them to make compliance a low priority. ... Scientists become overextended in part because their institutions value large numbers of projects.
  • Robust research: Institutions must do their part for reproducibility by C. Glenn Begley, Alastair M. Buchan and Ulrich Dirnagl argues for compliance processes for general research analogous to those governing animal research. They too flag "more not better":
    Institutions must support and reward researchers who do solid — not just flashy — science and hold to account those whose methods are questionable. ... Although researchers want to produce work of long-term value, multiple pressures and prejudices discourage good scientific practices. In many laboratories, the incentives to be first can be stronger than the incentives to be right.
  • Workplace climate: Metrics for ethics by Monya Baker reports on institutions using a survey of researcher's workplace climate issues in areas such as "integrity norms (such as giving due credit to others' ideas), integrity inhibitors (such as inadequate access to material resources) and adviser–advisee relations".
Also in Nature is Corie Lok's Science’s 1%: How income inequality is getting worse in research, which starts:
For a portrait of income inequality in science, look no further than the labs of the University of California. Twenty-nine medical researchers there earned more than US$1 million in 2015 and at least ten non-clinical researchers took home more than $400,000 each. Meanwhile, thousands of postdocs at those universities received less than $50,000. Young professors did better, but many still collected less than one-quarter of the earnings of top researchers.
The work of Richard Wilkinson, Kate Pickett and others shows that increasing inequality is correlated with misconduct, among other social ills. The finance industry is the poster-child of inequality, their machinations in the 2008 financial crisis, such as robosigning and synthetic CDOs, should be evidence enough. Which way round the chain of causation runs is not clear.

Even if there is no actual misconduct, the bad incentives will still cause bad science to proliferate via natural selection, or the scientific equivalent of Gresham's Law that "bad money drives out good". The Economist's Incentive Malus, subtitled Poor scientific methods may be hereditary, is based on The natural selection of bad science by Paul E. Smaldino and Richard McElreath, which starts:
Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing—no deliberate cheating nor loafing—by scientists, only that publication is a principal factor for career advancement.
The Economist writes that Smaldino and McElreath:
decided to apply the methods of science to the question of why this was the case, by modelling the way scientific institutions and practices reproduce and spread, to see if they could nail down what is going on.

They focused in particular on incentives within science that might lead even honest researchers to produce poor work unintentionally. To this end, they built an evolutionary computer model in which 100 laboratories competed for “pay-offs” representing prestige or funding that result from publications. ... Labs that garnered more pay-offs were more likely to pass on their methods to other, newer labs (their “progeny”).

Some labs were better able to spot new results (and thus garner pay-offs) than others. Yet these labs also tended to produce more false positives—their methods were good at detecting signals in noisy data but also, as Cohen suggested, often mistook noise for a signal. More thorough labs took time to rule these false positives out, but that slowed down the rate at which they could test new hypotheses. This, in turn, meant they published fewer papers.

In each cycle of “reproduction”, all the laboratories in the model performed and published their experiments. Then one—the oldest of a randomly selected subset—“died” and was removed from the model. Next, the lab with the highest pay-off score from another randomly selected group was allowed to reproduce, creating a new lab with a similar aptitude for creating real or bogus science. ... they found that labs which expended the least effort to eliminate junk science prospered and spread their methods throughout the virtual scientific community.
Worse, they found that replication was did not suppress this selection process:
Replication has recently become all the rage in psychology. In 2015, for example, over 200 researchers in the field repeated 100 published studies to see if the results of these could be reproduced (only 36% could). Dr Smaldino and Dr McElreath therefore modified their model to simulate the effects of replication, by randomly selecting experiments from the “published” literature to be repeated.

A successful replication would boost the reputation of the lab that published the original result. Failure to replicate would result in a penalty. Worryingly, poor methods still won—albeit more slowly. This was true in even the most punitive version of the model, in which labs received a penalty 100 times the value of the original “pay-off” for a result that failed to replicate, and replication rates were high (half of all results were subject to replication efforts).
The Economist reports Smaldino and McElreath's conclusion is bleak:
that when the ability to publish copiously in journals determines a lab’s success, then “top-performing laboratories will always be those who are able to cut corners”—and that is regardless of the supposedly corrective process of replication.

Ultimately, therefore, the way to end the proliferation of bad science is not to nag people to behave better, or even to encourage replication, but for universities and funding agencies to stop rewarding researchers who publish copiously over those who publish fewer, but perhaps higher-quality papers.
Alas, the people in a position to make this change are reached this exalted state by publishing copiously, so The Economist's is a utopian suggestion. In Bad incentives in peer-reviewed science I wrote:
Fixing these problems of science is a collective action problem; it requires all actors to take actions that are against their immediate interests roughly simultaneously. So nothing happens, and the long-term result is, as Arthur Caplan (of the Division of Medical Ethics at NYU's Langone Medical Center) pointed out, a total loss of science's credibility:
The time for a serious, sustained international effort to halt publication pollution is now. Otherwise scientists and physicians will not have to argue about any issue—no one will believe them anyway.
(see also John Michael Greer).
This loss of credibility is the subject of Andrea Saltelli's Science in crisis: from the sugar scam to Brexit, our faith in experts is fading which starts:
Worldwide, we are facing a joint crisis in science and expertise. This has led some observers to speak of a post-factual democracy – with Brexit and the rise of Donald Trump the results.

Today, the scientific enterprise produces somewhere in the order of 2m papers a year, published in roughly 30,000 different journals. A blunt assessment has been made that perhaps half or more of all this production “will not stand the test of time”.

Meanwhile, science has been challenged as an authoritative source of knowledge for both policy and everyday life, with noted major misdiagnoses in fields as disparate as forensics, preclinical and clinical medicine, chemistry, psychology and economics.
Like I did above, Saltelli uses the finance analogy point out the deleterious effect of simplistic metrics - you get what you reward:
One can see in the present critique of finance – as something having outgrown its original function into a self-serving entity – the same ingredients of the social critique of science.

Thus the ethos of “little science” reminds us of the local banker of old times. Scientists in a given field knew one another, just as local bankers had lunch and played golf with their most important customers. The ethos of techno-science or mega-science is similar to that of the modern Lehman bankers, where the key actors know one another only through performance metrics.
But I think in this case the analogy is misleading. The balkanization of science into many sub-fields leads to cliques and the kind of group-think illustrated in William A. Wilson's Scientific Regress:
once an entire field has been created—with careers, funding, appointments, and prestige all premised upon an experimental result which was utterly false due either to fraud or to plain bad luck—pointing this fact out is not likely to be very popular. Peer review switches from merely useless to actively harmful. It may be ineffective at keeping papers with analytic or methodological flaws from being published, but it can be deadly effective at suppressing criticism of a dominant research paradigm.
Charles Seife's How the FDA Manipulates the Media shows how defensive scientific institutions are becoming in the face of these problems. They are so desperate to control how the press reports science and science-based policy that they are using "close-hold embargos":
The deal was this: NPR, along with a select group of media outlets, would get a briefing about an upcoming announcement by the U.S. Food and Drug Administration a day before anyone else. But in exchange for the scoop, NPR would have to abandon its reportorial independence. The FDA would dictate whom NPR's reporter could and couldn't interview.
The FDA isn't the only institution doing this:
This January the California Institute of Technology was sitting on a great story: researchers there had evidence of a new giant planet—Planet Nine—in the outer reaches of our solar system. The Caltech press office decided to give only a dozen reporters, including Scientific American's Michael Lemonick, early access to the scientists and their study. When the news broke, the rest of the scientific journalism community was left scrambling. “Apart from the chosen 12, those working to news deadlines were denied the opportunity to speak to the researchers, obtain independent viewpoints or have time to properly digest the published research paper,” complained BBC reporter Pallab Ghosh about Caltech's “inappropriate” favoritism in an open letter to the World Federation of Science Journalists.
But it may be the only one doing it in violation of their stated policy:
in June 2011, the FDA's new media policy officially killed the close-hold embargo: “A journalist may share embargoed material provided by the FDA with nonjournalists or third parties to obtain quotes or opinions prior to an embargo lift provided that the reporter secures agreement from the third party to uphold the embargo.”
The downside of the close-hold embargo is obvious from this example:
in 2014 the Harvard-Smithsonian Center for Astrophysics (CfA) used a close-hold embargo when it announced to a dozen reporters that researchers had discovered subtle signals of gravitational waves from the early universe. “You could only talk to other scientists who had seen the papers already; we didn't want them shared unduly,” says Christine Pulliam, the media relations manager for CfA. Unfortunately, the list of approved scientists provided by CfA listed only theoreticians, not experimentalists—and only an experimentalist was likely to see the flaw that doomed the study. (The team was seeing the signature of cosmic dust, not gravitational waves.)
Defensiveness is rampant. Cory Doctorow's Psychology's reproducibility crisis: why statisticians are publicly calling out social scientists reports on a response by Andrew Gelman, a professor of statistics and political science, and director of Columbia's Applied Statistics Center, to a screed by Princeton University psychology professor Susan Fiske, a past president of the Association for Psychological Science. Fiske is unhappy that "self-appointed data police" are using blogs and other social media to criticize published research via "methodological terrorism", instead of using the properly peer-reviewed and "monitored channels". Gelman's long and detailed blog post starts:
Fiske doesn’t like when people use social media to publish negative comments on published research. She’s implicitly following what I’ve sometimes called the research incumbency rule: that, once an article is published in some approved venue, it should be taken as truth. I’ve written elsewhere on my problems with this attitude—in short, (a) many published papers are clearly in error, which can often be seen just by internal examination of the claims and which becomes even clearer following unsuccessful replication, and (b) publication itself is such a crapshoot that it’s a statistical error to draw a bright line between published and unpublished work.
Gelman's post connects to the work of Smaldino and McElreath:
Fiske expresses concerns for the careers of her friends, careers that may have been damaged by public airing of their research mistakes. Just remember that, for each of these people, there may well be three other young researchers who were doing careful, serious work but then didn’t get picked for a plum job or promotion because it was too hard to compete with other candidates who did sloppy but flashy work that got published in Psych Science or PPNAS. It goes both ways.

No comments: