The current research communication system is dysfunctional in many ways. One way to look at these ways is from the perspective of the various participants in the system:
- The General Public
- Libraries, Archives & Repositories
- Software & Infrastructure Developers
The general public needs to be able to extract reliable information from the deluge of mostly ill-informed, self-serving or commercial messages that forms their information environment. They have been educated to believe that content branded "peer-reviewed" is a gold standard on which they can rely. It would be in the public interest if it were reliable but high-profile examples show this isn’t always the case. For example, it took 12 years before the notorious Wakefield paper linking MMR vaccine to autism was retracted, and another
The additional quality denoted by the "peer-reviewed" brand has been decreasing:
“False positives and exaggerated results in peer-reviewed scientific studies have reached epidemic proportions in recent years.”One major cause has been that the advent of the Internet, by reducing the cost of distribution, encouraged publishers to switch libraries from subscribing to individual journals to the "big deal", in which they paid a single subscription to access all of a publisher's content. In the world of the big deal, many publishers discovered the effectiveness of this Microsoft-like "bundling" strategy. By proliferating cheap, low-quality journals, thus inflating the perceived value of their deal to the librarians, they could grab more of the market. This intuitive conclusion is supported by detailed economic analysis of the "big deal":
"Economists are familiar with the idea that a monopoly seller can increase its profits by bundling. This possibility was discussed by W.J. Adams and Janet Yellen and by Richard Schamalensee. Hal Varian noted that academic journals are well suited for this kind of bundling. Mark Armstrong and Yannis Bakos and Erik Brynjolfsson demonstrated that bundling large collections of information goods such as scholarly articles will not only increase a monopolist's profits, but will also decrease net benefits to consumers."Researchers cooperated with the proliferation of journals. They were seduced by extra opportunities to publish and extra editorial board slots. They did not see the costs, which were paid by their librarians or funding agencies. The big deal deprived librarians of their economic ability to reward high quality journals and punish low quality journals:
“Libraries find the majority of their budgets are taken up by a few large publishers,” says David Hoole, director of brand marketing and institutional relations at [Nature Publishing Group]. “There is [therefore] little opportunity [for libraries] to make collection decisions on a title-by-title basis, taking into account value-for-money and usage.”The inevitable result of stretching the "peer-reviewed" brand in this way has been to devalue it. Almost anything, even commercial or ideological messages, can be published under the brand:
“BIO-Complexity is a peer-reviewed scientific journal with a unique goal. It aims to be the leading forum for testing the scientific merit of the claim that intelligent design (ID) is a credible explanation for life.”Authors submit papers repeatedly, descending the quality hierarchy to find a channel with lax enough reviewing to accept them. PLoS ONE publishes every submission that meets its criteria for technical soundness; despite this 40% of the submissions it rejects as unsound are eventually published elsewhere. Even nonsense can be published if page charges are paid.
Researchers play many roles in the flow of research communication, as authors, reviewers, readers, reproducers and re-users. The evaluations that determine their career success are, in most cases, based on their role as authors of papers (for scientists) or books and monographs (for humanists), to the exclusion of their other roles. In the sciences, credit is typically based on the number of papers and the "impact factor" of the journal in which they appeared. Journal impact factor is generally agreed to be a seriously flawed measure of the quality of the research described by a paper. Although impact factors are based on citation counts for their articles, journal impact factors do not predict article citation counts, which are in any case easily manipulated. For example, a citation pointing out that an article had been retracted acts to improve the impact factor of the journal that retracted it. A further problem in some areas of science is that experiments require large teams, and thus long lists of authors, making it hard to assign credit to individuals on the basis of their partial authorship.
Peer review depends on reviewers, who are only very indirectly rewarded for their essential efforts. The anonymity of reviews makes it impossible to build a public reputation as a high-quality reviewer. If articles had single authors and averaged three reviewers, authors would need to do an average of three reviews per submission. Multiple authorship reduces this load, so if were evenly distributed it would be manageable. In practice, the distribution is heavily skewed, loading some reviewers enough to interfere with their research:
“Academic stars are unlikely to be available for reviewing; hearsay suggests that sometimes professors ask their assistants or PhD students to do reviews which they sign! Academics low down in the pecking order may not be asked to review. Most reviews are done by academics in the middle range of reputation and specifically by those known to editors and who have a record of punctuality and rigour in their reviews: the willing and conscientious horses are asked over and over again by overworked and—sometimes desperate—editors.”The cost is significant:
“In 2008, a Research Information Network report estimated that the unpaid non-cash costs of peer review, undertaken in the main by academics, is £1.9 billion globally each year.”Reviewers rarely have access to the raw data and enough information on methods and procedures to be able to reproduce the results, even if they had adequate time and resources to do so. Lack of credit for thorough reviews means there is little motivation to do so. Reviewers are thus in a poor position to detect falsification or fabrication. Experimental evidence suggests that they aren’t even in a good position to detect significant errors:
“Indeed, an abundance of data from a range of journals suggests peer review does little to improve papers. In one 1998 experiment designed to test what peer review uncovers, researchers intentionally introduced eight errors into a research paper. More than 200 reviewers identified an average of only two errors. That same year, a paper in the Annals of Emergency Medicine showed that reviewers couldn't spot two-thirds of the major errors in a fake manuscript. In July 2005, an article in JAMA showed that among recent clinical research articles published in major journals, 16% of the reports showing an intervention was effective were contradicted by later findings, suggesting reviewers may have missed major flaws.”Peer review is often said to be the gold standard of science, but this is not the case. The gold standard in experimental science is reproducibility, ensuring that anyone repeating the experiment gets the same result. When even a New York Times op-ed points out that, in practice, scientists almost never reproduce published experiments it is clear that there is a serious problem. Articles in high-impact journals are regularly retracted; there is even a blog tracking retractions. Lower-impact journals retract articles less frequently, but this probably reflects the lesser scrutiny that their articles receive rather than a lower rate of error. These retractions are rarely based on attempts to reproduce the experiments in question. Researchers are not rewarded for reproducing previous experiments, causing a retraction does not normally count as a publication, and it can be impossible to publish refutations:
“Three teams of scientists promptly tried to replicate his results. All three teams failed. One of the teams wrote up its results and submitted them to [the original journal]. The team's submission was rejected — but not because the results were flawed. As the journal’s editor, [explained] the journal has a longstanding policy of not publishing replication studies.“This policy is not new and is not unique to this journal,” he said. As a result, the original study stands.”The lack of recognition for reproducing experiments is the least of the barriers to reproducibility. Publications only rarely contain all the information an independent researcher would need in order to reproduce the experiment in question:
“The article summarises the experiment ... - the data are often missing or so emasculated as to be useless. It is the film review without access to the film.”Quite apart from the difficulty of reproducing the experiment, this frequently prevents other researchers from re-using the techniques in future, related experiments. Isaac Newton famously "stood on the shoulders of giants"; it is becoming harder and harder for today's researchers to stand on their predecessors' shoulders.
Re-using the data that forms the basis of a research communication is harder than it should be. Even in the rare cases when the data is part of the research communication it typically forms "supplementary material", whose format, and preservation are inadequate. In other cases the data are in a separate data repository, tenuously linked to the research communication. Data submissions are only patchily starting to be cite-able via DOIs.
Scholars have been complaining of information overload for more than a century. Online access provides much better discovery and aggregation tools, but these tools struggle against the fragmentation of research communication caused by the rapid proliferation of increasingly specialized and overlapping journals with decreasing quality of reviewing.
Libraries, Archives & Repositories
Libraries used to play an essential role in research communication. They purchased and maintained local collections of journals, monographs and books, reducing the latency and cost of access to research communications for researchers in the short term. As a side effect of doing so, they safeguarded access for scholars in the long term. A large number of identical copies in independently managed collections provided a robust preservation infrastructure for the scholarly record.
The transition to the Web as the medium for scholarly communication has ended the role for local library collections in the access path to the flow of research communication in the short term. In many countries, such as the US, libraries (sometimes in consortia) retain their role as the paying customers of the publishers. In other countries, such as the UK, negotiations as to the terms of access and payment for it are now undertaken at a national level. But neither provides librarians much ability to be discriminating customers of individual journals, because both are subject to the "big deal". Libraries bought into the big deal despite warnings from a few perceptive librarians who saw the threat:
"Academic library directors should not sign on to the Big Deal or any comprehensive licensing agreement with commercial publishers ... the Big Deal serves only the Big Publishers ... increasing our dependence on publishers who have already shown their determination to monopolize the marketplace"Libraries and archives have been forced to switch from purchasing a copy of the research communications of interest to their readers, to leasing access to the publisher's copy. Librarians did not find publishers’ promises of “perpetual access” to the subscribed materials convincing as a replacement for libraries’ role as long-term stewards of the record. Two approaches to this problem of long-term access have emerged:
- A single third-party subscription archive called Portico. Portico collects and preserves a copy of published material. Libraries subscribe to Portico and, as long as their subscription continues, can have access to material they used to but no longer subscribe to. Portico has been quite widely adopted, despite not actually implementing a solution to the problem of post-cancellation access (logically, it is a second instance of the same problem), but has yet to achieve economic sustainability.
- A distributed network of local library collections called LOCKSS (Lots Of Copies Keep Stuff Safe), modeled on the way libraries work in the paper world. Publishers grant permission for LOCKSS boxes at subscribing libraries to collect and preserve a copy of the content to which they subscribe. Fewer libraries are using the LOCKSS system to build collections than subscribe to Portico for post-cancellation access. Despite this the LOCKSS program has been financially sustainable since 2007.
In many fields the volumes of data to be published, and thus the costs of doing so, are formidable:
“Adequate and sustained funding for long-lived data collections ... remains a vexing problem ... the widely decentralized and nonstandard mechanisms for generating data ... make this problem an order of magnitude more difficult than than our experiences to date ...”In many cases, these vast collections of data are the output of scholars at many institutions, the motivation for an individual institution to expend the resources needed for publishing is weak. The business models for subject repositories are fragile; the UK's Arts and Humanities Data Service failed when its central funding was withdrawn, and arXiv.org's finances are shaky at best. A “Blue Ribbon Task Force” recently addressed the issue of sustainable funding for long-term access to data; its conclusions were not encouraging.
Academic publishing is a multi-billion dollar business. For at least some of the large publishers, both for-profit and not-for-profit, it is currently extraordinarily lucrative:
- Reed Elsevier's academic and medical division's 2010 results show revenues of $3160M and pre-tax profits of $1130M, which is a gross margin of 36%. The parent company's tax rate was 24%. Assuming that this applies uniformly across divisions, the Elsevier division made $868M net profit. In other words, 27.5 cents of every dollar in subscription payments went directly to Reed Elsevier's shareholders.
- Wiley is smaller but just as lucrative. Their 2010 results show their academic publishing division had revenues of $987M and pre-tax profit of $405M, a gross margin of 41%. The parent company's tax rate is 31%. On the same assumption the net profit is $
280200M; 2820 cents of every dollar of subscription goes directly to Wiley's shareholders (See note at end).
- Springer's 2008 results (the most recent available) are harder to interpret but by my computation the corresponding numbers are revenue $949M, pre-tax profit $361M, gross margin 38%, tax rate 11%, net profit $328M. About 34 cents of their subscription dollar flows to shareholders.
- The American Chemical Society is not for profit, but so lucrative that working chemists are annoyed. It had 2009 revenues of $460M, and paid its executives lavishly, at least compared to the salaries of chemists.
“At the beginning of 2011, researchers in Bangladesh, one of the world’s poorest countries, received a letter announcing that four big publishers would no longer be allowing free access to their 2500 journals through the Health InterNetwork for Access to Research Initiative (HINARI) system. It emerged later that other countries are also affected.”The world's research and education budgets pay these three companies about $3.2B/yr for management, editorial and distribution services. Over and above that, the worlds research and education budgets pay the shareholders of these three companies almost $1.5B for the privilege of reading the results of research (and writing and reviewing) that these budgets already paid for.
The over three billion dollars a year might be justified if the big publisher's journals were of higher quality than those of competing not-for-profit publishers, but:
"Surveys of [individual] journal [subscription] pricing ... show that the average price per page charged by commercial publishers is several times higher than that which is charged by professional societies and university presses. These price differences do not reflect differences in quality. If we use citation counts as a measure of journal quality ... we see that the prices charged per citation differ by an even greater margin."It is hard to justify the one and a half billion dollars a year on any basis. These numbers demonstrate that the three big publishers have effective monopoly power in their market:
"... despite the absence of obvious legal barriers to entry by new competing journals. Bergstrom argues that journals achieve monopoly power as the outcome of a “coordination game” in which the most capable authors and referees are attracted to journals with established reputations. This market power is sustained by copyright law, which restricts competitors from selling “perfect substitutes” for existing journals by publishing exactly the same articles. In contrast, sellers of shoes or houses are not restrained from producing nearly identical copies of their competitors' products."Publishers' major customers, libraries, are facing massive budget cuts thus are unlikely to be a major source of additional revenue:
"The Elsevier science and medical business ... saw modest growth reflecting a constrained customer budget environment."The bundling model of the big publishers means that, in response to these cuts, libraries typically cancel their subscriptions to smaller and not-for-profit publishers, so that tough times increase the market dominance of the big publishers.
The business of academic publishing has been slower to encounter, but is not immune from, the disruption the Internet has wrought on other content industries. The combination of cash-strapped customers, publishers desperate for more revenue, and the Internet's effect of greatly reduced costs of publishing, mean that disruption is inevitable. Fighting tooth and nail against this disruption, as the music business did, would be even more counter-productive in this case. The people the publishers would sue are the very people who create and review the content whose monetization the publishers would be defending.
To sum up, the advent of the Internet has greatly reduced the monetary value that can be extracted from academic content. Publishers who have depended on extracting this value face a crisis. The crisis is being delayed only by Universities and research funders. They have the power in their hands to insist on alternative models for access to the results of research, such as self-archiving, but have in most cases been reluctant to do so.
Software & Infrastructure Developers
A large and active movement is developing tools and network services intended to improve the effectiveness of research communication, and thus the productivity of researchers. These efforts are to a great extent hamstrung by two related problems, access to the flow of research communication, and the formats in which research is communicated.
Both problems can be illustrated by the example of mouse genetics. Researchers in the field need a database allowing them to search for experiments that have been performed on specific genes, and their results. The value of this service is such that it has been developed. However, because the format in which these experiments are reported is the traditional journal paper, this and similar databases are maintained by a whole class of scientists, generally post-PH.D. biologists, funded by NIH, who curate genetic and genomic information from published papers into the databases. These expensive people are wasting time they should be spending on research on tasks that could in principle be automated.
Automating this process would require providing software with access to the journal papers, replacing the access the curators get via their institution's journal subscriptions. Unfortunately, these papers are copyright, and the copyright is fragmented among a large number of publishers. The developers of such an automated system would have to negotiate individually with each publisher. If even a single publisher refused to permit access, the value of the automation would be greatly reduced, and volunteers would still be needed.
Equally, because the mechanism that enforces compliance with the current system of research communication attaches value only to publications in traditional formats, vast human and machine efforts are required to extract the factual content of the communication from the traditional format. Were researchers to publish their content in formats better adapted to information technology, these costs could be avoided.
Generalizing, we can say that improving the current system requires:
- more information to be published,
- in formats more suited to information technology,
- less encumbered with intellectual property restrictions,
- more cheaply,
- with better discovery and aggregation mechanisms,
- better quality metrics,
- better mechanisms for improving quality,
- and sustainably preserved for future scholars.