Tuesday, July 26, 2016

The Citation Graph

An important point raised during the discussions at the recent JISC-CNI meeting is also raised by Larivière et al's A simple proposal for the publication of journal citation distributions:
However, the raw citation data used here are not publicly available but remain the property of Thomson Reuters. A logical step to facilitate scrutiny by independent researchers would therefore be for publishers to make the reference lists of their articles publicly available. Most publishers already provide these lists as part of the metadata they submit to the Crossref metadata database and can easily permit Crossref to make them public, though relatively few have opted to do so. If all Publisher and Society members of Crossref (over 5,300 organisations) were to grant this permission, it would enable more open research into citations in particular and into scholarly communication in general.
In other words, despite the importance of the citation graph for understanding and measuring the output of science, the data are in private hands, and are analyzed by opaque algorithms to produce a metric (journal impact factor) that is easily gamed and is corrupting the entire research ecosystem.

Simply by asking to flip a bit, publishers already providing their citations to CrossRef can make them public, but only a few have done so.

Larivière et al's painstaking research shows that journal publishers and others with access to these private databases (Web of Science and Scopus) can use it to graph the distribution of citations to the articles they publish. Doing so reveals that:
the shape of the distribution is highly skewed to the left, being dominated by papers with lower numbers of citations. Typically, 65-75% of the articles have fewer citations than indicated by the JIF. The distributions are also characterized by long rightward tails; for the set of journals analyzed here, only 15-25% of the articles account for 50% of the citations
Thus, as has been shown many times before, the impact factor of a journal conveys no useful information about the quality of a paper it contains. Further, the data on which it is based is itself suspect:
On a technical point, the many unmatched citations ... that were discovered in the data for eLife, Nature Communications, Proceedings of the Royal Society: Biology Sciences and Scientific Reports raises concerns about the general quality of the data provided by Thomson Reuters. Searches for citations to eLife papers, for example, have revealed that the data in the Web of ScienceTM are incomplete owing to technical problems that Thomson Reuters is currently working to resolve. ...
Because the citation graph data is not public, audits such as Larivière et al's are difficult and rare. Were the data to be public, both publishers and authors would be able to, and motivated to, improve it. It is perhaps a straw in the wind that Larivière's co-authors include senior figures from PLoS, AAAS, eLife, EMBO, Nature and the Royal Society.

No comments:

Post a Comment