Wednesday, April 12, 2017

Identifiers: A Double-Edged Sword

This is the last of my posts from CNI's Spring 2017 Membership Meeting. Predecessors are Researcher Privacy, Research Access for the 21st Century, and The Orphans of Scholarship.

Geoff Bilder's Open Persistent Identifier Infrastructures: The Key to Scaling Mandate Auditing and Assessment Exercises was ostensibly a report on the need for and progress in bringing together the many disparate identifier systems for organizations in order to facilitate auditing and assessment processes. It was actually an insightful rant about how these processes were corrupting the research ecosystem. Below the fold, I summarize Geoff's argument (I hope Geoff will correct me if I misrepresent him) and rant back.

The non-rant part of Geoff's talk started from the premise that researchers and their institutions are increasingly subject by funders and governments to assessments, such as the UK's Research Excellence Framework, and mandates, such as the Wellcome Trust's open access mandate. Compliance with the mandates has been generally poor.

Assessing how poor, and assessing the excellence of research both require an ample supply of high-quality metadata, which in principle Crossref is in a good position to supply. To assess research productivity, three main types of identifier are needed; content, contributor, and organization. Geoff used this three-legged stool image to show that:
The rant part was not about what identifiers are, but about what they are used for. It took off from Geoff's question as to whether the audience thought that the pressure-cooker of all these assessments was likely to lead to greater creativity.

I have a great counter-example. The physicist G. I. Taylor (my great-uncle) started in 1909 with the classic experiment which showed that interference fringes were still observed at such low intensity that only a single photon at a time was in flight. The following year at age 23 he was elected a Fellow of Trinity College, and apart from a few years teaching, he was able to pursue research undisturbed by any assessment for the next 6 decades. Despite this absence of pressure, he was one of the 20th century's most productive scientists, with four huge volumes of collected papers over a 60-year career.

Papers/year (linear)
Since the assessments are all based on counting the number of peer-reviewed publications meeting certain criteria, one result has been gradually accelerating exponential growth in the number of peer-reviewed publications. But it is clear that More Is Not Better in which I wrote:
The Economist's Incentive Malus, ... is based on The natural selection of bad science by Paul E. Smaldino and Richard McElreath, which starts:
Poor research design and data analysis encourage false-positive findings. Such poor methods persist despite perennial calls for improvement, suggesting that they result from something more than just misunderstanding. The persistence of poor methods results partly from incentives that favour them, leading to the natural selection of bad science. This dynamic requires no conscious strategizing—no deliberate cheating nor loafing—by scientists, only that publication is a principal factor for career advancement.
The Economist reports Smaldino and McElreath's conclusion is bleak:
that when the ability to publish copiously in journals determines a lab’s success, then “top-performing laboratories will always be those who are able to cut corners”—and that is regardless of the supposedly corrective process of replication.
Papers/year (log-linear)
Only two things have interrupted this explosion of publishing; wars and depressions. Geoff and I are both concerned that recent political events in several of the leading research countries will lead to significant cuts in public funding for research, and thus increase the pressure in the cooker.  Research suggests that this will lead to higher retraction rates and more misconduct, further eroding the waning credibility of science. As Arthur Caplan (of the Division of Medical Ethics at NYU's Langone Medical Center) put it:
The time for a serious, sustained international effort to halt publication pollution is now. Otherwise scientists and physicians will not have to argue about any issue—no one will believe them anyway.
(see also John Michael Greer).

Post-PhD science career tracks
In 2010 the Royal Society issued a report on research with much valuable information. Alas, it is more relevant today than it was then, because the trends it identified have continued unabated. Geoff took from this report a graphic that illustrates how insanely competitive academia is as a career path. It shows that over half the newly minted Ph.D.s leave science immediately. Only one in five make research a career, and less than one in two hundred make professor. Geoff is concerned that Ph.D. programs are too focused on the one and not enough on the other one hundred and ninety-nine, and I agree. My friend Doug Kalish has built a business in retirement addressing this issue.

My Ph.D. was in Mechanical Engineering, so I'm not a scientist in the sense the Royal Society uses. I was a post-doc (at Edinburgh University) and then research staff (at Carnegie-Mellon University) before moving to industry (initially at Sun Microsystems) and eventually back to academia (at Stanford). I've published quite a bit both from academia and from industry but I was never in the publish or perish rat-race. I was always assessed on the usefulness of the stuff I built; the pressures in engineering are different.

Research funding flows
My take on the graph above is a bit different from Geoff's. I start from another graphic from the Royal Society report, showing the flow of funds in UK research and development, which includes much engineering (or its equivalent in the life sciences). Note that "private and industrial research" consumes nearly twice the funding of "university" and "public research institutions" combined. So one would expect about 2/3 of the Ph.D.s to be outside the universities and public research institutions. The split in the graphic Geoff used is 4/5, but one would expect that including engineering would lead to more retention of Ph.D.s. It is easier to fund engineering research in Universities than science because it has more immediate industrial application.

Besides, Ph.D.s leaving academia for industry is a good thing. Most of the "engineers" I worked with at my three successful Silicon Valley startups had Ph.D.s in physics, mathematics and computer science, not engineering. My Mech. Eng. Ph.D. was an outlier. Silicon Valley would not exist but for Ph.D.s leaving research to create products in industry.

1 comment:

David. said...

Rebecca Hill at The Register notes that the Wellcome Trust has updated its open access policy to include, among other things, software:

"The existing data management and sharing policy, introduced in 2007, requires that grant holders make data available "in a timely and responsible manner, with as few restrictions as possible".

The new policy extends this to original software and research materials such as cell lines, reagents and antibodies."

This is important. Much of the time the data isn't really useful without the software. And it will give emulation increased importance.