Thursday, June 8, 2017

Public Resource Audits Scholarly Literature

I (from personal experience), and others, have commented previously on the way journals paywall articles based on spurious claims that they own the copyright, even when there is clear evidence that they know that these claims are false. This is copyfraud, but:
While falsely claiming copyright is technically a criminal offense under the Act, prosecutions are extremely rare. These circumstances have produced fraud on an untold scale, with millions of works in the public domain deemed copyrighted, and countless dollars paid out every year in licensing fees to make copies that could be made for free.
The clearest case of journal copyfraud is when journals claim copyright on articles authored by US federal employees:
Work by officers and employees of the government as part of their official duties is "a work of the United States government" and, as such, is not entitled to domestic copyright protection under U.S. law. So, inside the US there is no copyright to transfer, and outside the US the copyright is owned by the US government, not by the employee. It is easy to find papers that apparently violate this, such as James Hansen et al's Global Temperature Change. It carries the statement "© 2006 by The National Academy of Sciences of the USA" and states Hansen's affiliation as "National Aeronautics and Space Administration Goddard Institute for Space Studies".
Perhaps the most compelling instance is the AMA falsely claiming to own the copyright on United States Health Care Reform: Progress to Date and Next Steps by one Barack Obama.

Now, Carl Malamud tweets:
Public Resource has been conducting an intensive audit of the scholarly literature. We have focused on works of the U.S. government. Our audit has determined that 1,264,429 journal articles authored by federal employees or officers are potentially void of copyright.
They extracted metadata from Sci-Hub and found:
Of the 1,264,429 government journal articles I have metadata for, I am now able to access 1,141,505 files (90.2%) for potential release.
This is already extremely valuable work. But in addition:
2,031,359 of the articles in my possession are dated 1923 or earlier. These 2 categories represent 4.92% of scihub. Additional categories to examine include lapsed copyright registrations, open access that is not, and author-retained copyrights.
It is long past time for action against the rampant copyfraud by academic journals.

Tip of the hat to James R. Jacobs.


Thomas Munro said...

This is magnificent. Bravo to Malamud! His mention of lapsed copyrights is also promising: that is, anything published in the US before 1964 on which copyright was not renewed. This is a vast untapped resource. Surprisingly, renewal was uncommon: even top journals failed to do it. For instance, Nature and Science never renewed their copyrights, 'Journal of the American Medical Association' didn't before 1960, and 'New England Journal of Medicine' didn't before 1957. Only a handful of journals renewed before 1940. I would guess that the vast majority of scientific articles published before 1964 are in the public domain in the US. All that remains is for them to be shared.
There is a thorough list of renewals here:

David. said...

At some point the AMA backed off from their copyright claim. Barack Obama's article now carries this:

"Disclaimer: The journal’s copyright notice applies to the distinctive display of this JAMA article, and not the President’s work or words."