Thursday, June 22, 2017

WAC2017: Security Issues for Web Archives

Jack Cushman and Ilya Kreymer's Web Archiving Conference talk Thinking like a hacker: Security Considerations for High-Fidelity Web Archives is very important. They discuss 7 different security threats specific to Web archives:
  1. Archiving local server files
  2. Hacking the headless browser
  3. Stealing user secrets during capture
  4. Cross site scripting to steal archive logins
  5. Live web leakage on playback
  6. Show different page contents when archived
  7. Banner spoofing
Below the fold, a brief summary of each to encourage you to do two things:
  1. First, view the slides.
  2. Second, visit http://warc.games., which is a sandbox with
    a local version of Webrecorder that has not been patched to fix known exploits, and a number of challenges for you learn how they might apply to web archives in general.

Tuesday, June 20, 2017

Analysis of Sci-Hub Downloads

Bastian Greshake has a post at the LSE's Impact of Social Sciences blog based on his F1000Research paper Looking into Pandora's Box. In them he reports on an analysis combining two datasets released by Alexandra Elbakyan:
  • A 2016 dataset of 28M downloads from Sci-Hub between September 2015 and February 2016.
  • A 2017 dataset of 62M DOIs to whose content Sci-Hub claims to be able to provide access.
Below the fold, some extracts and commentary.

Thursday, June 15, 2017

Emulation: Windows10 on ARM

At last December's WinHEC conference, Qualcomm and Microsoft made an announcement to which I should have paid more attention:
Qualcomm ... announced that they are collaborating with Microsoft Corp. to enable Windows 10 on mobile computing devices powered by next-generation Qualcomm® Snapdragon™ processors, enabling mobile, power efficient, always-connected cellular PC devices. Supporting full compatibility with the Windows 10 ecosystem, the Snapdragon processor is designed to enable Windows hardware developers to create next generation device form factors, providing mobility to cloud computing.
The part I didn't think about was:
New Windows 10 PCs powered by Snapdragon can be designed to support x86 Win32 and universal Windows apps, including Adobe Photoshop, Microsoft Office and Windows 10 gaming titles.
How do they do that? The answer is obvious: emulation! Below the fold, some thoughts.

Tuesday, June 13, 2017

Crowd-sourced Peer Review

At Ars Technica, Chris Lee's Journal tries crowdsourcing peer reviews, sees excellent results takes off from a column at Nature by a journal editor, Benjamin List, entitled Crowd-based peer review can be good and fast. List and his assistant Denis Höfler have come up with a pre-publication peer-review process that, while retaining what they see as its advantages, has some of the attributes of post-publication review as practiced, for example, by Faculty of 1000. See also here. Below the fold, some commentary.

Thursday, June 8, 2017

Public Resource Audits Scholarly Literature

I (from personal experience), and others, have commented previously on the way journals paywall articles based on spurious claims that they own the copyright, even when there is clear evidence that they know that these claims are false. This is copyfraud, but:
While falsely claiming copyright is technically a criminal offense under the Act, prosecutions are extremely rare. These circumstances have produced fraud on an untold scale, with millions of works in the public domain deemed copyrighted, and countless dollars paid out every year in licensing fees to make copies that could be made for free.
The clearest case of journal copyfraud is when journals claim copyright on articles authored by US federal employees:
Work by officers and employees of the government as part of their official duties is "a work of the United States government" and, as such, is not entitled to domestic copyright protection under U.S. law. So, inside the US there is no copyright to transfer, and outside the US the copyright is owned by the US government, not by the employee. It is easy to find papers that apparently violate this, such as James Hansen et al's Global Temperature Change. It carries the statement "© 2006 by The National Academy of Sciences of the USA" and states Hansen's affiliation as "National Aeronautics and Space Administration Goddard Institute for Space Studies".
Perhaps the most compelling instance is the AMA falsely claiming to own the copyright on United States Health Care Reform: Progress to Date and Next Steps by one Barack Obama.

Now, Carl Malamud tweets:
Public Resource has been conducting an intensive audit of the scholarly literature. We have focused on works of the U.S. government. Our audit has determined that 1,264,429 journal articles authored by federal employees or officers are potentially void of copyright.
They extracted metadata from Sci-Hub and found:
Of the 1,264,429 government journal articles I have metadata for, I am now able to access 1,141,505 files (90.2%) for potential release.
This is already extremely valuable work. But in addition:
2,031,359 of the articles in my possession are dated 1923 or earlier. These 2 categories represent 4.92% of scihub. Additional categories to examine include lapsed copyright registrations, open access that is not, and author-retained copyrights.
It is long past time for action against the rampant copyfraud by academic journals.

Tip of the hat to James R. Jacobs.