Thursday, December 5, 2019

Meta: Blog On Hiatus

I'm not going to be able to blog for a short while, probably a couple of weeks.

Tuesday, November 26, 2019

737 MAX: The Case Against Boeing

The title of Alec McGillis' The Case Against Boeing is misleading. Samya Stumo, one of the victims of the second 737 MAX crash was the daughter of a niece of Ralph Nader:
They were the first American family to sue Boeing, accusing the company of gross negligence and recklessness.
McGillis certainly does discuss some of the ways the culture of Douglas led to Boeing's malfeasance, including blaming the pilots:
Boeing seemed to believe that pilot error had caused the crash. In its response to an initial Indonesian government report, it highlighted the contrasting reactions of the crew on the doomed flight and the crew the day before, saying that the pilots on the second day had not followed the standard “runaway trim” procedures.
But that's not really what the article is about. Follow me below the fold as I try to tease out the real story McGillis tells, and then add more news on the topic.

Tuesday, November 19, 2019

Seeds Or Code?

I'd like to congratulate Microsoft on a truly excellent PR stunt, drawing attention to two important topics about which I've been writing for a long time, the cultural significance of open source software, and the need for digital preservation. Ashlee Vance provides the channel to publicize the stunt in Open Source Code Will Survive the Apocalypse in an Arctic Cave. In summary, near Longyearbyen on Spitzbergen is:
the Svalbard Global Seed Vault, where seeds for a wide range of plants, including the crops most valuable to humans, are preserved in case of some famine-inducing pandemic or nuclear apocalypse.
Nearby, in a different worked-out coal mine, is the Arctic World Archive:
The AWA is a joint initiative between Norwegian state-owned mining company Store Norske Spitsbergen Kulkompani (SNSK) and very-long-term digital preservation provider Piql AS. AWA is devoted to archival storage in perpetuity. The film reels will be stored in a steel-walled container inside a sealed chamber within a decommissioned coal mine on the remote archipelago of Svalbard. The AWA already preserves historical and cultural data from Italy, Brazil, Norway, the Vatican, and many others.
Github, the newly-acquired Microsoft subsidiary, will deposit there:
The 02/02/2020 snapshot archived in the GitHub Arctic Code Vault will sweep up every active public GitHub repository, in addition to significant dormant repos as determined by stars, dependencies, and an advisory panel. The snapshot will consist of the HEAD of the default branch of each repository, minus any binaries larger than 100KB in size. Each repository will be packaged as a single TAR file. For greater data density and integrity, most of the data will be stored QR-encoded. A human-readable index and guide will itemize the location of each repository and explain how to recover the data.
Follow me below the fold for an explanation of why I call this admirable effort a PR stunt, albeit a well-justified one.

Thursday, November 14, 2019

Auditing The Integrity Of Multiple Replicas

The fundamental problem in the design of the LOCKSS system was to audit the integrity of multiple replicas of content stored in unreliable, mutually untrusting systems without downloading the entire content:
  • Multiple replicas, in our case lots of them, resulted from our way of dealing with the fact that the academic journals the system was designed to preserve were copyright, and the copyright was owned by rich, litigious members of the academic publishing oligopoly. We defused this issue by insisting that each library keep its own copy of the content to which it subscribed.
  • Unreliable, mutually untrusting systems was a consequence. Each library's system had to be as cheap to own, administer and operate as possible, to keep the aggregate cost of the system manageable, and to keep the individual cost to a library below the level that would attract management attention. So neither the hardware nor the system administration would be especially reliable.
  • Without downloading was another consequence, for two reasons. Downloading the content from lots of nodes on every audit would be both slow and expensive. But worse, it would likely have been a copyright violation and subjected us to criminal liability under the DMCA.
Our approach, published now more than 16 years ago, was to have each node in the network compare its content with that of the consensus among a randomized subset of the other nodes holding the same content. They did so using a peer-to-peer protocol using proof-of-work, in some respects one of the many precursors of Satoshi Nakamoto's Bitcoin protocol.

Lots of replicas are essential to the working of the LOCKSS protocol, but more normal systems don't have that many for obvious economic reasons. Back then there were integrity audit systems developed that didn't need an excess of replicas, including work by Mehul Shah et al, and Jaja and Song. But, primarily because the implicit threat models of most archival systems in production assumed trustworthy infrastructure, these systems were not widely used. Outside the archival space, there wasn't a requirement for them.

A decade and a half later the rise of, and risks of, cloud storage have sparked renewed interest in this problem. Yangfei Lin et al's Multiple‐replica integrity auditing schemes for cloud data storage provides a useful review of the current state-of-the-art. Below the fold, a discussion of their, and some related work.

Tuesday, November 12, 2019

Academic Publishers As Parasites

This is just a quick post to draw attention to From symbiont to parasite: the evolution of for-profit science publishing by UCSF's Peter Walter and Dyche Mullins in Molecular Biology of the Cell. It is a comprehensive overview of the way the oligopoly publishers obtained and maintain their rent-extraction from the academic community:
"Scientific journals still disseminate our work, but in the Internet-connected world of the 21st century, this is no longer their critical function. Journals remain relevant almost entirely because they provide a playing field for scientific and professional competition: to claim credit for a discovery, we publish it in a peer-reviewed journal; to get a job in academia or money to run a lab, we present these published papers to universities and funding agencies. Publishing is so embedded in the practice of science that whoever controls the journals controls access to the entire profession."
My only criticisms are a lack of cynicism about the perks publishers distribute:
  • They pay no attention to the role of librarians, who after all actually "negotiate" with the publishers and sign the checks.
  • They write:
    we work for them for free in producing the work, reviewing it, and serving on their editorial boards
    We have spoken with someone who used to manage top journals for a major publisher. His internal margins were north of 90%, and the single biggest expense was the care and feeding of the editorial board.
And they are insufficiently skeptical of claims as to the value that journals add. See my Journals Considered Harmful from 2013.

Despite these quibbles, you should definitely go read the whole paper.

Thursday, October 31, 2019

Aviation's Groundhog Day

Searching for 40-year old lessons for Boeing in the grounding of the DC-10 by Jon Ostrower is subtitled An eerily similar crash in Chicago 40-years ago holds lessons for Boeing and the 737 Max that reverberate through history. Ostrower writes that it is:
The first in a series on the historical parallels and lessons that unite the groundings of the DC-10 and 737 Max.
I hope he's right about the series, because this first part is a must-read account of the truly disturbing parallels between the dysfunction at McDonnell-Douglas and the FAA that led to the May 25th 1979 Chicago crash of a DC-10, and the dysfunction at Boeing (whose management is mostly the result of the merger with McDonnell-Douglas) and the FAA that led to the two 737 MAX crashes. Ostrow writes:
The grounding of the DC-10 ignited a debate over system redundancy, crew alerting, requirements for certification, and insufficient oversight and expertise of an under-resourced regulator — all familiar topics that are today at the center of the 737 Max grounding. To revisit the events of 40 years ago is to revisit a safety crisis that, swapping a few specific details, presents striking similarities four decades later, all the way down to the verbiage.
Below the fold, some commentary with links to other reporting.

Thursday, October 24, 2019

Future of Open Access

The Future of OA: A large-scale analysis projecting Open Access publication and readership by Heather Piwowar, Jason Priem and Richard Orr is an important study of the availability and use of Open Access papers:
This study analyses the number of papers available as OA over time. The models includes both OA embargo data and the relative growth rates of different OA types over time, based on the OA status of 70 million journal articles published between 1950 and 2019.

The study also looks at article usage data, analyzing the proportion of views to OA articles vs views to articles which are closed access. Signal processing techniques are used to model how these viewership patterns change over time. Viewership data is based on 2.8 million uses of the Unpaywall browser extension in July 2019.
They conclude:
One interesting realization from the modeling we’ve done is that when the proportion of papers that are OA increases, or when the OA lag decreases, the total number of views increase -- the scholarly literature becomes more heavily viewed and thus more valuable to society.
Thus clearly demonstrating one part of the value that open access adds. Below the fold, some details and commentary.