DSHR's Blog: November 2019

Tuesday, November 26, 2019

737 MAX: The Case Against Boeing

The title of Alec McGillis' The Case Against Boeing is misleading. Samya Stumo, one of the victims of the second 737 MAX crash was the daughter of a niece of Ralph Nader:

They were the first American family to sue Boeing, accusing the company of gross negligence and recklessness.

McGillis certainly does discuss some of the ways the culture of Douglas led to Boeing's malfeasance, including blaming the pilots:

Boeing seemed to believe that pilot error had caused the crash. In its response to an initial Indonesian government report, it highlighted the contrasting reactions of the crew on the doomed flight and the crew the day before, saying that the pilots on the second day had not followed the standard “runaway trim” procedures.

But that's not really what the article is about. Follow me below the fold as I try to tease out the real story McGillis tells, and then add more news on the topic.

Seeds Or Code?

Svalbard Summer '69

I'd like to congratulate Microsoft on a truly excellent PR stunt, drawing attention to two important topics about which I've been writing for a long time, the cultural significance of open source software, and the need for digital preservation. Ashlee Vance provides the channel to publicize the stunt in Open Source Code Will Survive the Apocalypse in an Arctic Cave. In summary, near Longyearbyen on Spitzbergen is:

the Svalbard Global Seed Vault, where seeds for a wide range of plants, including the crops most valuable to humans, are preserved in case of some famine-inducing pandemic or nuclear apocalypse.

Nearby, in a different worked-out coal mine, is the Arctic World Archive:

The AWA is a joint initiative between Norwegian state-owned mining company Store Norske Spitsbergen Kulkompani (SNSK) and very-long-term digital preservation provider Piql AS. AWA is devoted to archival storage in perpetuity. The film reels will be stored in a steel-walled container inside a sealed chamber within a decommissioned coal mine on the remote archipelago of Svalbard. The AWA already preserves historical and cultural data from Italy, Brazil, Norway, the Vatican, and many others.

Github, the newly-acquired Microsoft subsidiary, will deposit there:

The 02/02/2020 snapshot archived in the GitHub Arctic Code Vault will sweep up every active public GitHub repository, in addition to significant dormant repos as determined by stars, dependencies, and an advisory panel. The snapshot will consist of the HEAD of the default branch of each repository, minus any binaries larger than 100KB in size. Each repository will be packaged as a single TAR file. For greater data density and integrity, most of the data will be stored QR-encoded. A human-readable index and guide will itemize the location of each repository and explain how to recover the data.

Follow me below the fold for an explanation of why I call this admirable effort a PR stunt, albeit a well-justified one.

Auditing The Integrity Of Multiple Replicas

The fundamental problem in the design of the LOCKSS system was to audit the integrity of multiple replicas of content stored in unreliable, mutually untrusting systems without downloading the entire content:

Multiple replicas, in our case lots of them, resulted from our way of dealing with the fact that the academic journals the system was designed to preserve were copyright, and the copyright was owned by rich, litigious members of the academic publishing oligopoly. We defused this issue by insisting that each library keep its own copy of the content to which it subscribed.
Unreliable, mutually untrusting systems was a consequence. Each library's system had to be as cheap to own, administer and operate as possible, to keep the aggregate cost of the system manageable, and to keep the individual cost to a library below the level that would attract management attention. So neither the hardware nor the system administration would be especially reliable.
Without downloading was another consequence, for two reasons. Downloading the content from lots of nodes on every audit would be both slow and expensive. But worse, it would likely have been a copyright violation and subjected us to criminal liability under the DMCA.

Our approach, published now more than 16 years ago, was to have each node in the network compare its content with that of the consensus among a randomized subset of the other nodes holding the same content. They did so using a peer-to-peer protocol using proof-of-work, in some respects one of the many precursors of Satoshi Nakamoto's Bitcoin protocol.

Lots of replicas are essential to the working of the LOCKSS protocol, but more normal systems don't have that many for obvious economic reasons. Back then there were integrity audit systems developed that didn't need an excess of replicas, including work by Mehul Shah et al, and Jaja and Song. But, primarily because the implicit threat models of most archival systems in production assumed trustworthy infrastructure, these systems were not widely used. Outside the archival space, there wasn't a requirement for them.

A decade and a half later the rise of, and risks of, cloud storage have sparked renewed interest in this problem. Yangfei Lin et al's Multiple‐replica integrity auditing schemes for cloud data storage provides a useful review of the current state-of-the-art. Below the fold, a discussion of their, and some related work.

Academic Publishers As Parasites

This is just a quick post to draw attention to From symbiont to parasite: the evolution of for-profit science publishing by UCSF's Peter Walter and Dyche Mullins in Molecular Biology of the Cell. It is a comprehensive overview of the way the oligopoly publishers obtained and maintain their rent-extraction from the academic community:

"Scientific journals still disseminate our work, but in the Internet-connected world of the 21st century, this is no longer their critical function. Journals remain relevant almost entirely because they provide a playing field for scientific and professional competition: to claim credit for a discovery, we publish it in a peer-reviewed journal; to get a job in academia or money to run a lab, we present these published papers to universities and funding agencies. Publishing is so embedded in the practice of science that whoever controls the journals controls access to the entire profession."

My only criticisms are a lack of cynicism about the perks publishers distribute:

They pay no attention to the role of librarians, who after all actually "negotiate" with the publishers and sign the checks.
They write:

we work for them for free in producing the work, reviewing it, and serving on their editorial boards
We have spoken with someone who used to manage top journals for a major publisher. His internal margins were north of 90%, and the single biggest expense was the care and feeding of the editorial board.

And they are insufficiently skeptical of claims as to the value that journals add. See my Journals Considered Harmful from 2013.

Despite these quibbles, you should definitely go read the whole paper.

DSHR's Blog

Tuesday, November 26, 2019

737 MAX: The Case Against Boeing

Tuesday, November 19, 2019

Seeds Or Code?

Thursday, November 14, 2019

Auditing The Integrity Of Multiple Replicas

Tuesday, November 12, 2019

Academic Publishers As Parasites