DSHR's Blog: October 2019

Thursday, October 31, 2019

Aviation's Groundhog Day

Searching for 40-year old lessons for Boeing in the grounding of the DC-10 by Jon Ostrower is subtitled An eerily similar crash in Chicago 40-years ago holds lessons for Boeing and the 737 Max that reverberate through history. Ostrower writes that it is:

The first in a series on the historical parallels and lessons that unite the groundings of the DC-10 and 737 Max.

I hope he's right about the series, because this first part is a must-read account of the truly disturbing parallels between the dysfunction at McDonnell-Douglas and the FAA that led to the May 25^th 1979 Chicago crash of a DC-10, and the dysfunction at Boeing (whose management is mostly the result of the merger with McDonnell-Douglas) and the FAA that led to the two 737 MAX crashes. Ostrow writes:

The grounding of the DC-10 ignited a debate over system redundancy, crew alerting, requirements for certification, and insufficient oversight and expertise of an under-resourced regulator — all familiar topics that are today at the center of the 737 Max grounding. To revisit the events of 40 years ago is to revisit a safety crisis that, swapping a few specific details, presents striking similarities four decades later, all the way down to the verbiage.

Below the fold, some commentary with links to other reporting.

Future of Open Access

The Future of OA: A large-scale analysis projecting Open Access publication and readership by Heather Piwowar, Jason Priem and Richard Orr is an important study of the availability and use of Open Access papers:

This study analyses the number of papers available as OA over time. The models includes both OA embargo data and the relative growth rates of different OA types over time, based on the OA status of 70 million journal articles published between 1950 and 2019.

The study also looks at article usage data, analyzing the proportion of views to OA articles vs views to articles which are closed access. Signal processing techniques are used to model how these viewership patterns change over time. Viewership data is based on 2.8 million uses of the Unpaywall browser extension in July 2019.

They conclude:

One interesting realization from the modeling we’ve done is that when the proportion of papers that are OA increases, or when the OA lag decreases, the total number of views increase -- the scholarly literature becomes more heavily viewed and thus more valuable to society.

Thus clearly demonstrating one part of the value that open access adds. Below the fold, some details and commentary.

MementoMap

I've been writing about how important Memento is for Web archiving, and how its success depends upon the effectiveness of Memento Aggregators since at least 2011:

In a recent post I described how Memento allows readers to access preserved web content, and how, just as accessing current Web content frequently requires the Web-wide indexes from keywords to URLs maintained by search engines such as Google, access to preserved content will require Web-wide indexes from original URL plus time of collection to preserved URL. These will be maintained by search-engine-like services that Memento calls Aggregators

Memento Aggregators turned out to be both useful, and a hard engineering problem. Below the fold, a discussion of MementoMap Framework for Flexible and Adaptive Web Archive Profiling by Sawood Alam et al from Old Dominion University and Arquivo.pt, which both reviews the history of finding out how hard it is, and reports on fairly encouraging progress in attacking it.

Be Careful What You Measure

"Be careful what you measure, because that's what you'll get" is a management platitude dating back at least to V. F. Ridgway's 1956 Dysfunctional Consequences of Performance Measurements:

Quantitative measures of performance are tools, and are undoubtedly useful. But research indicates that indiscriminate use and undue confidence and reliance in them result from insufficient knowledge of the full effects and consequences. ... It seems worth while to review the current scattered knowledge of the dysfunctional consequences resulting from the imposition of a system of performance measurements.

Back in 2013 I wrote Journals Considered Harmful, based on Deep Impact: Unintended consequences of journal rank by Björn Brembs and Marcus Munaf, which documented that the use of Impact Factor to rank journals had caused publishers to game the system, with negative impacts on the integrity of scientific research. Below the fold I look at a recent study showing similar negative impacts on research integrity.

Nanopore Technology For DNA Storage

DNA assembly for nanopore data storage readout by Randolph Lopez et al from the UW/Microsoft team continues their steady progress in developing technologies for data storage in DNA.

Below the fold, some details and a little discussion.

Real-Time Gross Settlement

Cryptocurrency advocates appear to believe that the magic of cryptography makes the value of trust zero, but they’re wrong. Follow me below the fold for an example that shows this.

The Data Isn't Yours (updated)

Most discussions of Internet privacy, for example Jaron Lanier Fixes the Internet, systematically elide the distinction between "my data" and "data about me". In doing so they systematically exaggerate the value of "my data".

The typical interaction that generates data about an Internet user involves two parties, a client and a server. Both parties know what happened (a link was clicked, a purchase was made, ...). This isn't "my data", it is data shared between the client ("me") and the server. The difference is that the server can aggregate the data from many interactions and, by doing so, create something sufficiently valuable that others will pay for it. The client ("my data") cannot.

Below the fold, an update.

Guest post: Ilya Kreymer's Client-Side Replay Technology

Ilya Kreymer gave a brief description of his recent development of client-side replay for WARC-based Web archives in this comment on my post Michael Nelson's CNI Keynote: Part 3. It uses Service Workers, which Matt Gaunt describes in Google's Web Fundamentals thus:

A service worker is a script that your browser runs in the background, separate from a web page, opening the door to features that don't need a web page or user interaction. Today, they already include features like push notifications and background sync. In the future, service workers might support other things like periodic sync or geofencing. The core feature discussed in this tutorial is the ability to intercept and handle network requests, including programmatically managing a cache of responses.

Client-side replay was clearly an important advance, so I asked him for a guest post with the details. Below the fold, here it is.

DSHR's Blog

Thursday, October 31, 2019

Aviation's Groundhog Day

Thursday, October 24, 2019

Future of Open Access

Tuesday, October 22, 2019

MementoMap

Thursday, October 17, 2019

Be Careful What You Measure

Tuesday, October 15, 2019

Nanopore Technology For DNA Storage

Thursday, October 10, 2019

Real-Time Gross Settlement

Tuesday, October 8, 2019

The Data Isn't Yours (updated)

Thursday, October 3, 2019

Guest post: Ilya Kreymer's Client-Side Replay Technology