Thursday, December 28, 2017

Why Decentralize?

In Blockchain: Hype or Hope? (paywalled until June '18) Radia Perlman asks what exactly you get in return for the decentralization provided by the enormous resource cost of blockchain technologies? Her answer is:
a ledger agreed upon by consensus of thousands of anonymous entities, none of which can be held responsible or be shut down by some malevolent government ... [but] most applications would not require or even want this property.
Two important essays published last February by pioneers in the field provide different answers to Perlman's question:
Below the fold I try to apply our experience with the decentralized LOCKSS technology to ask whether their arguments hold up. I'm working on a follow-up post based on Chelsea Barabas, Neha Narula and Ethan Zuckerman's Defending Internet Freedom through Decentralization from last August, which asks the question specifically about the decentralized Web and thus the idea of decentralized storage.

Tuesday, December 26, 2017

Updating Flash vs. Hard Disk

Chris Mellor at The Register has a useful update on the evolution of the storage market based on analysis from Aaron Rakers. Below the fold, I have some comments on it. In order to understand them you will need to have read my post The Medium-Term Prospects for Long-Term Storage Systems from a year ago.

Thursday, December 21, 2017

Science Friday's "File Not Found"

Science Friday's Lauren Young has a three-part series on digital preservation:
  1. Ghosts In The Reels is about magnetic tape.
  2. The Librarians Saving The Internet is about Web archiving.
  3. Data Reawakening is about the search for a quasi-immortal medium.
Clearly, increasing public attention to the problem of preserving digital information is a good thing, but I have reservations about these posts. Below the fold, I lay them out.

Tuesday, December 19, 2017

Bad Identifiers

This post on persistent identifiers (PIDs) has been sitting in my queue in note form for far too long. Its re-animation was sparked by an excellent post at PLOS Biologue by Julie McMurry, Lilly Winfree and Melissa Haendel entitled Bad Identifiers are the Potholes of the Information Superhighway: Take-Home Lessons for Researchers, which draws attention to a paper, Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data, of which they are three of the many authors. In addition, there were two papers at this year's iPRES on the topic;
Below the fold, some thoughts on PIDs.

Thursday, December 7, 2017

Cliff Lynch's Stewardship in the "Age of Algorithms"

Cliff Lynch has just published a long and very important article at First Monday entitled Stewardship in the "Age of Algorithms". It is a much broader look than my series The Amnesiac Civilization at the issues around providing the future with a memory of today's society.

Cliff accurately describes the practical impossibility of archiving the systems such as Facebook that today form the major part of most people's information environment and asks:
If we abandon the ideas of archiving in the traditional preservation of an artifact sense, it’s helpful to recall the stewardship goal here to guide us: to capture the multiplicity of ways in which a given system behaves over the range of actual or potential users. ... Who are these “users” (and how many of them are there)? How do we characterize them, and how do we characterize system behavior?
Then, with a tip of the hat to Don Waters, he notes that this problem is familiar in other fields:
they are deeply rooted in historical methods of anthropology, sociology, political science, ethnography and related humanistic and social science disciplines that seek to document behaviors that are essentially not captured in artifacts, and indeed to create such documentary artifacts
Unable to archive the system they are observing, these fields try to record and annotate the experience of those encountering the system; to record the performance from the audience's point of view. Cliff notes, and discusses the many problems with, the two possible kinds of audience for "algorithms":
  • Programs, which he calls robotic witnesses, and others call sock puppets. Chief among the problems here is that "algorithms" need robust defenses against programs posing as humans (see, for example, spam, or fake news).
  • Humans, which he calls New Nielson Families. Chief among the problems here is the detailed knowledge "algorithms" use to personalize their behaviors, leading to a requirement for vast numbers of humans to observe even somewhat representative behavior.
Cliff concludes:
From a stewardship point of view (seeking to preserve a reasonably accurate sense of the present for the future, as I would define it), there’s a largely unaddressed crisis developing as the dominant archival paradigms that have, up to now, dominated stewardship in the digital world become increasingly inadequate. ... the existing models and conceptual frameworks of preserving some kind of “canonical” digital artifacts ... are increasingly inapplicable in a world of pervasive, unique, personalized, non-repeatable performances. As stewards and stewardship organizations, we cannot continue to simply complain about the intractability of the problems or speak idealistically of fundamentally impossible “solutions.”
If we are to successfully cope with the new “Age of Algorithms,” our thinking about a good deal of the digital world must shift from artifacts requiring mediation and curation, to experiences. Specifically, it must focus on making pragmatic sense of an incredibly vast number of unique, personalized performances (including interaction with the participant) that can potentially be recorded or otherwise documented, or at least do the best we can with this.
I agree that society is facing a crisis in its ability to remember the past. Cliff has provided a must-read overview of the context in which the crisis has developed, and some pointers to pragmatic if unsatisfactory ways to address it. What I would like to see is a even broader view, describing this crisis as one among many caused by the way increasing returns to scale are squeezing out the redundancy essential to a resilient civilization.

Tuesday, December 5, 2017

International Digital Preservation Day

The Digital Preservation Coalition's International Digital Preservation Day was marked by a wide-ranging collection of blog posts. Below the fold, some links to and comments on, a few of them.