Thursday, December 7, 2017

Cliff Lynch's Stewardship in the "Age of Algorithms"

Cliff Lynch has just published a long and very important article at First Monday entitled Stewardship in the "Age of Algorithms". It is a much broader look than my series The Amnesiac Civilization at the issues around providing the future with a memory of today's society.

Cliff accurately describes the practical impossibility of archiving the systems such as Facebook that today form the major part of most people's information environment and asks:
If we abandon the ideas of archiving in the traditional preservation of an artifact sense, it’s helpful to recall the stewardship goal here to guide us: to capture the multiplicity of ways in which a given system behaves over the range of actual or potential users. ... Who are these “users” (and how many of them are there)? How do we characterize them, and how do we characterize system behavior?
Then, with a tip of the hat to Don Waters, he notes that this problem is familiar in other fields:
they are deeply rooted in historical methods of anthropology, sociology, political science, ethnography and related humanistic and social science disciplines that seek to document behaviors that are essentially not captured in artifacts, and indeed to create such documentary artifacts
Unable to archive the system they are observing, these fields try to record and annotate the experience of those encountering the system; to record the performance from the audience's point of view. Cliff notes, and discusses the many problems with, the two possible kinds of audience for "algorithms":
  • Programs, which he calls robotic witnesses, and others call sock puppets. Chief among the problems here is that "algorithms" need robust defenses against programs posing as humans (see, for example, spam, or fake news).
  • Humans, which he calls New Nielson Families. Chief among the problems here is the detailed knowledge "algorithms" use to personalize their behaviors, leading to a requirement for vast numbers of humans to observe even somewhat representative behavior.
Cliff concludes:
From a stewardship point of view (seeking to preserve a reasonably accurate sense of the present for the future, as I would define it), there’s a largely unaddressed crisis developing as the dominant archival paradigms that have, up to now, dominated stewardship in the digital world become increasingly inadequate. ... the existing models and conceptual frameworks of preserving some kind of “canonical” digital artifacts ... are increasingly inapplicable in a world of pervasive, unique, personalized, non-repeatable performances. As stewards and stewardship organizations, we cannot continue to simply complain about the intractability of the problems or speak idealistically of fundamentally impossible “solutions.”
If we are to successfully cope with the new “Age of Algorithms,” our thinking about a good deal of the digital world must shift from artifacts requiring mediation and curation, to experiences. Specifically, it must focus on making pragmatic sense of an incredibly vast number of unique, personalized performances (including interaction with the participant) that can potentially be recorded or otherwise documented, or at least do the best we can with this.
I agree that society is facing a crisis in its ability to remember the past. Cliff has provided a must-read overview of the context in which the crisis has developed, and some pointers to pragmatic if unsatisfactory ways to address it. What I would like to see is a even broader view, describing this crisis as one among many caused by the way increasing returns to scale are squeezing out the redundancy essential to a resilient civilization.

Tuesday, December 5, 2017

International Digital Preservation Day

The Digital Preservation Coalition's International Digital Preservation Day was marked by a wide-ranging collection of blog posts. Below the fold, some links to and comments on, a few of them.

Tuesday, November 28, 2017

Intel's "Management Engine"

Back in May Erica Portnoy and Peter Eckersley, writing for the EFF's Deep Links blog, summed up the situation in a paragraph:
Since 2008, most of Intel’s chipsets have contained a tiny homunculus computer called the “Management Engine” (ME). The ME is a largely undocumented master controller for your CPU: it works with system firmware during boot and has direct access to system memory, the screen, keyboard, and network. All of the code inside the ME is secret, signed, and tightly controlled by Intel. ... there is presently no way to disable or limit the Management Engine in general. Intel urgently needs to provide one.
Recent events have pulled back the curtain somewhat and revealed that things are worse than we knew in May. Below the fold, some details.

Tuesday, November 21, 2017

Has Web Advertising Jumped The Shark?

The Web runs on advertising. Has Web advertising jumped the shark? The relevant Wikipedia article says:
The usage of "jump the shark" has subsequently broadened beyond television, indicating the moment when a brand, design, franchise, or creative effort's evolution declines, or when it changes notably in style into something unwelcome.
There are four big problems with Web advertising as it currently exists:
  1. Bad guys love it.
  2. Readers hate it.
  3. Webmasters hate it.
  4. Advertisers find it wastes money.
#4 just might have something to do with #3, #2 and #1. It seems that there's a case to be made. Below the fold I try to make it.

Thursday, November 16, 2017

Techno-hype part 2

Don't, don't, don't, don't believe the hype!
Public Enemy

Enough about the hype around self-driving cars, now on to the hype around cryptocurrencies.

Sysadmins like David Gerard tend to have a realistic view of new technologies; after all, they get called at midnight when the technology goes belly-up. Sensible companies pay a lot of attention to their sysadmins' input when it comes to deploying new technologies.

Gerard's Attack of the 50 Foot Blockchain: Bitcoin, Blockchain, Ethereum & Smart Contracts is a must-read, massively sourced corrective to the hype surrounding cryptocurrencies and blockchain technology. Below the fold, some tidbits and commentary. Quotes not preceded by links are from the book, and I have replaced some links to endnotes with direct links.

Tuesday, November 14, 2017

Techno-hype part 1

Don't, don't, don't, don't believe the hype!
Public Enemy

New technologies are routinely over-hyped because people under-estimate the gap between a technology that works and a technology that is in everyday use by normal people.

You have probably figured out that I'm skeptical of the hype surrounding blockchain technology. Despite incident-free years spent routinely driving in company with Waymo's self-driving cars, I'm also skeptical of the self-driving car hype. Below the fold, an explanation.

Monday, November 6, 2017

Keynote at Pacific Neighborhood Consortium

I was invited to deliver a keynote at the 2017 Pacific Neighborhood Consortium in Tainan, Taiwan. My talk, entitled The Amnesiac Civilization, was based on the series of posts earlier this year with the same title. The theme was "Data Informed Society", and my abstract was:
What is the data that informs a society? It is easy to think that it is just numbers, timely statistical information of the kind that drives Google Maps real-time traffic display. But the rise of text-mining and machine learning means that we must cast our net much wider. Historic and textual data is equally important. It forms the knowledge base on which civilization operates.

For nearly a thousand years this knowledge base has been stored on paper, an affordable, durable, write-once and somewhat tamper-evident medium. For more than five hundred years it has been practical to print on paper, making Lots Of Copies to Keep Stuff Safe. LOCKSS is the name of the program at the Stanford Libraries that Vicky Reich and I started in 1998. We took a distributed approach; providing libraries with tools they could use to preserve knowledge in the Web world. They could work the way they were used to doing in the paper world, by collecting copies of published works, making them available to readers, and cooperating via inter-library loan. Two years earlier, Brewster Kahle had founded the Internet Archive, taking a centralized approach to the same problem.

Why are these programs needed? What have we learned in the last two decades about their effectiveness? How does the evolution of Web technologies place their future at risk?
Below the fold, the text of my talk.