Thursday, March 15, 2018

Ethics and Archiving the Web

I wanted to draw attention to what looks like a very interesting conference, Rhizome's National Forum on Ethics and Archiving the Web, March 22-24 at the New Museum in New York:
The dramatic rise in the public’s use of the web and social media to document events presents tremendous opportunities to transform the practice of social memory.

Web archives can serve as witness to crimes, corruption, and abuse; they are powerful advocacy tools; they support community memory around moments of political change, cultural expression, or tragedy. At the same time, they can cause harm and facilitate surveillance and oppression.

As new kinds of archives emerge, there is a pressing need for dialogue about the ethical risks and opportunities that they present to both those documenting and those documented. This conversation becomes particularly important as new tools, such as Rhizome’s Webrecorder software, are developed to meet the changing needs of the web archiving field.

Tuesday, March 13, 2018

The "Grand Challenges" of Curation and Preservation

I'm preparing for a meeting next week at the MIT Library on the "Grand Challenges" of digital curation and preservation. MIT, and in particular their library and press, have a commendable tradition of openness, so I've decided to post my input rather than submit it privately. My version of the challenges is below the fold.

Tuesday, March 6, 2018

Techno-hype part 2.5

Last November I wrote Techno-hype part 2 on cryptocurrencies and blockchains, reviewing David Gerard's excellent book Attack of the 50 Foot Blockchain: Bitcoin, Blockchain, Ethereum & Smart Contracts. A lot has happened since, so its time for an update. Below the fold, I look at three examples of how far these technologies are from being "ready for prime time":
  • The Lightning Network, which is supposed to allow Bitcoin to scale to billions of transactions.
  • IOTA, which is supposed to be a blockchain capable of supporting the Internet of Things.
  • Ethereum, which is supposed to be the infrastructure for "smart contracts".

Thursday, March 1, 2018

Archival Media: Not a Good Business

Thinking more about DNA's Niche in the Storage Market led me to focus on some problems with the market for archival media in general, not just DNA. The details are below the fold.

Tuesday, February 27, 2018

"Nobody cared about security"

There's a common meme that ascribes the parlous state of security on the Internet to the fact that in the ARPAnet days "nobody cared about security". It is true that in the early days of the ARPAnet security wasn't an important issue; everybody involved knew everybody else face-to-face. But it isn't true that the decisions taken in those early days hampered the deployment of security as the Internet took the shape we know today in the late 80s and early 90s. In fact the design decisions taken in the ARPAnet days made the deployment of security easier. The main reason for today's security nightmares is quite different.

I know because I was there, and to a small extent involved. Follow me below the fold for the explanation.

Thursday, February 22, 2018

Brief Talk at Video Game Preservation Workshop

I was asked to give a brief talk to the Video Game Preservation Workshop: Setting the Stage for Multi-Partner Projects at the Stanford Library, discussing the technical and legal aspects of cooperation on preserving software via emulation. Below the fold is an edited text of the talk with links to the sources.

Tuesday, February 20, 2018

Notes from FAST18

I attended the technical sessions of Usenix's File And Storage Technology conference this week. Below the fold, notes on the papers that caught my attention.

Thursday, February 15, 2018

Do You Need A Blockchain?

David Gerard's Do you need a Blockchain? Probably less than Wüst and Gervais think you do reviews an interesting paper, Do you need a Blockchain? by Karl Wüst and Arthur Gervais of ETH Zurich. Their abstract says:
In this article we critically analyze whether a blockchain is indeed the appropriate technical solution for a particular application scenario. We differentiate between permissionless (e.g., Bitcoin/Ethereum) and permissioned (e.g. Hyperledger/Corda) blockchains and contrast their properties to those of a centrally managed database.
Gerard is, for him, pretty enthusiastic about the paper:
This paper is worth your time. They explain the jargon at length, and discuss many commonly-advocated blockchain use cases — it’s a useful survey of the area — even as the authors are huge Bitcoin and blockchain advocates, and somewhat more optimistic for applying blockchains than is really warranted.
Below the fold, I look at both the paper and Gerard's review.

Wednesday, February 14, 2018

Tuesday, February 13, 2018

Correlated Cryptojacking

On February 11 at least 4,275 Web sites were found to have been simultaneously cryptojacked:
they include The City University of New York (, Uncle Sam's court information portal (, Lund University (, the UK's Student Loans Company (, privacy watchdog The Information Commissioner's Office ( and the Financial Ombudsman Service (, plus a shedload of other and sites, UK NHS services, and other organizations across the globe.,,,,,, the list goes on.
They were all running Coinhive's Monero miner in visitors' browsers. How and why did this happen and what should these sites have been doing to prevent it? Follow me below the fold.

Monday, February 12, 2018

Lessons From

Daniel Gomes' video
I'd like to draw your attention to Daniel Gomes excellent video entitled Improving the robustness of the web archive. is the Portuguese Web Archive. It got started in 2007, and in 2010 was an early archive to support full-text search. In 2013 it suffered a hardware malfunction that took the service down and lost 17% of its content. This led to a complete re-think of the system architecture, implementation, and operations. Daniel describes this process and the encouraging results in detail. It is well worth the 20 minutes to watch it.

Daniel divides the re-think into 5 major sections:
  1. Hardware and software architecture shifted to shared-nothing
  2. Reinforced replication policies
  3. Monitor the service
  4. Quality assurance for software development
  5. Document and test procedures
I'd agree with all these points. Many of the details correspond to things the LOCKSS Program focused on during preparation for the TRAC audit of the CLOCKSS Archive in 2014. This is especially the case for the last of Daniel's sections; the audit forced us to document our processes, which forced us to think about whether they were actually achieving their goals, which led to the discovery that in a number of cases they weren't.

Thursday, February 8, 2018

Meta: Blog Switched To HTTPS (Updated)

Because From July, Chrome will name and shame insecure HTTP websites I followed the instructions Hamad Ansari provides in Blogger Released Free SSL (HTTPS) For Custom Domains and enabled both "connections over HTTPS" and "HTTPS redirect", so that:
gets redirected to:
Everything I've tried so far works. Please comment on this post if you find things that don't work.

Update: Scott Helme points out that I'm just part of an encouraging trend. The graph shows the top million sites from Alexa in groups of 4,000. For each group, it shows the number of sites that are HTTPS (only, I believe). It shows that the pace of sites going HTTPS-only is increasing. The effect of Chrome's naming and shaming will presumably increase the rate of adoption further in July.

Tuesday, February 6, 2018

DNA's Niche in the Storage Market

I've been writing about storing data in DNA for the last five years, both enthusiastically about DNA's long-term prospects as a technology for storage, and pessimistically about its medium-term prospects. This time, I'd like to look at DNA storage systems as a product, and ask where their attributes might provide a fit in the storage marketplace.

As far as I know no-one has ever built a storage system using DNA as a medium, let alone sold one. Indeed, the only work I know on what such a system would actually look like is by the team from Microsoft Research and the University of Washington. Everything below the fold is somewhat informed speculation. If I've got something wrong, I hope the experts will correct me.

Thursday, January 25, 2018

Magical Thinking At The New York Times

Steven Johnson's Beyond The Bitcoin Bubble in the New York Times Magazine is a 9000-word explanation of how the blockchain can decentralize the Internet that appeared 5 days after my It Isn't About The Technology. Which is a good thing, because otherwise my post would have had to be much longer to address his tome. Follow me below the fold for the part I would have had to add to it.

Tuesday, January 23, 2018

Herbert Van de Sompel's Paul Evan Peters Award Lecture

In It Isn't About The Technology, I wrote about my friend Herbert Van de Sompel's richly-deserved Paul Evan Peters award lecture entitled Scholarly Communication: Deconstruct and Decentralize?, but only in the context of the push to "decentralize the Web". I believe Herbert's goal for this lecture was to spark discussion. In that spirit, below the fold, I have some questions about Herbert's vision of a future decentralized system for scholarly communications built on existing Web protocols. They aren't about the technology but about how it would actually operate.

Thursday, January 18, 2018

Tuesday, January 16, 2018

Not Really Decentralized After All

Here are two more examples of the phenomenon that I've been writing about ever since Economies of Scale in Peer-to-Peer Networks more than three years ago, centralized systems built on decentralized infrastructure in ways that nullify the advantages of decentralization:

Monday, January 15, 2018

The Internet Society Takes On Digital Preservation

Another worthwhile initiative comes from The Internet Society, through its New York chapter. They are starting an effort to draw attention to the issues around digital presentation. Shuli Hallack has an introductory blog post entitled Preserving Our Future, One Bit at a Time. They kicked off with a meeting at Google's DC office labeled as being about "The Policy Perspective". It was keynoted by Vint Cerf with respondents Kate Zwaard and Michelle Wu. I watched the livestream. Overall, I thought that the speakers did a good job despite wandering a long way from policies, mostly in response to audience questions.

Vint will also keynote the next event, at Google's NYC office February 5th, 2017, 5:30PM – 7:30PM. It is labeled as being about "Business Models and Financial Motives" and, if that's what it ends up being about it should be very interesting and potentially useful. I hope to catch the livestream.

Thursday, January 11, 2018

It Isn't About The Technology

A year and a half ago I attended Brewster Kahle's Decentralized Web Summit and wrote:
I am working on a post about my reactions to the first two days (I couldn't attend the third) but it requires a good deal of thought, so it'll take a while.
As I recall, I came away from the Summit frustrated. I posted the TL;DR version of the reason half a year ago in Why Is The Web "Centralized"? :
What is the centralization that decentralized Web advocates are reacting against? Clearly, it is the domination of the Web by the FANG (Facebook, Amazon, Netflix, Google) and a few other large companies such as the cable oligopoly.

These companies came to dominate the Web for economic not technological reasons.
Yet the decentralized Web advocates persist in believing that the answer is new technologies, which suffer from the same economic problems as the existing decentralized technologies underlying the "centralized" Web we have. A decentralized technology infrastructure is necessary for a decentralized Web but it isn't sufficient. Absent an understanding of how the rest of the solution is going to work, designing the infrastructure is an academic exercise.

It is finally time for the long-delayed long-form post. I should first reiterate that I'm greatly in favor of the idea of a decentralized Web based on decentralized storage. It would be a much better world if it happened. I'm happy to dream along with my friend Herbert Van de Sompel's richly-deserved Paul Evan Peters award lecture entitled Scholarly Communication: Deconstruct and Decentralize?. He describes a potential future decentralized system of scholarly communication built on existing Web protocols. But even he prefaces the dream with a caveat that the future he describes "will most likely never exist".

I agree with Herbert about the desirability of his vision, but I also agree that it is unlikely. Below the fold I summarize Herbert's vision, then go through a long explanation of why I think he's right about the low likelihood of its coming into existence.

Monday, January 8, 2018

The $2B Joke

Everything you need to know about cryptocurrency is in Timothy B. Lee's
Remember Dogecoin? The joke currency soared to $2 billion this weekend:
"Nobody was supposed to take Dogecoin seriously. Back in 2013, a couple of guys created a new cryptocurrency inspired by the "doge" meme, which features a Shiba Inu dog making excited but ungrammatical declarations. ... At the start of 2017, the value of all Dogecoins in circulation was around $20 million. ... Then on Saturday the value hit $2 billion. ... "It says a lot about the state of the cryptocurrency space in general that a currency with a dog on it which hasn't released a software update in over 2 years has a $1B+ market cap," [cofounder] Palmer told Coindesk last week.
So blockchain, such bubble. Up 100x in a year. Are you HODL-ing or getting your money out?

Digital Preservation Declaration of Shared Values

I'd like to draw your attention to the effort underway by a number of organizations active in digital preservation to agree on a Digital Preservation Declaration of Shared Values:
The digital preservation landscape is one of a multitude of choices that vary widely in terms of purpose, scale, cost, and complexity. Over the past year a group of collaborating organizations united in the commitment to digital preservation have come together to explore how we can better communicate with each other and assist members of the wider community as they negotiate this complicated landscape.

As an initial effort, the group drafted a Digital Preservation Declaration of Shared Values that is now being released for community comment. The document is available here:

The comment period will be open until March 1st, 2018. In addition, we welcome suggestions from the community for next steps that would be beneficial as we work together.
The list of shared values (Collaboration, Affordability, Availability, Inclusiveness, Diversity, Portability/Interoperability, Transparency/information sharing, Accountability, Stewardship Continuity, Advocacy, Empowerment) includes several to which adherence in the past hasn't been great.

There are already good comments on the draft. Having more input, and input from a broader range of institutions, would help this potentially important initiative.

Friday, January 5, 2018

Meltdown & Spectre

This hasn't been a good few months for Intel. I wrote in November about the vulnerabilities in their Management Engine. Now they, and other CPU manufacturers are facing Meltdown and Spectre, three major vulnerabilities caused by side-effects of speculative execution. The release of these vulnerabilities was rushed and the initial reaction less than adequate.

The three vulnerabilties are very serious but mitigations are in place and appear to be less costly than reports focused on the worst-case would lead you to believe. Below the fold, I look at the reaction, explain what speculative execution means, and point to the best explanation I've found of where the vulnerabilities come from and what the mitigations do.

Tuesday, January 2, 2018

The Box Conspiracy

Growing up in London left me with a life-long interest in the theatre (note the spelling).  Although I greatly appreciate polished productions of classics, such as the Royal National Theatre's 2014 King Lear, my particular interests are:
I've been writing recently about Web advertising, reading Tim Wu's book The Attention Merchants: The Epic Scramble to Get Inside Our Heads, and especially watching Dude, You Broke The Future, Charlie Stross' keynote for the 34th Chaos Communications Congress. As I do so, I can't help remembering a show I saw nearly a quarter of a century ago that fit the last of those categories. Below the fold I pay tribute to the prophetic vision of an under-appreciated show and its author.