Thursday, April 15, 2021

NFTs and Web Archiving

One of the earliest observations of the behavior of the Web at scale was "link rot". There were a lot of 404s, broken links. Research showed that the half-life of Web pages was alarmingly short. Even in 1996 this problem was obvious enough for Brewster Kahle to found the Internet Archive to address it. From the Wikipedia entry for Link Rot:
A 2003 study found that on the Web, about one link out of every 200 broke each week,[1] suggesting a half-life of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in Yahoo! Directory (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.[2]
One might have thought that academic journals were a relatively stable part of the Web, but research showed that their references decayed too, just somewhat less rapidly. A 2013 study found a half-life of 9.3 years. See my 2015 post The Evanescent Web.

I expect you have noticed the latest outbreak of blockchain-enabled insanity, Non-Fungible Tokens (NFTs). Someone "paying $69M for a JPEG" or $560K for a New York Times column attracted a lot of attention. Follow me below the fold for the connection between NFTs, "link rot" and Web archiving.

Tuesday, April 13, 2021

Cryptocurrency's Carbon Footprint

China’s bitcoin mines could derail carbon neutrality goals, study says and Bitcoin mining emissions in China will hit 130 million tonnes by 2024, the headlines say it all. Excusing this climate-destroying externality of Proof-of-Work blockchains requires a continuous flow of new misleading arguments. Below the fold I discuss one of the more recent novelties.

Tuesday, April 6, 2021

Elon Musk: Threat or Menace?

Although both Tesla and SpaceX are major engineering achievements, Elon Musk seems completely unable to understand the concept of externalities, unaccounted-for costs that society bears as a result of these achievements.

First, in Tesla: carbon offsetting, but in reverse, Jaime Powell reacted to Tesla taking $1.6B in carbon offsets which provided the only profit Tesla ever made and putting them into Bitcoin:
Looked at differently, a single Bitcoin purchase at a price of ~$50,000 has a carbon footprint of 270 tons, the equivalent of 60 ICE cars.

Tesla’s average selling price in the fourth quarter of 2020? $49,333.

We’re not sure about you, but FT Alphaville is struggling to square the circle of “buy a Tesla with a bitcoin and create the carbon output of 60 internal combustion engine cars” with its legendary environmental ambitions.

Unless, of course, that was never the point in the first place.
Below the fold, more externalities Musk is ignoring.

Thursday, March 25, 2021

Internet Archive Storage

The Internet Archive is a remarkable institution, which has become increasingly important during the pandemic. It has been for many years in the world's top 300 Web sites and is currently ranked #209, sustaining almost 60Gb/s outbound bandwidth from its collection of almost half a trillion archived Web pages and much other content. It does this on a budget of under $20M/yr, yet maintains 99.98% availability.

Jonah Edwards, who runs the Core Infrastructure team, gave a presentation on the Internet Archive's storage infrastructure to the Archive's staff. Below the fold, some details and commentary.

Tuesday, March 16, 2021

Correlated Failures

The invaluable statistics published by Backblaze show that, despite being built from technologies close to the physical limits (Heat-Assisted Magnetic Recording, 3D NAND Flash), modern digital storage media are extraordinarily reliable. However, I have long believed that the models that attempt to project the reliability of digital storage systems from the statistics of media reliability are wildly optimistic. They ignore foreseeable causes of data loss such as Coronal Mass Ejections and ransomware attacks, which cause correlated failures among the media in the system. No matter how many they are, if all replicas are destroyed or corrupted the data is irrecoverable.

Modelling these "black swan" events is clearly extremely difficult, but much less dramatic causes are in practice important too. It has been known at least since Talagala's 1999 Ph.D. thesis that media failures in storage systems are significantly correlated, and at least since Jiang et al's 2008 Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics that only about half the failures in storage systems are traceable to media failures. The rest happen in the pipeline from the media to the CPU. Because this typically aggregates data from many media components, it naturally causes correlations.

As I wrote in 2015's Disk reliability, discussing Backblaze's experience of a 40% Annual Failure Rate (AFR) in over 1,100 Seagate 3TB drives:
Alas, there is a long history of high failure rates among particular batches of drives. An experience similar to Backblaze's at Facebook is related here, with an AFR over 60%. My first experience of this was nearly 30 years ago in the early days of Sun Microsystems. Manufacturing defects, software bugs, mishandling by distributors, vibration resonance, there are many causes for these correlated failures.
Despite plenty of anecdotes, there is little useful data on which to base models of correlated failures in storage systems. Below the fold I summarize and comment on an important paper by a team from the Chinese University of Hong Kong and Alibaba that helps remedy this.

Thursday, March 4, 2021

History Of Window Systems

Alan Kay's Should web browsers have stuck to being document viewers? makes important points about the architecture of the infrastructure for user interfaces, but also sparked comments and an email exchange that clarified the early history of window systems. This is something I've wrtten about previously, so below the fold I go into considerable detail.

Thursday, February 25, 2021

Principles For The Decentralized Web

A week ago yesterday the Internet Archive launched both a portal for the Decentralized Web (DWeb) at https://getdweb.net/, designed by a team led by Iryna Nezhynska of Jolocom, and a set of principles for the Decentralized Web, developed with much community input by a team led by Mai Ishikawa Sutton and John Ryan.

Nezhynska led a tour of the new website and the thinking behind its design, including its accessibility features. It looks very polished; how well it functions as a hub for the DWeb community only time will tell.

Brewster Kahle introduced the meeting by stressing that, as I have written many times, if the DWeb is successful it will be attacked by those who have profited massively from the centralized Web. The community needs to prepare for technical, financial and PR attacks.

Below the fold I look at how the principles might defend against some of these attacks.

Thursday, February 18, 2021

Blast Radius

Last December Simon Sharwood reported on an "Infrastructure Keynote" by Amazon's Peter DeSantis in AWS is fed up with tech that wasn’t built for clouds because it has a big 'blast radius' when things go awry:
Among the nuggets he revealed was that AWS has designed its own uninterruptible power supplies (UPS) and that there’s now one in each of its racks. AWS decided on that approach because the UPS systems it needed were so big they required a dedicated room to handle the sheer quantity of lead-acid batteries required to keep its kit alive. The need to maintain that facility created more risk and made for a larger “blast radius” - the extent of an incident's impact - in the event of failure or disaster.

AWS is all about small blast radii, DeSantis explained, and in the past the company therefore wrote its own UPS firmware for third-party products.

“Software you don’t own in your infrastructure is a risk,” DeSantis said, outlining a scenario in which notifying a vendor of a firmware problem in a device commences a process of attempting to replicate the issue, followed by developing a fix and then deployment.

“It can take a year to fix an issue,” he said. And that’s many months too slow for AWS given a bug can mean downtime for customers.
This is a remarkable argument for infrastructure based on open source software, but that isn't what this post is about. Below the fold is a meditation on the concept of "blast radius", the architectural dilemma it poses, and its relevance to recent outages and compromises.

Thursday, February 11, 2021

More On Archiving Twitter

Himarsha Jayanetti from Michael Nelson's group at Old Dominion follows up on the work I discussed in Michael Nelson's Group On Archiving Twitter with Twitter rewrites your URLs, but assumes you’ll never rewrite theirs: more problems replaying archived Twitter:
Source
URLs shared on Twitter are automatically shortened to t.co links. Twitter does this to track its engagements and also protect its users from sites with malicious content. Twitter replaces these t.co URLs with HTML that suggests the original URL so that the end-user does not see the t.co URLs while browsing. When these t.co URLs are replayed through web archives, they are rewritten to an archived URL (URI-M) and should be rendered in the web archives as in the live web, without displaying these t.co URI-Ms to the end-user.
But, as the screen-grab from the Wayback Machine shows, they may not be. Below the fold, a look at Jayanetti's explanation.

Friday, February 5, 2021

Talk At Berkeley's Information Access Seminar

Once again Cliff Lynch invited me to give a talk to the Information Access Seminar at UC Berkeley's iSchool. Preparation time was limited because these days I'm a full-time grandparent so the talk, entitled Securing The Digital Supply Chain summarizes and updates two long posts from two years ago:
The abstract was:
The Internet is suffering an epidemic of supply chain attacks, in which a trusted supplier of content is compromised and delivers malware to some or all of their clients. The recent SolarWinds compromise is just one glaring example. This talk reviews efforts to defend digital supply chains.
Below the fold, the text of the talk with links to the sources.

Thursday, February 4, 2021

Chromebook Linux Update

My three Acer C720 Chromebooks running Linux are still giving yeoman service, although for obvious reasons I'm not travelling these days. But it is time for an update to 2017's Travels with a Chromebook. Below the fold, an account of some adventures in sysadmin.

Thursday, January 28, 2021

Effort Balancing And Rate Limits

Catalin Cimpanu reports on yet another crime wave using Bitcoin in As Bitcoin price surges, DDoS extortion gangs return in force:
In a security alert sent to its customers and shared with ZDNet this week, Radware said that during the last week of 2020 and the first week of 2021, its customers received a new wave of DDoS extortion emails.

Extortionists threatened companies with crippling DDoS attacks unless they got paid between 5 and 10 bitcoins ($150,000 to $300,000)
...
The security firm believes that the rise in the Bitcoin-to-USD price has led to some groups returning to or re-prioritizing DDoS extortion schemes.
And Dan Goodin reports on the latest technique the DDOS-ers are using in DDoSers are abusing Microsoft RDP to make attacks more powerful:
As is typical with many authenticated systems, RDP responds to login requests with a much longer sequence of bits that establish a connection between the two parties. So-called booter/stresser services, which for a fee will bombard Internet addresses with enough data to take them offline, have recently embraced RDP as a means to amplify their attacks, security firm Netscout said.

The amplification allows attackers with only modest resources to strengthen the size of the data they direct at targets. The technique works by bouncing a relatively small amount of data at the amplifying service, which in turn reflects a much larger amount of data at the final target. With an amplification factor of 85.9 to 1, 10 gigabytes-per-second of requests directed at an RDP server will deliver roughly 860Gbps to the target.
I don't know why it took me so long to figure it out, but reading Goodin's post I suddenly realized that techniques we described in Impeding attrition attacks in p2p systems, a 2004 follow-up to our award-winning 2003 SOSP paper on the architecture of the LOCKSS system, can be applied to preventing systems from being abused by DDOS-ers. Below the fold, brief details.

Tuesday, January 26, 2021

ISP Monopolies

For at least the last three years (It Isn't About The Technology) I've been blogging about the malign effects of the way the FAANGs dominate the Web and the need for anti-trust action to mitigate them. Finally, with the recent lawsuits against Facebook and Google, some action may be in prospect. I'm planning a post on this topic. But when it comes to malign effects of monopoly I've been ignoring the other monopolists of the Internet, the telcos.

An insightful recent post by John Gilmore to Dave Farber's IP list sparked a response from Thomas Leavitt and some interesting follow-up e-mail. Gilmore was involved in pioneering consumer ISPs, and Leavitt in pioneering Web hosting. Both attribute the current sorry state of Internet connectivity in the US to the lack of effective competition. They and I differ somewhat on how the problem could be fixed. Below the fold I go into the details.

Thursday, January 14, 2021

The Bitcoin "Price"

Jemima Kelly writes No, bitcoin is not “the ninth-most-valuable asset in the world” and its a must-read. Below the fold, some commentary.

Thursday, January 7, 2021

Two Million Page Views!

Woohoo! This blog just passed two million all-time page views since April 21st 2007.

Tuesday, January 5, 2021

The New Oldweb.today

Two days before Christmas Ilya Kreymer posted Announcing the New OldWeb.today. The old oldweb.today was released five years ago, and Ilya described the details in a guest post here. It was an important step forward in replaying preserved Web content because users could view the old Web content as it would have been rendered at the time it was published, not as rendered in a modern browser. I showed an example of the difference this made in The Internet is for Cats.

Below the fold, I look at why the new oldweb.today is an improvement on the old version, which is still available at classic.oldweb.today