Tuesday, August 30, 2016

Fighting the Web Flab

Source: Frederic Filloux
At Monday Note, Frederic Filloux's Bloated HTML, the best and the worse starts where I've started several times, with the incredibly low density of actual content in today's Web:
When reading this 800 words Guardian story — about half of page of text long — your web browser loads the equivalent of 55 pages of HTML code, almost half a million characters. To be precise: an article of 757 words (4667 characters and spaces), requires 485,527 characters of code ... “useful” text (the human-readable article) weighs less than one percent (0.96%) of the underlying browser code. The rest consists of links (more than 600) and scripts of all types (120 references), related to trackers, advertising objects, analytics, etc.
But he ends on a somewhat less despairing note. Follow me below the fold for a faint ray of hope.

Thursday, August 25, 2016

Evanescent Web Archives

Below the fold, discussion of two articles from last week about archived Web content that vanished.

Tuesday, August 23, 2016

Content negotiation and Memento

Back in March Ilya Kreymer summarized discussions he and I had had about a problem he'd encountered building oldweb.today thus:
a key problem with Memento is that, in its current form, an archive can return an arbitrarily transformed object and there is no way to determine what that transformation is. In practice, this makes interoperability quite difficult.
What Ilya was referring to was that, for a given Web page, some archives have preserved the HTML, the images, the CSS and so on, whereas some have preserved a PNG image of the page (transforming it by taking a screenshot). Herbert van de Sompel, Michael Nelson and others have come up with a creative solution. Details below the fold.

Thursday, August 18, 2016

The 120K BTC Heist

Based on my experience of P2P systems in the LOCKSS Program, I've been writing skeptically about Bitcoin and the application of blockchain technology to other applications for nearly three years. In that time there have been a number of major incidents warning that skepticism is essential, including:
Despite these warnings, enthusiasm for the future of blockchain technology is still rampant. Below the fold, the latest hype and some recent responses from less credulous sources.

Tuesday, August 16, 2016

OK, I'm really amazed

Ever since I read Maciej Cegłowski's What Happens Next Will Amaze You (its a must-read) I've been noticing how unpleasant the experience of browsing the Web has become. Ever since I read Georgis Kontaxis and Monica Chew's Tracking Protection in Firefox for Privacy and Performance I've been noticing how slow browsing the Web has become.

Because I work at Stanford I have a discounted subscription to the New York Times, so I'm that rarity on the Web, a paying customer. You would think they would try to make my Web browsing experience pleasant and hassle-free. So here I am, using my hotel's WiFi and Chrome on my totally up-to-date Google Nexus 9 tablet with no ad-blocker. I'm scrolling down the front page of the New York Times and I notice a story that looks interesting. My finger touches the link. And what happens next amazes me. In fact, it tips me over the edge into full-on rant mode, which starts below the fold. You have been warned, and I apologize for two rants in a row.

Tuesday, August 9, 2016

Correlated Distraction

It is 11:44AM Pacific and I'm driving, making a left on to Central Expressway in Mountain View, CA and trying to avoid another vehicle whose driver isn't paying attention when an ear-splitting siren goes off in my car. After a moment of panic I see "Connected" on the infotainment system display. Its the emergency alert system. When it is finally safe to stop and check, I see this message:
Emergency Alert: Dust Storm Warning in this area until 12:00PM MST. Avoid travel. Check local media - NWS.
WTF? Where to even begin with this stupidity? Well, here goes:
  • "this area" - what area? In the Bay Area we have earthquakes, wildfires, flash floods, but we don't yet have dust storms. Why does the idiot who composed the message think they know where everyone who will read it is?
  • Its 11:44AM Pacific, or 18:44UTC. That's 12:44PM Mountain. Except we're both on daylight savings time. So did the message mean 12:00PM MDT, in which case the message was already 44 minutes too late? Or did the message mean 12:00MST, or 19:00UTC, in which case it had 16 minutes to run? Why send a warning 44 minutes late or use the wrong time zone?
  • A dust storm can be dangerous, so giving people 16 minutes (but not -44 minutes) warning could save some lives. Equally, distracting everyone in "this area" who is driving, operating machinery, performing surgery, etc. could cost some lives. Did anyone balance the upsides and downsides of issuing this warning, even assuming it only reached people in "this area"?
  • I've written before about the importance and difficulty of modelling correlated failures. Now that essentially every driver is carrying (but hopefully not talking on) a cellphone, the emergency alert system is a way to cause correlated distraction of every driver across the entire nation. Correlated distraction caused by rubbernecking at accidents is a well-known cause of additional accidents. But at least that is localized in space. Who thought that building a system to cause correlated distraction of every driver in the nation was a good idea?
  • Who has authority to trigger the distraction? Who did trigger the distraction? Can we get that person fired?
  • This is actually the third time the siren has gone off while I'm driving. The previous two were Amber alerts. Don't get me wrong. I think getting drivers to look out for cars that have abducted children is a good idea, and I'm glad to see the overhead signs on freeways used for that purpose. But it isn't a good enough idea to justify the ear-splitting siren and consequent distraction. So I had already followed instructions to disable Amber alerts. I've now also disabled Emergency alerts.
So, once again, because no-one thought What Could Possibly Go Wrong?, a potentially useful system has crashed and burned.

Thursday, August 4, 2016

A Cost-Effective DIY LOCKSS Box

Several LOCKSS Alliance members have asked us about cost-effective, high-capacity LOCKSS box hardware. We recently assembled and are testing our answer to these questions, a LOCKSS box built into the U-NAS NSC800 chassis. It supports 8 3.5" drives so, for example, using the recently available 8TB drives and RAID-6 it would provide about 48TB of raw storage before file system overhead. Below the fold, a detailed parts list, links to build instructions, and comments on the build process.

Tuesday, August 2, 2016

Cameron Neylon's "Squaring Circles"

Cameron Neylon's Squaring Circles: The economics and governance of scholarly infrastructures is an expanded version of his excellent talk at the JISC-CNI workshop. Below the fold, some extracts and comments, but you should read the whole thing.