Thursday, November 15, 2018

Cryptocurrencies' Seven Deadly Paradoxes

John Lewis of the Bank of England pens a must-read, well-linked summary of the problems of cryptocurrencies in The seven deadly paradoxes of cryptocurrency. Below the fold, a few comments on each of the seven.

Tuesday, November 13, 2018

Kids Today Have No Idea

One of the downsides of getting old is that every so often something triggers the Grumpy Grandpa. You kids have no idea what it was like back in the day! You need to watch Rob Pike's video to learn where the hardware and software you take for granted came from!


I'm eight years to the day older than Rob, so I got to work with even earlier technology than he did. As far as I know, Rob never encountered the IBM1401, the PDP-7 with its 340 display, the Titan and its time-sharing system, 7-hole paper tape and Flexowriters, or the horrible Data General Nova mini-computer.  I never used an IBM System /360, but we did both work with CDC machines, and punch cards.

I think Rob and I started on PDP-11s at about the same time in 1975, me on RSX-11M at Imperial and Rob on Unix at Toronto. Rob was always much closer to the center of the Unix universe than I was in the UK, but the Unix history he recounts was mine too, from Version 6 on. Rob's talk is a must-watch video.

Thursday, November 8, 2018

What's Happening To Storage?

My only post about storage since May, was October's Betteridge's Law Violation, another critique of IDC's Digital Universe, and their constant pushing of the idea that the demand for storage is insatiable. So its time for an update on what is happening in the real world of storage media, instead of IDC's Universe. Below the fold, some quick takes.

Tuesday, November 6, 2018

Making PIEs Is Hard

In The Four Most Expensive Words In The English Language I wrote:
Since the key property of a cryptocurrency-based storage service is a lack of trust in the storage providers, Proofs of Space and Time are required. As Bram Cohen has pointed out, this is an extraordinarily difficult problem at the very frontier of research.
The post argues that the economics of decentralized storage services aren't viable, so the difficulty of Proofs of Space and Time isn't that important. All the same, this area of research is fascinating. Now, in One File for the Price of Three: Catching Cheating Servers in Decentralized Storage Networks Ethan Cecchetti, Ian Miers, and Ari Juels have pushed the frontier further out by inventing PIEs. Below the fold, some details.

Thursday, November 1, 2018

Ithaka's Perspective on Digital Preservation

Oya Rieger of Ithaka S+R has published a report entitled The State of Digital Preservation in 2018: A Snapshot of Challenges and Gaps. In June and July Rieger:
talked with 21 experts and thought leaders to hear their perspectives on the state of digital preservation. The purpose of this report is to share a number of common themes that permeated through the conversations and provide an opportunity for broader community reaction and engagement, which will over time contribute to the development of an Ithaka S+R research agenda in these areas.
Below the fold, a critique.

Tuesday, October 30, 2018

Controlled Digital Lending

Three years ago in Emulation and Virtualization as Preservation Strategies I wrote about Controlled Digital Lending (CDL):
One idea that might be worth exploring as a way to mitigate the legal issues is lending. The Internet Archive has successfully implemented a lending system for their collection of digitized books; readers can check a book out for a limited period, and each book can be checked out to at most one reader at a time. This has not encountered much opposition from copyright holders.

A similar system for emulation would be feasible; readers would check out an emulation for a limited period, and each emulation could be checked out to at most one reader at a time. One issue would be dependencies. An archive might have, say, 10,000 emulations based on Windows 3.1. If checking out one blocked access to all 10,000 that might be too restrictive to be useful.
Now, Controlled Digital Lending by Libraries offers libraries the opportunity to:
  • better understand the legal framework underpinning CDL,
  • communicate their support for CDL, and
  • build a community of expertise around the practice of CDL.
Below the fold, some details.

Thursday, October 25, 2018

Syndicating Journal Publisher Content

There's a lot of good information in Roger Schonfeld's Will Publishers Syndicate Their Content?. It starts:
The scholarly publishing sector has struggled to address the problems that users face in their discovery-to-access workflow and thereby stave off skyrocketing piracy. The top-line impact of these struggles is becoming clearer, starting with Elsevier’s absence from Germany. This makes the efforts to establish seamles single-platform access to all scholarly publications — equal in extent as Sci-Hub but legitimate, and which I term a Supercontinent of Scholarly Publishing — all the more urgent. The technical solutions are challenging, and at the STM meeting in Frankfurt last week it became clear that, although progress is being made, policy, governance, and competition issues may complicate the drive to consensus.
Schonfeld asserts that providing a seamless, uniform view of the publisher's content, whether paywalled or open access, requires two services:
First, it requires an ability to authorize appropriate access in a decentralized distribution environment. A Shared Entitlements System, as it is sometimes called, would be a kind of common authorization service for all publishers. As I will discuss below, there are at least two options for how Entitlements can be addressed. Second, it requires Distributed Usage Logging, which is to say the ability for all usage, wherever it takes place, to be “counted” in measuring the value of articles on behalf of authors and licenses on behalf of publishers.
Below the fold, a rather long explanation of why I think Schonfeld's analysis doesn't go far enough.

Tuesday, October 23, 2018

Gini Coefficients Of Cryptocurrencies

The Gini coefficient expresses a system's degree of inequality or, in the blockchain context, centralization. It therefore factors into arguments, like mine, that claims of blockchains' decentralization are bogus.

In his testimony to the US Senate Committee on Banking, Housing and Community Affairs' hearing on “Exploring the Cryptocurrency and Blockchain Ecosystem" entitled Crypto is the Mother of All Scams and (Now Busted) Bubbles While Blockchain Is The Most Over-Hyped Technology Ever, No Better than a Spreadsheet/Database, Nouriel Roubini wrote:
wealth in crypto-land is more concentrated than in North Korea where the inequality Gini coefficient is 0.86 (it is 0.41 in the quite unequal US): the Gini coefficient for Bitcoin is an astonishing 0.88.
The link is to Joe Weisenthal's How Bitcoin Is Like North Korea from nearly five years ago, which was based upon a Stack Exchange post, which in turn was based upon a post by the owner of the Bitcoinica exchange from 2011! Which didn't look at all holdings of Bitcoin, let alone the whole of crypto-land, but only at Bitcoinica's customers!

Follow me below the fold as I search for more up-to-date and comprehensive information. I'm not even questioning how Roubini knows the Gini coefficient of North Korea to two decimal places.

Thursday, October 18, 2018

Betteridge's Law Violation

Erez Zadok points me to Wasim Ahmed Bhat's Is a Data-Capacity Gap Inevitable in Big Data Storage? in IEEE Computer. It is a violation of Betteridge's Law of Headlines because the answer isn't no. But what, exactly, is this gap? Follow me below the fold.

Tuesday, October 16, 2018

Software Heritage Foundation Update

I first wrote about the Software Heritage Foundation two years ago. It is four months since their Archive officially went live. Now Roberto di Cosmo and his collaborators have an article, and a video, entitled Building the Universal Archive of Source Code in Communications of the ACM describing their three challenges, of collection, preservation and sharing, and setting out their current status:
Software Heritage is an active project that has already assembled the largest existing collection of software source code. At the time of writing the Software Heritage Archive contains more than four billion unique source code files and one billion individual commits, gathered from more than 80 million publicly available source code repositories (including a full and up-to-date mirror of GitHub) and packages (including a full and up-to-date mirror of Debian). Three copies are currently maintained, including one on a public cloud.

As a graph, the Merkle DAG underpinning the archive consists of 10 billion nodes and 100 billion edges; in terms of resources, the compressed and fully de-duplicated archive requires some 200TB of storage space. These figures grow constantly, as the archive is kept up to date by periodically crawling major code hosting sites and software distributions, adding new software artifacts, but never removing anything. The contents of the archive can already be browsed online, or navigated via a REST API.
I have always believed, as I wrote in 2013:
Software, and in particular open source software is just as much a cultural production as books, music, movies, plays, TV, newspapers, maps and everything else that research libraries, and in particular the Library of Congress, collect and preserve so that future scholars can understand our society.
I'm very disappointed that national libraries haven't accepted this argument, let alone the argument that preservation and access to their other digital collections largely depend on preserving and providing access to open source software. Since they have failed in this task, it is up to the Software Heritage Foundation to step into the breach.

You can find out more at their Web site, and support this important work by donating.

Thursday, October 11, 2018

I'm Shocked, Shocked To Find Collusion Going On

The security of a permissionless peer-to-peer system generally depends upon the assumption of uncoordinated choice, the idea that each peer acts independently upon its own view of the system's state. Vitalik Buterin, a co-founder of Ethereum, wrote in The Meaning of Decentralization:
In the case of blockchain protocols, the mathematical and economic reasoning behind the safety of the consensus often relies crucially on the uncoordinated choice model, or the assumption that the game consists of many small actors that make decisions independently.
Another way of saying this is that the system isn't secure if enough peers collude with each other. Below the fold, I look at why this is a big problem.

Tuesday, October 9, 2018

Click On The Llama

There was lots of great stuff at the Internet Archive's Annual Bash. But for those of us who can remember the days before PCs played music, the highlight was right at the end of the presentations when the awesome Jason Scott introduced the port of 1997's WinAmp to the Web. Two years earlier:
WinPlay3 was the first real-time MP3 audio player for PCs running Windows, both 16-bit (Windows 3.1) and 32-bit (Windows 95). Prior to this, audio compressed with MP3 had to be decompressed prior to listening.
Source
WinPlay3 was the first, but it was bare-bones.It was WinAmp that really got people to realize that the PC was a media device. But the best part was that WinAmp was mod-able. It unleashed a wave of creativity (Debbie does WinAmp, anyone?), now preserved in the Archive's collection of over 5,000 WinAmp skins!

Jason has the details in his blog post Don't Click on the Llama:
Thanks to Jordan Eldredge and the Webamp programming community for this new and strange periscope into the 1990s internet past.
When I first clicked on the llama on The Swiss Family Robinson on my Ubuntu desktop the sound ceased. It turns out that the codec selection mechanism is different between the regular player and WinAmp, and it needed a codec I didn't have installed. The fix was:
sudo apt install ubuntu-restricted-extras
Source
I should also note that the Archive's amazing collection of emulations now includes the Commodore 64 (Jason's introduction is here), and 1,100 additional arcade machines.

Thursday, October 4, 2018

I Don't Really Want To Stop The Show

But I thought you might like to know,
...
It was twenty years ago today that Vicky Reich and I walked into Mike Keller's office in the Stanford Library and got the go-ahead to start the LOCKSS Program. I told the story of its birth five years ago.

Over the last couple of years, as we retired, the program has migrated from being an independent operation under the umbrella of the Stanford Library, to being one of the programs run by the Library's main IT operation, Tom Cramer's DLSS. The transition will shortly be symbolized by a redesigned website (its predecessor looked like this).

Now we are retired, on my blog there are lists of Vicky's and my publications from 1981 on (the LOCKSS ones start in 2000), and talks from 2006 on.

Thanks again to the NSF, Sun Microsystems, and the Andrew W. Mellon Foundation for the funding that allowed us to develop the system. Many thanks to the steadfast support of the libraries of the LOCKSS Alliance, and the libraries and publishers of the CLOCKSS Archive, that has sustained it in production. Special thanks to Don Waters for facilitating the program's evolution off grant funding, and to Margaret Kim for the original tortoise logo.

PS - Google is just one week older.  Vicky was the librarian on the Stanford Digital Library Project with Larry Page and Sergey Brin that led to Google.

Wednesday, October 3, 2018

Brief Talk At Internet Archive Event

Vicky Reich gave a brief talk at the Building A Better Web: The Internet Archive’s Annual Bash. She followed Jefferson Bailey's talk, which reported that the Internet Archive's efforts to preserve the journals have already accumulated full text and metadata of nearly 8.7M articles, of which nearly 1.5M are from "at-risk" small journals. This is around 10% of the entire academic literature.

Below the fold, an edited text of Vicky's talk with links to the sources.

Tuesday, October 2, 2018

Bitcoin's Academic Pedigree

Bitcoin's Academic Pedigree (also here) by Arvind Narayanan and Jeremy Clark starts:
If you've read about bitcoin in the press and have some familiarity with academic research in the field of cryptography, you might reasonably come away with the following impression: Several decades' worth of research on digital cash, beginning with David Chaum, did not lead to commercial success because it required a centralized, banklike server controlling the system, and no banks wanted to sign on. Along came bitcoin, a radically different proposal for a decentralized cryptocurrency that didn't need the banks, and digital cash finally succeeded. Its inventor, the mysterious Satoshi Nakamoto, was an academic outsider, and bitcoin bears no resemblance to earlier academic proposals.
They comprehensively debunk this view, showing that each of the techniques Nakamoto used had been developed over the preceding three decades of academic research, and that Nakamoto's brilliant contribution was:
the specific, complex way in which the underlying components are put together.
Below the fold, details on the specific techniques.

Tuesday, September 25, 2018

Web Archives As Evidence

In Blockchain Solves Preservation! I critiqued John Collomosse et al's ARCHANGEL: Trusted Archives of Digital Public Documents. They argue that
integrity validation via hashes is needed because:
Document integrity is fundamental to public trust in archives. Yet currently that trust is built upon institutional reputation — trust at face value in a centralised authority, like a national government archive or University.
But they also write that:
acceptance of content evidence might eventually become similar to acceptance of DNA evidence in court, but that establishing that level of confidence would require strong public engaged to explain Blockchain in an accessible manner particularly explaining why one could trust the cryptographic assurances inherent in a DLT solution.
At least as far as courts are concerned, they're wrong about both "face value" and how trust is established. Below the fold, an explanation.

Tuesday, September 18, 2018

Vint Cerf on Traceability

Vint Cerf's Traceability addresses a significant problem:
how to preserve the freedom and openness of the Internet while protecting against the harmful behaviors that have emerged in this global medium. That this is a significant challenge cannot be overstated. The bad behaviors range from social network bullying and misinformation to email spam, distributed denial of service attacks, direct cyberattacks against infrastructure, malware propagation, identity theft, and a host of other ills
Cerf's proposed solution is:
differential traceability. The ability to trace bad actors to bring them to justice seems to me an important goal in a civilized society. The tension with privacy protection leads to the idea that only under appropriate conditions can privacy be violated. By way of example, consider license plates on cars. They are usually arbitrary identifiers and special authority is needed to match them with the car owners ... This is an example of differential traceability; the police department has the authority to demand ownership information from the Department of Motor Vehicles that issues the license plates. Ordinary citizens do not have this authority.
Below the fold I examine this proposal and one of the responses.

Thursday, September 13, 2018

Blockchain Solves Preservation!

We're in a period when blockchain or "Distributed Ledger Technology" is the Solution to Everything™, so it is inevitable that it will be proposed as the solution to the problems of digital preservation. John Collomosse et al's abstract for ARCHANGEL: Trusted Archives of Digital Public Documents states:
We present ARCHANGEL; a de-centralised platform for ensuring the long-term integrity of digital documents stored within public archives. Document integrity is fundamental to public trust in archives. Yet currently that trust is built upon institutional reputation --- trust at face value in a centralised authority, like a national government archive or University. ARCHANGEL proposes a shift to a technological underscoring of that trust, using distributed ledger technology (DLT) to cryptographically guarantee the provenance, immutability and so the integrity of archived documents. We describe the ARCHANGEL architecture, and report on a prototype of that architecture build over the Ethereum infrastructure. We report early evaluation and feedback of ARCHANGEL from stakeholders in the research data archives space.
This is a wonderful example of the way people blithely assume that the claimed properties of blockchain systems are actually delivered in the real world. Below the fold I ask whether Collomosse et al have applied appropriate skepticism to blockchain's claims, and whether they have considered the sustainability of their proposal.

Tuesday, September 11, 2018

What Does Data "Durability" Mean

Source
In What Does 11 Nines of Durability Really Mean? David Friend writes:
No amount of nines can prevent data loss.

There is one very important and inconvenient truth about reliability: Two-thirds of all data loss has nothing to do with hardware failure.

The real culprits are a combination of human error, viruses, bugs in application software, and malicious employees or intruders. Almost everyone has accidentally erased or overwritten a file. Even if your cloud storage had one million nines of durability, it can’t protect you from human error.
Friend may be right that these are the top 5 causes of data loss, but over the timescale of preservation as opposed to storage they are far from the only ones. In Requirements for Digital Preservation Systems: A Bottom-Up Approach we listed 13 of them. Below the fold, some discussion of the meaning and usefulness of durability claims.

Tuesday, September 4, 2018

Chia Network

Back in March I wrote Proofs of Space, analyzing Bram Cohen's fascinating EE380 talk. I've now learned more about Chia Network, the company that is implementing a network using his methods. Below the fold I look into their prospects.

Thursday, August 30, 2018

What Does The Decentralized Web Need?

In, among others, It Isn't About The Technology, Decentralized Web Summit2018: Quick Takes and Special Report on Decentralizing the Internet I've been skeptical at considerable length about the prospect of a decentralized Web. I would really like the decentralized Web to succeed, so I admit I'm biased, just pessimistic.

I was asked to summarize what would be needed for success apart from working technology (which we pretty much have)? My answer was four things:
  • A sustainable business model
  • Anti-trust enforcement
  • The killer app
  • A way to remove content
Below the fold, I try to explain of each of them at more readable length.

Tuesday, August 28, 2018

Lending Emulations?

In my report Emulation and Virtualization as Preservation Strategies I discussed the legal issues around emulating obsolete software, the basis for the burgeoning retro-gaming industry. These issues have attracted attention recently, as Kyle Orland reports:
In the wake of Nintendo's recent lawsuits against other ROM distribution sites, major ROM repository EmuParadise has announced it will preemptively cease providing downloadable versions of copyrighted classic games.
Below the fold, some comments on this threat to our cultural history.

Friday, August 24, 2018

Triumph Of Greed Over Arithmetic

I discussed FileCoin's ICO in The Four Most Expensive Words in the English Language and worked out that:
Filecoin needs to generate $25.7M/yr over and above what it pays the providers. But it can't charge the customers more than S3, or $0.276/GB/yr. If it didn't pay the providers anything it would need to be storing over 93PB right away to generate a 10% return. That's a lot of storage to expect providers to donate to the system.
On my bike ride this morning I thought of another way of looking at FileCoin's optimistic economics.

FileCoin won't be able, as S3 does, to claim 11 nines of durability and triple redundancy across data centers. So the real competition is S3's Reduced Redundancy Storage, which currently costs $23K/PB/month. Assuming that Amazon continues its historic 15%/year Kryder rate, storing a Petabyte in RRS for a decade is $1.48M. So, if you believe cryptocurrency "prices", FileCoin's "investors" pre-paid $257M for data storage at some undefined time in the future. They could instead have, starting now, stored 174PB in S3's RRS for 10 years. So FileCoin needs to store at least 174PB for 10 years before breaking even.

It gets worse. S3 is by no means the low-cost provider in the storage market. If we assume that the competition is Backblaze's B2 service at $0.06/GB/yr and that their Kryder rate is zero, FileCoin would need to store 428PB for 10 years before breaking even. Nearly half an Exabyte for a decade!

Tuesday, August 21, 2018

Optical media durability

At last I started clearing out the garage laundry room cupboards, which is where amongst much other stuff the optical media backups I take every week have been accumulating for many years. They have been stored in a fairly warm shirt-sleeve environment with no special precautions. So to get some idea of the durability of writable optical media, I've been somewhat randomly pulling groups of backups out of the stacks and re-verifying the MD5 checksums, which were all verified immediately after writing.

TL;DR: Surprisingly, I'm getting good data from CD-Rs more than 14 years old, and from DVD-Rs nearly 12 years old. Your mileage may vary. Below the fold, my results.

Tuesday, August 14, 2018

The Internet of Torts

Rebecca Crootof at Balkinization has two interesting posts:
  • Introducing the Internet of Torts, in which she describes "how IoT devices empower companies at the expense of consumers and how extant law shields industry from liability."
  • Accountability for the Internet of Torts, in which she discusses "how new products liability law and fiduciary duties could be used to rectify this new power imbalance and ensure that IoT companies are held accountable for the harms they foreseeably cause.
Below the fold,some commentary on both.

Thursday, August 9, 2018

The Blockchain Trilemma

The blockchain trilemma
In The economics of blockchains Markus K Brunnermeier and Joseph Abadi (BA) write:
much of the innovation in blockchain technology has been aimed at wresting power from centralised authorities or monopolies. Unfortunately, the blockchain community’s utopian vision of a decentralised world is not without substantial costs. In recent research, we point out a ‘blockchain trilemma’ – it is impossible for any ledger to fully satisfy the three properties shown in Figure 1 simultaneously (Abadi and Brunnermeier 2018). In particular, decentralisation has three main costs: waste of resources, scalability problems, and network externality inefficiencies.
Below the fold, some commentary.

Tuesday, August 7, 2018

Decentralized Web Summit 2018: Quick Takes

Last week I attended the main two days of the 2018 Decentralized Web Summit put on by the Internet Archive at the San Francisco Mint. I had many good conversations with interesting people, but it didn't change the overall view I've written about in the past. There were a lot of parallel sessions, so I only got a partial view, and the acoustics of the Mint are TERRIBLE for someone my age, so I may have missed parts even of the sessions I was in. Below the fold, some initial reactions.

Thursday, August 2, 2018

Shitcoin And The Lightning Network

The Lightning Network is an overlay on the Bitcoin network, intended to remedy the fact that Bitcoin is unusable for actual transactions. Andreas Brekken, of shitcoin.com, tried installing, running and using a node. He describes his experience in four blog posts:
  1. Can I compile and run a node?
  2. We must first become the Lightning Network
  3. Paying for goods and services
  4. What happens when you close half of the Lightning Network?
Brekken's final TL;DR was “Operating the largest node on the Bitcoin Lightning Network has been educational, frustrating, fun, and at times terrifying. I look forward to trying it again once the technology matures.” Below the fold I look into some of the details.

Tuesday, July 31, 2018

Amazon's Margins Again

AMZN operating margins
I've been pointing out that economies of scale allow for the astonishing margins Amazon enjoys on S3, and the rest of AWS, for six years. Now, This is the Amazon everyone should have feared — and it has nothing to do with its retail business by Jason Del Rey and Rani Molla documents AWS' margins in this table.
Amazon’s $52.9 billion of revenue in the second quarter of the year came in a tad below what Wall Street analysts expected — and that doesn’t matter whatsoever.

That’s because the massive online retailer once again posted its largest quarterly profit in history — $2.5 billion for the quarter — on the back of two businesses that were afterthoughts just a few years ago: Amazon Web Services, its cloud computing unit, as well as its fast-growing advertising business.
Below the fold, I discuss one of the implications of these amazing margins.

Tuesday, July 17, 2018

DINO and IINO

One of the things that I, as an observer of the blockchain scene, find fascinating is how the various heists illuminate the deficiencies of actual, as opposed to the Platonic ideal, blockchain-based systems.

I've been writing for more than 4 years that, at scale, blockchains are DINO (Decentralized In Name Only) because irresistible economies of scale drive centralization. Now, a heist illuminates that, in practice, "smart contracts" such as those on the Ethereum blockchain (which is DINO) are also IINO (Immutable In Name Only). Follow me below the fold for the explanation.

Monday, July 9, 2018

School's out (meta)

Grandkids are sick, or didn't get into the camp their parents wanted, so blogging will be close to non-existent for a while. Sorry about that!

Tuesday, July 3, 2018

Special Report on Decentralizing the Internet (Updated)

The Economist's June 30th issue features a special report from Ludwig Siegele entitled How to fix what has gone wrong with the internet consisting of the following articles:
I really like the way The Economist occasionally allows its writers to address a topic at length. Siegele provides a good overview of what has gone wrong and the competing views of how to fix it. Below the fold, my overall critique, and commentary on some of the articles.

Monday, July 2, 2018

Josh Marshall on Facebook

Last September in Josh Marshall on Google, I wrote:
a quick note to direct you to Josh Marshall's must-read A Serf on Google's Farm. It is a deep dive into the details of the relationship between Talking Points Memo, a fairly successful independent news publisher, and Google. It is essential reading for anyone trying to understand the business of publishing on the Web.
Marshall wasn't happy with TPM's deep relationship with Google. In Has Web Advertising Jumped The Shark? I quoted him:
We could see this coming a few years ago. And we made a decisive and longterm push to restructure our business around subscriptions. So I'm confident we will be fine. But journalism is not fine right now. And journalism is only one industry the platform monopolies affect. Monopolies are bad for all the reasons people used to think they were bad. They raise costs. They stifle innovation. They lower wages. And they have perverse political effects too. Huge and entrenched concentrations of wealth create entrenched and dangerous locuses of political power.
Have things changed? Follow me below the fold.

Friday, June 29, 2018

Cryptocurrencies Have Limits

The Economic Limits Of Bitcoin And The Blockchain by Eric Budish is an important analysis of the economics of two kinds of "51% attack" on Bitcoin and other cryptocurrencies, such as those becoming endemic on Bitcoin Gold and other alt-coins:
  • A "double spend" attack, in which an attacker spends cryptocurrency to obtain goods, then makes the spend disappear in order to spend the cryptocurrency again.
  • A "sabotage" attack, in which short-sellers discredit the cryptocurrency to reduce its value.
Below the fold, some commentary on Budish's paper.

Thursday, June 28, 2018

Rate limits

Andrew Marantz writes in Reddit and the Struggle to Detoxify the Internet:
[On 2017's] April Fools’, instead of a parody announcement, Reddit unveiled a genuine social experiment. It was called r/Place, and it was a blank square, a thousand pixels by a thousand pixels. In the beginning, all million pixels were white. Once the experiment started, anyone could change a single pixel, anywhere on the grid, to one of sixteen colors. The only restriction was speed: the algorithm allowed each redditor to alter just one pixel every five minutes. “That way, no one person can take over—it’s too slow,” Josh Wardle, the Reddit product manager in charge of Place, explained. “In order to do anything at scale, they’re gonna have to coöperate."
The r/Place experiment successfully forced coöperation, for example with r/AmericanFlagInPlace drawing a Stars and Stripes, or r/BlackVoid trying to rub out everything:
Toward the end, the square was a dense, colorful tapestry, chaotic and strangely captivating. It was a collage of hundreds of incongruous images: logos of colleges, sports teams, bands, and video-game companies; a transcribed monologue from “Star Wars”; likenesses of He-Man, David Bowie, the “Mona Lisa,” and a former Prime Minister of Finland. In the final hours, shortly before the experiment ended and the image was frozen for posterity, BlackVoid launched a surprise attack on the American flag. A dark fissure tore at the bottom of the flag, then overtook the whole thing. For a few minutes, the center was engulfed in darkness. Then a broad coalition rallied to beat back the Void; the stars and stripes regained their form, and, in the end, the flag was still there.
What is important about the r/Place experiment? Follow me below the fold for an explanation.

Thursday, June 21, 2018

Software Heritage Archive Goes Live

June 7th was a big day for software preservation; it was the formal opening of Software Heritage's archive. Congratulations to Roberto di Cosmo and the team! There's a post on the Software Heritage blog with an overview:
Today, June 7th 2018, we are proud to be back at Unesco headquarters to unveil a major milestone in our roadmap: the grand opening of the doors of the Software Heritage archive to the public (the slides of the presentation are online). You can now look at what we archived, exploring the largest collection of software source code in the world: you can explore the archive right away, via your web browser. If you want to know more, an upcoming post will guide you through all the features that are provided and the internals backing them.
Morane Gruenpeter's Software Preservation: A Stepping Stone for Software Citation is an excellent explanation of the role that Software Heritage's archive plays in enabling researchers to cite software:
In recent years software has become a legitimate product of research gaining more attention from the scholarly ecosystem than ever before, and researchers feel increasingly the need to cite the software they use or produce. Unfortunately, there is no well established best practice for doing this, and in the citations one sees used quite often ephemeral URLs or other identifiers that offer little or no guarantee that the cited software can be found later on.

But for software to be findable, it must have been preserved in the first place: hence software preservation is actually a prerequisite of software citation.
The importance of preserving software, and in particular open source software, is something I've been writing about for nearly a decade. My initial post about the Software Heritage Foundation started:
Back in 2009 I wrote:
who is to say that the corpus of open source is a less important cultural and historical artifact than, say, romance novels.
Back in 2013 I wrote:
Software, and in particular open source software is just as much a cultural production as books, music, movies, plays, TV, newspapers, maps and everything else that research libraries, and in particular the Library of Congress, collect and preserve so that future scholars can understand our society.
Please support this important work by donating to the Software Heritage Foundation.

Tuesday, June 19, 2018

The Four Most Expensive Words in the English Language

There are currently a number of attempts to deploy a cryptocurrency-based decentralized storage network, including MaidSafe, FileCoin, Sia and others. Distributed storage networks have a long history, and decentralized, peer-to-peer storage networks a somewhat shorter one. None have succeeded; Amazon's S3 and all other successful network storage systems are centralized.

Despite this history, initial coin offerings for these nascent systems have raised incredible amounts of "money", if you believe the heavily manipulated "markets". According to Sir John Templeton the four words are "this time is different". Below the fold I summarize the history, then ask what is different this time, and how expensive is it likely to be?

Tuesday, June 12, 2018

No-one could have predicted ...

... the threats posed by information technology to civil liberties. But my friend Robert G. Kennedy III came close. In April 1989 he wrote Technological Threats To Civil Liberties. From almost 30 years later it is an amazingly perceptive piece. Here are two samples to encourage you to read the whole thing:
An alarming synergy could occur when debit card data is accessed by connectionist machines (neural networks) for business applications. There are patterns to our behavior (economic and otherwise) of which we ourselves might be unaware; these can be extracted by neural nets without the need for formal rules, models, or a priori knowledge. A net is very, very good at pattern inference and recognition. ... One can see the potential for some truly subtle forms of embezzlement, irresistable invasive advertising keyed to surreptitiously acquired psychological profiles, or consumer fraud on a grand scale, among other things.
and:
An executive I know has told me of an office surveillance/attendance system being installed at his company, along the same lines as home security systems. Commercial versions have been on the market for over a year. It uses interactive badges and scanners, sort of transponders-in-an-ID, to track the location, time, and identity of personnel in a building: sort of an electronic leash. (He confided that it is silly to treat employees as bar-coded merchandise; for my part, I was polite enough not to mention the phrase, "Big Brother".)
As you read, remember that it was written two-and-a-half years before the first US Web page went up (which was around 6th Dec. 1991).

Thursday, June 7, 2018

The Island of Misfit Toys

The Berkman Center's Johnathan Zittrain has a New York Times editorial entitled From Westworld to Best World for the Internet of Things starts:
Last month the F.B.I. issued an urgent warning: Everyone with home internet routers should reboot them to shed them of malware from “foreign cyberactors.”
Below the fold, some details and a critique of  Zittrain’s proposals for improving the IoT.

Tuesday, June 5, 2018

Cryptographers on Blockchains: Part 2 (updated)

Back in April I wrote Cryptographers on Blockchains; they weren't enthusiastic. It is time for some more of the same, so follow me below the fold.

Thursday, May 31, 2018

Recreational Bugs

At the San Diego Usenix in January 1989 I presented Visualizing X11 Clients, a paper written by David Lemke and myself. In email conversation about his Pie Menus: A 30 Year Retrospective, Don Hopkins unearthed the script for the talk I gave, which I posted to the "xpert@athena.mit.edu" mail list. To record the script for posterity, a slightly edited version is below the fold.

Don also unearthed A Window Manager for Bitmapped Displays and Unix, the paper James Gosling and I wrote describing the Andrew window manager for the Alvey Workshop at Cosener's House, Abingdon (29th April to 1st May 1985) (DOI). The entire workshop proceedings were subsequently published as Methodology of Window Management, and are online here. The Andrew window manager tiled the screen with windows because, as the quote at the head of the paper said:
You will get a better Gorilla effect if you use as big a piece of paper as possible. Kunihiko Kasahara, Creative Origami.
In retrospect, this wasn't a great idea.

Tuesday, May 29, 2018

Pie Menus

Don's NeWS Pie Menu
IIRC it is 1988, and James Gosling and I are in the Sun Microsystems booth at SIGGRAPH demo-ing the NeWS window system. Don Hopkins walks up with a tape cartridge in his hand and says "load this". Knowing Don, we do, and all of a sudden all the menus in the system are transformed from the conventional pull-right rectangles to circles divided into pie-slices. And Don, at that time the most caffeinated person I'd ever met, is blazing through the menus faster than we've ever seen before.

Why am I writing this thirty years later? Follow me below the fold.

Thursday, May 24, 2018

How Far Is Far Enough?

When collecting an individual web site for preservation by crawling it is necessary to decide where its edges are, which links encountered are "part of the site" and which are links off-site. The crawlers use "crawl rules" to make these decisions. A simple rule would say:
Collect all URLs starting https://www.nytimes.com/
NoScript on http://nytimes.com
If a complex "site" is to be properly preserved the rules need to be a lot more complex. The image shows the start of the list of DNS names from which the New York Times home page embeds resources. Preserving this single page, let alone the "whole site", would need resources from at least 17 DNS names. Rules are needed for each of these names. How are all these more complex rules generated? Follow me below the fold for the answer, and news of an encouraging recent development.

Tuesday, May 22, 2018

ASICs and Mining Centralization

Three and a half years ago, as part of my explanation of why peer-to-peer networks that were successful would become centralized, I wrote in Economies of Scale in Peer-to-Peer Networks:
When new, more efficient technology is introduced, thus reducing the cost per unit contribution to a P2P network, it does not become instantly available to all participants. As manufacturing ramps up, the limited supply preferentially goes to the manufacturers best customers, who would be the largest contributors to the P2P network. By the time supply has increased so that smaller contributors can enjoy the lower cost per unit contribution, the most valuable part of the technology's useful life is over.
I'm not a blockchain insider. But now in a blockbuster post a real insider, David Vorick, the lead developer of Sia, a blockchain based cloud storage platform, makes it clear that the effect I described has been dominating the Bitcoin and other blockchains for a long time, and that it has led to centralization in the market for mining hardware:
The biggest takeaway from all of this is that mining is for big players. The more money you spend, the more of an advantage you have, and there’s not an easy way to change that equation. At least with traditional Nakamoto style consensus, a large entity that produces and controls most of the hashrate seems to be more or less the outcome, and at the very best you get into a situation where there are 2 or 3 major players that are all on similar footing. But I don’t think at any point in the next few decades will we see a situation where many manufacturing companies are all producing relatively competitive miners. Manufacturing just inherently leads to centralization, and it happens across many different vectors.
Below the fold, the details.

Wednesday, May 16, 2018

Shorter talk at MSST2018

I was invited to give both a longer and a shorter talk at the 34th International Conference on Massive Storage Systems and Technology at Santa Clara University. Below the fold is the text with links to the sources of the shorter talk, which was updated from and entitled DNA's Niche in the Storage Market .

Longer talk at MSST2018

I was invited to give both a longer and a shorter talk at the 34th International Conference on Massive Storage Systems and Technology at Santa Clara University. Below the fold is the text with links to the sources of the longer talk, which was updated from and entitled The Medium-Term Prospects for Long-Term Storage Systems.

Monday, May 14, 2018

Blockchain for Peer Review

An initiative has started in the UK called Blockchain for Peer Review. It claims:
The project will develop a protocol where information about peer review activities (submitted by publishers) are stored on a blockchain. This will allow the review process to be independently validated, and data to be fed to relevant vehicles to ensure recognition and validation for reviewers.  By sharing peer review information, while adhering to laws on privacy, data protection and confidentiality, we will foster innovation and increase interoperability.
Everything about this makes sense and could be implemented with a database run by a trusted party, as for example CrossRef does for DOI resolution. Implementing it with a blockchain is effectively impossible. Follow me below the fold for the explanation.

Tuesday, May 8, 2018

Prof. James Morris: "One Last Lecture"

The most important opportunity in my career was when Prof. Bob Sproull, then at Xerox PARC, suggested that I should join the Andrew Project (paper) then just starting at Carnegie-Mellon and run by Prof. James (Jim) Morris. The two years I spent working with Jim and the incredibly talented team he assembled (James Gosling, Mahadev Satyanarayanan, Nathaniel Borenstein, ...) changed my life.

Jim's final lecture at CMU is full of his trademark insights and humor, covering the five mostly CMU computing pioneers who influenced his career. You should watch the whole hour-long video, but below the fold I have transcribed a few tastes:

Monday, May 7, 2018

Might Need Some Work

"I Agree" - Source
Cory Doctorow writes:
"I Agree" is Dima Yarovinsky's art installation for Visualizing Knowledge 2018, with printouts of the terms of service for common apps on scrolls of colored paper, creating a bar chart of the fine print that neither you, nor anyone else in the history of the world, has ever read.
Earlier, Doctorow explained that the GDPR requires that:
Under the new directive, every time a European's personal data is captured or shared, they have to give meaningful consent, after being informed about the purpose of the use with enough clarity that they can predict what will happen to it.

Wednesday, May 2, 2018

"Privacy Is No Longer A Social Norm"

It is widely believed that in 2010 Mark Zuckerberg said "Privacy is no longer a social norm" but apparently that wasn't exactly what he said. Below the fold, I take off from this and other misquotes to look at our home-town's major industry, surveillance. Facebook (now headquartered in Menlo Park) has been getting all the attention recently, but they probably know less about you than Palantir Technologies, still headquartered in Palo Alto.

Monday, April 30, 2018

Michael Nelson's Fifteen Minutes Of Fame

We interrupt our regularly scheduled blogging for this special announcement. Go read Michael Nelson's post Why we need multiple web archives: the case of blog.reidreport.com right now! Its a detailed account in several updates of the forensic analysis of Joy-Ann Reid claim that either her blog or the Internet Archive was hacked. Michael's work landed him a spot on CNN at 0930 April 29th. He did an excellent job of explanation. Half an hour later Reid walked back her claims.

Michael is right about the importance of multiple independent Web archives; once again the Lots Of Copies Keep Stuff Safe principle. But the economics of this multiplicity are problematic.

Thursday, April 26, 2018

Cryptographers On Blockchains

David Gerard's April 21st blog post is a real linkfest. Below the fold, commentary on four of the links.

Tuesday, April 24, 2018

All Your Tweets Are Belong To Kannada

Gerd Badur  CC BY-SA 3.0, Source
Sawood Alam and Plinio Vargas have a fascinating blog post documenting their investigation into why:
47% of mementos of Barack Obama's Twitter page were in non-English languages, almost half of which were in Kannada alone. While language diversity in web archives is generally a good thing, in this case though, it is disconcerting and counter-intuitive.
Kannada is an Indian language spoken by only about 38 million people. Below the fold, some commentary.

Thursday, April 12, 2018

Your Tax Dollars At Work

When I was writing Pre-publication Peer Review Subtracts Value, Springer wanted to charge me $39.95 for access to Comparing Published Scientific Journal Articles to Their Pre-print Versions by Martin Klein et al. This despite the fact that the copyright notice said:
This is a U.S. government work and its text is not subject to copyright protection in the United States
Fortunately, you can now follow the link to the final version at arXiv.org. I'm not the only one annoyed by the publishers charging for access to papers not subject to copyright. Below the fold, some more on this scam.

Tuesday, April 10, 2018

Natural Redundancy

Most uncompressed files contain significant redundancy, which is why they can be made smaller by a compression algorithm; they work by reducing redundancy. The better the algorithm, the less redundancy left in the output. If the files are then stored for the long term, they need to be protected, for example by erasure coding, which adds some redundancy back. In Exploiting Source Redundancy to Improve the Rate of Polar Codes, Ying Wang, Krishna R. Narayanan and Anxiao (Andrew) Jiang of Texas A&M explore using the original redundancy to reduce the amount of protection redundancy needed for a given level of reliability. Below the fold, some commentary.

Monday, April 9, 2018

John Perry Barlow RIP

By Mohamed Nanabhay
from Qatar CC BY 2.0
Vicky Reich and I were both acquainted with John Perry Barlow in the 90s; we met at one of the parties he threw at the DNA Lounge. He was perhaps the most charismatic person I've ever encountered. So we were anxious to attend the symposium the EFF and the Internet Archive organized last Saturday to honor one aspect of his life, his writing and activism around civil liberties in cyberspace.

The Economist, The Guardian and the New York Times had good obituaries, but they mentioned only his Declaration of the Independence of Cyberspace among his writings. It was undoubtedly an important rallying-cry at the time, but it should not be allowed to overshadow his other cyberspace-related writings, thankfully collected by the EFF in the John Perry Barlow Library. Below the fold, the one I would have chosen.

Thursday, April 5, 2018

Emulating Stephen Hawking's Voice

Jason Fagone at the San Francisco Chronicle has a fascinating story of heroic, successful (and timely) emulation in The Silicon Valley quest to preserve Stephen Hawking’s voice. It's the story of a small team which started work in 2009 trying to replace Hawking's voice synthesizer with more modern technology. Below the fold, some details to get you to read the whole article

Tuesday, April 3, 2018

Falling Research Productivity

Are Ideas Getting Harder to Find? by Nicholas Bloom et al looks at the history of investment in R&D and its effect on the product across several industries. Their main example is Moore's Law, and they show that [page 19]:
research effort has risen by a factor of 18 since 1971. This increase occurs while the growth rate of chip density is more or less stable: the constant exponential growth implied by Moore’s Law has been achieved only by a massive increase in the amount of resources devoted to pushing the frontier forward.

Assuming a constant growth rate for Moore’s Law, the implication is that research productivity has fallen by this same factor of 18, an average rate of 6.8 percent per year.

If the null hypothesis of constant research productivity were correct, the growth rate underlying Moore’s Law should have increased by a factor of 18 as well. Instead, it was remarkably stable. Put differently, because of declining research productivity, it is around 18 times harder today to generate the exponential growth behind Moore’s Law than it was in 1971.
Below the fold, some commentary on this and other relevant research.

Thursday, March 29, 2018

Flash vs. Disk (Again)

Gartner's graph
Chris Mellor's NAND chips are going to stay too pricey for flash to slit disk's throat... is based on analysis from Gartner. It continues the theme that I've been stressing for quite some time; flash will not displace hard disk from the bulk storage layer of the hierarchy in the medium term. Follow me below the fold for some commentary, and more graphs.

Wednesday, March 28, 2018

Bitcoin: The Future World Currency?

I had a lot of fun applying arithmetic to DNA's prospects as a storage medium. Jamie Powell must have had just as much fun applying arithmetic to the prospect of Bitcoin becoming the world's currency in Sorry Jack, Bitcoin will not become the global currency., which is part of the FT Alphaville's excellent new Someone is wrong on the Internet series. Below the fold, some of the entertainment.

Tuesday, March 27, 2018

Bad Blockchain Content

A Quantitative Analysis of the Impact of Arbitrary Blockchain Content on Bitcoin by Roman Matzutt et al examines the stuff in the Bitcoin blockchain that isn't a monetary transaction. They:
provide the first systematic analysis of the benefits and threats of arbitrary blockchain content. Our analysis shows that certain content, e.g., illegal pornography, can render the mere possession of a blockchain illegal. Based on these insights, we conduct a thorough quantitative and qualitative analysis of unintended content on Bitcoin's blockchain. Although most data originates from benign extensions to Bitcoin's protocol, our analysis reveals more than 1600 files on the blockchain, over 99% of which are texts or images.
Below the fold, some details.

Thursday, March 22, 2018

Proofs of Space

Bram Cohen, the creator of BitTorrent, gave an EE380 talk entitled Stopping grinding attacks in proofs of space. Two aspects were really interesting:
  • A detailed critique of both the Proof of Work system used by most cryptocurrencies and blockchains, and schemes such as Proof of Stake that have been proposed to replace it.
  • An alternate scheme for securing blockchains based on combining Proof of Space with Verifiable Delay Functions.
But there was another aspect that concerned me. Follow me below the fold for details.

Tuesday, March 20, 2018

Pre-publication Peer Review Subtracts Value

Pre-publication peer review is intended to perform two functions; to prevent bad science being published (gatekeeping), and to improve the science that is published (enhancement). Over the years I've written quite often about how the system is no longer "fit for purpose". Its time for another episode draw attention to two not-so recent contributions:
Below the fold, the details.

Thursday, March 15, 2018

Ethics and Archiving the Web

I wanted to draw attention to what looks like a very interesting conference, Rhizome's National Forum on Ethics and Archiving the Web, March 22-24 at the New Museum in New York:
The dramatic rise in the public’s use of the web and social media to document events presents tremendous opportunities to transform the practice of social memory.

Web archives can serve as witness to crimes, corruption, and abuse; they are powerful advocacy tools; they support community memory around moments of political change, cultural expression, or tragedy. At the same time, they can cause harm and facilitate surveillance and oppression.

As new kinds of archives emerge, there is a pressing need for dialogue about the ethical risks and opportunities that they present to both those documenting and those documented. This conversation becomes particularly important as new tools, such as Rhizome’s Webrecorder software, are developed to meet the changing needs of the web archiving field.

Tuesday, March 13, 2018

The "Grand Challenges" of Curation and Preservation

I'm preparing for a meeting next week at the MIT Library on the "Grand Challenges" of digital curation and preservation. MIT, and in particular their library and press, have a commendable tradition of openness, so I've decided to post my input rather than submit it privately. My version of the challenges is below the fold.

Tuesday, March 6, 2018

Techno-hype part 2.5

Last November I wrote Techno-hype part 2 on cryptocurrencies and blockchains, reviewing David Gerard's excellent book Attack of the 50 Foot Blockchain: Bitcoin, Blockchain, Ethereum & Smart Contracts. A lot has happened since, so its time for an update. Below the fold, I look at three examples of how far these technologies are from being "ready for prime time":
  • The Lightning Network, which is supposed to allow Bitcoin to scale to billions of transactions.
  • IOTA, which is supposed to be a blockchain capable of supporting the Internet of Things.
  • Ethereum, which is supposed to be the infrastructure for "smart contracts".

Thursday, March 1, 2018

Archival Media: Not a Good Business

Thinking more about DNA's Niche in the Storage Market led me to focus on some problems with the market for archival media in general, not just DNA. The details are below the fold.

Tuesday, February 27, 2018

"Nobody cared about security"

There's a common meme that ascribes the parlous state of security on the Internet to the fact that in the ARPAnet days "nobody cared about security". It is true that in the early days of the ARPAnet security wasn't an important issue; everybody involved knew everybody else face-to-face. But it isn't true that the decisions taken in those early days hampered the deployment of security as the Internet took the shape we know today in the late 80s and early 90s. In fact the design decisions taken in the ARPAnet days made the deployment of security easier. The main reason for today's security nightmares is quite different.

I know because I was there, and to a small extent involved. Follow me below the fold for the explanation.

Thursday, February 22, 2018

Brief Talk at Video Game Preservation Workshop

I was asked to give a brief talk to the Video Game Preservation Workshop: Setting the Stage for Multi-Partner Projects at the Stanford Library, discussing the technical and legal aspects of cooperation on preserving software via emulation. Below the fold is an edited text of the talk with links to the sources.

Tuesday, February 20, 2018

Notes from FAST18

I attended the technical sessions of Usenix's File And Storage Technology conference this week. Below the fold, notes on the papers that caught my attention.

Thursday, February 15, 2018

Do You Need A Blockchain?

David Gerard's Do you need a Blockchain? Probably less than Wüst and Gervais think you do reviews an interesting paper, Do you need a Blockchain? by Karl Wüst and Arthur Gervais of ETH Zurich. Their abstract says:
In this article we critically analyze whether a blockchain is indeed the appropriate technical solution for a particular application scenario. We differentiate between permissionless (e.g., Bitcoin/Ethereum) and permissioned (e.g. Hyperledger/Corda) blockchains and contrast their properties to those of a centrally managed database.
Gerard is, for him, pretty enthusiastic about the paper:
This paper is worth your time. They explain the jargon at length, and discuss many commonly-advocated blockchain use cases — it’s a useful survey of the area — even as the authors are huge Bitcoin and blockchain advocates, and somewhat more optimistic for applying blockchains than is really warranted.
Below the fold, I look at both the paper and Gerard's review.

Wednesday, February 14, 2018

Tuesday, February 13, 2018

Correlated Cryptojacking

On February 11 at least 4,275 Web sites were found to have been simultaneously cryptojacked:
they include The City University of New York (cuny.edu), Uncle Sam's court information portal (uscourts.gov), Lund University (lu.se), the UK's Student Loans Company (slc.co.uk), privacy watchdog The Information Commissioner's Office (ico.org.uk) and the Financial Ombudsman Service (financial-ombudsman.org.uk), plus a shedload of other .gov.uk and .gov.au sites, UK NHS services, and other organizations across the globe.

Manchester.gov.uk, NHSinform.scot, agriculture.gov.ie, Croydon.gov.uk, ouh.nhs.uk, legislation.qld.gov.au, the list goes on.
They were all running Coinhive's Monero miner in visitors' browsers. How and why did this happen and what should these sites have been doing to prevent it? Follow me below the fold.

Monday, February 12, 2018

Lessons From Arquivo.pt

Daniel Gomes' video
I'd like to draw your attention to Daniel Gomes excellent video entitled Improving the robustness of the Arquivo.pt web archive.

Arquivo.pt is the Portuguese Web Archive. It got started in 2007, and in 2010 was an early archive to support full-text search. In 2013 it suffered a hardware malfunction that took the service down and lost 17% of its content. This led to a complete re-think of the system architecture, implementation, and operations. Daniel describes this process and the encouraging results in detail. It is well worth the 20 minutes to watch it.

Daniel divides the re-think into 5 major sections:
  1. Hardware and software architecture shifted to shared-nothing
  2. Reinforced replication policies
  3. Monitor the service
  4. Quality assurance for software development
  5. Document and test procedures
I'd agree with all these points. Many of the details correspond to things the LOCKSS Program focused on during preparation for the TRAC audit of the CLOCKSS Archive in 2014. This is especially the case for the last of Daniel's sections; the audit forced us to document our processes, which forced us to think about whether they were actually achieving their goals, which led to the discovery that in a number of cases they weren't.

Thursday, February 8, 2018

Meta: Blog Switched To HTTPS (Updated)

Because From July, Chrome will name and shame insecure HTTP websites I followed the instructions Hamad Ansari provides in Blogger Released Free SSL (HTTPS) For Custom Domains and enabled both "connections over HTTPS" and "HTTPS redirect", so that:
http://blog.dshr.org/
gets redirected to:
https://blog.dshr.org/
Everything I've tried so far works. Please comment on this post if you find things that don't work.

Source
Update: Scott Helme points out that I'm just part of an encouraging trend. The graph shows the top million sites from Alexa in groups of 4,000. For each group, it shows the number of sites that are HTTPS (only, I believe). It shows that the pace of sites going HTTPS-only is increasing. The effect of Chrome's naming and shaming will presumably increase the rate of adoption further in July.

Tuesday, February 6, 2018

DNA's Niche in the Storage Market

I've been writing about storing data in DNA for the last five years, both enthusiastically about DNA's long-term prospects as a technology for storage, and pessimistically about its medium-term prospects. This time, I'd like to look at DNA storage systems as a product, and ask where their attributes might provide a fit in the storage marketplace.

As far as I know no-one has ever built a storage system using DNA as a medium, let alone sold one. Indeed, the only work I know on what such a system would actually look like is by the team from Microsoft Research and the University of Washington. Everything below the fold is somewhat informed speculation. If I've got something wrong, I hope the experts will correct me.

Thursday, January 25, 2018

Magical Thinking At The New York Times

Steven Johnson's Beyond The Bitcoin Bubble in the New York Times Magazine is a 9000-word explanation of how the blockchain can decentralize the Internet that appeared 5 days after my It Isn't About The Technology. Which is a good thing, because otherwise my post would have had to be much longer to address his tome. Follow me below the fold for the part I would have had to add to it.

Tuesday, January 23, 2018

Herbert Van de Sompel's Paul Evan Peters Award Lecture

In It Isn't About The Technology, I wrote about my friend Herbert Van de Sompel's richly-deserved Paul Evan Peters award lecture entitled Scholarly Communication: Deconstruct and Decentralize?, but only in the context of the push to "decentralize the Web". I believe Herbert's goal for this lecture was to spark discussion. In that spirit, below the fold, I have some questions about Herbert's vision of a future decentralized system for scholarly communications built on existing Web protocols. They aren't about the technology but about how it would actually operate.

Thursday, January 18, 2018

Tuesday, January 16, 2018

Not Really Decentralized After All

Here are two more examples of the phenomenon that I've been writing about ever since Economies of Scale in Peer-to-Peer Networks more than three years ago, centralized systems built on decentralized infrastructure in ways that nullify the advantages of decentralization: