DSHR's Blog: 2018

Thursday, December 27, 2018

Securing The Hardware Supply Chain

This is the third part of a series about trust in digital content that might be called:

Is this the real life?
Is this just fantasy?

We are moving down the stack:

The first part was Certificate Transparency, about how we know we are getting content from the Web site we intended to.
The second part was Securing The Software Supply Chain, about how we know we're running the software we intended to, such as the browser that got the content whose certificate was transparent.
This part is about how we can know that the hardware the software we secured is running on is doing what we expect it to.

Below the fold, some rather long and scary commentary.

Meta: Impending Blog Slowdown

So far this year, I've been averaging 1.8 posts/week, but this rate can't be maintained for the near future. I'm sorry about that, but quite apart from grandparent duties over the festive season, there are several reasons:

Certificate Transparency and Securing The Software Supply Chain each took quite a lot of research and thought. The promised hardware follow-up looks like it will take significantly more.
I've been asked to write a report on using cloud storage for preservation. I will start on this after the New Year, and expect it will take only slightly less work than Emulation and Virtualization as Preservation Strategies.
I've been consistently critical of the prospects for decentralized Web applications such as Solid, decentralized storage services, and decentralized social networks. I need to find some time to try them out for myself and report on the experience, as Andreas Brekken did for the Lightning network.

So the 40-odd draft and note-form posts in my queue will just continue to gather dust for a while.

Tuesday, December 18, 2018

Securing The Software Supply Chain

This is the second part of a series about trust in digital content that might be called:

Is this the real life?
Is this just fantasy?

The first part was Certificate Transparency, about how we know we are getting content from the Web site we intended to. This part is about how we know we're running the software we intended to. This question, how to defend against software supply chain attacks, has been in the news recently:

A hacker or hackers sneaked a backdoor into a widely used open source code library with the aim of surreptitiously stealing funds stored in bitcoin wallets, software developers said Monday.

The malicious code was inserted in two stages into event-stream, a code library with 2 million downloads that's used by Fortune 500 companies and small startups alike. In stage one, version 3.3.6, published on September 8, included a benign module known as flatmap-stream. Stage two was implemented on October 5 when flatmap-stream was updated to include malicious code that attempted to steal bitcoin wallets and transfer their balances to a server located in Kuala Lumpur.

See also here and here. The good news is that this was a highly specific attack against a particular kind of cryptocurrency wallet software; things could have been much worse. The bad news is that, however effective they may be against some supply chain attacks, none of the techniques I discuss below the fold would defend against this particular attack.

Software Preservation Network

The Software Preservation Network has two major interrelated achievements under its belt already:

Below the fold I look at both of them.

Blockchain: What's Not To Like?

I gave a talk at the Fall CNI meeting entitled Blockchain: What's Not To Like? The abstract was:

We're in a period when blockchain or "Distributed Ledger Technology" is the Solution to Everything™, so it is inevitable that it will be proposed as the solution to the problems of academic communication and digital preservation. These proposals typically assume, despite the evidence, that real-world blockchain implementations actually deliver the theoretical attributes of decentralization, immutability, anonymity, security, scalability, sustainability, lack of trust, etc. The proposers appear to believe that Satoshi Nakamoto revealed the infallible Bitcoin protocol to the world on golden tablets; they typically don't appreciate or cite the nearly three decades of research and implementation that led up to it. This talk will discuss the mis-match between theory and practice in blockchain technology, and how it applies to various proposed applications of interest to the CNI audience.

Below the fold, an edited text of the talk with links to the sources, and much additional material. The colored boxes contain quotations that were on the slides but weren't spoken.

Update: the video of my talk has now been posted on YouTube and Vimeo.

Irina Bolychevsky on Solid

Although I'm an enthusiast for the idea of a decentralized Web, I've been consistently skeptical that the products proposed to implement it have viable businesses. Two months ago, in How solid is Tim’s plan to redecentralize the web?, Irina Bolychevsky (@redecentralize founder and self-described "product person") made related points. Below the fold, some commentary.

Selective Amnesia

Last year's series of posts and PNC keynote entitled The Amnesiac Civilization were about the threats to our cultural heritage from inadequate funding of Web archives, and the resulting important content that is never preserved. But content that Web archives do collect and preserve is also under a threat that can be described as selective amnesia. David Bixenspan's When the Internet Archive Forgets makes the important, but often overlooked, point that the Internet Archive isn't an elephant:

On the internet, there are certain institutions we have come to rely on daily to keep truth from becoming nebulous or elastic. Not necessarily in the way that something stupid like Verrit aspired to, but at least in confirming that you aren’t losing your mind, that an old post or article you remember reading did, in fact, actually exist. It can be as fleeting as using Google Cache to grab a quickly deleted tweet, but it can also be as involved as doing a deep dive of a now-dead site’s archive via the Wayback Machine. But what happens when an archive becomes less reliable, and arguably has legitimate reasons to bow to pressure and remove controversial archived material?
...
Over the last few years, there has been a change in how the Wayback Machine is viewed, one inspired by the general political mood. What had long been a useful tool when you came across broken links online is now, more than ever before, seen as an arbiter of the truth and a bulwark against erasing history.

Below the fold, some commentary on the vulnerability of Web history to censorship.

Certificate Transparency

Today is 2018's World Digital Preservation Day. It might appear that this post has little to do with digital preservation. However, I hope that the long post below the fold is the start of a series asking the simple question underlying not just digital preservation, but many areas of the digital world, "how do I know this digital content is real?" It is another of the many problems for which blockchain is touted as a solution by people lacking real-world understanding of either the problem or the technology, or both.

Cryptocurrency Collapse

Bitcoin 2-year "price" history

In the second half of last year Bitcoin experienced a massive pump-and-dump. The (heavily manipulated) "price" was pumped from under $2K in mid-July to almost $20K in mid-December. Then came the dump. Hannah Murphy's Bitcoin: Who really owns it, the whales or small fry? reports, based on data from Chainalysis, that in the dump phase:

longer-term holders sold at least $30 billion worth of bitcoin to new speculators over the December to April period, with half of this movement taking place in December alone.

“This was an exceptional transfer of wealth,” says Philip Gradwell, Chainalysis’ chief economist, who dubs the past six months as bitcoin’s “liquidity event”.

This dump drove the "price" down to the mid-$6K region by mid-June, where it stayed until mid-November. But in the last two weeks, things have become "dynamic". At 4pm yesterday on Coinbase the price of a bitcoin was $3718.96. It has lost more than 80% of its value this year. Below the fold, I look at why this might have happened.

John Wharton RIP

My friend John Wharton died last Wednesday of cancer. He was intelligent, eccentric, opinionated on many subjects, and occasionally extraordinarily irritating. He was important in the history of computing as the instruction set architect (PDF) of the Intel 8051, Intel's highest-volume microprocessor, and probably the most-implemented instruction set of all time, and as the long-time chair of the Asilomar Microcomputer Workshop. I attended AMW first in 1987, and nearly all years since. I served on the committee with John from 2000 through 2016, when grandparent duties forced a hiatus.

On hearing of his death, I thought to update his Wikipedia page, but found none. I collected much information from fellow Asilomar attendees, and drafted a page for him, which is has been published. Most of the best stories about John have no chance of satisfying Wikipedia's strict standards for sourcing and relevance, so I have collected some below the fold for posterity.

Cryptocurrencies' Seven Deadly Paradoxes

John Lewis of the Bank of England pens a must-read, well-linked summary of the problems of cryptocurrencies in The seven deadly paradoxes of cryptocurrency. Below the fold, a few comments on each of the seven.

Kids Today Have No Idea

One of the downsides of getting old is that every so often something triggers the Grumpy Grandpa. You kids have no idea what it was like back in the day! You need to watch Rob Pike's video to learn where the hardware and software you take for granted came from!

I'm eight years to the day older than Rob, so I got to work with even earlier technology than he did. As far as I know, Rob never encountered the IBM1401, the PDP-7 with its 340 display, the Titan and its time-sharing system, 7-hole paper tape and Flexowriters, or the horrible Data General Nova mini-computer. I never used an IBM System /360, but we did both work with CDC machines, and punch cards.

I think Rob and I started on PDP-11s at about the same time in 1975, me on RSX-11M at Imperial and Rob on Unix at Toronto. Rob was always much closer to the center of the Unix universe than I was in the UK, but the Unix history he recounts was mine too, from Version 6 on. Rob's talk is a must-watch video.

Thursday, November 8, 2018

What's Happening To Storage?

My only post about storage since May, was October's Betteridge's Law Violation, another critique of IDC's Digital Universe, and their constant pushing of the idea that the demand for storage is insatiable. So its time for an update on what is happening in the real world of storage media, instead of IDC's Universe. Below the fold, some quick takes.

Making PIEs Is Hard

In The Four Most Expensive Words In The English Language I wrote:

Since the key property of a cryptocurrency-based storage service is a lack of trust in the storage providers, Proofs of Space and Time are required. As Bram Cohen has pointed out, this is an extraordinarily difficult problem at the very frontier of research.

The post argues that the economics of decentralized storage services aren't viable, so the difficulty of Proofs of Space and Time isn't that important. All the same, this area of research is fascinating. Now, in One File for the Price of Three: Catching Cheating Servers in Decentralized Storage Networks Ethan Cecchetti, Ian Miers, and Ari Juels have pushed the frontier further out by inventing PIEs. Below the fold, some details.

Ithaka's Perspective on Digital Preservation

Oya Rieger of Ithaka S+R has published a report entitled The State of Digital Preservation in 2018: A Snapshot of Challenges and Gaps. In June and July Rieger:

talked with 21 experts and thought leaders to hear their perspectives on the state of digital preservation. The purpose of this report is to share a number of common themes that permeated through the conversations and provide an opportunity for broader community reaction and engagement, which will over time contribute to the development of an Ithaka S+R research agenda in these areas.

Below the fold, a critique.

Controlled Digital Lending

Three years ago in Emulation and Virtualization as Preservation Strategies I wrote about Controlled Digital Lending (CDL):

One idea that might be worth exploring as a way to mitigate the legal issues is lending. The Internet Archive has successfully implemented a lending system for their collection of digitized books; readers can check a book out for a limited period, and each book can be checked out to at most one reader at a time. This has not encountered much opposition from copyright holders.

A similar system for emulation would be feasible; readers would check out an emulation for a limited period, and each emulation could be checked out to at most one reader at a time. One issue would be dependencies. An archive might have, say, 10,000 emulations based on Windows 3.1. If checking out one blocked access to all 10,000 that might be too restrictive to be useful.

Now, Controlled Digital Lending by Libraries offers libraries the opportunity to:

better understand the legal framework underpinning CDL,
communicate their support for CDL, and
build a community of expertise around the practice of CDL.

Below the fold, some details.

Syndicating Journal Publisher Content

There's a lot of good information in Roger Schonfeld's Will Publishers Syndicate Their Content?. It starts:

The scholarly publishing sector has struggled to address the problems that users face in their discovery-to-access workflow and thereby stave off skyrocketing piracy. The top-line impact of these struggles is becoming clearer, starting with Elsevier’s absence from Germany. This makes the efforts to establish seamles single-platform access to all scholarly publications — equal in extent as Sci-Hub but legitimate, and which I term a Supercontinent of Scholarly Publishing — all the more urgent. The technical solutions are challenging, and at the STM meeting in Frankfurt last week it became clear that, although progress is being made, policy, governance, and competition issues may complicate the drive to consensus.

Schonfeld asserts that providing a seamless, uniform view of the publisher's content, whether paywalled or open access, requires two services:

First, it requires an ability to authorize appropriate access in a decentralized distribution environment. A Shared Entitlements System, as it is sometimes called, would be a kind of common authorization service for all publishers. As I will discuss below, there are at least two options for how Entitlements can be addressed. Second, it requires Distributed Usage Logging, which is to say the ability for all usage, wherever it takes place, to be “counted” in measuring the value of articles on behalf of authors and licenses on behalf of publishers.

Below the fold, a rather long explanation of why I think Schonfeld's analysis doesn't go far enough.

Gini Coefficients Of Cryptocurrencies

The Gini coefficient expresses a system's degree of inequality or, in the blockchain context, centralization. It therefore factors into arguments, like mine, that claims of blockchains' decentralization are bogus.

In his testimony to the US Senate Committee on Banking, Housing and Community Affairs' hearing on “Exploring the Cryptocurrency and Blockchain Ecosystem" entitled Crypto is the Mother of All Scams and (Now Busted) Bubbles While Blockchain Is The Most Over-Hyped Technology Ever, No Better than a Spreadsheet/Database, Nouriel Roubini wrote:

wealth in crypto-land is more concentrated than in North Korea where the inequality Gini coefficient is 0.86 (it is 0.41 in the quite unequal US): the Gini coefficient for Bitcoin is an astonishing 0.88.

The link is to Joe Weisenthal's How Bitcoin Is Like North Korea from nearly five years ago, which was based upon a Stack Exchange post, which in turn was based upon a post by the owner of the Bitcoinica exchange from 2011! Which didn't look at all holdings of Bitcoin, let alone the whole of crypto-land, but only at Bitcoinica's customers!

Follow me below the fold as I search for more up-to-date and comprehensive information. I'm not even questioning how Roubini knows the Gini coefficient of North Korea to two decimal places.

Betteridge's Law Violation

Erez Zadok points me to Wasim Ahmed Bhat's Is a Data-Capacity Gap Inevitable in Big Data Storage? in IEEE Computer. It is a violation of Betteridge's Law of Headlines because the answer isn't no. But what, exactly, is this gap? Follow me below the fold.

Software Heritage Foundation Update

I first wrote about the Software Heritage Foundation two years ago. It is four months since their Archive officially went live. Now Roberto di Cosmo and his collaborators have an article, and a video, entitled Building the Universal Archive of Source Code in Communications of the ACM describing their three challenges, of collection, preservation and sharing, and setting out their current status:

Software Heritage is an active project that has already assembled the largest existing collection of software source code. At the time of writing the Software Heritage Archive contains more than four billion unique source code files and one billion individual commits, gathered from more than 80 million publicly available source code repositories (including a full and up-to-date mirror of GitHub) and packages (including a full and up-to-date mirror of Debian). Three copies are currently maintained, including one on a public cloud.

As a graph, the Merkle DAG underpinning the archive consists of 10 billion nodes and 100 billion edges; in terms of resources, the compressed and fully de-duplicated archive requires some 200TB of storage space. These figures grow constantly, as the archive is kept up to date by periodically crawling major code hosting sites and software distributions, adding new software artifacts, but never removing anything. The contents of the archive can already be browsed online, or navigated via a REST API.

I have always believed, as I wrote in 2013:

Software, and in particular open source software is just as much a cultural production as books, music, movies, plays, TV, newspapers, maps and everything else that research libraries, and in particular the Library of Congress, collect and preserve so that future scholars can understand our society.

I'm very disappointed that national libraries haven't accepted this argument, let alone the argument that preservation and access to their other digital collections largely depend on preserving and providing access to open source software. Since they have failed in this task, it is up to the Software Heritage Foundation to step into the breach.

You can find out more at their Web site, and support this important work by donating.

Thursday, October 11, 2018

I'm Shocked, Shocked To Find Collusion Going On

The security of a permissionless peer-to-peer system generally depends upon the assumption of uncoordinated choice, the idea that each peer acts independently upon its own view of the system's state. Vitalik Buterin, a co-founder of Ethereum, wrote in The Meaning of Decentralization:

In the case of blockchain protocols, the mathematical and economic reasoning behind the safety of the consensus often relies crucially on the uncoordinated choice model, or the assumption that the game consists of many small actors that make decisions independently.

Another way of saying this is that the system isn't secure if enough peers collude with each other. Below the fold, I look at why this is a big problem.

Click On The Llama

There was lots of great stuff at the Internet Archive's Annual Bash. But for those of us who can remember the days before PCs played music, the highlight was right at the end of the presentations when the awesome Jason Scott introduced the port of 1997's WinAmp to the Web. Two years earlier:

WinPlay3 was the first real-time MP3 audio player for PCs running Windows, both 16-bit (Windows 3.1) and 32-bit (Windows 95). Prior to this, audio compressed with MP3 had to be decompressed prior to listening.

Source

WinPlay3 was the first, but it was bare-bones.It was WinAmp that really got people to realize that the PC was a media device. But the best part was that WinAmp was mod-able. It unleashed a wave of creativity (Debbie does WinAmp, anyone?), now preserved in the Archive's collection of over 5,000 WinAmp skins!

Jason has the details in his blog post Don't Click on the Llama:

Thanks to Jordan Eldredge and the Webamp programming community for this new and strange periscope into the 1990s internet past.

When I first clicked on the llama on The Swiss Family Robinson on my Ubuntu desktop the sound ceased. It turns out that the codec selection mechanism is different between the regular player and WinAmp, and it needed a codec I didn't have installed. The fix was:

sudo apt install ubuntu-restricted-extras

Source

I should also note that the Archive's amazing collection of emulations now includes the Commodore 64 (Jason's introduction is here), and 1,100 additional arcade machines.

Thursday, October 4, 2018

I Don't Really Want To Stop The Show

But I thought you might like to know,
...
It was twenty years ago today that Vicky Reich and I walked into Mike Keller's office in the Stanford Library and got the go-ahead to start the LOCKSS Program. I told the story of its birth five years ago.

Over the last couple of years, as we retired, the program has migrated from being an independent operation under the umbrella of the Stanford Library, to being one of the programs run by the Library's main IT operation, Tom Cramer's DLSS. The transition will shortly be symbolized by a redesigned website (its predecessor looked like this).

Now we are retired, on my blog there are lists of Vicky's and my publications from 1981 on (the LOCKSS ones start in 2000), and talks from 2006 on.

Thanks again to the NSF, Sun Microsystems, and the Andrew W. Mellon Foundation for the funding that allowed us to develop the system. Many thanks to the steadfast support of the libraries of the LOCKSS Alliance, and the libraries and publishers of the CLOCKSS Archive, that has sustained it in production. Special thanks to Don Waters for facilitating the program's evolution off grant funding, and to Margaret Kim for the original tortoise logo.

PS - Google is just one week older. Vicky was the librarian on the Stanford Digital Library Project with Larry Page and Sergey Brin that led to Google.

Wednesday, October 3, 2018

Brief Talk At Internet Archive Event

Vicky Reich gave a brief talk at the Building A Better Web: The Internet Archive’s Annual Bash. She followed Jefferson Bailey's talk, which reported that the Internet Archive's efforts to preserve the journals have already accumulated full text and metadata of nearly 8.7M articles, of which nearly 1.5M are from "at-risk" small journals. This is around 10% of the entire academic literature.

Below the fold, an edited text of Vicky's talk with links to the sources.

Bitcoin's Academic Pedigree

Bitcoin's Academic Pedigree (also here) by Arvind Narayanan and Jeremy Clark starts:

If you've read about bitcoin in the press and have some familiarity with academic research in the field of cryptography, you might reasonably come away with the following impression: Several decades' worth of research on digital cash, beginning with David Chaum, did not lead to commercial success because it required a centralized, banklike server controlling the system, and no banks wanted to sign on. Along came bitcoin, a radically different proposal for a decentralized cryptocurrency that didn't need the banks, and digital cash finally succeeded. Its inventor, the mysterious Satoshi Nakamoto, was an academic outsider, and bitcoin bears no resemblance to earlier academic proposals.

They comprehensively debunk this view, showing that each of the techniques Nakamoto used had been developed over the preceding three decades of academic research, and that Nakamoto's brilliant contribution was:

the specific, complex way in which the underlying components are put together.

Below the fold, details on the specific techniques.

Web Archives As Evidence

In Blockchain Solves Preservation! I critiqued John Collomosse et al's ARCHANGEL: Trusted Archives of Digital Public Documents. They argue that
integrity validation via hashes is needed because:

Document integrity is fundamental to public trust in archives. Yet currently that trust is built upon institutional reputation — trust at face value in a centralised authority, like a national government archive or University.

But they also write that:

acceptance of content evidence might eventually become similar to acceptance of DNA evidence in court, but that establishing that level of confidence would require strong public engaged to explain Blockchain in an accessible manner particularly explaining why one could trust the cryptographic assurances inherent in a DLT solution.

At least as far as courts are concerned, they're wrong about both "face value" and how trust is established. Below the fold, an explanation.

Vint Cerf on Traceability

Vint Cerf's Traceability addresses a significant problem:

how to preserve the freedom and openness of the Internet while protecting against the harmful behaviors that have emerged in this global medium. That this is a significant challenge cannot be overstated. The bad behaviors range from social network bullying and misinformation to email spam, distributed denial of service attacks, direct cyberattacks against infrastructure, malware propagation, identity theft, and a host of other ills

Cerf's proposed solution is:

differential traceability. The ability to trace bad actors to bring them to justice seems to me an important goal in a civilized society. The tension with privacy protection leads to the idea that only under appropriate conditions can privacy be violated. By way of example, consider license plates on cars. They are usually arbitrary identifiers and special authority is needed to match them with the car owners ... This is an example of differential traceability; the police department has the authority to demand ownership information from the Department of Motor Vehicles that issues the license plates. Ordinary citizens do not have this authority.

Below the fold I examine this proposal and one of the responses.

Blockchain Solves Preservation!

We're in a period when blockchain or "Distributed Ledger Technology" is the Solution to Everything™, so it is inevitable that it will be proposed as the solution to the problems of digital preservation. John Collomosse et al's abstract for ARCHANGEL: Trusted Archives of Digital Public Documents states:

We present ARCHANGEL; a de-centralised platform for ensuring the long-term integrity of digital documents stored within public archives. Document integrity is fundamental to public trust in archives. Yet currently that trust is built upon institutional reputation --- trust at face value in a centralised authority, like a national government archive or University. ARCHANGEL proposes a shift to a technological underscoring of that trust, using distributed ledger technology (DLT) to cryptographically guarantee the provenance, immutability and so the integrity of archived documents. We describe the ARCHANGEL architecture, and report on a prototype of that architecture build over the Ethereum infrastructure. We report early evaluation and feedback of ARCHANGEL from stakeholders in the research data archives space.

This is a wonderful example of the way people blithely assume that the claimed properties of blockchain systems are actually delivered in the real world. Below the fold I ask whether Collomosse et al have applied appropriate skepticism to blockchain's claims, and whether they have considered the sustainability of their proposal.

What Does Data "Durability" Mean

Source

In What Does 11 Nines of Durability Really Mean? David Friend writes:

No amount of nines can prevent data loss.

There is one very important and inconvenient truth about reliability: Two-thirds of all data loss has nothing to do with hardware failure.

The real culprits are a combination of human error, viruses, bugs in application software, and malicious employees or intruders. Almost everyone has accidentally erased or overwritten a file. Even if your cloud storage had one million nines of durability, it can’t protect you from human error.

Friend may be right that these are the top 5 causes of data loss, but over the timescale of preservation as opposed to storage they are far from the only ones. In Requirements for Digital Preservation Systems: A Bottom-Up Approach we listed 13 of them. Below the fold, some discussion of the meaning and usefulness of durability claims.

Chia Network

Back in March I wrote Proofs of Space, analyzing Bram Cohen's fascinating EE380 talk. I've now learned more about Chia Network, the company that is implementing a network using his methods. Below the fold I look into their prospects.

What Does The Decentralized Web Need?

In, among others, It Isn't About The Technology, Decentralized Web Summit2018: Quick Takes and Special Report on Decentralizing the Internet I've been skeptical at considerable length about the prospect of a decentralized Web. I would really like the decentralized Web to succeed, so I admit I'm biased, just pessimistic.

I was asked to summarize what would be needed for success apart from working technology (which we pretty much have)? My answer was four things:

A sustainable business model
Anti-trust enforcement
The killer app
A way to remove content

Below the fold, I try to explain of each of them at more readable length.

Lending Emulations?

In my report Emulation and Virtualization as Preservation Strategies I discussed the legal issues around emulating obsolete software, the basis for the burgeoning retro-gaming industry. These issues have attracted attention recently, as Kyle Orland reports:

In the wake of Nintendo's recent lawsuits against other ROM distribution sites, major ROM repository EmuParadise has announced it will preemptively cease providing downloadable versions of copyrighted classic games.

Below the fold, some comments on this threat to our cultural history.

Triumph Of Greed Over Arithmetic

I discussed FileCoin's ICO in The Four Most Expensive Words in the English Language and worked out that:

Filecoin needs to generate $25.7M/yr over and above what it pays the providers. But it can't charge the customers more than S3, or $0.276/GB/yr. If it didn't pay the providers anything it would need to be storing over 93PB right away to generate a 10% return. That's a lot of storage to expect providers to donate to the system.

On my bike ride this morning I thought of another way of looking at FileCoin's optimistic economics.

FileCoin won't be able, as S3 does, to claim 11 nines of durability and triple redundancy across data centers. So the real competition is S3's Reduced Redundancy Storage, which currently costs $23K/PB/month. Assuming that Amazon continues its historic 15%/year Kryder rate, storing a Petabyte in RRS for a decade is $1.48M. So, if you believe cryptocurrency "prices", FileCoin's "investors" pre-paid $257M for data storage at some undefined time in the future. They could instead have, starting now, stored 174PB in S3's RRS for 10 years. So FileCoin needs to store at least 174PB for 10 years before breaking even.

It gets worse. S3 is by no means the low-cost provider in the storage market. If we assume that the competition is Backblaze's B2 service at $0.06/GB/yr and that their Kryder rate is zero, FileCoin would need to store 428PB for 10 years before breaking even. Nearly half an Exabyte for a decade!

Tuesday, August 21, 2018

Optical media durability

At last I started clearing out the ~~garage~~ laundry room cupboards, which is where amongst much other stuff the optical media backups I take every week have been accumulating for many years. They have been stored in a fairly warm shirt-sleeve environment with no special precautions. So to get some idea of the durability of writable optical media, I've been somewhat randomly pulling groups of backups out of the stacks and re-verifying the MD5 checksums, which were all verified immediately after writing.

TL;DR: Surprisingly, I'm getting good data from CD-Rs more than 14 years old, and from DVD-Rs nearly 12 years old. Your mileage may vary. Below the fold, my results.

The Internet of Torts

Rebecca Crootof at Balkinization has two interesting posts:

Introducing the Internet of Torts, in which she describes "how IoT devices empower companies at the expense of consumers and how extant law shields industry from liability."
Accountability for the Internet of Torts, in which she discusses "how new products liability law and fiduciary duties could be used to rectify this new power imbalance and ensure that IoT companies are held accountable for the harms they foreseeably cause.

Below the fold,some commentary on both.

The Blockchain Trilemma

The blockchain trilemma

In The economics of blockchains Markus K Brunnermeier and Joseph Abadi (BA) write:

much of the innovation in blockchain technology has been aimed at wresting power from centralised authorities or monopolies. Unfortunately, the blockchain community’s utopian vision of a decentralised world is not without substantial costs. In recent research, we point out a ‘blockchain trilemma’ – it is impossible for any ledger to fully satisfy the three properties shown in Figure 1 simultaneously (Abadi and Brunnermeier 2018). In particular, decentralisation has three main costs: waste of resources, scalability problems, and network externality inefficiencies.

Below the fold, some commentary.

Decentralized Web Summit 2018: Quick Takes

Last week I attended the main two days of the 2018 Decentralized Web Summit put on by the Internet Archive at the San Francisco Mint. I had many good conversations with interesting people, but it didn't change the overall view I've written about in the past. There were a lot of parallel sessions, so I only got a partial view, and the acoustics of the Mint are TERRIBLE for someone my age, so I may have missed parts even of the sessions I was in. Below the fold, some initial reactions.

Shitcoin And The Lightning Network

The Lightning Network is an overlay on the Bitcoin network, intended to remedy the fact that Bitcoin is unusable for actual transactions. Andreas Brekken, of shitcoin.com, tried installing, running and using a node. He describes his experience in four blog posts:

Brekken's final TL;DR was “Operating the largest node on the Bitcoin Lightning Network has been educational, frustrating, fun, and at times terrifying. I look forward to trying it again once the technology matures.” Below the fold I look into some of the details.

Amazon's Margins Again

AMZN operating margins

I've been pointing out that economies of scale allow for the astonishing margins Amazon enjoys on S3, and the rest of AWS, for six years. Now, This is the Amazon everyone should have feared — and it has nothing to do with its retail business by Jason Del Rey and Rani Molla documents AWS' margins in this table.

Amazon’s $52.9 billion of revenue in the second quarter of the year came in a tad below what Wall Street analysts expected — and that doesn’t matter whatsoever.

That’s because the massive online retailer once again posted its largest quarterly profit in history — $2.5 billion for the quarter — on the back of two businesses that were afterthoughts just a few years ago: Amazon Web Services, its cloud computing unit, as well as its fast-growing advertising business.

Below the fold, I discuss one of the implications of these amazing margins.

DINO and IINO

One of the things that I, as an observer of the blockchain scene, find fascinating is how the various heists illuminate the deficiencies of actual, as opposed to the Platonic ideal, blockchain-based systems.

I've been writing for more than 4 years that, at scale, blockchains are DINO (Decentralized In Name Only) because irresistible economies of scale drive centralization. Now, a heist illuminates that, in practice, "smart contracts" such as those on the Ethereum blockchain (which is DINO) are also IINO (Immutable In Name Only). Follow me below the fold for the explanation.

School's out (meta)

Grandkids are sick, or didn't get into the camp their parents wanted, so blogging will be close to non-existent for a while. Sorry about that!

Tuesday, July 3, 2018

Special Report on Decentralizing the Internet (Updated)

The Economist's June 30th issue features a special report from Ludwig Siegele entitled How to fix what has gone wrong with the internet consisting of the following articles:

I really like the way The Economist occasionally allows its writers to address a topic at length. Siegele provides a good overview of what has gone wrong and the competing views of how to fix it. Below the fold, my overall critique, and commentary on some of the articles.

Josh Marshall on Facebook

Last September in Josh Marshall on Google, I wrote:

a quick note to direct you to Josh Marshall's must-read A Serf on Google's Farm. It is a deep dive into the details of the relationship between Talking Points Memo, a fairly successful independent news publisher, and Google. It is essential reading for anyone trying to understand the business of publishing on the Web.

Marshall wasn't happy with TPM's deep relationship with Google. In Has Web Advertising Jumped The Shark? I quoted him:

We could see this coming a few years ago. And we made a decisive and longterm push to restructure our business around subscriptions. So I'm confident we will be fine. But journalism is not fine right now. And journalism is only one industry the platform monopolies affect. Monopolies are bad for all the reasons people used to think they were bad. They raise costs. They stifle innovation. They lower wages. And they have perverse political effects too. Huge and entrenched concentrations of wealth create entrenched and dangerous locuses of political power.

Have things changed? Follow me below the fold.

Cryptocurrencies Have Limits

The Economic Limits Of Bitcoin And The Blockchain by Eric Budish is an important analysis of the economics of two kinds of "51% attack" on Bitcoin and other cryptocurrencies, such as those becoming endemic on Bitcoin Gold and other alt-coins:

A "double spend" attack, in which an attacker spends cryptocurrency to obtain goods, then makes the spend disappear in order to spend the cryptocurrency again.
A "sabotage" attack, in which short-sellers discredit the cryptocurrency to reduce its value.

Below the fold, some commentary on Budish's paper.

Rate limits

Andrew Marantz writes in Reddit and the Struggle to Detoxify the Internet:

[On 2017's] April Fools’, instead of a parody announcement, Reddit unveiled a genuine social experiment. It was called r/Place, and it was a blank square, a thousand pixels by a thousand pixels. In the beginning, all million pixels were white. Once the experiment started, anyone could change a single pixel, anywhere on the grid, to one of sixteen colors. The only restriction was speed: the algorithm allowed each redditor to alter just one pixel every five minutes. “That way, no one person can take over—it’s too slow,” Josh Wardle, the Reddit product manager in charge of Place, explained. “In order to do anything at scale, they’re gonna have to coöperate."

The r/Place experiment successfully forced coöperation, for example with r/AmericanFlagInPlace drawing a Stars and Stripes, or r/BlackVoid trying to rub out everything:

Toward the end, the square was a dense, colorful tapestry, chaotic and strangely captivating. It was a collage of hundreds of incongruous images: logos of colleges, sports teams, bands, and video-game companies; a transcribed monologue from “Star Wars”; likenesses of He-Man, David Bowie, the “Mona Lisa,” and a former Prime Minister of Finland. In the final hours, shortly before the experiment ended and the image was frozen for posterity, BlackVoid launched a surprise attack on the American flag. A dark fissure tore at the bottom of the flag, then overtook the whole thing. For a few minutes, the center was engulfed in darkness. Then a broad coalition rallied to beat back the Void; the stars and stripes regained their form, and, in the end, the flag was still there.

What is important about the r/Place experiment? Follow me below the fold for an explanation.

Software Heritage Archive Goes Live

June 7^th was a big day for software preservation; it was the formal opening of Software Heritage's archive. Congratulations to Roberto di Cosmo and the team! There's a post on the Software Heritage blog with an overview:

Today, June 7th 2018, we are proud to be back at Unesco headquarters to unveil a major milestone in our roadmap: the grand opening of the doors of the Software Heritage archive to the public (the slides of the presentation are online). You can now look at what we archived, exploring the largest collection of software source code in the world: you can explore the archive right away, via your web browser. If you want to know more, an upcoming post will guide you through all the features that are provided and the internals backing them.

Morane Gruenpeter's Software Preservation: A Stepping Stone for Software Citation is an excellent explanation of the role that Software Heritage's archive plays in enabling researchers to cite software:

In recent years software has become a legitimate product of research gaining more attention from the scholarly ecosystem than ever before, and researchers feel increasingly the need to cite the software they use or produce. Unfortunately, there is no well established best practice for doing this, and in the citations one sees used quite often ephemeral URLs or other identifiers that offer little or no guarantee that the cited software can be found later on.

But for software to be findable, it must have been preserved in the first place: hence software preservation is actually a prerequisite of software citation.

The importance of preserving software, and in particular open source software, is something I've been writing about for nearly a decade. My initial post about the Software Heritage Foundation started:

Back in 2009 I wrote:

who is to say that the corpus of open source is a less important cultural and historical artifact than, say, romance novels.
Back in 2013 I wrote:

Software, and in particular open source software is just as much a cultural production as books, music, movies, plays, TV, newspapers, maps and everything else that research libraries, and in particular the Library of Congress, collect and preserve so that future scholars can understand our society.

Please support this important work by donating to the Software Heritage Foundation.

Tuesday, June 19, 2018

The Four Most Expensive Words in the English Language

There are currently a number of attempts to deploy a cryptocurrency-based decentralized storage network, including MaidSafe, FileCoin, Sia and others. Distributed storage networks have a long history, and decentralized, peer-to-peer storage networks a somewhat shorter one. None have succeeded; Amazon's S3 and all other successful network storage systems are centralized.

Despite this history, initial coin offerings for these nascent systems have raised incredible amounts of "money", if you believe the heavily manipulated "markets". According to Sir John Templeton the four words are "this time is different". Below the fold I summarize the history, then ask what is different this time, and how expensive is it likely to be?

No-one could have predicted ...

... the threats posed by information technology to civil liberties. But my friend Robert G. Kennedy III came close. In April 1989 he wrote Technological Threats To Civil Liberties. From almost 30 years later it is an amazingly perceptive piece. Here are two samples to encourage you to read the whole thing:

An alarming synergy could occur when debit card data is accessed by connectionist machines (neural networks) for business applications. There are patterns to our behavior (economic and otherwise) of which we ourselves might be unaware; these can be extracted by neural nets without the need for formal rules, models, or a priori knowledge. A net is very, very good at pattern inference and recognition. ... One can see the potential for some truly subtle forms of embezzlement, irresistable invasive advertising keyed to surreptitiously acquired psychological profiles, or consumer fraud on a grand scale, among other things.

and:

An executive I know has told me of an office surveillance/attendance system being installed at his company, along the same lines as home security systems. Commercial versions have been on the market for over a year. It uses interactive badges and scanners, sort of transponders-in-an-ID, to track the location, time, and identity of personnel in a building: sort of an electronic leash. (He confided that it is silly to treat employees as bar-coded merchandise; for my part, I was polite enough not to mention the phrase, "Big Brother".)

As you read, remember that it was written two-and-a-half years before the first US Web page went up (which was around 6^th Dec. 1991).

Thursday, June 7, 2018

The Island of Misfit Toys

The Berkman Center's Johnathan Zittrain has a New York Times editorial entitled From Westworld to Best World for the Internet of Things starts:

Last month the F.B.I. issued an urgent warning: Everyone with home internet routers should reboot them to shed them of malware from “foreign cyberactors.”

Below the fold, some details and a critique of Zittrain’s proposals for improving the IoT.

Cryptographers on Blockchains: Part 2 (updated)

Back in April I wrote Cryptographers on Blockchains; they weren't enthusiastic. It is time for some more of the same, so follow me below the fold.

Recreational Bugs

At the San Diego Usenix in January 1989 I presented Visualizing X11 Clients, a paper written by David Lemke and myself. In email conversation about his Pie Menus: A 30 Year Retrospective, Don Hopkins unearthed the script for the talk I gave, which I posted to the "xpert@athena.mit.edu" mail list. To record the script for posterity, a slightly edited version is below the fold.

Don also unearthed A Window Manager for Bitmapped Displays and Unix, the paper James Gosling and I wrote describing the Andrew window manager for the Alvey Workshop at Cosener's House, Abingdon (29^th April to 1^st May 1985) (DOI). The entire workshop proceedings were subsequently published as Methodology of Window Management, and are online here. The Andrew window manager tiled the screen with windows because, as the quote at the head of the paper said:

You will get a better Gorilla effect if you use as big a piece of paper as possible. Kunihiko Kasahara, Creative Origami.

In retrospect, this wasn't a great idea.

Pie Menus

Don's NeWS Pie Menu

IIRC it is 1988, and James Gosling and I are in the Sun Microsystems booth at SIGGRAPH demo-ing the NeWS window system. Don Hopkins walks up with a tape cartridge in his hand and says "load this". Knowing Don, we do, and all of a sudden all the menus in the system are transformed from the conventional pull-right rectangles to circles divided into pie-slices. And Don, at that time the most caffeinated person I'd ever met, is blazing through the menus faster than we've ever seen before.

Why am I writing this thirty years later? Follow me below the fold.

How Far Is Far Enough?

When collecting an individual web site for preservation by crawling it is necessary to decide where its edges are, which links encountered are "part of the site" and which are links off-site. The crawlers use "crawl rules" to make these decisions. A simple rule would say:

Collect all URLs starting https://www.nytimes.com/

NoScript on http://nytimes.com

If a complex "site" is to be properly preserved the rules need to be a lot more complex. The image shows the start of the list of DNS names from which the New York Times home page embeds resources. Preserving this single page, let alone the "whole site", would need resources from at least 17 DNS names. Rules are needed for each of these names. How are all these more complex rules generated? Follow me below the fold for the answer, and news of an encouraging recent development.

ASICs and Mining Centralization

Three and a half years ago, as part of my explanation of why peer-to-peer networks that were successful would become centralized, I wrote in Economies of Scale in Peer-to-Peer Networks:

When new, more efficient technology is introduced, thus reducing the cost per unit contribution to a P2P network, it does not become instantly available to all participants. As manufacturing ramps up, the limited supply preferentially goes to the manufacturers best customers, who would be the largest contributors to the P2P network. By the time supply has increased so that smaller contributors can enjoy the lower cost per unit contribution, the most valuable part of the technology's useful life is over.

I'm not a blockchain insider. But now in a blockbuster post a real insider, David Vorick, the lead developer of Sia, a blockchain based cloud storage platform, makes it clear that the effect I described has been dominating the Bitcoin and other blockchains for a long time, and that it has led to centralization in the market for mining hardware:

The biggest takeaway from all of this is that mining is for big players. The more money you spend, the more of an advantage you have, and there’s not an easy way to change that equation. At least with traditional Nakamoto style consensus, a large entity that produces and controls most of the hashrate seems to be more or less the outcome, and at the very best you get into a situation where there are 2 or 3 major players that are all on similar footing. But I don’t think at any point in the next few decades will we see a situation where many manufacturing companies are all producing relatively competitive miners. Manufacturing just inherently leads to centralization, and it happens across many different vectors.

Below the fold, the details.

Shorter talk at MSST2018

I was invited to give both a longer and a shorter talk at the 34^th International Conference on Massive Storage Systems and Technology at Santa Clara University. Below the fold is the text with links to the sources of the shorter talk, which was updated from and entitled DNA's Niche in the Storage Market .

Longer talk at MSST2018

I was invited to give both a longer and a shorter talk at the 34^th International Conference on Massive Storage Systems and Technology at Santa Clara University. Below the fold is the text with links to the sources of the longer talk, which was updated from and entitled The Medium-Term Prospects for Long-Term Storage Systems.

Blockchain for Peer Review

An initiative has started in the UK called Blockchain for Peer Review. It claims:

The project will develop a protocol where information about peer review activities (submitted by publishers) are stored on a blockchain. This will allow the review process to be independently validated, and data to be fed to relevant vehicles to ensure recognition and validation for reviewers. By sharing peer review information, while adhering to laws on privacy, data protection and confidentiality, we will foster innovation and increase interoperability.

Everything about this makes sense and could be implemented with a database run by a trusted party, as for example CrossRef does for DOI resolution. Implementing it with a blockchain is effectively impossible. Follow me below the fold for the explanation.

Prof. James Morris: "One Last Lecture"

The most important opportunity in my career was when Prof. Bob Sproull, then at Xerox PARC, suggested that I should join the Andrew Project (paper) then just starting at Carnegie-Mellon and run by Prof. James (Jim) Morris. The two years I spent working with Jim and the incredibly talented team he assembled (James Gosling, Mahadev Satyanarayanan, Nathaniel Borenstein, ...) changed my life.

Jim's final lecture at CMU is full of his trademark insights and humor, covering the five mostly CMU computing pioneers who influenced his career. You should watch the whole hour-long video, but below the fold I have transcribed a few tastes:

Might Need Some Work

"I Agree" - Source

Cory Doctorow writes:

"I Agree" is Dima Yarovinsky's art installation for Visualizing Knowledge 2018, with printouts of the terms of service for common apps on scrolls of colored paper, creating a bar chart of the fine print that neither you, nor anyone else in the history of the world, has ever read.

Earlier, Doctorow explained that the GDPR requires that:

Under the new directive, every time a European's personal data is captured or shared, they have to give meaningful consent, after being informed about the purpose of the use with enough clarity that they can predict what will happen to it.

Wednesday, May 2, 2018

"Privacy Is No Longer A Social Norm"

It is widely believed that in 2010 Mark Zuckerberg said "Privacy is no longer a social norm" but apparently that wasn't exactly what he said. Below the fold, I take off from this and other misquotes to look at our home-town's major industry, surveillance. Facebook (now headquartered in Menlo Park) has been getting all the attention recently, but they probably know less about you than Palantir Technologies, still headquartered in Palo Alto.

Michael Nelson's Fifteen Minutes Of Fame

We interrupt our regularly scheduled blogging for this special announcement. Go read Michael Nelson's post Why we need multiple web archives: the case of blog.reidreport.com right now! Its a detailed account in several updates of the forensic analysis of Joy-Ann Reid claim that either her blog or the Internet Archive was hacked. Michael's work landed him a spot on CNN at 0930 April 29^th. He did an excellent job of explanation. Half an hour later Reid walked back her claims.

Michael is right about the importance of multiple independent Web archives; once again the Lots Of Copies Keep Stuff Safe principle. But the economics of this multiplicity are problematic.

Thursday, April 26, 2018

Cryptographers On Blockchains

David Gerard's April 21^st blog post is a real linkfest. Below the fold, commentary on four of the links.

All Your Tweets Are Belong To Kannada

Gerd Badur CC BY-SA 3.0, Source

Sawood Alam and Plinio Vargas have a fascinating blog post documenting their investigation into why:

47% of mementos of Barack Obama's Twitter page were in non-English languages, almost half of which were in Kannada alone. While language diversity in web archives is generally a good thing, in this case though, it is disconcerting and counter-intuitive.

Kannada is an Indian language spoken by only about 38 million people. Below the fold, some commentary.

Your Tax Dollars At Work

When I was writing Pre-publication Peer Review Subtracts Value, Springer wanted to charge me $39.95 for access to Comparing Published Scientific Journal Articles to Their Pre-print Versions by Martin Klein et al. This despite the fact that the copyright notice said:

This is a U.S. government work and its text is not subject to copyright protection in the United States

Fortunately, you can now follow the link to the final version at arXiv.org. I'm not the only one annoyed by the publishers charging for access to papers not subject to copyright. Below the fold, some more on this scam.

Natural Redundancy

Most uncompressed files contain significant redundancy, which is why they can be made smaller by a compression algorithm; they work by reducing redundancy. The better the algorithm, the less redundancy left in the output. If the files are then stored for the long term, they need to be protected, for example by erasure coding, which adds some redundancy back. In Exploiting Source Redundancy to Improve the Rate of Polar Codes, Ying Wang, Krishna R. Narayanan and Anxiao (Andrew) Jiang of Texas A&M explore using the original redundancy to reduce the amount of protection redundancy needed for a given level of reliability. Below the fold, some commentary.

John Perry Barlow RIP

By Mohamed Nanabhay
from Qatar CC BY 2.0

Vicky Reich and I were both acquainted with John Perry Barlow in the 90s; we met at one of the parties he threw at the DNA Lounge. He was perhaps the most charismatic person I've ever encountered. So we were anxious to attend the symposium the EFF and the Internet Archive organized last Saturday to honor one aspect of his life, his writing and activism around civil liberties in cyberspace.

The Economist, The Guardian and the New York Times had good obituaries, but they mentioned only his Declaration of the Independence of Cyberspace among his writings. It was undoubtedly an important rallying-cry at the time, but it should not be allowed to overshadow his other cyberspace-related writings, thankfully collected by the EFF in the John Perry Barlow Library. Below the fold, the one I would have chosen.

Emulating Stephen Hawking's Voice

Jason Fagone at the San Francisco Chronicle has a fascinating story of heroic, successful (and timely) emulation in The Silicon Valley quest to preserve Stephen Hawking’s voice. It's the story of a small team which started work in 2009 trying to replace Hawking's voice synthesizer with more modern technology. Below the fold, some details to get you to read the whole article

Falling Research Productivity

Are Ideas Getting Harder to Find? by Nicholas Bloom et al looks at the history of investment in R&D and its effect on the product across several industries. Their main example is Moore's Law, and they show that [page 19]:

research effort has risen by a factor of 18 since 1971. This increase occurs while the growth rate of chip density is more or less stable: the constant exponential growth implied by Moore’s Law has been achieved only by a massive increase in the amount of resources devoted to pushing the frontier forward.

Assuming a constant growth rate for Moore’s Law, the implication is that research productivity has fallen by this same factor of 18, an average rate of 6.8 percent per year.

If the null hypothesis of constant research productivity were correct, the growth rate underlying Moore’s Law should have increased by a factor of 18 as well. Instead, it was remarkably stable. Put differently, because of declining research productivity, it is around 18 times harder today to generate the exponential growth behind Moore’s Law than it was in 1971.

Below the fold, some commentary on this and other relevant research.

Flash vs. Disk (Again)

Gartner's graph

Chris Mellor's NAND chips are going to stay too pricey for flash to slit disk's throat... is based on analysis from Gartner. It continues the theme that I've been stressing for quite some time; flash will not displace hard disk from the bulk storage layer of the hierarchy in the medium term. Follow me below the fold for some commentary, and more graphs.

Bitcoin: The Future World Currency?

I had a lot of fun applying arithmetic to DNA's prospects as a storage medium. Jamie Powell must have had just as much fun applying arithmetic to the prospect of Bitcoin becoming the world's currency in Sorry Jack, Bitcoin will not become the global currency., which is part of the FT Alphaville's excellent new Someone is wrong on the Internet series. Below the fold, some of the entertainment.

Bad Blockchain Content

A Quantitative Analysis of the Impact of Arbitrary Blockchain Content on Bitcoin by Roman Matzutt et al examines the stuff in the Bitcoin blockchain that isn't a monetary transaction. They:

provide the first systematic analysis of the benefits and threats of arbitrary blockchain content. Our analysis shows that certain content, e.g., illegal pornography, can render the mere possession of a blockchain illegal. Based on these insights, we conduct a thorough quantitative and qualitative analysis of unintended content on Bitcoin's blockchain. Although most data originates from benign extensions to Bitcoin's protocol, our analysis reveals more than 1600 files on the blockchain, over 99% of which are texts or images.

Below the fold, some details.

Proofs of Space

Bram Cohen, the creator of BitTorrent, gave an EE380 talk entitled Stopping grinding attacks in proofs of space. Two aspects were really interesting:

A detailed critique of both the Proof of Work system used by most cryptocurrencies and blockchains, and schemes such as Proof of Stake that have been proposed to replace it.
An alternate scheme for securing blockchains based on combining Proof of Space with Verifiable Delay Functions.

But there was another aspect that concerned me. Follow me below the fold for details.

Pre-publication Peer Review Subtracts Value

Pre-publication peer review is intended to perform two functions; to prevent bad science being published (gatekeeping), and to improve the science that is published (enhancement). Over the years I've written quite often about how the system is no longer "fit for purpose". Its time for another episode draw attention to two not-so recent contributions:

Nearly two years ago Martin Klein and co-authors posted Comparing Published Scientific Journal Articles to Their Pre-print Versions to arXiv.org. They had presented the work at the Fall 2015 CNI. I really thought I had written about it at the time, but apparently it slipped through the cracks. I was reminded about it by Glyn Moody, who wrote on the occasion of its "official publication" two weeks ago in the International Journal of Digital Libraries. They show that pre-publication review's effect on article texts is barely detectable.
Last October Sir Timothy Gowers, Rouse Ball Professor of Mathematics at Cambridge, had an important article in the Times Literary Supplement entitled The end of an error?, describing how the switch from pre- to post-publication review has improved mathematics research.

Below the fold, the details.

Ethics and Archiving the Web

I wanted to draw attention to what looks like a very interesting conference, Rhizome's National Forum on Ethics and Archiving the Web, March 22-24 at the New Museum in New York:

The dramatic rise in the public’s use of the web and social media to document events presents tremendous opportunities to transform the practice of social memory.

Web archives can serve as witness to crimes, corruption, and abuse; they are powerful advocacy tools; they support community memory around moments of political change, cultural expression, or tragedy. At the same time, they can cause harm and facilitate surveillance and oppression.

As new kinds of archives emerge, there is a pressing need for dialogue about the ethical risks and opportunities that they present to both those documenting and those documented. This conversation becomes particularly important as new tools, such as Rhizome’s Webrecorder software, are developed to meet the changing needs of the web archiving field.

Thursday, December 27, 2018

Thursday, December 20, 2018

Tuesday, December 18, 2018

Thursday, December 13, 2018

Monday, December 10, 2018

Thursday, December 6, 2018

Tuesday, December 4, 2018

Thursday, November 29, 2018

Tuesday, November 27, 2018

Tuesday, November 20, 2018

Thursday, November 15, 2018

Tuesday, November 13, 2018

Thursday, November 8, 2018

Tuesday, November 6, 2018

Thursday, November 1, 2018

Tuesday, October 30, 2018

Thursday, October 25, 2018

Tuesday, October 23, 2018

Thursday, October 18, 2018

Tuesday, October 16, 2018

Thursday, October 11, 2018

Tuesday, October 9, 2018

Thursday, October 4, 2018

Wednesday, October 3, 2018

Tuesday, October 2, 2018

Tuesday, September 25, 2018

Tuesday, September 18, 2018

Thursday, September 13, 2018

Tuesday, September 11, 2018

Tuesday, September 4, 2018

Thursday, August 30, 2018

Tuesday, August 28, 2018

Friday, August 24, 2018

Tuesday, August 21, 2018

Tuesday, August 14, 2018

Thursday, August 9, 2018

Tuesday, August 7, 2018

Thursday, August 2, 2018

Tuesday, July 31, 2018

Tuesday, July 17, 2018

Monday, July 9, 2018

Tuesday, July 3, 2018

Monday, July 2, 2018

Friday, June 29, 2018

Thursday, June 28, 2018

Thursday, June 21, 2018

Tuesday, June 19, 2018

Tuesday, June 12, 2018

Thursday, June 7, 2018

Tuesday, June 5, 2018

Thursday, May 31, 2018

Tuesday, May 29, 2018

Thursday, May 24, 2018

Tuesday, May 22, 2018

Wednesday, May 16, 2018

Monday, May 14, 2018

Tuesday, May 8, 2018

Monday, May 7, 2018

Wednesday, May 2, 2018

Monday, April 30, 2018

Thursday, April 26, 2018

Tuesday, April 24, 2018

Thursday, April 12, 2018

Tuesday, April 10, 2018

Monday, April 9, 2018

Thursday, April 5, 2018

Tuesday, April 3, 2018

Thursday, March 29, 2018

Wednesday, March 28, 2018

Tuesday, March 27, 2018

Thursday, March 22, 2018

Tuesday, March 20, 2018

Thursday, March 15, 2018