Thursday, September 13, 2018

Blockchain Solves Preservation!

We're in a period when blockchain or "Distributed Ledger Technology" is the Solution to Everything™, so it is inevitable that it will be proposed as the solution to the problems of digital preservation. John Collomosse et al's abstract for ARCHANGEL: Trusted Archives of Digital Public Documents states:
We present ARCHANGEL; a de-centralised platform for ensuring the long-term integrity of digital documents stored within public archives. Document integrity is fundamental to public trust in archives. Yet currently that trust is built upon institutional reputation --- trust at face value in a centralised authority, like a national government archive or University. ARCHANGEL proposes a shift to a technological underscoring of that trust, using distributed ledger technology (DLT) to cryptographically guarantee the provenance, immutability and so the integrity of archived documents. We describe the ARCHANGEL architecture, and report on a prototype of that architecture build over the Ethereum infrastructure. We report early evaluation and feedback of ARCHANGEL from stakeholders in the research data archives space.
This is a wonderful example of the way people blithely assume that the claimed properties of blockchain systems are actually delivered in the real world. Below the fold I ask whether Collomosse et al have applied appropriate skepticism to blockchain's claims, and whether they have considered the sustainability of their proposal.

Collomosse et al start by claiming:
ARCHANGEL breaks new ground by proposing the use of a blockchain payload to record digital signatures (content evidence) derived from either scanned physical, or born-digital, document to ensure their integrity over decade- or century-long timespans.
Seven years ago I wrote Do Digital Signatures Assure Long-Term Integrity? in response to Duane Dunston's advocacy of the use of digital signatures to assure the integrity of preserved digital documents. After reviewing the issues I concluded:
Basing the long-term integrity of digital documents on digital signatures, and thus on the ability to keep secrets for the long term is unwise. Fortunately, it is not necessary. There are at least two different approaches to doing so that do not depend on long-term secrets:
  • The technique of entangling hashes, patented by Stuart Haber and others, and implemented in the ACE system, provides tamper-evident storage without secrets. It can detect but not recover from tampering using a minimum of tamper-proof storage. There are practical difficulties in implementing it securely enough, but these are much less significant than those involved in long-term use of digital signatures.
  • The protocol underlying the LOCKSS system provides tamper-resistant storage against a powerful adversary without long-term secrets. It does use short-term secrets, whose life is a day or less, but it limits the damage caused if even these leak.
So they are proposing to use blockchain technology to solve a problem for which a commercial centralized solution has been available for twenty-four years, and an open-source academic solution for eleven years (both based on patents issued twenty-six years ago), and a decentralized solution available for fifteen years which has been in economically sustainable production use for twelve years. None of this work is cited.

The authors do not establish that there is significant unmet demand for services of these kinds, presumably because they are unaware that such services have existed for more than a decade. ARCHANGEL does not even "break new ground" in proposing the use of blockchain technology for this purpose. Victoria Lemieux's 2016 paper Trusting records: is Blockchain technology the answer? concluded:
Blockchain technology can be used to address issues associated with information integrity in the present and near term, assuming proper security architecture and infrastructure management controls. It does not, however, guarantee reliability of information in the first place, and would have several limitations as a long-term solution for maintaining trustworthy digital records.
It is important to observe that:
ARCHANGEL utilises a permissioned blockchain model, in which operators or automatic processes authorised to add content to the AMI commit blocks into the chain encoding content evidence.
They go on to suggest that their permissioned blockchain use proof-of-work to establish consensus:
In our architecture we propose two modes of consensus checking, both predicated upon a permissioned DLT model:
  1. The Blockchain is maintained via proof of work across a private set of nodes, which are maintained collectively by multiple AMIs each with independent governance structure e. g. national archives of different nation states. As such an unprecedented level of collusion would be required to corrupt the Blockchain.
  2. The Blockchain is maintained via proof of work across a public Blockchain maintained globally. In such case a program embedded within the Blockchain (a ’smart contract’) with sole permission to write to the Blockchain is invoked in order to the append data. Access to the smart contract end-point is granted via secret key. In this case corruption would require more than half of the public DLT infrastructure miners to collude, which is again unlikely e. g. on the Ethereum main network.
There are a number of problems here. The first is that permissioned blockchains do not need proof of work to maintain consensus; the canonical permissioned blockchain, IBM's Hyperledger, uses Byzantine Fault Tolerance protocols and thereby avoids the appalling and obviously unsustainable energy use of current proof of work blockchains (the top 5 cryptocurrencies are estimated to use as much energy as The Netherlands). Option 1 could be implemented using Hyperledger far more easily and efficiently than by using proof of work.

The second is that the security of blockchains assumes that a large number of nodes act independently. It has been known for more than four years that this is not the case in public blockchains such as Ethereum, which are dominated by a small number of large "mining pools":
in Ethereum — 3 pools control more than 60% of the hashrate, and 6 pools will get you over 85%.
For major cryptocurrencies these pools contain large numbers of nodes, which were attracted by the rapidly rising "price" of the coin. Since the beginning of this year, however:
MVIS CryptoCompare Digital Assets 10 Index extended its collapse from a January high to 80 percent. ... Wednesday’s losses were led by Ether, the second-largest virtual currency. It fell 6 percent to $171.15 at 7:50 a.m. in New York, extending this month’s retreat to 40 percent.
This makes mining uneconomic for most miners, as does utilities raising electricity prices for miners, so the networks will lose nodes. Committing digital preservation for the long term to infrastructure as volatile as cryptocurrencies is not a route to sustainability.

British Library budget
Equally, a private blockchain operated by memory institutions faces great difficulty in maintaining adequate participation. These institutions are under severe budget pressure and competition for skilled staff. They are being forced to outsource their IT operations to "the cloud", and are unlikely to take on new or maintain existing in-house tasks. Byzantine Fault Tolerance is a better approach to maintaining consensus than proof of work in networks with a small number of nodes, even ignoring its wastefulness.

Third, Collomosse et al pay scant attention to the fact that digital preservation is primarily an economic problem. A sustainable digital preservation technology requires a viable business model:
We might also explore ... new business models to encourage sustainability. For example, the maintenance of the DLT (in terms of computational effort for mining) might be facilitated by users who seek document verification ’paying’ for that service via contribution of mining effort to maintain the DLT.
The whole point of archives and memory institutions is that they hold materials that are very rarely accessed. Thus the idea of sustaining them economically by charging for access can never work. Not to mention that the idea that individual scholars wanting to verify a document would "pay" by running a network node fails the laugh test.

Fourth, the authors uncritically accept blockchain's marketing hype that the only security threat is the unlikely one of collusion among the independent nodes:
The security of a blockchain is afforded by the immutability of data with the blocks, delivered by the compounding effect of each new block being hashed to include the hashes of previous blocks. Thus as content is committed into the blockchain, the security of the content is reinforced.
There are a number of issues with this:
  • The verifiability "of a blockchain is afforded by the immutability of data with the blocks". The security of the blockchain is afforded by the large number of replicas of the blocks at the large number of nodes.
  • The immutability of the blocks depends on the security of the hash algorithm in use. This is not absolute; "over decade- or century-long timespans" it degrades as happened for example to SHA-1.
  • Their system uses "smart contracts", which in practice are "immutable in name only".
  • Because the security of the blockchain depends upon the entire chain being replicated at each of the large number of nodes, successful blockchain systems suffer scaling problems, consuming large amounts of storage.
  • Unlike the decentralized LOCKSS system, ARCHANGEL and systems such as ACE do not contain "the content", merely hashes of "the content" which can be used for verification only if "the content" itself has survived through some other mechanism.
To do Collomosse et al justice, they do not claim to solve the whole problem of digital preservation, just that:
Blockchain offers a shield which archives can use to defend the records as authentic. ... It was also noted that acceptance of content evidence might eventually become similar to acceptance of DNA evidence in court, but that establishing that level of confidence would require strong public engaged to explain Blockchain in an accessible manner particularly explaining why one could trust the cryptographic assurances inherent in a DLT solution.
But, to the extent that blockchains and cryptocurrencies are perceived by the public as a single technology, and cryptocurrencies perceived as rancid with theft, fraud and manipulation, this may be difficult.

Sustainability is job #1 for archives. There's no point in setting up an archiving system and filling it with content only to have it fail after a decade or so. Sustainability has to be designed into both the technology and the organization into which it is embedded from the start if the contents are to survive the wide range of threats to which archived data is subject. Layering it on afterwards isn't going to be effective.

Basing the architecture of a preservation system on economically and environmentally unsustainable proof of work blockchains, and failing to identify any sustainable business model to support them isn't realistic.

1 comment:

Geoff said...

I'm really disappointed by how many of my friends are diving in to blockchain (and in one case has quit his job to found a blockchain startup). It's not clear if they are True Believers, or simply looking for an opportunity to Fleece The Gullible. Both alternatives are disheartening....