Thursday, September 13, 2018

Blockchain Solves Preservation!

We're in a period when blockchain or "Distributed Ledger Technology" is the Solution to Everything™, so it is inevitable that it will be proposed as the solution to the problems of digital preservation. John Collomosse et al's abstract for ARCHANGEL: Trusted Archives of Digital Public Documents states:
We present ARCHANGEL; a de-centralised platform for ensuring the long-term integrity of digital documents stored within public archives. Document integrity is fundamental to public trust in archives. Yet currently that trust is built upon institutional reputation --- trust at face value in a centralised authority, like a national government archive or University. ARCHANGEL proposes a shift to a technological underscoring of that trust, using distributed ledger technology (DLT) to cryptographically guarantee the provenance, immutability and so the integrity of archived documents. We describe the ARCHANGEL architecture, and report on a prototype of that architecture build over the Ethereum infrastructure. We report early evaluation and feedback of ARCHANGEL from stakeholders in the research data archives space.
This is a wonderful example of the way people blithely assume that the claimed properties of blockchain systems are actually delivered in the real world. Below the fold I ask whether Collomosse et al have applied appropriate skepticism to blockchain's claims, and whether they have considered the sustainability of their proposal.

Collomosse et al start by claiming:
ARCHANGEL breaks new ground by proposing the use of a blockchain payload to record digital signatures (content evidence) derived from either scanned physical, or born-digital, document to ensure their integrity over decade- or century-long timespans.
Seven years ago I wrote Do Digital Signatures Assure Long-Term Integrity? in response to Duane Dunston's advocacy of the use of digital signatures to assure the integrity of preserved digital documents. After reviewing the issues I concluded:
Basing the long-term integrity of digital documents on digital signatures, and thus on the ability to keep secrets for the long term is unwise. Fortunately, it is not necessary. There are at least two different approaches to doing so that do not depend on long-term secrets:
  • The technique of entangling hashes, patented by Stuart Haber and others, and implemented in the ACE system, provides tamper-evident storage without secrets. It can detect but not recover from tampering using a minimum of tamper-proof storage. There are practical difficulties in implementing it securely enough, but these are much less significant than those involved in long-term use of digital signatures.
  • The protocol underlying the LOCKSS system provides tamper-resistant storage against a powerful adversary without long-term secrets. It does use short-term secrets, whose life is a day or less, but it limits the damage caused if even these leak.
So they are proposing to use blockchain technology to solve a problem for which a commercial centralized solution has been available for twenty-four years, and an open-source academic solution for eleven years (both based on patents issued twenty-six years ago), and a decentralized solution available for fifteen years which has been in economically sustainable production use for twelve years. None of this work is cited.

The authors do not establish that there is significant unmet demand for services of these kinds, presumably because they are unaware that such services have existed for more than a decade. ARCHANGEL does not even "break new ground" in proposing the use of blockchain technology for this purpose. Victoria Lemieux's 2016 paper Trusting records: is Blockchain technology the answer? concluded:
Blockchain technology can be used to address issues associated with information integrity in the present and near term, assuming proper security architecture and infrastructure management controls. It does not, however, guarantee reliability of information in the first place, and would have several limitations as a long-term solution for maintaining trustworthy digital records.
It is important to observe that:
ARCHANGEL utilises a permissioned blockchain model, in which operators or automatic processes authorised to add content to the AMI commit blocks into the chain encoding content evidence.
They go on to suggest that their permissioned blockchain use proof-of-work to establish consensus:
In our architecture we propose two modes of consensus checking, both predicated upon a permissioned DLT model:
  1. The Blockchain is maintained via proof of work across a private set of nodes, which are maintained collectively by multiple AMIs each with independent governance structure e. g. national archives of different nation states. As such an unprecedented level of collusion would be required to corrupt the Blockchain.
  2. The Blockchain is maintained via proof of work across a public Blockchain maintained globally. In such case a program embedded within the Blockchain (a ’smart contract’) with sole permission to write to the Blockchain is invoked in order to the append data. Access to the smart contract end-point is granted via secret key. In this case corruption would require more than half of the public DLT infrastructure miners to collude, which is again unlikely e. g. on the Ethereum main network.
There are a number of problems here. The first is that permissioned blockchains do not need proof of work to maintain consensus; the canonical permissioned blockchain, IBM's Hyperledger, uses Byzantine Fault Tolerance protocols and thereby avoids the appalling and obviously unsustainable energy use of current proof of work blockchains (the top 5 cryptocurrencies are estimated to use as much energy as The Netherlands). Option 1 could be implemented using Hyperledger far more easily and efficiently than by using proof of work.

The second is that the security of blockchains assumes that a large number of nodes act independently. It has been known for more than four years that this is not the case in public blockchains such as Ethereum, which are dominated by a small number of large "mining pools":
in Ethereum — 3 pools control more than 60% of the hashrate, and 6 pools will get you over 85%.
For major cryptocurrencies these pools contain large numbers of nodes, which were attracted by the rapidly rising "price" of the coin. Since the beginning of this year, however:
MVIS CryptoCompare Digital Assets 10 Index extended its collapse from a January high to 80 percent. ... Wednesday’s losses were led by Ether, the second-largest virtual currency. It fell 6 percent to $171.15 at 7:50 a.m. in New York, extending this month’s retreat to 40 percent.
This makes mining uneconomic for most miners, as does utilities raising electricity prices for miners, so the networks will lose nodes. Committing digital preservation for the long term to infrastructure as volatile as cryptocurrencies is not a route to sustainability.

British Library budget
Equally, a private blockchain operated by memory institutions faces great difficulty in maintaining adequate participation. These institutions are under severe budget pressure and competition for skilled staff. They are being forced to outsource their IT operations to "the cloud", and are unlikely to take on new or maintain existing in-house tasks. Byzantine Fault Tolerance is a better approach to maintaining consensus than proof of work in networks with a small number of nodes, even ignoring its wastefulness.

Third, Collomosse et al pay scant attention to the fact that digital preservation is primarily an economic problem. A sustainable digital preservation technology requires a viable business model:
We might also explore ... new business models to encourage sustainability. For example, the maintenance of the DLT (in terms of computational effort for mining) might be facilitated by users who seek document verification ’paying’ for that service via contribution of mining effort to maintain the DLT.
The whole point of archives and memory institutions is that they hold materials that are very rarely accessed. Thus the idea of sustaining them economically by charging for access can never work. Not to mention that the idea that individual scholars wanting to verify a document would "pay" by running a network node fails the laugh test.

Fourth, the authors uncritically accept blockchain's marketing hype that the only security threat is the unlikely one of collusion among the independent nodes:
The security of a blockchain is afforded by the immutability of data with the blocks, delivered by the compounding effect of each new block being hashed to include the hashes of previous blocks. Thus as content is committed into the blockchain, the security of the content is reinforced.
There are a number of issues with this:
  • The verifiability "of a blockchain is afforded by the immutability of data with the blocks". The security of the blockchain is afforded by the large number of replicas of the blocks at the large number of nodes.
  • The immutability of the blocks depends on the security of the hash algorithm in use. This is not absolute; "over decade- or century-long timespans" it degrades as happened for example to SHA-1.
  • Their system uses "smart contracts", which in practice are "immutable in name only".
  • Because the security of the blockchain depends upon the entire chain being replicated at each of the large number of nodes, successful blockchain systems suffer scaling problems, consuming large amounts of storage.
  • Unlike the decentralized LOCKSS system, ARCHANGEL and systems such as ACE do not contain "the content", merely hashes of "the content" which can be used for verification only if "the content" itself has survived through some other mechanism.
To do Collomosse et al justice, they do not claim to solve the whole problem of digital preservation, just that:
Blockchain offers a shield which archives can use to defend the records as authentic. ... It was also noted that acceptance of content evidence might eventually become similar to acceptance of DNA evidence in court, but that establishing that level of confidence would require strong public engaged to explain Blockchain in an accessible manner particularly explaining why one could trust the cryptographic assurances inherent in a DLT solution.
But, to the extent that blockchains and cryptocurrencies are perceived by the public as a single technology, and cryptocurrencies perceived as rancid with theft, fraud and manipulation, this may be difficult.

Sustainability is job #1 for archives. There's no point in setting up an archiving system and filling it with content only to have it fail after a decade or so. Sustainability has to be designed into both the technology and the organization into which it is embedded from the start if the contents are to survive the wide range of threats to which archived data is subject. Layering it on afterwards isn't going to be effective.

Basing the architecture of a preservation system on economically and environmentally unsustainable proof of work blockchains, and failing to identify any sustainable business model to support them isn't realistic.


Geoff said...

I'm really disappointed by how many of my friends are diving in to blockchain (and in one case has quit his job to found a blockchain startup). It's not clear if they are True Believers, or simply looking for an opportunity to Fleece The Gullible. Both alternatives are disheartening....

David. said...

Among the things that blockchain will fix is the problem of the border between the North and the South after Brexit:

"a statement from Phillip Hammond, chancellor of the exchequer, reported by Reuters on Monday:

“There is technology becoming available (...) I don’t claim to be an expert on it but the most obvious technology is blockchain,” Hammond said when asked about how the government could achieve smooth trade after Brexit.
It is safe to say technology used at the border is a red herring, as even the best database can't poke its nose inside a lorry. Here, for instance, is one of the IT experts quoted in the Irish Times calling the idea of technological solutions to the border question “complete nonsense”:

“It’s one of these things that if people say it often enough it starts to sound like something that could work,” said Sadhbh McCarthy, who set up and led the Centre for Irish and European Security (CIES).

“If border issues were that easy to sort out do you think the US, with all its resources, would be considering building a big wall with Mexico?” added Ms McCarthy

From Alphahville's Chancellor's blockchain idea is a desperate scrape of the Brexit barrel

David. said...

In I’m very sorry, but you’re going to have to learn to love the blockchain Jon Eveans provides yet another example of blithely assuming that the claimed properties of blockchain systems are actually delivered in the real world:

"That’s why the mere existence of a permissionless decentralized alternative, one not financed by ads, one not ruled by any central titanic company, is important. And, my friends, I know you don’t want to hear this, but it’s getting really, really hard to imagine a decentralized network with a new financing model that doesn’t involve blockchains in one ore more way, shape, or form."

Clearly, the now decade-long experience showing that blockchains in the real world aren't decentralized isn't enough to stop people assuming that "blockchain" means "decentralized".

David. said...

In How I Lost My Faith in Private Blockchains, Angus Champion de Crespigny writes:

"It may, therefore, be easiest to think of a blockchain as a distributed database with the ability to administer it taken away.

The key question to ask then, is what are the reasons an enterprise would prefer to sacrifice many measurable metrics – transactions per second, disk space, speed and efficiency of computation, cost of maintenance – and opt instead for deploying a technology that is harder to administer?"

David. said...

In Creating money out of thin ether, Tim Copeland reports on two current attacks on the Ethereum blockchain:

Etherdig is apparently "spy mining". It:

"has mined over 1,250 blocks in the last three months, without validating a single transaction. As a result, it’s received 3,750 ETH ($862,500) in mining rewards."

F2Pool is apparently "selfish mining" (an attack I wrote about 5 years ago):

"In selfish mining, when a miner in a mining pool discovers a block, it lets the rest of the pool work on its block header in order to gain a time advantage on the next block. Essentially, a selfish miner creates a private blockchain that it, and its pool, can work on more quickly. When it’s solved more blocks than the public blockchain, it publishes its version (which is now longer) to the public chain. When this happens, miners spot the longer chain and join it, allowing the selfish miner to gobble up the block solving rewards. Spy miners are effectively eavesdropping on the whole process, making things worse."

David. said...

Daniel Oberhaus' Bitcoin Mining Alone Could Raise Global Temperatures Above Critical Limit By 2033 reports on a study in Nature Climate Change entitled Bitcoin emissions alone could push global warming above 2°C:

"the researchers determined that Bitcoin generated 69 million metric tons of CO2 last year. To put this in perspective, that is a little over one percent of all CO2 emissions from energy production globally. This is a huge energy budget considering that Bitcoin accounted for just 0.03 percent of all cashless transactions globally in the same time frame, according to the study."

I have to say I'm skeptical of the assumptions underlying the study. And Tim Swanson makes the point that other leading proof-of-work blockchains make significant contributions too, the top 4 perhaps another 25%.

David. said...

Timothy B. Lee's takedown of Alex Tapscott's monumentally stupid It’s Time for Online Voting is entitled Blockchain-based elections would be a disaster for democracy:

"Tapscott focuses on the idea that blockchain technology would allow people to vote anonymously while still being able to verify that their vote was included in the final total. Even assuming this is mathematically possible—and I think it probably is—this idea ignores the many, many ways that foreign governments could compromise an online vote without breaking the core cryptographic algorithms."

But Lee misses another of the overwhelming number of problems with Tapscott's idea. In On-Chain Vote Buying and the Rise of Dark DAOs Philip Daian, Tyler Kell, Ian Miers, and Ari Juels write:

"The existence of trust-minimizing vote buying and Dark DAO primitives imply that users of all on-chain votes are vulnerable to shackling, manipulation, and control by plutocrats and coercive forces. This directly implies that all on-chain voting schemes where users can generate their own keys outside of a trusted environment inherently degrade to plutocracy, ... Our schemes can also be repurposed to attack proof of stake or proof of work blockchains profitably, posing severe security implications for all blockchains."

Tapscott is apparently co-founder of the Blockchain Research Institute, clearly a trustworthy source for applications of blockchain technology.

David. said...

Jemima Kelly's The unholiest of holy wars: “Satoshi” vs “Bitcoin Jesus” sums up the conflict about the Bitcoin Cash hard-fork between Craig Wright and Roger Ver:

"Bitcoin cash's backers include people like Craig Wright, the man who in 2016 declared that he was Satoshi but then failed to produce the evidence to support that claim, and Calvin Ayre, the gambling kingpin who was indicted in 2012 on money laundering and illegal gambling charges in the US, and who last year pled guilty to misdemeanour. They also include Roger Ver, the man once dubbed "Bitcoin Jesus", who once served 10 months in prison for selling explosives."

David. said...

On the idea of blockchain voting, see Ryan North's I’m a Computer Scientist. Here’s Why You Should Never Trust a Computer and xkcd.

David. said...

More on blockchain voting from Mike Masnick in Blockchain Voting: Solves None Of The Actual Problems Of Online Voting; Leverages None Of The Benefits Of Blockchain.