Tuesday, May 30, 2017

Blockchain as the Infrastructure for Science? (updated)

Herbert Van de Sompel pointed me to Lambert Heller's How P2P and blockchains make it easier to work with scientific objects – three hypotheses as an example of the persistent enthusiasm for these technologies as a way of communicating and preserving research, among other things. Another link from Herbert, Chris H. J. Hartgerink's Re-envisioning a future in scholarly communication from this year's IFLA conference, proposes something similar:
Distributing and decentralizing the scholarly communications system is achievable with peer-to-peer (p2p) Internet protocols such as dat and ipfs. Simply put, such p2p networks securely send information across a network of peers but are resilient to nodes being removed or adjusted because they operate in a mesh network. For example, if 20 peers have file X, removing one peer does not affect the availability of the file X. Only if all 20 are removed from the network, file X will become unavailable. Vice versa, if more peers on the network have file X, it is less likely that file X will become unavailable. As such, this would include unlimited redistribution in the scholarly communication system by default, instead of limited redistribution due to copyright as it is now.
I first expressed skepticism about this idea three years ago discussing a paper proposing a P2P storage infrastructure called Permacoin. It hasn't taken over the world. [Update: my fellow Sun Microsystems alum Radia Perlman has a broader skeptical look at blockchain technology. I've appended some details.]

I understand the theoretical advantages of peer-to-peer (P2P) technology. But after nearly two decades researching, designing, building, deploying and operating P2P systems I have learned a lot about how hard it is for these theoretical advantages actually to be obtained at scale, in the real world, for the long term. Below the fold, I try to apply these lessons.

For the purpose of this post I will stipulate that the implementations of both the P2P technology and the operating system on which it runs are flawless, and their design contains no vulnerabilities that the bad guys can exploit. Of course, in the real world there will be flaws and vulnerabilities, but discussing their effects on the system would distract from the message of this post.

Heller's three hypotheses are based on the idea of using a P2P storage infrastructure such as IPFS that names objects by their hash:
  • It would be better for researchers to allocate persistent object names than for digital archives to do so. There are a number of problems with this hypothesis. First, it doesn't describe the current situation accurately. Archives such as the Wayback Machine or LOCKSS try hard not to assign names to content they preserve, striving to ensure that it remains accessible via its originally assigned URL, DOI or metadata (such as OpenURL). Second, the names Heller suggests are not assigned by researchers, they are hashes computed from the content. Third, hashes are not persistent over the timescales needed because, as technology improves over time, it becomes possible to create "hash collisions", as we have seen recently with SHA1.
  • From name allocation plus archiving plus x as a “package solution” to an open market of modular services. Heller is correct to point out that:
    The mere allocation of a persistent name does not ensure the long-term accessibility of objects. This is also the case for a P2P file system such as IPFS. ... Since name allocation using IPFS or a blockchain is not necessarily linked to the guarantee of permanent availability, the latter must be offered as a separate service.
    The upside of using hashes as names would be that the existence and location of the archive would be invisible. The downside of using hashes as names is that the archive would be invisible, posing insurmountable business model difficulties for those trying to offer archiving services, and insurmountable management problems for those such as the Keeper's Registry who try to ensure that the objects that should be preserved actually are being preserved. There can't be a viable market in archiving services if the market participants and their products are indistinguishable and accessible freely to all. Especially not if the objects in question are academic papers, which are copyright works.
  • It is possible to make large volumes of data scientifically usable more easily without APIs and central hosts. In an ideal world in which both storage and bandwidth were infinite and free, storing all the world's scientific data in an IPFS-like P2P service backed up by multiple independent archive services would indeed make the data vastly more accessible, useful and persistent than it is now. But we don't live in an ideal world. If this P2P network is to be sustainable for the long term, the peers in the network need a viable business model, to pay for both storage and bandwidth. But they can't charge for access to the data, since that would destroy its usability. They can't charge the researchers for storing their data, since it is generated by research that is funded by term-limited grants. Especially in the current financial environment, they can't charge the researchers' institutions, because they have more immediate funding priorities than allowing other institutions' researchers to access the data in the future for free.
I have identified three major problems with Heller's proposal which also apply to Hartgerink's:
  • They would populate the Web with links to objects that, while initially unique, would over time become non-unique. That is, it would become possible for objects to be corrupted. When the links become vulnerable, they need to be replaced with better hashes. But there is no mechanism for doing so. This is not a theoretical concern, the BitTorrent protocol underlying IPFS has been shown to be vulnerable to SHA1 collisions.
  • The market envisaged, at least for archiving services, does not allow for viable business models, in that the market participants are indistinguishable.
  • Unlike Bitcoin, there is no mechanism for rewarding peers for providing services to the network.
None of these has anything to do with the functioning of the software system. Heller writes:
There is hope that we will see more innovative, reliable and reproducible services in the future, also provided by less privileged players; services that may turn out to be beneficial and inspirational to actors in the scientific community.
I don't agree, especially about "provided by less privileged players". Leave aside that the privileged players in the current system have proven very adept at countering efforts to invade their space, for example by buying up the invaders. There is a much more fundamental problem facing P2P systems.

Four months after the Permacoin post, inspired in part by Natasha Lomas' Techcrunch piece The Server Needs To Die To Save The Internet about the MaidSafe P2P storage network, I wrote Economies of Scale in Peer-to-Peer Networks. This is a detailed explanation of how the increasing returns to scale inherent to technologies in general (and networked systems in particular) affect P2P systems, making it inevitable that they will gradually lose their decentralized nature and the benefits that it provides, such as resistance to some important forms of attack.

Unconfirmed transactions
The history of Bitcoin shows this centralizing effect in practice. It also shows that, even when peers have a viable (if perhaps not sustainable) business model, based in Bitcoin's case on financial speculation, Chinese flight capital and crime such as ransomware, resources do not magically appear to satisfy demand.

As I write, about 100MB of transactions are waiting to be confirmed. A week and a half ago, Izabella Kaminska reported that there were over 200,000 transactions in the queue. At around 5 transaction/sec, that's around an 11-hour backlog. Right now, the number is about half that. How much less likely are resources to become available to satisfy demand if the peers lack a viable business model?

Because Bitcoin has a lot of peers and speculation has driven its value sky-high, it is easy to assume that it is a successful technology. Clearly, it is very successful along some axes. Along others, not so much. For example, Kaminska writes:
The views of one trader:
... This is the biggest problem with bitcoin, it’s not just that it’s expensive to transact, it’s uncertain to transact. It’s hard to know if you’ve put enough of a fee. So if you significantly over pay to get in, even then it’s not guaranteed. There are a lot of people who don’t know how to set their fees, and it takes hours to confirm transactions. It’s a bad system and no one has any solutions.
Transactions which fail to get the attention of miners sit in limbo until they drop out. But the suspended state leaves payers entirely helpless. They can’t risk resending the transaction, in case the original one does clear eventually. They can’t recall the original one either. Our source says he’s had a significant sized transaction waiting to be settled for two weeks.

The heart of the problem is game theoretical. Users may not know it but they’re participating in what amounts to a continuous blind auction.

Legacy fees can provide clues to what fees will get your transactions done — and websites are popping up which attempt to offer clarity on that front — but there’s no guarantee that the state of the last block is equivalent to the next one.
Right now, if you want a median-sized transaction in the next block you're advised to bid nearly $3. The uncertainty is problematic for large transactions and the cost is prohibitive for small ones. Kaminska points out that the irony is:
given bitcoin’s decentralised and real-time settlement obsession, ... how the market structure has evolved to minimise the cost of transaction.

Traders, dealers, wallet and bitcoin payments services get around transaction settlement choke points and fees by netting transactions off-blockchain.

This over time has created a situation where the majority of small-scale payments are not processed on the bitcoin blockchain at all. To the contrary, intermediaries operate for the most part as trusted third parties settling netted sums as and when it becomes cost effective to do so. ... All of which proves bitcoin is anything but a cheap or competitive system. With great irony, it is turning into a premium service only cost effective for those who can’t — for some reason, ahem — use the official system.
There's no guarantee that the axes on which Bitcoin succeeded are those relevant to other blockchain uses; the ones on which it is failing may well be. Among the blockchain's most hyped attributes were the lack of a need for trust, and the lack of a single point of failure. Another of Kaminska's posts:
Coinbase has been intermittently down for at least two days.

With an unprecedented amount of leverage in the bitcoin and altcoin market, a runaway rally that doesn’t seem to know when to stop, the biggest exchange still not facilitating dollar withdrawals and incremental reports about other exchanges encountering service disruption, it could just be there’s more to this than first meets the eye.

(Remember from 2008 how liquidity issues tend to cause a spike in the currency that’s in hot demand?)
These problems illustrate the difficulty of actually providing the theoretical advantages of a P2P technology "at scale, in the real world, for the long term".

Update: In Blockchain: Hype or Hope? Radia Perlman provides a succinct overview of blockchain technology, asks what is novel about it, and argues that the only feature of the blockchain that cannot be provided at much lower cost by preexisting technology is:
a ledger agreed upon by consensus of thousands of anonymous entities, none of which can be held responsible or be shut down by some malevolent government
But, as she points out:
most applications would not require or even want this property. And, as demonstrated by the Bitcoin community's reaction to forks, there really are a few people in charge who can control the system
She doesn't point out that, in order to make money, the "thousands of ... entities" are forced to cooperate in pools, so that in practice the system isn't very decentralized, and the "anonymous entities" are much less anonymous than they would like to believe (see here and here).

Radia's article is a must-read corrective to the blockchain hype. Alas, although I have it in my print copy of Usenix ;login:, it doesn't appear to be on the Usenix website yet, and even when it is it will only be available to members for a year. I've made a note to post about it again when it is available.


David. said...

Radia Perlman's article is now on-line. Its free for Usenix members but otherwise $8 until this time next year. A sample of her critique:

"The startling aspect of the Bitcoin hash is that it is equally difficult for the community of miners to compute a hash as for someone to forge a hash. This means that the security of Bitcoin depends on the assumption that no entity or collection of entities can amass as much compute power as the Bitcoin mining community. This is a very surprising assumption. It would indeed be easy for a nation-state to amass more compute power than the Bitcoin community."

It doesn't take a nation-state, just a collaboration among miners mustering more than half the mining power. This has is not a theoretical possibility, it happened in 2014.

David. said...

Izabella Kaminska latest post to the FT's Alphaville blog discusses Dissecting Ponzi schemes on Ethereum:
identification, analysis, and impact
by Bartoletti et al. Kaminska writes:

"For the most part, the question that really needs asking is this: What sort of legitimate organisation really benefits from a decentralised or headless state? Also, what sort of company benefits from decentralised funding options or from providing decentralised services? The answer is almost none.

Decentralisation is, in almost all cases, not an efficiency. To the contrary, it’s a cost that adds complexity and creates an unnecessary burden for both users and operators unless centralised layers are added on top of it — defying the whole point.


At the end of the day, there are only two groups of people prepared to go to costly lengths to decentralise a service which is already available (in what is often a much higher quality form) in a centralised or conventional hierarchal state. One group is criminals and fraudsters. The other is ideologues and cultists. The first sees the additional cost/effort as worthwhile due to the un-censorable utility of these systems. The second consumes it simply as a luxury or cultured good."

David. said...

Ethereum, which has been bubbling even faster than Bitcoin, suffered a "Flash Crash" from $319 to $0.10, followed by a rebound. There was nothing wrong with the technology, it was the response to a large sell order. However, the exchange apparently offered to refund traders losses, which kind of defeats the the idea behind these technologies, that the market is always right.

David. said...

Lambert Heller has posted a second installment of his enthusiasm for the blockchain. He addresses some of the criticisms above.

First, he attempts to address my issue with using hashes as persistent names for objects thus:

"Rosenthal emphasises weaknesses of the hashing method in two places, backing up his claim with the SHA-1 collisions identified at the beginning of 2017. This can be countered by the assessment of cryptography expert Bruce Schneier. Schneier points out that this problem had been looming for many years – which is why NIST declared the algorithm SHA-3 the new standard in 2012. The P2P file system I mentioned – IPFS – consciously decided back in 2014 not to use SHA-1 – for the explicit reason that potential problems were already known at that time."

This is to utterly miss the point I was trying to make. Any hash algorithm will, over time, become vulnerable to collisions. Choosing SHA-3 over SHA-1 is not a solution to this problem, it merely delays the onset of the problem for some unknown period. If hashes are used as names for objects these names will not be persistent, they will be unique only as long as the hash algorithm is unbroken. This poses two problems:

- There is no requirement for the team that breaks the hash algorithm to announce their achievement. The team that announced they had broken SHA-1 may, or may not have been the first to do so. So although Schneier is right that SHA-1 has been suspect for "many years", we do not know when it became possible for malign actors to create collisions.

- Suppose SHA-3 hashes become widely used as names, and the Web becomes populated with links using these names to point to objects. Now suppose some public-spirited research group announces the ability to create SHA-3 collisions. It becomes necessary to upgrade the naming system to use SHA-4. Future links will use SHA-4 hashes. But the links to the objects that were created while SHA-3 was the algorithm of choice will still be out there on the Web. It isn't feasible to automatically update the existing links to use SHA-4. Some may be updated manually, most will not be. Now these not-updated SHA-3 links may point to something different. Hashes are not persistently unique names. They are merely names that are probably unique for a reasonable period.

The benefits that accrue to an actor that achieves and conceals the ability to create collisions are so large that concealment must be assumed.

David. said...

Heller also writes:

"Rosenthal’s criticism that P2P networks are inappropriate for the reliable provision of services is clearly more serious than the issue of SHA-1 collisions. One of his points of criticism is the concentration of power on individual nodes within networks – the behaviour of which spells incalculable risks for the functioning of the entire network. He also argues that such networks offer too few incentives for many individual participants to make a positive contribution to the overall performance of the network."

Again, this misses the point I made in Economies of Scale in Peer-to-Peer Networks which is that, as we see with Bitcoin, the economies of scale inherent to IT provide a powerful centralizing force which, over time, will destroy the decentralized nature of a blockchain network. This is true whether the participants are motivated by monetary gain, or are institutions like Universities under sustained budget pressure. Heller appears to believe that the properties of the blockchain, such as immutability, are inherent in the technology. But in fact they depend on the assumption that the network is decentralized enough that no single participant, nor any conspiracy among participants, controls a majority of the mining power. Over time, economies of scale mean that this assumption will no longer be true.

Heller concludes:

"In short: the teething troubles of P2P systems anticipated by Rosenthal have largely been overcome at the level of elementary technical concepts. This is reflected in the wide range of applications of products such as Ethereum and Hyperledger in industry and trade. The implementation of these concepts in application areas such as education, research and cultural heritage has only just begun."

Perhaps Heller didn't notice the DAO heist, the Zerocoin heist, the Ethereum "flash crash" and other "teething troubles" that I've written about here.

I would strongly urge Heller, before the next installment of his techno-optimistic series, to spend the $8 and read Radia Perlman's article, referenced in the update above. Radia is far more expert in these matters than I.

David. said...

See also Martin Walker's Seven signs of over-hyped Fintech at the London School of Economics blog.

Lambert said...

By the way, I explicitly linked to an article about the DAO heist in the blog article you replied to. Still, your argument reminds me a bit of talking about early experimentations with databases, where accidentally tables dropped from a database, and then to insist that this event disproves the concept of relational database management systems altogether.
More importantly, my article argued that advanced concepts of smart contracts simply cannot be interfered with by the way of concentrating mining power. Let's take for example the MaidSafe network I mentioned. Any object that is to be stored within this network is first split into parts, encrypted, and then redundantly spread across the network. No single node - and this even holds true for a node with a hypothetical absolute majority of MaidSafe coins, or mining power - could even tell where a particular object is stored, let alone tamper with that object. So, no, there are a lot of problems with blockchains, but concentration of power on too few nodes is not necessarily one of them. It clearly applies to a lot of traditional P2P network architectures, but it simply doesn't apply to concepts like MaidSafe anymore.
As an aside, I'd love to discus Radia Perlman's article publicly. I think we both have a reason why we have this discussion in public. So do you think there's any chance Perlman could make the essence of her article's argumentation freely available on the web?

David. said...

See today's post Is Decentralized Storage Sustainable? for a restatement of the economic argument that leads to the conclusion that it isn't.

If, as I argue, these networks won't remain decentralized over time, statements about the behavior of these networks that assume that they are decentralized should be viewed skeptically. See, for example Another Class Of Blockchain Vulnerabilities.

David. said...

Ethereum has been having an exciting time recently, even ignoring its spectacular bubble. Yesterday, someone stole $7.4M worth from CoinBase using a "ludicrously simple hack":

"the attacker simply switched the cryptocurrency wallet CoinDash pointed to on its website. This meant that, once the hacker took over around three minutes into the ICO, all future payments filled their wallet instead of CoinDash’s.

CoinDash locked their website down once they noticed the attack but it looks likely that the hacker made off with a lot of money. The company says that, within the first three minutes of its ICO, they received around $6 million worth of Ethereum. As Ethereum is like Bitcoin, transactions are traceable and CoinDash can see that around 43,438 ether has landed in the hacker's wallet – this currently equates to around $7.4 million."

Today, someone used a vulnerability in a multi-sig wallet client to steal about $30M worth:

"An unknown hacker has used a vulnerability in an Ethereum wallet client to steal over 153,000 Ether, worth over $30 million dollars.

The hack was possible due to a flaw in the Parity Ethereum client. The vulnerability allowed the hacker to exfiltrate funds from multi-sig wallets created with Parity clients 1.5 and later. Parity 1.5 was released on January 19, 2017.

Multi-sig wallets are Ethereum accounts over which multiple persons have control with their own keys. Multi-sig accounts allow owners to move funds only when a majority of owners sign a transaction with their key."

A different group then stole all the remaining about $76M:

"According to messages posted on Reddit and in a Gitter chat, The White Hat Group appears to be formed of security researchers and members of the Ethereum Project that have taken it into their own hands to secure funds in vulnerable wallets.

Based on a message the group posted online, they plan to return the funds they took. Their wallet currently holds 377,116.819319439311671493 Ether, which is over $76 million."

While it is true that neither of these heists were due to flaws in the protocol, but "only" flaws in protocol clients, they show that the cryptocurrency ecosystem lacks the maturity to handle the absurd valuations and the consequent rewards for finding vulnerabilities.

David. said...

Leaving aside the daily multi-million dollar heists, there is the opinion of one of Ethereum's co-founders that the speculative frenzy in Initial Coin Offerings is dangerous:

"Initial coin offerings, a means of crowdfunding for blockchain-technology companies, have caught so much attention that even the co-founder of the ethereum network, where many of these digital coins are built, says it’s time for things to cool down in a big way.

“People say ICOs are great for ethereum because, look at the price, but it’s a ticking time-bomb,” Charles Hoskinson, who helped develop ethereum, said in an interview. “There’s an over-tokenization of things as companies are issuing tokens when the same tasks can be achieved with existing blockchains. People are blinded by fast and easy money.”

Firms have raised $1.3 billion this year in digital coin sales, surpassing venture capital funding of blockchain companies and up more than six-fold from the total raised last year, according to Autonomous Research. Ether, the digital currency linked to the ethereum blockchain, surged from around $8 after its ICO at the start of the year to just under $400 last month. It’s since dropped by about 50 percent."

A wonderful example of this speculative frenzy is block.one, a startup that raised $230M by selling tokens that have no use and no rights attached:

"There’s a token, but it can’t actually be used for anything. This is from the FAQs:

The EOS Tokens do not have any rights, uses, purpose, attributes, functionalities or features, express or implied, including, without limitation, any uses, purpose, attributes, functionalities or features on the EOS Platform.

You might want to read that over a couple of times, keeping in mind that investors have spent over $200m buying these “EOS Tokens”."

David. said...

Last Sunday's Ethereum heist was $8.4M from Veritaseum. The fourth multi-million dollar heist in a month. See my post Initial Coin Offerings.

David. said...

Izabella Kaminska reports on A $4bn bitcoin laundering operation potentially busted. Shortly after the SEC's report on ICOs, the BTC-e bitcoin exchange went down for "unscheduled maintenance" and:

"For years, the identities of the exchange’s owner/operators have been a bit of an industry mystery. ... All that is known is that the site appears to be Russia-based and that quoted prices on BTC-e are regularly and significantly discounted to those on other international exchanges due to the exchange’s lesser standards."

In possibly related news:

"But now Reuters reports that a Russian man, Alexander Vinnik, 38, has been arrested in Greece on suspicion of laundering at least $4bn of criminal funds through bitcoin.

The wire cites two people close to the investigation who say he was a “key person” behind BTC-e ... It was thought that ‘at least’ $4 billion in cash had been laundered through a bitcoin platform since 2011; the platform had 7 million bitcoins deposited, and 5.5 million bitcoins in withdrawals."

David. said...

Izabella Kaminska's The huge significance of the BTC-e bust looks at the implications of the BTC-e bust which, if the indictment is to be believed, was a very important part of the criminal infrastructure of Bitcoin:

"According to the DoJ, since BTC-e’s inception Vinnik and others developed a customer base for BTC-e that was “heavily reliant on criminals, including by not requiring users to validate their identity, obscuring and anonymizing transactions and source of funds, and by lacking any anti-money laundering processes.”

This made it the go-to laundromat for hacked bitcoin across the world, especially after the closure of Liberty Reserve in 2013. Specifically, the indictment alleges, BTC-e was used to facilitate crimes ranging from computer hacking, to fraud, identity theft, tax refund fraud schemes, public corruption, and drug trafficking, receiving more than $4bn worth of bitcoin over the course of its operation."

This activity included laundering the proceeds of the Mt. Gox hack among others. Kaminska points out that:

"irrespective of anonymity and pseudonymity, it transpires the bitcoin network is vulnerable to blockchain analysis which can connect the dots to real world identities in the long run. This implies bitcoin itself isn’t half as useful for criminal enterprise as first perceived. It’s just a matter of time before the Feds catch up, which may undermine bitcoin’s primary utility from now on."

David. said...

More enthusiasm for blockchain as a research infrastructure (without considering the economics) in the context of VIVO.

14Volt said...

Hi David,

I'm very much for heavy criticism and enjoy Kaminska's posts, so fair sympathy for what you're doing here.

I'm still digesting all your arguments, but don't think you're right on SHA-1 attacks being a current risk for IPFS. IPFS states clearly in their github they are using SHA-256, in other words vastly more complicated than SHA-1 or SHA-3. I agree there is an end to the security for SHA-256, but it's not arriving now and it can be planned for. As long as we can consider an encryption methodology to be secure, we can maintain our records as IPFS does; when it starts to weaken we will have the option to migrate to something more secure. This is an unavoidable cat-and-mouse game but doesn't seem as such to be unworkable and also not an IPFS-only problem

David. said...

14Volt, it is true that the owners of files can migrate them to a better hash when SHA-256 gets close to being broken.

But doing so doesn't fix the problem caused by naming files by their hash. The world will have been populated with SHA-256-based links to the file's content. After the file is migrated, these links will either be broken, or they may point to a hash collision of the original content. There is no way to update these SHA-256 links to point to the new, SHA-1024-based name of the content.

This is a fundamental problem for ANY system that uses hashes as the public name for content.