Distributing and decentralizing the scholarly communications system is achievable with peer-to-peer (p2p) Internet protocols such as dat and ipfs. Simply put, such p2p networks securely send information across a network of peers but are resilient to nodes being removed or adjusted because they operate in a mesh network. For example, if 20 peers have file X, removing one peer does not affect the availability of the file X. Only if all 20 are removed from the network, file X will become unavailable. Vice versa, if more peers on the network have file X, it is less likely that file X will become unavailable. As such, this would include unlimited redistribution in the scholarly communication system by default, instead of limited redistribution due to copyright as it is now.I first expressed skepticism about this idea three years ago discussing a paper proposing a P2P storage infrastructure called Permacoin. It hasn't taken over the world. [Update: my fellow Sun Microsystems alum Radia Perlman has a broader skeptical look at blockchain technology. I've appended some details.]
I understand the theoretical advantages of peer-to-peer (P2P) technology. But after nearly two decades researching, designing, building, deploying and operating P2P systems I have learned a lot about how hard it is for these theoretical advantages actually to be obtained at scale, in the real world, for the long term. Below the fold, I try to apply these lessons.
For the purpose of this post I will stipulate that the implementations of both the P2P technology and the operating system on which it runs are flawless, and their design contains no vulnerabilities that the bad guys can exploit. Of course, in the real world there will be flaws and vulnerabilities, but discussing their effects on the system would distract from the message of this post.
Heller's three hypotheses are based on the idea of using a P2P storage infrastructure such as IPFS that names objects by their hash:
- It would be better for researchers to allocate persistent object names than for digital archives to do so. There are a number of problems with this hypothesis. First, it doesn't describe the current situation accurately. Archives such as the Wayback Machine or LOCKSS try hard not to assign names to content they preserve, striving to ensure that it remains accessible via its originally assigned URL, DOI or metadata (such as OpenURL). Second, the names Heller suggests are not assigned by researchers, they are hashes computed from the content. Third, hashes are not persistent over the timescales needed because, as technology improves over time, it becomes possible to create "hash collisions", as we have seen recently with SHA1.
- From name allocation plus archiving plus x as a “package solution” to an open market of modular services. Heller is correct to point out that:
The mere allocation of a persistent name does not ensure the long-term accessibility of objects. This is also the case for a P2P file system such as IPFS. ... Since name allocation using IPFS or a blockchain is not necessarily linked to the guarantee of permanent availability, the latter must be offered as a separate service.The upside of using hashes as names would be that the existence and location of the archive would be invisible. The downside of using hashes as names is that the archive would be invisible, posing insurmountable business model difficulties for those trying to offer archiving services, and insurmountable management problems for those such as the Keeper's Registry who try to ensure that the objects that should be preserved actually are being preserved. There can't be a viable market in archiving services if the market participants and their products are indistinguishable and accessible freely to all. Especially not if the objects in question are academic papers, which are copyright works.
- It is possible to make large volumes of data scientifically usable more easily without APIs and central hosts. In an ideal world in which both storage and bandwidth were infinite and free, storing all the world's scientific data in an IPFS-like P2P service backed up by multiple independent archive services would indeed make the data vastly more accessible, useful and persistent than it is now. But we don't live in an ideal world. If this P2P network is to be sustainable for the long term, the peers in the network need a viable business model, to pay for both storage and bandwidth. But they can't charge for access to the data, since that would destroy its usability. They can't charge the researchers for storing their data, since it is generated by research that is funded by term-limited grants. Especially in the current financial environment, they can't charge the researchers' institutions, because they have more immediate funding priorities than allowing other institutions' researchers to access the data in the future for free.
- They would populate the Web with links to objects that, while initially unique, would over time become non-unique. That is, it would become possible for objects to be corrupted. When the links become vulnerable, they need to be replaced with better hashes. But there is no mechanism for doing so. This is not a theoretical concern, the BitTorrent protocol underlying IPFS has been shown to be vulnerable to SHA1 collisions.
- The market envisaged, at least for archiving services, does not allow for viable business models, in that the market participants are indistinguishable.
- Unlike Bitcoin, there is no mechanism for rewarding peers for providing services to the network.
There is hope that we will see more innovative, reliable and reproducible services in the future, also provided by less privileged players; services that may turn out to be beneficial and inspirational to actors in the scientific community.I don't agree, especially about "provided by less privileged players". Leave aside that the privileged players in the current system have proven very adept at countering efforts to invade their space, for example by buying up the invaders. There is a much more fundamental problem facing P2P systems.
Four months after the Permacoin post, inspired in part by Natasha Lomas' Techcrunch piece The Server Needs To Die To Save The Internet about the MaidSafe P2P storage network, I wrote Economies of Scale in Peer-to-Peer Networks. This is a detailed explanation of how the increasing returns to scale inherent to technologies in general (and networked systems in particular) affect P2P systems, making it inevitable that they will gradually lose their decentralized nature and the benefits that it provides, such as resistance to some important forms of attack.
As I write, about 100MB of transactions are waiting to be confirmed. A week and a half ago, Izabella Kaminska reported that there were over 200,000 transactions in the queue. At around 5 transaction/sec, that's around an 11-hour backlog. Right now, the number is about half that. How much less likely are resources to become available to satisfy demand if the peers lack a viable business model?
Because Bitcoin has a lot of peers and speculation has driven its value sky-high, it is easy to assume that it is a successful technology. Clearly, it is very successful along some axes. Along others, not so much. For example, Kaminska writes:
The views of one trader:
... This is the biggest problem with bitcoin, it’s not just that it’s expensive to transact, it’s uncertain to transact. It’s hard to know if you’ve put enough of a fee. So if you significantly over pay to get in, even then it’s not guaranteed. There are a lot of people who don’t know how to set their fees, and it takes hours to confirm transactions. It’s a bad system and no one has any solutions.Transactions which fail to get the attention of miners sit in limbo until they drop out. But the suspended state leaves payers entirely helpless. They can’t risk resending the transaction, in case the original one does clear eventually. They can’t recall the original one either. Our source says he’s had a significant sized transaction waiting to be settled for two weeks.
The heart of the problem is game theoretical. Users may not know it but they’re participating in what amounts to a continuous blind auction.
Legacy fees can provide clues to what fees will get your transactions done — and websites are popping up which attempt to offer clarity on that front — but there’s no guarantee that the state of the last block is equivalent to the next one.
given bitcoin’s decentralised and real-time settlement obsession, ... how the market structure has evolved to minimise the cost of transaction.There's no guarantee that the axes on which Bitcoin succeeded are those relevant to other blockchain uses; the ones on which it is failing may well be. Among the blockchain's most hyped attributes were the lack of a need for trust, and the lack of a single point of failure. Another of Kaminska's posts:
Traders, dealers, wallet and bitcoin payments services get around transaction settlement choke points and fees by netting transactions off-blockchain.
This over time has created a situation where the majority of small-scale payments are not processed on the bitcoin blockchain at all. To the contrary, intermediaries operate for the most part as trusted third parties settling netted sums as and when it becomes cost effective to do so. ... All of which proves bitcoin is anything but a cheap or competitive system. With great irony, it is turning into a premium service only cost effective for those who can’t — for some reason, ahem — use the official system.
Coinbase has been intermittently down for at least two days.These problems illustrate the difficulty of actually providing the theoretical advantages of a P2P technology "at scale, in the real world, for the long term".
With an unprecedented amount of leverage in the bitcoin and altcoin market, a runaway rally that doesn’t seem to know when to stop, the biggest exchange still not facilitating dollar withdrawals and incremental reports about other exchanges encountering service disruption, it could just be there’s more to this than first meets the eye.
(Remember from 2008 how liquidity issues tend to cause a spike in the currency that’s in hot demand?)
Update: In Blockchain: Hype or Hope? Radia Perlman provides a succinct overview of blockchain technology, asks what is novel about it, and argues that the only feature of the blockchain that cannot be provided at much lower cost by preexisting technology is:
a ledger agreed upon by consensus of thousands of anonymous entities, none of which can be held responsible or be shut down by some malevolent governmentBut, as she points out:
most applications would not require or even want this property. And, as demonstrated by the Bitcoin community's reaction to forks, there really are a few people in charge who can control the systemShe doesn't point out that, in order to make money, the "thousands of ... entities" are forced to cooperate in pools, so that in practice the system isn't very decentralized, and the "anonymous entities" are much less anonymous than they would like to believe (see here and here).
Radia's article is a must-read corrective to the blockchain hype. Alas, although I have it in my print copy of Usenix ;login:, it doesn't appear to be on the Usenix website yet, and even when it is it will only be available to members for a year. I've made a note to post about it again when it is available.