Tuesday, October 7, 2014

Economies of Scale in Peer-to-Peer Networks

In a recent IEEE Spectrum article entitled Escape From the Data Center: The Promise of Peer-to-Peer Cloud Computing, Ozalp Babaoglu and Moreno Marzolla (BM) wax enthusiastic about the potential for Peer-to-Peer (P2P) technology to eliminate the need for massive data centers. Even more exuberance can be found in Natasha Lomas' Techcrunch piece The Server Needs To Die To Save The Internet (LM) about the MaidSafe P2P storage network. I've been working on P2P technology for more than 16 years, and although I believe it can be very useful in some specific cases, I'm far less enthusiastic about its potential to take over the Internet.

Below the fold I look at some of the fundamental problems standing in the way of a P2P revolution, and in particular at the issue of economies of scale. After all, I've just written a post about the huge economies that Facebook's cold storage technology achieves by operating at data center scale.

Economies of Scale

Back in April, discussing a vulnerability of the Bitcoin network, I commented:
Gradually, the economies of scale you need to make money mining Bitcoin are concentrating mining power in fewer and fewer hands. I believe this centralizing tendency is a fundamental problem for all incentive-compatible P2P networks. ... After all, the decentralized, distributed nature of Bitcoin was supposed to be its most attractive feature.
In June, discussing Permacoin, I returned to the issue of economies of scale:
increasing returns to scale (economies of scale) pose a fundamental problem for peer-to-peer networks that do gain significant participation. One necessary design goal for networks such as Bitcoin is that the protocol be incentive-compatible, or as Ittay Eyal and Emin Gun Sirer (ES) express it:
the best strategy of a rational minority pool is to be honest, and a minority of colluding miners cannot earn disproportionate benefits by deviating from the protocol
They show that the Bitcoin protocol was, and still is, not incentive-compatible.

Even if the protocol were incentive-compatible, the implementation of each miner would, like almost all technologies, be subject to increasing returns to scale.
Since then I've become convinced that this problem is indeed fundamental. The simplistic version of the problem is this:
  • The income to a participant in a P2P network of this kind should be linear in their contribution of resources to the network.
  • The costs a participant incurs by contributing resources to the network will be less than linear in their resource contribution, because of the economies of scale.
  • Thus the proportional profit margin a participant obtains will increase with increasing resource contribution.
  • Thus the effects described in Brian Arthur's Increasing Returns and Path Dependence in the Economy will apply, and the network will be dominated by a few, perhaps just one, large participant.
The advantages of P2P networks arise from a diverse network of small, roughly equal resource contributors. Thus it seems that P2P networks which have the characteristics needed to succeed (by being widely adopted) also inevitably carry the seeds of their own failure (by becoming effectively centralized). Bitcoin is an example of this. Some questions arise:
  • Does incentive-compatibility imply income linear in contribution?
  • If not, are there incentive-compatible ways to deter large contributions?
  • The simplistic version is, in effect, a static view of the network. Are there dynamic effects also in play?
Does incentive-compatibility imply income linear in contribution? Clearly, the reverse is true. If income is linear in, and solely dependent upon, contribution there is no way for a colluding minority of participants to gain more than their just reward. If, however:
  • Income grows faster than linearly with contribution, a group of participants can pool their contributions, pretend to be a single participant, and gain more than their just reward.
  • Income goes more slowly than linearly with contribution, a group of participants that colluded to appear as a single participant would gain less than their just reward.
So it appears that income linear in contribution is the limiting case, anything faster is not incentive-compatible.

Are there incentive-compatible ways to deter large contributions? In principle, the answer is yes. Arranging that income grows more slowly than contribution, and depends on nothing else, will do the trick. The problem lies in doing so.

Source: bitcoincharts.com
The actual income received by a participant is the value of the reward the network provides in return for the contribution of resources, for example the Bitcoin, less the costs incurred in contributing the resources, the capital and running costs of the mining hardware, in the Bitcoin case. As the value of Bitcoins collapsed (as I write, BTC is about $320, down from about $1200 11 months ago and half its value in August) many smaller miners discovered that mining wasn't worth the candle.

The network has to arrange not just that the reward grows more slowly than the contribution, but that it grows more slowly than the cost of the contribution to any participant. If there is even one participant whose rewards outpace their costs, Brian Arthur's analysis shows they will end up dominating the network. Herein lies the rub. The network does not know what an individual participant's costs, or even  the average participant's costs, are and how they grow as the participant scales up their contribution.

So the network would have to err on the safe side, and make rewards grow very slowly with contribution, at least above a certain minimum size. Doing so would mean few if any participants above the minimum contribution, making growth dependent entirely on recruiting new participants. This would be hard because their gains from participation would be limited to the minimum reward. It is clear that mass participation in the Bitcoin network was fueled by the (unsustainable) prospect of large gains for a small investment.

Source: blockchain.info
A network that assured incentive-compatibility in this way would not succeed, because the incentives would be so limited. A network that allowed sufficient incentives to motivate mass participation, as Bitcoin did, would share Bitcoin's vulnerability to domination by, as at present, two participants (pools, in Bitcoin's case).

Are there dynamic effects also in play? As well as increasing returns to scale, technology markets exhibit decreasing returns through time. Bitcoin is an extreme example of this. Investment in Bitcoin mining hardware has a very short productive life:
the overall network hash rate has been doubling every 3-4 weeks, and therefore, mining equipment has been losing half its production capability within the same time frame. After 21-28 weeks (7 halvings), mining rigs lose 99.3% of their value.
This effect is so strong that it poses temptations for the hardware manufacturers that some have found impossible to resist. The FBI recently caught Butterfly Labs using hardware that customers had bought and paid for to mine on their own behalf for a while before shipping it to the customers. They thus captured the most valuable week or so of the hardware's short useful life for themselves

Source: blockchain.info
Even though with technology improvement rates much lower than the Bitcoin network hash rate increase, such as Moore's Law or Kryder's Law, the useful life of hardware is much longer than 6 months, this effect can be significant. When new, more efficient technology is introduced, thus reducing the cost per unit contribution to a P2P network, it does not become instantly available to all participants. As manufacturing ramps up, the limited supply preferentially goes to the manufacturers best customers, who would be the largest contributors to the P2P network. By the time supply has increased so that smaller contributors can enjoy the lower cost per unit contribution, the most valuable part of the technology's useful life is over.

Early availability of new technology acts to reduce the costs of the larger participants, amplifying their economies of scale. This effect must be very significant in Bitcoin mining, as Butterfly Labs noticed. At pre-2010 Kryder rates it would be quite noticeable since storage media service lives were less than 60 months. At the much lower Kryder rates projected by the industry storage media lifetimes will be extended and the effect correspondingly less.

Trust

BM admit that there are significant unresolved trust issues in P2P technology:
The people using such a cloud must trust that none of the many strangers operating it will do something malicious. And the providers of equipment must trust that the users won’t hog computer time.

These are formidable problems, which so far do not have general solutions. If you just want to store data in a P2P cloud, though, things get easier: The system merely has to break up the data, encrypt it, and store it in many places.
Unfortunately, even for storage this is inadequate. The system cannot trust the peers claiming to store the shards of the encrypted data but must verify that they actually are storing them. This is a resource-intensive process. Permacoin's proposal, to re-purpose resources already being expended elsewhere, is elegant but unlikely to be successful. Worse, the verification process consumes not just resources, but time. At each peer there is necessarily a window of time between successive verifications. During that time the system believes the peer has a good copy of the shard, but it might no longer have one.

Edge of the internet

P2P enthusiasts describe the hardware from which their network is constructed in similar terms. Here is BM:
the P2P cloud is made up of a diverse collection of different people’s computers or game consoles or whatever
and here is LM:
Users of MaidSafe’s network contribute unused hard drive space, becoming the network’s nodes. It’s that pooling — or, hey, crowdsourcing — of many users’ spare computing resource that yields a connected storage layer that doesn’t need to centralize around dedicated datacenters.
When the idea of P2P networks started in the 90s:
Their model of the edge of the Internet was that there were a lot of desktop computers, continuously connected and powered-up, with low latency and no bandwidth charges, and with 3.5" hard disks that were mostly empty. Since then, the proportion of the edge with these characteristics has become vanishingly small.
The edge is now intermittently powered up and connected, with bandwidth charges, and only small amounts of local storage.

Monetary rewards

This means that, if the network is to gain mass participation, the majority of participants cannot contribute significant resources to it; they don't have suitable resources to contribute. They will have to contribute cash. This in turn means that there must be exchanges, converting between the rewards for contributing resources and cash, allowing the mass of resource-poor participants to buy from the few resource-rich participants.

Both Permacoin and MaidSafe envisage such exchanges, but what they don't seem to envisage is the effect on customers of the kind of volatility seen in the Bitcoin graph above. Would you buy storage from a service with this price history, or from Amazon? What exactly is the value to the mass customer of paying a service such as MaidSafe, by buying SafeCoin on an exchange, instead of paying Amazon directly, that would overcome the disadvantage of the price volatility?

As we see with Bitcoin, a network whose rewards can readily be converted into cash is subject to intense attack, and attracts participants ranging from sleazy to criminal. Despite its admirably elegant architecture, Bitcoin has suffered from repeated vulnerabilities. Although P2P technology has many advantages in resisting attack, especially the elimination of single points of failure and centralized command and control, it introduces a different set of attack vectors.

Measuring contributions

Discussion of P2P storage networks tends to assume that measuring the contribution a participant supplies in return for a reward is easy. A Gigabyte is a Gigabyte after all. But compare two Petabytes of completely reliable and continuously available storage, one connected to the outside world by a fiber connection to a router near the Internet's core, and the other connected via 3G. Clearly, the first has higher bandwidth, higher availability and lower cost per byte transferred, so its contribution to serving the network's customers is vastly higher. It needs a correspondingly greater reward.

In fact, networks would need to reward many characteristics of a peer's storage contribution as well as its size:
  • Reliability
  • Availability
  • Bandwidth
  • Latency
Measuring each of these parameters, and establishing "exchange rates" between them, would be complex, would lead to a very mixed marketing message, and would be the subject of disputes. For example, the availability, bandwidth and latency of a network resource depends on the location in the network from which the resource is viewed, so there would be no consensus among the peers about these parameters.

Conclusion

While it is clear that P2P storage networks can work, and can even be useful tools for small communities of committed users, the non-technical barriers to widespread adoption are formidable. They have been effective in preventing widespread adoption since the late 90s, and the evolution of the Internet has since raised additional barriers.

5 comments:

David. said...

Steve Randy Waldman has an interesting post Econometrics, open science, and cryptocurrency arguing that the infrastructure for research should be a P2P network with a cryptocurrency-like consensus system.

I have a lot of sympathy for his vision of a shared, preserved research infrastructure: "Ultimately, we should want to generate a reusable, distributed, permanent, and ever-expanding web of science, including conjectures, verifications, modifications, and refutations, and reanalyses as new data arrives. Social science should become a reified public commons. It should be possible to build new analyses from any stage of old work, by recruiting raw data into new projects, by running alternative models on already cleaned-up or normalized data tables, by using an old model's estimates to generate inputs to simulations or new analyses."

But, alas, for the reasons set out above, it will be very difficult to implement this using P2P cryptocurrency techniques.

There will be significant real costs associated with running nodes in such a network. To motivate participation, there has to be some way to defray these costs, so there has to be an exchange between the cryptocurrency and currency real enough to pay electricity bills and purchase hardware. "From each according to his ability, to each according to his need" isn't going to cut it on a scale big enough to matter in research.

David. said...

My comment on Waldman's post is here.

David. said...

On the general topic of digital currencies, the FT has an excellent corrective to misplaced enthusiasm, Izabella Kaminska's From the annals of disruptive digital currencies past.

David. said...

One major problem with the centralization that is driven by economies of scale is that it reduces the resilience of the network to failures. A "data center" that was a major contributor to the overall Bitcoin hash rate was just destroyed by fire.

David. said...

I apologize for the mistake I made writing this post. I linked to, rather than captured, the pool size graph. blockchain.info has moved the graph, it is now here.

As I write four pools, F2Pool, AntPool, GHash.io and BTCChinaPool control 54% of the mining power. This is better than one or two pools, But, as Ittay Eyal points out in an important post:

"The dismantling of overly large pools is one of the most important and difficult tasks facing the Bitcoin community."

Pools are needed to generate consistent income but:

"[Miners] can get steady income from pools well below 10%, and they have only little incentive to use very large pools; it's mostly convenience and a feeling of trust in large entities."

Right now, at least 46% of the mining power is in pools larger than 10%.