Tuesday, July 11, 2017

Is Decentralized Storage Sustainable?

There are many reasons to dislike centralized storage services. They include business risk, as we see in le petit musée des projets Google abandonnés, monoculture vulnerability and rent extraction. There is thus naturally a lot of enthusiasm for decentralized storage systems, such as MaidSafe, DAT and IPFS. In 2013 I wrote about one of their advantages in Moving vs. Copying. Among the enthusiasts is Lambert Heller. Since I posted Blockchain as the Infrastructure for Science, Heller and I have been talking past each other. Heller is talking technology; I have some problems with the technology but they aren't that important. My main problem is an economic one that applies to decentralized storage irrespective of the details of the technology.

Below the fold is an attempt to clarify my argument. It is a re-statement of part of the argument in my 2014 post Economies of Scale in Peer-to-Peer Networks, specifically in the context of decentralized storage networks.

To make my argument I use a model of decentralized storage that abstracts away the details of the technology. The goal is a network with a large number of peers each providing storage services. This network is:
  • decentralized in the sense that no single entity, or small group of entities, controls the network (the peers are independently owned and operated), and
  • sustainable, in that the peers do not lose financially by providing storage services to the network.
I argue that this network is economically unstable and will, over time, become centralized. This argument is based on work from the 80s by the economist W. Brian Arthur1.

Let us start by supposing that such a decentralized storage network has, by magic, been created:
  • It consists of a large number of peers, initially all providing the same amount of storage resource to the network.
  • Users submit data to be stored to the network, not to individual peers. The network uses erasure coding to divide the data into shards and peers store shards.
  • Each peer incurs costs to supply this resource, in the form of hardware, bandwidth, power, cooling, space and staff time.
  • The network has no central organization which could contract with the peers to supply their resource. Instead, it rewards the peers in proportion to the resource they supply by a token, such as a crypto-currency, that the peers can convert into cash to cover their costs.
  • The users of the network rent space in the network by buying tokens for cash on an exchange, setting a market price at which peers can sell their tokens for cash. This market price sets the $/TB/month rent that users must pay, and that peers receive as income. It also ensure that users do not know which peers store their data.
Although the income each peer receives per unit of storage is the same, as set by the market, their costs differ. One might be in Silicon Valley, where space, power and staff time are expensive. Another might be in China, where all these inputs are cheap. So providing resources to the network is more profitable in China than in Silicon Valley.

Suppose the demand for storage is increasing. That demand will preferentially be supplied from China, where the capital invested in adding capacity can earn a greater reward. Thus peers in China will add capacity faster than those in Silicon Valley and will enjoy not merely a lower cost base because of location, but also a lower cost base from economies of scale. This will increase the cost differential driving the peers to China, and create a feedback process.

Competition among the peers and decreasing hardware costs will drive down the  $/TB/month rent to levels that are uneconomic for Silicon Valley peers, concentrating the storage resource in China (as we see with Bitcoin miners).

Lets assume that all the peers in China share the same low cost base. But some will have responded to the increase in demand before others. They will have better economies of scale than the laggards, so they will in turn grow at the laggards' expense. Growth may be by increasing the capacity of existing peers, or adding peers controlled by the entity with the economies of scale.

The result of this process is a network in which the aggregate storage resource is overwhelmingly controlled by a small number of entities, controlling large numbers of large peers in China. These are the ones which started with a cost base advantage and moved quickly to respond to demand. The network is no longer decentralized, and will suffer from the problems of centralized storage outlined above.

This should not be a surprise. We see the same winner-take-all behavior in most technology markets. We see this behavior in the Bitcoin network.

I believe it is up to the enthusiasts to explain why this model does not apply to their favorite decentralized storage technology, and thus why it won't become centralized. Or, alternatively, why they aren't worried that their decentralized storage network isn't actually decentralized after all.

References:

  1. Arthur, W. Brian. Competing technologies and lock-in by historical small events: the dynamics of allocation under increasing returns. Center for Economic Policy Research, Stanford University, 1985. in Arthur, W. Brian. Increasing Returns and Path Dependence in the Economy, Michigan University Press, 1994.

5 comments:

David. said...

Abraham Othman's post Smart contracts will need human juries on the FT's Aplhaville blog is a very interesting take on Ethereum's "smart contracts" and on a technique for adjudicating disputes in this environment.

David. said...

I'm working on a long-ish post which will take some time based on two important recent essays by pioneers in the field; Nick Szabo's Money, blockchains, and social scalability and Vitalik Buterin's The Meaning of Decentralization. Both are must-reads.

David. said...

Paul Frazee has a thoughtful response to this post. I'll work it into the post I mentioned above.

Jarechiga said...

It could work if location is another one of the supply and demand factors.

I may be willing to pay more for decentralized storage if it is backed up in four different regions.
If enough demand is built for markets outside of China, then it will drive the prices up for those locations making up for the economic difference and incentifying a multi region supply.

As in real estate; "location, location, location"

David. said...

Jarechiga, you need to read the post, where I say:

"Users submit data to be stored to the network, not to individual peers. The network uses erasure coding to divide the data into shards and peers store shards."

If the storage is truly decentralized, you don't know where the shards are stored. You can pay for more redundancy, but you don't get to decide the locations of the shards. The network does that. If you lock shards to peers, when that peer goes away so does your redundancy.

Your model is more properly called distributed, not decentralized. And it is in effect little different from what you can buy from Amazon today, because you can specify the Amazon region(s). And it is definitely subject to increasing returns to scale, which is why Amazon dominates the market.