Tuesday, July 11, 2017

Is Decentralized Storage Sustainable?

There are many reasons to dislike centralized storage services. They include business risk, as we see in le petit musée des projets Google abandonnés, monoculture vulnerability and rent extraction. There is thus naturally a lot of enthusiasm for decentralized storage systems, such as MaidSafe, DAT and IPFS. In 2013 I wrote about one of their advantages in Moving vs. Copying. Among the enthusiasts is Lambert Heller. Since I posted Blockchain as the Infrastructure for Science, Heller and I have been talking past each other. Heller is talking technology; I have some problems with the technology but they aren't that important. My main problem is an economic one that applies to decentralized storage irrespective of the details of the technology.

Below the fold is an attempt to clarify my argument. It is a re-statement of part of the argument in my 2014 post Economies of Scale in Peer-to-Peer Networks, specifically in the context of decentralized storage networks.

To make my argument I use a model of decentralized storage that abstracts away the details of the technology. The goal is a network with a large number of peers each providing storage services. This network is:
  • decentralized in the sense that no single entity, or small group of entities, controls the network (the peers are independently owned and operated), and
  • sustainable, in that the peers do not lose financially by providing storage services to the network.
I argue that this network is economically unstable and will, over time, become centralized. This argument is based on work from the 80s by the economist W. Brian Arthur1.

Let us start by supposing that such a decentralized storage network has, by magic, been created:
  • It consists of a large number of peers, initially all providing the same amount of storage resource to the network.
  • Users submit data to be stored to the network, not to individual peers. The network uses erasure coding to divide the data into shards and peers store shards.
  • Each peer incurs costs to supply this resource, in the form of hardware, bandwidth, power, cooling, space and staff time.
  • The network has no central organization which could contract with the peers to supply their resource. Instead, it rewards the peers in proportion to the resource they supply by a token, such as a crypto-currency, that the peers can convert into cash to cover their costs.
  • The users of the network rent space in the network by buying tokens for cash on an exchange, setting a market price at which peers can sell their tokens for cash. This market price sets the $/TB/month rent that users must pay, and that peers receive as income. It also ensure that users do not know which peers store their data.
Although the income each peer receives per unit of storage is the same, as set by the market, their costs differ. One might be in Silicon Valley, where space, power and staff time are expensive. Another might be in China, where all these inputs are cheap. So providing resources to the network is more profitable in China than in Silicon Valley.

Suppose the demand for storage is increasing. That demand will preferentially be supplied from China, where the capital invested in adding capacity can earn a greater reward. Thus peers in China will add capacity faster than those in Silicon Valley and will enjoy not merely a lower cost base because of location, but also a lower cost base from economies of scale. This will increase the cost differential driving the peers to China, and create a feedback process.

Competition among the peers and decreasing hardware costs will drive down the  $/TB/month rent to levels that are uneconomic for Silicon Valley peers, concentrating the storage resource in China (as we see with Bitcoin miners).

Lets assume that all the peers in China share the same low cost base. But some will have responded to the increase in demand before others. They will have better economies of scale than the laggards, so they will in turn grow at the laggards' expense. Growth may be by increasing the capacity of existing peers, or adding peers controlled by the entity with the economies of scale.

The result of this process is a network in which the aggregate storage resource is overwhelmingly controlled by a small number of entities, controlling large numbers of large peers in China. These are the ones which started with a cost base advantage and moved quickly to respond to demand. The network is no longer decentralized, and will suffer from the problems of centralized storage outlined above.

This should not be a surprise. We see the same winner-take-all behavior in most technology markets. We see this behavior in the Bitcoin network.

I believe it is up to the enthusiasts to explain why this model does not apply to their favorite decentralized storage technology, and thus why it won't become centralized. Or, alternatively, why they aren't worried that their decentralized storage network isn't actually decentralized after all.


  1. Arthur, W. Brian. Competing technologies and lock-in by historical small events: the dynamics of allocation under increasing returns. Center for Economic Policy Research, Stanford University, 1985. in Arthur, W. Brian. Increasing Returns and Path Dependence in the Economy, Michigan University Press, 1994.


David. said...

Abraham Othman's post Smart contracts will need human juries on the FT's Aplhaville blog is a very interesting take on Ethereum's "smart contracts" and on a technique for adjudicating disputes in this environment.

David. said...

I'm working on a long-ish post which will take some time based on two important recent essays by pioneers in the field; Nick Szabo's Money, blockchains, and social scalability and Vitalik Buterin's The Meaning of Decentralization. Both are must-reads.

David. said...

Paul Frazee has a thoughtful response to this post. I'll work it into the post I mentioned above.

Jarechiga said...

It could work if location is another one of the supply and demand factors.

I may be willing to pay more for decentralized storage if it is backed up in four different regions.
If enough demand is built for markets outside of China, then it will drive the prices up for those locations making up for the economic difference and incentifying a multi region supply.

As in real estate; "location, location, location"

David. said...

Jarechiga, you need to read the post, where I say:

"Users submit data to be stored to the network, not to individual peers. The network uses erasure coding to divide the data into shards and peers store shards."

If the storage is truly decentralized, you don't know where the shards are stored. You can pay for more redundancy, but you don't get to decide the locations of the shards. The network does that. If you lock shards to peers, when that peer goes away so does your redundancy.

Your model is more properly called distributed, not decentralized. And it is in effect little different from what you can buy from Amazon today, because you can specify the Amazon region(s). And it is definitely subject to increasing returns to scale, which is why Amazon dominates the market.

David. said...

To understand the extent of the cost base and scale advantages in China, see Photos: Life inside of China’s massive and remote bitcoin mines by Johnny Simon and photographer Liu Xingzhe.

DidgetMaster said...

I read with interest your blog posts on distributed storage systems. I may have stumbled onto a possible solution would love to get your input on it. I have created a new kind of general-purpose data management system that is essentially a highly efficient object store with a bunch of database capabilities.

I can put over 200 million data objects in one of my containers; put a variety of meta-data tags on each one; and find everything of a certain type or with a certain kind of tag in just a couple of seconds. I designed it so that each of the containers could be a node on a network so that they could share data between them and give the user a unified view of all their data, not just the data on the device they were using at the time.

What I am now thinking is that in addition to lots of independent 'data networks' where individuals or companies manage all their own data across multiple devices; this system could also be the basis for a world-wide data network where millions of nodes could interact with each other on a global scale.

I would be interested in discussing different aspects of this architecture with anyone who has a lot of interest in this area. For more info see www.DidgetMaster.blogspot.com

David. said...

"A distributed cloud storage project, Maidsafe, recently announced it was pursuing alternative fundraising strategies after the value of their tokens dramatically fell, cutting their revenues from over $8 million to around $2 million." as reported in the MIT Media Lab's report Defending Internet Freedom Through Decentralization. The details are in this post to MaidSafe's blog from May 2016. Clearly, I should have been paying closer attention.