Tuesday, May 7, 2013

Storing "all that stuff"

In two CNN interviews former FBI counter-terrorism specialist Tim Clemente attracted a lot of attention when he said:

"We certainly have ways in national security investigations to find out exactly what was said in that conversation. ... No, welcome to America. All of that stuff is being captured as we speak whether we know it or like it or not." and "all digital communications in the past" are recorded and stored
Many people assumed that the storage is in the Utah Data Center which, according to Wikipedia:
is a data storage facility for the United States Intelligence Community that is designed to be a primary storage resource capable of storing data on the scale of yottabytes
Whatever the wisdom of collecting everything, I'm a bit skeptical about the practicality of storing it. Follow me below the fold for a look at the numbers.

What medium are they storing the yottabyte on? The disk drive industry builds less than 100 exabytes per quarter, say 400 exabytes/year. Disk is about 70% of the total storage produced, thus the total is less than 600 exabyte/yr. A yottabyte is a million exabytes, or about 1700 times the world's annual production of storage.

How much data is "all that stuff"? IDC estimates the world in 2020 will generate 40 zettabyte, or 4% of a yottabyte. This year they estimate 3.6 zettabyte, about 6 times the amount of storage produced. So both from the demand and the supply sides it'll be a while before the intelligence community has a yottabyte.

How hard is it to keep up with "all that stuff"? Just this year the write bandwidth would be 3.6*1021/3.15*107 = 1.1*1014 byte/sec, or over a petabyte every 10 seconds. Next year more.

Can you afford to store "all that stuff"? Suppose you used Glacier, at $0.01/GB/mo. Just this year's bill for 3.6 zettabyte would be $0.5*12*0.01*3.6*1012 or a bit over $200B. Which contrasts oddly with the $2B it is claimed the data center's equipment will cost. Next year this year's data would cost $400B and 2014's data would cost more than $200B because of data growth.  A billion here, a billion there and pretty soon you're talking real money, even for the black budget. This is estimated at perhaps $120B/yr, so in 2014 we're talking 6 times the current black budget for both DoD and the intelligence community. That's just for storing it; accessing it would be extra.

Storing a yottabyte in Glacier would cost over $111T/yr, although you might get a price break for volume.

Statements about the intelligence "community", especially by allegedly "former" members, should be treated with skepticism. They have a vested interest in exaggerating their powers, to intimidate the public into self-censorship. I believe the intelligence community is collecting vast amounts of data. I don't believe they are storing all of it, and certainly not storing all of it for the long haul.


David. said...

Brewster Kahle estimates that storing just one copy of the content for all voice calls in the US, about 272PB/yr, or less than 1/3 exabyte, would cost about $27M capital.

Voice isn't a big part of "all that stuff". You don't need a yottabyte to store it.

David. said...

Roger Bohn refines Brewster Kahle's estimate and arrives at 0.7 exabyte/yr for one copy of all world voice traffic.

David. said...

For more on this, see these two stories on Slashdot.

David. said...

Forbes has the blueprints of the Utah center and this has enabled a more detailed analysis. Still not yottabytes.