Tuesday, August 27, 2013

More on storing "all that stuff'

In a post last May I expressed skepticism about the claims that the organizations on the dark side could store yottabytes of data, for example at the Utah data center. I wasn't alone; here for example from June is Mark Burnett on the same theme. The skeptical chorus has had some effect; the Wikipedia article has been edited to remove this hyperbolic claim:
a data storage facility for the United States Intelligence Community that is designed to be a primary storage resource capable of storing data on the scale of yottabytes.
In this clip from a NOVA documentary from last January entitled "Rise of the Drones", Yannis Antoniades of BAE Systems discusses the Argus camera used for drone surveillance. The video claims:
Argus streams live to the ground and also stores everything, a million terabytes of video a day, ...
Below the fold, lets look at this seemingly innocuous claim.

A million terabytes is an exabyte. Lets try to fit an exabyte in a drone. The current state of the art in storage packing density using flash is, I believe, Skyera's Sky Eagle at 500TB in 1U. So 1M TB would be about 50 full racks, and, at 800W/1U would burn 1.6MW, let alone the cooling. You might do that in a 747, but not in the kind of drones flying today. And the 50 racks only gets you 1 day of storage. Yannis doesn't say how long they can keep video, but he does claim to be able to wind back more than 3 days. So you'd need at least 150 racks and nearly 5MW.

If you can't fit the racks in the drone, you need to use a data link to get it to the ground. What is the bandwidth of this link? 1 exabyte/day = 8*10^18/24*3600 = 0.9*10^14 bit/s or nearly 1M Gb/s. The current record for a fiber-optic link is 400 Gb/s, so the drone needs to trail a cable with 2,500 fibers in it.

Getting it to the ground doesn't obviate the need to store it. The storage industry claims to build about 600 exabytes/year of all forms of storage (disk, tape, flash, ...). So one drone would take 0.5% of the entire storage output. Maybe WD and Seagate do have vast secret contracts to manufacture many times the storage that gets into the open market. But if they do, either they make no money on them, or their financial statements are fraudulent.

In other words, this is yet another rent in the intelligence community's threadbare credibility, and another instance of journalists incapable of doing simple math, or even using Google.

If you do you use Google, you find a much more credible story from Brian Dodson at Gizmag (admittedly, it was published Feb 11 where the NOVA story aired January 23):
Now comes the hard part. The ARGUS-IS takes 12 frames a second to maintain video surveillance over its field of view. The sensor data amounts to 12 bits per pixel, so the camera delivers a flood of raw image data amounting to 32.4 GB/s, while the Common Data Link used by the ARGUS-IS has a capacity of 34.25 MB/s. Clearly, a great deal of data compression must take place in the airborne ARGUS unit. To do so, a 32-processor data compression unit that carries out the data compression and object tracking function is flown along with the ARGUS-IS camera.
Lawrence Livermore National Laboratory (LLNL) was given the task of developing methods to compress and analyze the raw video data. Most of the visual information in an aerial image does not change from frame to frame – rooftops don't change unless someone walks on them (or a bird flies by). The LLNL software works by identifying interesting moving objects, tracking them as they move, and recording changes in their appearance. The researchers claim that this approach, together with JPEG2000 video compression, results in a thousand-fold compression of the raw video data. This amazing level of compression allows the use of the Common Data Link for air to ground communications.
DARPA's aim is to be able to store 70 hours of imagery data within the ground station so that a commander can look at an area that was ignored in yesterday's real-time surveillance, and see the entire day's video record of that area. To store the decompressed raw video would require nearly ten petabytes per day of raw video. Instead, the compressed data stream from the ARGUS-IS is stored, which only requires about six terabytes of data storage – only twice the size of my US$200 backup drive.
See, the real story doesn't need any magic technology, and it isn't even secret.


David. said...

NYU's Brennan Center is out with a disturbing report that looks into what the intelligence community does with "all that stuff".

Yochai Benkler has an op-ed in The Guardian pointing out that whatever they are doing with "all that stuff" isn't actually catching terrorists or frustrating their plots. This is something that Senator Patrick Leahy forced General Alexander to admit in a Oct. 2nd Senate hearing.

David. said...

="SA Mathieson at The Register discusses David Anderson's report on GCHQ. Anderson says that GCHQ, at least, isn't storing everything:

'And bulk interception is not as wide-ranging as it sounds, Anderson writes: “GCHQ currently only has the capacity to intercept the data travelling through a small percentage of the 100,000 bearers, including undersea cables, which make up the global communications core infrastructure.” An estimated 10 to 25 per cent of global telecom traffic transits the UK through such undersea cables, although the agency reckons the correct figure is closer to 10 per cent. “For reasons of resource constraint as well as proportionality, GCHQ considers carefully what communications channels it seeks to intercept”, he adds, with limited storage capacity another constraint.'