- per byte, storage media are getting cheaper very rapidly (Kryder's Law), and
- the demand for storage greatly exceeds the supply.
Why can't these statements both be true? If the demand for storage greatly exceeded the supply, the price would rise until supply and demand were in balance.
flooding large parts of Thailand including the parts where disks were manufactured. This flooding didn't change the demand for disks, because these parts of Thailand were not large consumers of disks. What happened? As shown in this 2013 graph from Preeti Gupta, the price of disks immediately nearly doubled, choking off demand to match the available supply, and then fell slowly as supply recovered.
So we have two statements. The first is "per byte, storage media are getting cheaper very rapidly". We can argue about exactly how rapidly, but there are decades of factual data recording the drop in cost per byte of disk and other storage media (see Preeti's graph). So it is reasonable to believe the first statement. Anyone who has been buying computers for a few years can testify to it.
The orange bars are labeled "output", which I believe represents the total number of bytes of storage media manufactured each year. This number should be fairly accurate, but it overstates the amount of newly created information stored each year for many reasons:
- Newly manufactured media does not instantly get filled. There are delays in the distribution pipeline - for example I have nearly half a terabyte of unwritten DVD-R media sitting on a shelf. This is likely to be a fairly small percentage.
- Some media that gets filled turns out to be faulty and gets returned under warranty. This is likely to be a fairly small percentage.
- Some of the newly manufactured media replaces obsolete media, so isn't available to store newly created information.
- Because of overhead from file systems and so on, newly created information occupies more bytes of storage than its raw size. This is typically a small percentage.
- If newly created information does actually get written to a storage medium, several copies of it normally get written. This is likely to be a factor of about two.
- Some newly created information exists in vast numbers of copies. For example, my iPhone 6 claims to have 64GB of storage. That corresponds to the amount of newly manufactured storage medium (flash) it consumes. But about 8.5GB of that is consumed by a copy of iOS, the same information that consumes 8.5GB in every iPhone 6. Between October 2014 and October 2015 Apple sold 222M iPhones, So those 8.5GB of information are replicated 222M times, consuming about 1.9EB of the storage manufactured in that year.
What do the blue bars represent? They are labeled "demand" but, as we have seen, the demand for storage depends on the price. There's no price specified for these bars. The caption of the graph says "Source: Recode", which I believe refers to this 2014 article by Rocky Pimentel entitled Stuffed: Why Data Storage Is Hot Again. (Really!). Based on the IDC/EMC Digital Universe report, Pimentel writes:
The total amount of digital data generated in 2013 will come to 3.5 zettabytes (a zettabyte is 1 with 21 zeros after it, and is equivalent to about the storage of one trillion USB keys). The 3.5 zettabytes generated this year will triple the amount of data created in 2010. By 2020, the world will generate 40 zettabytes of data annually, or more than 5,200 gigabytes of data for every person on the planet.The operative words are "data generated". Not "data stored permanently", nor "bytes of storage consumed". The numbers projected by IDC for "data generated" have always greatly exceeded the numbers actually reported for storage media manufactured in a given year, which in turn as discussed above exaggerate the capacity added to the world's storage infrastructure. So where were the extra projected bytes stored?
The assumption behind "demand exceeds supply" is that every byte of "data generated" in the IDC report is a byte of demand for permanent storage capacity. In a world where storage was free there would still be much data generated that was never intended to be stored for any length of time, and would thus not represent demand for storage media.
In the real word, where Storage Will Be Much Less Free Than It Used To Be, there are at least two answers to the question Why Not Store It All? Data costs money to store and, as Maciej Cegłowski's Haunted By Data points out, it costs a whole lot more when it leaks.
For a long time, discussions of storage have been bedevilled by the confusion between IDC's projections for "data generated" and the actual demand for storage media. Don't get fooled. Any articles using the IDC/EMC numbers as storage demand can be ignored.