Bhat's introduction draws heavily on IDC's Digital Universe report which:
forecasts that the amount of data generated globally will reach 44 zettabytes (ZBs) in 2020 and 163 ZBs in 2025. Even the estimates are increasing, as earlier it was forecast to be 35 ZBs in 2020 instead of 44.
Seagate ... subscribes to IDC’s estimate that around 13 ZBs of 44 ZBs generated in 2020 would be critical and should be stored. ... Seagate also anticipates that the storage capacity available in 2020 will not be able to fulfill the minimum required storage demand, and will lead to a data-capacity gap of at least 6 ZBs
|Preeti Gupta's 2014 graph|
Unfortunately, Bhat does not cite or seem to have read my 2016 post Where Did All Those Bits Go? in which I point out a number of flaws in IDC's reports, and in the analyses such as Seagate's based on them. The most important of these flaws is the implicit assumption that the demand for storage is independent of the price of storage:
Seagate ... subscribes to IDC’s estimate that around 13 ZBs of 44 ZBs generated in 2020 would be critical and should be stored. ... the storage capacity available in 2020 will not be able to fulfill the minimum required storage demand, and will lead to a data-capacity gap of at least 6 ZBNote the lack of any concept of the price of storing the 13ZB. Since it is evident that neither IDC nor Seagate nor Baht believe that the 6ZB of additional media "required" would be available at any price, something has to give. But what?
In practice big, and indeed any, data storage user compares a prediction of the value to be realized by storing the data with the cost of doing so. Data whose potential value does not justify its storage doesn't get stored, which is what will happen to the 6ZB.
IDC, Seagate and Bhat suffer from the collision of two ideas, both of which are wrong:
- The "Big Data" hype, which is that the value of keeping everything is almost infinite.
- The "storage is free" idea left over from the long-gone days of 40+% Kryder rates.
Thus the gap to which Bhat refers is between what data centers would store if storage were free, and what they will store given the actual cost of storing it. This gap would only be something new or unexpected if storage were free. This hasn't been the perception, let alone the reality, since very early in the history of Big Data centers.