Robert Fontana & Gary Decadinvaluable overview of the storage landscape. Their key points include:
- [Slide 5] The total amount of storage manufactured each year continues its exponential growth at around 20%/yr. The vast majority (76%) of it is HDD, but the proportion of flash (20%) is increasing. Tape remains a very small proportion (4%).
[Slide 12] They contrast this 20% growth in supply with the traditionally ludicrous 40% growth in "demand". Their analysis assumes one byte of storage manufactured in a year represents one byte of data stored in that year, which is not the case (see my 2016 post Where Did All Those Bits Go? for a comprehensive debunking). So their supposed "storage gap" is actually a huge, if irrelevant, underestimate. But they hit the nail on the head with:
Key Point: HDD 75% of bits and 30% of revenue, NAND 20% of bits and 70% of revenue".
- [Slide 9] The Kryder rates for NAND Flash, HDD and Tape are comparable;
$/GB decreases are competitive with all technologies.But, as I've been writing since at least 2012's Storage Will Be A Lot Less Free Than It Used To Be, the Kryder rate has decreased significantly from the good old days:
$/GB decreases are in the 19%/yr range and not the classical Moore’s Law projection of 28%/yr associated with areal density doubling every 2 yearsAs my economic model shows, this makes long-term data storage a significantly greater investment.
- [Slide 11] In 2017 flash was 9.7 times as expensive as HDD. In 2018 the ratio was 9 times. Thus, despite recovering from 2017's supply shortages, flash has not made significant progress in eroding HDD's $/GB advantage. By continuing current trends, they project that by 2026 flash will ship more bytes than HDD. But they project it will still be 6 times as expensive per byte. So they ask a good question:
In 2026 is there demand for 7X more manufactured storage annually and is there sufficient value for this storage to spend $122B more annually (2.4X) for this storage?
Jon Tranthamhas been for a decade, the date for volume shipments of HAMR drives is still slipping in real time; "Seagate is now shipping HAMR drives in limited quantities to lead customers".
His presentation is interesting in that he provides some details of the extraordinary challenges involved in manufacturing HAMR drives, with pictures showing how small everything is:
The height from the bottom of the slider to the top of the laser module is less than 500 umAs usual, I will predict that the industry is far more likely to achieve the 15% CAGR in areal density line on the graph than the 30% line. Note the flatness of the "HDD Product" curve for the last five years or so.
The slider will fly over the disk with an air-gap of only 1-2 nm
TapeThe topic of tape provided a point-counterpoint balance.
Gary Decad and Robert Fontana from IBM made the point that tape's roadmap is highly credible by showing that:
Tape, unlike HDD, has consistently achieved published capacity roadmaps
For the last 8 years, the ratio of manufactured EB of tape to manufactured EB of HDD as remained constant in the 5.5% rangeand that:
Unlike HDD, tape magnetic physics is not the limiting issues since tape bit cells are 60X larger than HDD bit cells ... The projected tape areal density in 2025 (90 Gbit/in2) is 13x smaller than today’s HDD areal density and has already been demonstrated in laboratory environments.Carl Watts' Issues in Tape Industry needed only a few bullets to make his counterpoint that the risk in tape is not technological:
If, as Decad and Fontana claim:
- IBM is the last of the hardware manufacturers:
- IBM is the only builder of LTO8
- IBM is the only vendor left with enterprise class tape drives
- If you only have one manufacturer how do you mitigate risk?
- These cloud archival solutions all use tape:
- Amazon AWS Glacier and Glacier Deep ($1/TB/month)
- Azure General Purpose v2 storage Archive ($2/TB/month)
- Google GCP Coldline($7/TB/month)
- If it's all the same tape, how do we mitigate risk?
Tape storage is strategic in public, hybrid, and private “Clouds”then IBM has achieved a monopoly, which could have implications for tape's cost advantage. Jon Trantham's presentation described Seagate's work on robots, similar to tape robots and the Blu-Ray robots developed by Facebook, but containing hard disk cartridges descended from those we studied in 2008's Predicting the Archival Life of Removable Hard Disk Drives. We showed that the bits on the platters had similar life to bits on tape. Of course, tape has the advantage of being effectively a 3D medium where disk is effectively a 2D medium.
Cloud StorageAmazon, Wasabi and Ceph gave useful marketing presentations. Julian Morley reported on Stanford's transition from in-house tape to cloud storage, with important cost data. I reported previously on the economic modeling Morley used to support this decision.
|10,000 write operations||0.05|
|10,000 read operations||0.004|
|Early deletion charge:||180 days|
|10,000 write operations||0.1|
|10,000 read operations||5|
|Early deletion charge:||180 days|
|Early deletion charge:||365 days|
This table, note, is an over-simplification. The pricing is complex; operations are broken down more precisely than read and write; the exact features vary; and there may be discounts for reserved storage. Costs for data transfer within your cloud infrastructure may be less. The only way to get a true comparison is to specify your exact requirements (and whether the cloud provider can meet them), and work out the price for your particular case.
DNAmedium-term future, of DNA as an archival storage medium for more than seven years. I've always been impressed by the work of the Microsoft/UW team in this field, and Karin Strauss and Luis Ceze's DNA data storage and computation is no exception. It includes details of their demonstration of a complete write-to-read automated system (see also video), and discussion of techniques for performing "big data" computations on data stored in DNA.
Anne Fischer reported on DARPA's research program in Molecular Informatics. One of its antecedents was a DARPA workshop in 2016. Her presentation stressed the diverse range of small molecules that can be used as storage media. I wrote about one non-DNA approach from Harvard last year.
In Cost-Reducing Writing DNA Data I wrote about Catalog's approach, assembling a strand from a library of short sequences of bases. It is a good idea, addressing one of the big deficiencies of DNA as a storage medium, its write bandwidth. But Devin Leake's slides are short on detail, more of an elevator pitch for investment, They start by repeating the ludicrous IDC projection of "bytes generated" and equating it to demand for storage, and in particular archival storage. If you're doing a company you need a much better idea than this about the market you're addressing.
Henry NewmanThe good Dr. Pangloss loved Henry Newman's enthusiasm for 5G networking, but I'm a lot more skeptical. It is true that early 5G phones can demo nearly 2Gb/s in very restricted coverage areas in some US cities. But 5G phones are going to be more expensive to buy, more expensive to use, have less battery life, overheat, have less consistent bandwidth and almost non-existent coverage. In return, you get better peak bandwidth, which most people don't use. Customers are already discovering that their existing phone is "good enough". 5G is such a deal!
The reason the carriers are building out 5G networks isn't phones, it is because they see a goldmine in the Internet of Things. But combine 2Gb/s bandwidth with the IoT's notoriously non-existent security, and you have a disaster the carriers simply cannot allow to happen.
The IoT has proliferated for two reasons, the Things are very cheap and connecting them to the Internet is unregulated, so ISPs cannot impose hassles. But connecting a Thing to the 5G Internet will require a data plan from the carrier, so they will be able to impose requirements, and thus costs. Among the requirements will have to be that the Things have UL certification, adequate security and support, including timely software updates for their presumably long connected life. It is precisely the lack of these expensive attributes that have made the IoT so ubiquitous and such a security dumpster-fire!
FixityTwo presentations discussed fixity checks. Mark Cooper reported on an effort to validate both the inventory and the checksums of part of LC's digital collection. The conclusion was that the automated parts were reliable, the human parts not so much:
Buzz Hayes from Google explained their recommended technique for performing fixity checks on data in Google's cloud. They provide scripts for the two traditional approaches:
- Content on storage is correct, inventory is not
- Content custodians working around system limitations, resulting in broken inventory records
- Content in the digital storage system needs to be understood as potentially dynamic, in particular for presentation and access
- System needs to facilitate required actions in ways that are logged and versioned
- Read the data back and hash it, which at scale gets expensive in access and bandwidth charges.
- Hash the data in the cloud that stores it, which involves trusting the cloud to actually perform the hash rather than simply remember the hash computed at ingest.
BlockchainSharmila Bhatia reported on an initiative by NARA to investigate the potential for blockchain to assist government records management which concluded:
Authenticity and IntegrityIt is important to note that what NARA means by "government records" is quite different from what is typically meant by "records", and the legislative framework under which they operate may make applying blockchain technology tricky.
- Blockchain distributed ledger functionality presents a new way to ensure electronic systems provide electronic record authenticity / integrity.
- May not help with preservation or long term access and may make these issues more complicated.
Ben Fino-Radin and Michelle Lee pitched Starling, a startup claiming:
Simplified & coordinated decentralized storage on the Filecoin networkTheir slides describe how the technology works, but give no idea of how much it would cost to use. Just as with DNA and other exotic media, the real issue is economic not technical.
I wrote skeptically about the economics of the Filecoin network in The Four Most Expensive Words in the English Language and Triumph Of Greed Over Arithmetic, comparing its possible pricing to Amazon's S3 and S3 RRS. Of course, the numbers would have looked much worse for Filecoin had I compared it with Wasabi's pricing.