Data Storage Trends
As usual, IBM's Georg Lauhoff provided an invaluable overview of the storage industry as of late 2025, co-authored with Sassan Shahidi. They make an important point that I have been making since at least 2018's Archival Media: Not a Good Business:Challenges of Alternative Archival TechnologiesThis justifies their focus on flash, hard disk and tape. Their "exabytes shipped" graph shows that indeed Hard Disk Unexpectedly Not Dead; the dramatic decline in HDD's share since 2008 reversed in 2024.
• Alternative archival technologies face technical and economic hurdles.
The key metric for technological progress in traditional storage media is areal density:
- Lauhoff and Shahidi's graph shows that tape, which has the easiest path because of the relatively large size of the bits, has continued its steady growth, although one could argue both that their 24% annual growth exaggerates the period since 2017, and that INSIC's projection of 28% is optimistic.
- It is clear that HDD areal density progress slowed dramatically about 2010 to around 11% per year. But the developments Jon Trantham reported, see the next section, could lead to a significant acceleration in HDD areal density.
- Flash has continued a steady 30% per year growth since about 2010, thanks to stacking cells vertically and storing multiple bits in them. Both of these have limits, into which the industry will eventually run.
Lauhoff and Shahidi conclude that:
- Tape Storage: continues to evolve.
- HDD: improvements slow down but recently high demand.
- NAND: well-suited for hot storage but not for archival purposes.
- Lack of Alternatives: Within the foreseeable future (within 10 years), there are no viable alternatives to Tape, HDD, and NAND storage.
- AI leads to storage demands across the tiers
Mass Capacity Storage in an AI Era
Jon Trantham of Seagate reported that after more than a quarter-century of work and 14 years after HAMR was demonstrated in the lab, Seagate has finally been shipping HAMR drives in quantity since early 2025.He also announced that they have started to ship their 40TB HAMR drives. Their roadmap to 100TB/drive presents some significant challenges, as shown in Trantham's slide. The history of HAMR shows that Seagate can surmount major technical challenges, but it may take longer than they project.
One of Trantham's slides vividly illustrated the technology challenges the HDD industry faces, showing to scale to evolution since 1997 of the sizes of the bits on the media, the reader, and the writer. Note the 1610-fold decrease in the area of the writer, the 305-fold decrease in the area of the bit, and the 289-fold decrease in the area of the reader.
Flash for Archival Storage
Fifteen years ago, Ethan Miller, Ian Adams and I published Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes. It was inspired by work at Carnegie-Mellon from 2009, FAWN: a fast array of wimpy nodes, which argued that implementing fast storage using large numbers of small nodes built from cell-phone technology could save two orders of magnitude in energy per query. We argued that it would be possible to build low cost, low energy archival storage systems using a similar approach.Our idea was ignored, but at this meeting Ethan Miller revived the idea of using flash as an archival medium. He argues for a rack-scale system storing 500PB/rack built from 5U shelves, similar to Backblaze's, each holding 216 of Pure Storage's 300TB DFMs (direct flash modules) stacked vertically.
There are three big challenges:
- First, if all the DFMs were actively I/O-ing the rack would draw 45KW. Supplying the rack with that much power and cooling it would be very difficult (see the design of Nvidia's racks). But, just as with Facebook's hard disk cold storage, this can be mitigated by scheduling accesses so that only a small proportion of the drives are active.
- Second, flash cells gradually leak electrons, so must be regularly refreshed by reading and re-writing them. This task must be scheduled along with the application's reads and writes, but doing so is fairly easy since the refresh timing isn't critical.
- Third, flash is more expensive per TB than hard disk or tape. As I have argued for a long time, in the archival storage market the time value of money makes it difficult to justify trading increased capex for decreased opex:
- The opex savings are significant, with essentially no mechanical failures, more benign failure modes, and much higher bandwidth for erasure code recovery.
- Miller argues that the capex isn't as bad as the cost of the media makes it look, because at 0.5EB/rack there are savings in space, power and cooling. He doesn't point out that the lower latency for read access potentially allows for the elimination of an entire warm layer of the storage hierarchy.
Although I'm naturally biassed, I think Miller's case for archival flash is worth a detailed investigation.
Avoiding the Pitfalls of Cloud Storage for AI Applications
Fourteen years ago in Cloud vs. Local Storage Costs and More on Glacier Pricing I started writing about the way the complex and somewhat opaque pricing models of cloud storage platforms made it difficult to estimate how much you would end up paying. People are just now figuring out that AI has the same problem. Neither is an accident; these pricing models serve two goals important for the platform's business model. First, the purchase decision is based on the "Low, Low" advertised price. Second, once you discover how much more you're actually paying, you face the lock-in created by egress fees. In 2019's Cloud for Presevation I wrote about how egress charges implement vendor lock-in.David Boland of Wasabi presented a current analysis of this issue. He reports that about half of all the organizations they surveyed exceeded their budget for public cloud storage.
The budget overruns were caused by the fact that the actual spend was about double the sticker price for the storage. Fees were the culprit, which by design are much harder to project.
Digital Storage Architectures for AI and ML
Will Cavin of Amazon had two important iterms of news:- Amazon has been proactive and now supports post-quantum encryption on TLS conections.
- Using AMAzon's cloud services for long-term preservation always suffered from the fact that, to confirm the fixity of the preserved content, it had to be read and thus incur fees. Finally, two advances have improved things. First, it is now possible to use SHA-256 and SHA-512 schecksums when uploading data. Second, it is now possible to use an S3 batch job to validate the checksums on objects without reading them.








No comments:
Post a Comment