Thursday, September 18, 2025

Hard Disk Unexpectedly Not Dead

As I read Zak Killian's Expect HDD, SSD shortages as AI rewrites the rules of storage hierarchy — multiple companies announce price hikes, too I realized I had forgotten to write this year's version of my annual post on the Library of Congress' Desihning Storage Architectures meeting, which was back in March. So below the fold I discuss a few of the DSA talks, Killian's more recent post, and yet another development in DNA storage. The TL;DR is that the long-predicted death of hard disks is continuing to fail to materialize, and so is the equally long-predicted death of tape.

Killian's post starts:
The computing market is absolutely ablaze with AI-driven growth. Regardless of how sustainable it might be, companies are spending untold amounts of wealth on hardware, with most headlines revolving around GPUs. But the storage market is also under pressure, especially hard drive vendors who purportedly haven't done much to increase manufacturing capacity in a decade. TrendForce says lead times for high-capacity "nearline" hard drives have ballooned to over 52 weeks — more than a full year.
Source
Western Digital is:
warning of "unprecedented demand for every capacity in [its] portfolio," and stating that it is raising prices on all of its hard drives.
The unprecedented demand from AI farms is because:
You don't just need the data required to run inference. You also need the history of everything to prove to regulators that you're not laundering bias, to retrain when new data comes in, and to roll back to a previous checkpoint if your fine-tuned model goes feral and, say, starts referring to itself as MechaHitler. This stuff can't go to offline storage until you're certain it isn't needed in the short term. But it's too big to live in the primary storage of all but the beefiest servers. Thus, the need for nearline hard drives.
WD's projection
At the meeting, Western Digital's Dave Landsman's HDDs are here to stay made the same point with this graph using data from IDC and TrendFocus. They are projecting that both disk and enterprise SSD will grow in the low 20%/year range, so the vast bulk of data in data centers will remain on disk. Landsman claims that SSDs are and will remain 6 times as expensive per bit as hard disk and that 81% of installed data center capacity is on hard disk.

Keeping the data on hard disk might actually be a good idea. Sustainability in Datacenters by Shruti Sethi presented a joint Microsoft/Carnegie-Mellon study of the scope 2 (operational) and scope 3 (embedded) carbon emissions of compute, SSD and HDD racks in Azure's data centers. The study, A Call for Research on Storage Emissions by Sara MacAllister et al concluded that:
an SSD storage rack has approximately 4× the operational emissions per TB of an HDD storage rack. Storage devices (SSDs and HDDs) are the largest single contributor of operational emissions. For SSD racks, storage devices account for 39% of emissions, whereas for HDD racks they account for 48% of emissions. These numbers contradict the conventional wisdom that processing units dominate energy consumption: storage servers carry so many storage devices that they become the dominant energy consumers.
...
SSD racks emit approximately 10× the embodied emissions per TB as that of HDD storage racks. The storage devices themselves dominate embodied emissions, accounting for 81% and 55% of emissions in SSD and HDD racks, respectively.
Areal Density Trends
As usual, the authoritative word on the performance of the storage industry comes from IBM. Georg Lauhoff & Sassan Shahidi's Data Storage Trends: NAND, HDD and Tape Storage added another year's data points to their invaluable graphs and revealed that:
  • NAND areal density continues to increase rapidly, because 3D scales faster than the 2D of disk and tape.
  • Disk's 8%/year areal density increase continued, but note that although their graph includes Seagate's 32TB HAMR drive the effect of Seagate's and later WD's deployment of HAMR didn't really start until later in 2025.
  • Tape continued its 27%/year increase.
Coming from a tape supplier this comment isn't surprising but it is correct:
Despite the promise of alternative archive storage technologies, challenges persist. Enduring relevance of tape storage, which itself is rapidly evolving.
The main problem being that the huge investment and long time horizon needed to displace tape's 7% of the storage market can't generate the necessary return.

Product vs. Demo
One fascinating graph shows the difference between demonstrations and products for tape and disk. I keep pointing out the very long timescales in the storage industry. In January's Storage Roundup I noted that HAMR was just starting to be deployed 23 years after Seagate demonstrated it. Lauhoff & Shahidi's graph shows that the current tape density was demo-ed in 2006 and shipped in 2022, and that disk's current density was demo-ed in 2012.

Source
This graph reinforces that tape's roadmap is credible, but the good Dr. Pangloss noticed the optimism of the NAND and disk roadmaps. New technologies tend to scale faster at first, then slower as they age. So it is likely that the advent of HAMR will accelerate disk's areal density increase somewhat. And it is possible that the difficulty of moving from 3D NAND to 4D NAND will slow its increase.

Cost Ratio
Lauhoff & Shahidi's cost ratio graph shows that the relative costs of the different media were roughly stable. If Killian is right that the disk manufacturers are increasing prices and lengthening lead times because of demand from AI, this could be different in next year's graph. But Killian also notes that, despite the fact that QLC SSDs are at least "four times the cost per gigabyte":
Trendforce reports that memory suppliers are actively developing SSD products intended for deployment in nearline service. These should help bring costs down once they hit the market. But in the short term, we can expect the storage crunch to cause rising SSD prices as well, at least for enterprise drives.
Annual Bit Shipments
Lauhoff & Shahidi's bit shipment graph is interesting for two reasons:
  • Disk's proportion of total bit shipments increased.
  • They started tracking the proportion of NAND flash that was SSDs. They represented only about 30% of disk's bit shipments. The claim that the bulk of data still lives on hard disk is true and looks to continue. Disk ships mostly to the nearline enterprise market, while SSDs ship mostly to the online enterprise market. Disk is shipping nearly three times as many bits.
Tape's predicted demise is just as delayed as disk's. Back in July Simon Sharwood posted And now for our annual ‘Tape is still not dead’ update:
Shipments of tape storage media increased again in 2024, according to HPE, IBM, and Quantum – the three companies that back the Linear Tape-Open (LTO) Format.

The three companies on Tuesday claimed they shipped 176.5 Exabytes worth of tape during 2024, a 15.4 percent increase on 2023’s 152.9 Exabytes.
DNA Storage
I have been writing skeptically about the medium-term prospects for DNA storage since 2012 and Lauhoff & Shahidi share my skepticism in their graph of the technology's progress in the lab. DNA can only compete in the archival storage market, so the relevant comparison is with LTO tape. Even if you believe Wang's estimate, DNA is more than ten million times too expensive.

Figure 1
Via Brandon Vigliarolo's Boffins invent DNA tape that could pack 375 petabytes into an LTO cart we find the latest idea for improving DNA storage in A compact cassette tape for DNA-based data storage by Jiankai Li et al from the Southern University of Science and Technology in Shenzhen. Their idea is to deposit DNA on "good old-fashioned polyester-nylon composite tape" that could reside in an LTO cartridge. Vigliarolo notes that:
DNA is a very dense storage medium and storage researchers have tried to use it for data storage, but without much success, because it’s hard to find info within DNA and read times are slow.

Jiang's team claims to have addressed that problem, establishing a sequence of data partitions on the tape and identifying each of these with a bar code.
The team from Shenzen's focus on reading is a misunderstanding of the fundamental requirements of the hyperscaler archive market, which are in order:
  1. Write bandwidth. Kestutis Patiejunas pointed this out over a decade ago.
  2. Cost per byte. Archival storage is for data that can no longer earn its keep on lower-latency media, so it has to be very cheap.
Read latency and bandwidth are pretty much irrelevant because, as Kestutis Patiejunas said, the main reason the data would be read is a subpoena.

The paper claims they demonstrated:
a completely automated closed-loop operation involving addressing, recovery, removal, subsequent file deposition, and file recovery again, all accomplished within 50 min.
Vigliarolo notes that:
Jiang's team only wrote 156.6 kilobytes of data to a test tape for their experiment, consisting of four "puzzle pieces" depicting a Chinese lantern. If the data were damaged, it wouldn't assemble correctly. The researchers managed to successfully recover the lantern image without issue, but it took two and a half hours or not-quite one kilobyte per minute.
And the team admit that:
DNA synthesis costs are still very high
Because archival data is guaranteed to be written but is very likely never read, the cost of storing data in DNA is dominated by synthesis. The team effectively admits that they can't compete.

I summed up my skepticism in 2018's DNA's Niche in the Storage Market, posing this challenge to a hypothetical product team:
Engineers, your challenge is to increase the speed of synthesis by a factor of a quarter of a trillion, while reducing the cost by a factor of fifty trillion, in less than 10 years while spending no more than $24M/yr.

Finance team, your challenge is to persuade the company to spend $24M a year for the next 10 years for a product that can then earn about $216M a year for 10 years.
Write bandwidth and cost remain the core problems of DNA storage, and while progress has been made in other areas, both are still many orders of magnitude away from competing with hard disk, let alone LTO tape.

Nevertheless, as I concluded more than seven years ago:
That isn't to say that researching DNA storage technologies is a waste of resources. Eventually, I believe it will be feasible and economic. But eventually is many decades away. This should not be a surprise, time-scales in the storage industry are long. Disk is a 60-year-old technology, tape is at least 65 years old, CDs are 35 years old, flash is 30 years old and has yet to impact bulk data storage.

No comments:

Post a Comment