Tuesday, May 28, 2024

Library of Congress: Designing Storage Architectures 2024

I participated virtually in the 2024 Library of Congress Designing Storage Architectures for Digital Collections meeting. As usual, there were a set of very interesting talks. The slides from the presentations are now online so, below the fold, I discuss the talks I found particulary interesting.

NAND, HDD and Tape Storage Technology Trends

As has become traditional, IBM's Georg Lauhoff and Gary M Decad presented a detailed overview of the storage market. Five slides are of particular interest.

The first is their log-linear graph of the progress of areal density of hard disk (HDD), tape and NAND flash over the last three decades:
  • Tape has improved its areal density at a very consistent ~27%/year. This has been possible due in part to improvements in both media and heads. But, as the graph shows, primarily because the bits on tape are currently about the size that bits on hard disk were two decades ago. This gives tape a lot of headroom before it runs into the pyysical limits.
  • HDD was improving its areal density at around 35%/year until, around 2010, it got too close to the physical limits. Since then it has been growing around 6%/year, despite continual Panglossian predictions from the industry.
  • NAND, in which category IBM includes everything from enterprise SSDs to SD cards, saw explosive growth from the late 1990s to the late 2000s, growing about 3 orders of magnitude in around a decade.It then slowed to a mere 32%/year, meaning that it is still growing significantly faster than the other media.
IBM's second pair of graphs shows (a) a log-linear plot of the Exabytes shipped each year since 2008, and (b) the percentage of the total represented by each medium:
  • The graph of Exabytes shipped shows NAND growing three times faster than the other media, that tape's growth rate is again very consistent, and that drops in HDD shipments for 2022 and 2023 have made the total shipments for HDD and NAND about the same.
  • The market share graph shows tape maintaining a small share slightly eroded by HDD. But NAND is rapidly eroding HDD's share.
The next slide shows (a) a log-linear plot of the cost of a terabyte over time (the Kryder rate), and (b) the ratio between NAND and HDD, and HDD to tape:
  • The Kryder rate graph shows both NAND's and HDD's rate slowing in the late 2000s. Tape again has a more consistent rate.
  • The cost ratio graph shows why NAND is rapidly eroding HDD's market share, as its cost disadvantage is decreasing exponentially. The cost ratio between tape and HDD has been fairly constant, which likely explains why tape's market share has suffered slight erosion.
Their table of 2023 costs shows that NAND is more than 3 times as expensive as HDD. But this is an average across the whole range of NAND and HDD markets. Much NAND goes into market segments where it does not compete with HDD, such as SD cards, USB drives and phones. Whereas most HDD is 3.5" drives in PCs and data centers, where it competes only with enterprise flash. So in markets where they compete, the cost differential will be substantially higher.

They have two slides that echo topics I have posted about fairly often. The first points out that industry projections of future areal density (and thus cost) routinely exaggerate the rate of growth by at least 10%. This why I frequently report on how happy the good Dr. Pangloss is with the storage industry.

The second of them expresses skepticism about the prospect of DNA storage impactuing the market. Their cost graph shows the cost to write a terabyte of DNA is around 10 orders of magnitude more than for HDD, and the cost to read it once is around 3 orders of magnitude more than buying a terabyte of HDD. They write:
  • Very slow to read and write.
  • Very expensive.
  • Large Market size needed to develop technology; a sub-tier below tape storage is small.
I pointed this out in 2018's DNA's Niche In The Storage Market. Even if projections about the increase in demand for archival storage pan out, it will be hard for new media to compete with tape.

They also write something mystifying — "Not so stable", citing DNA Data Storage by Tomasz Buko, Nella Tuczko, and Takao Ishikawa of the University of Warsaw. This paper contends that DNA is vastly more stable than tape, or any current medium:
For years, a DNA specimen collected from a 700,000-year-old horse was considered to be the oldest extracted DNA. However, in 2021, this record was pushed to 1 million years. DNA extracted from mammoth teeth was successfully extracted and sequenced. Additionally, scientists managed to sequence 300,000-year-old mitochondrial DNA from humans and bears. These examples perfectly illustrate the longevity of DNA and proves its usefulness for archeological purposes or data storekeeping. If stored in optimal conditions and dehydrated, DNA can possibly endure for millions of years.

Seagate Storage Update

Jon Trantham's look at the hard disk market was interesting. First, his graph of demand for nearline HDDs explains why the industry suffered hard times recently and why they believe the future looks brighter. A period of rapid growth led to an inventory buildup from exuberant demand forecasts, but the inventory has been depleted and demand is now rising. One might be skeptical of the rate of demand recovery in the graph, but at least the US economy suggests rising demand is plausible.

His slide on IDC's market forecast supports my contention that IBM's cost ratio between NAND and HDD is misleading. The non-cloud HDD market is projected to be a small proportion of the total, and to grow slowly. The vast bulk of the HDD market is for cloud storage, and thus the effective cost ratio is between HDD and enterprise SSDs. This would be much greater than IBM's overall estimate of 3.

Seagate projects that the capacity per platter will rise from today's 2.4TB to over 5TB by 2026 (see Dr. Pangloss and IBM) using their Mozaic HAMR technology. Current 16TB drives have 9 1.78TB platters. Seagate recently started shipping 30TB drives with 10 3TB platters. If they are right a 10 platter drive in 2026 would be 50TB, or around 3x the current drive capacity. This would certainly help maintain HDD's market share.

Some additional topics:
  • Seagate is working to move HDDs to the NVMe interface, to simplify and speed up datacenter systems.
  • Their sustainability efforts include trying to avoid customers shredding drives at their end-of-life. This is driven by customers requirement for data security. But Seagate encrypts the data on the platters and securely erase it by overwriting the key. The problem is to persuade customers that this satisfies their requirement. Seagate also wants to recover the rare earth magnets by industrial-scale disassembly of the drives.
  • Given a target of 98% drive reliability, as the number of platters increases the number of heads increases and thus the required reliability of the heads increases. Seagate is advocating that, in the case of head failure, the drive remains in use with reduced capacity. By provisioning a pool of spare drives, and avoiding failing an entire drive if a single head fails, the cost of ownership can be significantly reduced. 40-60% of all drive failures are single head failures.
Western Digital's presentation also focused on sustainability.

Design and Operation of Exascale Archives in Azure

The first slide of Microsoft's Aaron Ogus and Shashidar Joshi's talk shows why cloud systems like Azure are dominating the industry.

Their observations from the experience of running Exabytes of archival storage containing trillions of objects and servicing billions of requests each month are:
  • Writes dominate
  • Reads are infrequent
  • Small reads require low latency random access
  • Large reads require good throughput
  • Archive Storage system needs dynamic provisioning to account for workload variances
All the early studies showed that archival storage workloads are write-domminated, because most archival data is written and very rarely accessed. That's why it has been archived. It is reasssuring to know that this is still true in the cloud era. Given this, and that the Service Level Agreements for cloud archives do not require low latency, it isn't clear why Microsoft thinks this is important.

The challenges Microsoft sees with their current technologies are:
  • Mechanical overheads lead to latencies
  • Environmental conditions limit deployment capabilities
  • Uncertainty with roadmaps, capacities and costs
  • Need for media migrations at EOL
  • Opportunity for new storage technologies
The environmental requirements of current technologies were one reason for Facebook's use of optical storage, it only needed warehouse space not a data center. Anecdotal reports at the meeting revealed serious issues with moving tapes between environments.

The uncertainties are always with us, not least because the economics of long-term storage depend strongly on interest rates. The "need for media migration at EOL" is one major motivation for the use of quasi-immortal media, such as Project Silica. But this is illusory. As I wrote about Facebook's optical cold storage:
No-one expects the racks to sit in the data center for 50 years, at some point before then they will be obsoleted by some unknown new, much denser and more power-efficient cold storage medium.
Earlier this year I discussed Microsoft Research's view of the "opportunity for new storage technologies" in Microsoft's Archival Storage Research. They are pursuing two technologies, DNA storage, and Project Silica. Ogus and Joshi discuss Project Silica, claiming that it provides "Performance (Random IO) per TB metric better than currently available Archive technologies". But part of the design of Project Silica is that the write and read drives are different, and thus:
This allows independent scaling of read and write throughput.
Their claim depends upon the system being configured with enough write (if they meant I) or read (if they meant O) drives.

The Data Storage Industry Gets Ready for AI

Fred Moore made the important point that archival storage faced a significant risk of a Vertical Market Failure (VMF):
  • The Zettabyte scale secondary storage market (cold, archive) has become the exclusive domain of few suppliers.
  • IBM is the only (1) tape drive developer/supplier controlling the entire tape ecosystem specifications.
  • Fujifilm and Sony are the only (2) LTO tape media suppliers.
  • HPE, IBM, Quantum and Spectra are the primary large-scale tape and library suppliers.
  • Seagate, Toshiba and WD are the only (3) remaining HDD suppliers.
  • (Tape - WD), (HDD - Seagate and TDK) are the only R/W head manufacturers.
  • Will current HDD and tape development roadmaps keep pace with demand?
  • HSDCs leverage their bargaining and buying power to drive down prices impacting vendor margins, R&D investments.
  • In the event of a secondary storage VMF, sustainability challenges will become insurmountable for HDDs to address.
  • As supplier profit margins become insufficient, future R&D funding, roadmaps, will place innovation at significant risk.
Archival storage is a very small part of the total storage industry. NAND and HDDs pay for their R&D in the much bigger online and nearline markets. Tape leverages a little of that, in that its head technology is based on HDD head technology, but in general has to fund R&D solely from the archive market. And so will potential novel archival storage technologies, such as DNA and Project Silica. Fundamentally, companies don't want to invest in archival storage because it doesn't generate income, so the market isn't just small, but under significant margin pressure. This is why I wrote Archival Media: Not a Good Business six years ago. If and when NAND eventually displaces HDD for nearline storage, the reduced market will definitely cause a VMF, with knock-on effects on tape.

No comments: