Designing Storage Architectures
At the end of March the Library of Congress hosted this year's Designing Storage Architectures meeting. Here are some of the highlights:Media Overview & Projections
Georg Lauhoff of IBM continued the invaluable analytic work of Robert Fontana and Gary Decad, presenting four key slides.The first shows the trends of areal density for NAND flash, hard drives and tape. It shows how approaching the physical limits is making it very difficult for hard drive vendors to increase areal density, forcing them to add platters to continue increasing capacity. In contrast, NAND vendors are still rapidly increasing areal density.
The second shows the trend in $/TB for these media. The rate of cost reduction is gradually slowing for all three. Flash memory is still more than 4 times as expensive in capex per TB than hard disk, albeit with lower opex.
The third shows the distribution of media types among the total exabytes shipped. Flash is a rapidly increasing proportion of the total.
The last shows the annual revenue for each media type. Note that tape is an insignificant fraction of the total, so is on an expanded axis. It is striking that both flash and hard disk saw revenue decline in 2022 from the previous year.
Disk Trends
Manuel Offenberg of Seagate presented a more optimistic view of the hard disk industry, with a slide showing that the vast majority of cloud storage was still on hard disk. But note that this represents past purchases; in the future the proportion of hard disk purchased is likely to be lower.Offenberg claimed that:
HAMR is working! Seagate is shipping HAMR based products.Perpendicular Magnetic Recording (PMR) seems to have hit its limit at around 1300GB/in2. Offenberg's graph claims lab demos of their current HAMR technology are now demonstrating 3000GB/in2, enough for a 50TB 10 platter 3.5" drive. But as we have seen, there is a very long way from lab demos to volume shipments, Offenberg reports Seagate's CEO as claiming they:
expect to launch our 30-plus terabyte platform in the June quarter, slightly ahead of schedule.I don't think by "launch" they mean volume shipments to regular customers. A 30TB drive needs around 1800GB/in2, which the graph claims they demo-ed four years ago.
I've been writing skeptically about the timeframe for mass adoption of HAMR for the past thirteen years, based on Dave Anderson's presentation to the 2009 DSA meeting. It predicted HAMR would ship in 2010, and be obsoleted by Bit Patterned Media in 2015, so skepticism has been amply justified. At the 2019 DSA meeting Seagate reported that:
Seagate is now shipping HAMR drives in limited quantities to lead customers.Four years later, they still don't seem to claim that they're shipping HAMR in volume to regular customers. Maybe 2023 will be the year it happens. The bottleneck is probably scaling up the major changes needed to the manufacturing technology for the heads and the actuators.
Seagate stressed the importance for sustainability of the ability to sanitize media, a topic I'll return to below.
Paul Peck of Western Digital was equally optimistic but provided much less detail.
Understanding Storage Intermediaries
Helen Hockx-Yu presented a paper she a Dan Brower gave at iPres21, Understanding Storage Intermediaries, whose abstract reads:Storage intermediaries are software, and sometimes hardware appliances that act as a link between applications and storage media, performing a range of tasks, such as protocol translation, caching, compression or even encryption. This paper describes storage intermediaries and their key functions that librarians and archivists should be aware of, as these introduce technical dependencies that can impact digital preservation.The paper is based on experience at Notre Dame with migrating data between cloud storage systems, identifying transformations that intermediaries implement such as:
Their point is that the intermediaries and their transformations, which may in some cases be invisible at the top level, are dependencies for long-term preservation.
- Underlying storage organization may not match logical organization
- Files might be renamed -
- e.g. UUID, Hashed value (Content Addressable Storage)
- Files might be broken up, compressed and stored as many separate chunks
- Additional metadata
- file systems metadata versus in-file metadata
- "Deleted" files
- “Phantom” files
Project Silica
Ioan Stefanovici presented Microsoft's work moving a decade-old idea from the University of Southampton closer to reality. The idea is to use femtosecond lasers to write into tablets of silica. Microsoft Reseach has built a prototype robot to write, manage and read the tablets, with the idea that this quasi-immortal write-once medium could be deployed as part of their Azure storage stack.I think this is more plausible in the medium term than DNA, but it faces similar economic challenges to those I described in DNA's Niche in the Storage Market, namely that the cost (a) is incurred up-front, (b) is high because archival storage is a niche market without the volumes and competition that drive the Kryder rate of hard disk and flash, and (c) is high because the R&D of the ephemeral media it competes with has been amortized in a much larger market. It may be true that the total life-cycle cost of quasi-immortal media is much lower, but history shows long-term data preservation is not among the priorities for companies to invest each quarter.
SSD Reliability
Andy Klein continues his valuable role as the Principal Cloud Storage Storyteller at Backblaze with The SSD Edition: 2022 Drive Stats Review. Backblaze has nearly 3,000 SSDs of various types acting as boot and log disks. Some are M2 PCIe, some are SATA. Some are enterprise models, some are consumer drives. Since these drives are not mission-critical, Backblaze is fine with this. The overall AFR was 0.89%, but this includes several models with very little data. The three models with a 1% confidence interval are:- Dell model DELLBOSS VD: lifetime AFR–0.00%
- Seagate model ZA250CM10003: lifetime AFR–0.66%
- Seagate model ZA250CM10002: lifetime AFR–0.96%
For 2022, the average temperature was 34.9 degrees Celsius. The average temperature of the hard drives in the same storage servers over the same period was 29.1 degrees Celsius. This difference seems to fly in the face of conventional wisdom that says SSDs run cooler than HDDs. One possible reason is that, in all of our storage servers, the boot drives are further away from the cool aisle than the data drives.
Source |
As the latest generation of M.2 SSDs have trickled out to consumer platforms we've seen some wild and wacky cooling solutions strapped to them: heat pipes, 20,000 rpm fans, and even tiny liquid coolers.Mann quotes Micron's Jon Tanguy:
Perhaps the most extreme example we've seen so far is Adata Project NeonStorm. It packs a self contained liquid-cooling system, complete with pump, reservoir, radiator and pair of fans to the the gum-stick-sized drive. However, it is hardly the only one. TeamGroup and Inland have also strapped fans and even whole cooling towers to their SSDs.
NAND ... is happiest within a relatively narrow temperature band. "NAND flash actually likes to be 'hot' in that 60° to 70° [Celsius] range in order to program a cell because when it's that hot, those electrons can move a little bit easier," he explained.If the SSDs run hot it is likely that their lifetime AFRs will increase. But there are other sources of failure too, as shown by Scharon Harding's SanDisk Extreme SSDs keep abruptly failing—firmware fix for only some promised:
Go a little too hot — say 80°C — and things become problematic, however. At these temps, you risk the SSD's built-in safety mechanisms forcibly powering down the hardware to prevent damage. However, before this happens users are likely to see the performance of their drives plummet, as the SSD's controller throttles itself to prevent data loss.
...
The takeaway is that with PCIe 5.0 SSDs — the performance oriented models in particular — some kind of cooler is necessary to achieve peak performance. Whether it needs to be actively cooled is another question entirely.
SanDisk customers have been complaining about the company's Extreme and Extreme Pro portable SSDs suddenly wiping data and, in some cases, becoming unreadable. Complaints go back at least four months, and SanDisk told Ars today that a firmware fix is coming "soon." However, SanDisk only confirmed a firmware update for the 4TB models, despite an Ars staffer and online users reporting issues with 2TB drives.Finally, I should note Chris Mellor catching Pure Storage "talking their book" in Pure: No more hard drives will be sold after 2028:
An Ars reader tipped us (thanks!) to online discussions filled with panicked and disappointed users detailing experiences with recently purchased Extreme V2 and Extreme Pro V2 portable SSDs. Most users seemed to be using a 4TB model, but there were also complaints from owners of 2TB drives.
...
Ars Technica's Lee Hutchinson confirmed suffering not one, but two 2TB Extreme Pros dying. After filling about halfway, each drive met a slew of read and write errors. When he disconnected and reconnected the SSD, it showed it was unformatted with the drive completely wiped, including its file system. Wiping and reformatting didn't help, and this happened with two different units.
In the latest blast of the HDD vs SSD culture wars, a Pure Storage exec is predicting that no more hard disk drives will be sold after 2028 because of electricity costs and availability, as well as NAND $/TB declines.Lauhoff's graph of IBM's projections suggests that NAND might ship more bits than hard disks in 2028, but that would still leave hard disks shipping somewhere around 1,500 Exabytes/year to someone. It is more than a decade since I first reported on the good Dr. Pangloss' appreciation of storage media marketing hype in Dr. Pangloss' Notes From Dinner and he is glad to see this great tradition is alive and well.
Hard Disk Reliability
Source |
This chart combines all of the manufacturer’s drive models regardless of their age. In our case, many of the older drive models are from Seagate and that helps drive up their overall AFR. For example, 60% of the 4TB drives are from Seagate and are, on average, 89 months old, and over 95% of the 8TB drives in production are from Seagate and they are, on average, over 70 months old. As we’ve seen when we examined hard drive life expectancy using the Bathtub Curve, older drives have a tendency to fail more often.Inspired by Chris Mellor's Most failed disk drives fail just before they hit 3 years’ use, says data recovery biz, based on a blog post by Timoth Burlee of Secure Data Recovery, Klein looked into how old drives were when they failed, producing this table:
SOURCE | FAILED DRIVE COUNT | AVERAGE FAILED AGE |
---|---|---|
Secure Data Recovery | 2,007 failed drives | 2 years, 10 months |
Backblaze | 17,155 failed drives (all models) | 2 years, 6 months |
Backblaze | 3,379 failed drives (only drive models no longer in production) | 2 years, 7 months |
It said that 37 percent of the Western Digital and Seagate drives analyzed featured SMR. The firm found that, "Western Digital's SMR models had a 12.7 percent shorter lifespan than their CMR counterparts, [and] Seagate's SMR models had a 19.7 percent shorter lifespan than their CMR counterparts," when looking at average power-on hours. Further, eight of the 13 SMR drives from Western Digital and Seagate registered fewer than 15,000 power-on hours on average.In their typical customer-friendly way, Western Digital seized on this data. The result was, as Scharon Harding reported in “Clearly predatory”: Western Digital sparks panic, anger for age-shaming HDDs:
some customers have been panicked, confused, and/or angered to see their Western Digital NAS hard drive automatically given a warning label in Synology's DiskStation Manager (DSM) after they were powered on for three years. With no other factors considered for these automatic flags, Western Digital is accused of age-shaming drives to push people to buy new HDDs prematurely.Since most drives are warranted for at least three years, does this mean the average drive fails under warranty? That would be really bad for the vendor's financials.
There are two ways in which drives can leave the fleet, failure or retirement after a full working life. Examining Backblaze's set of drive models that are no longer in the fleet, 3,379 drives from 35 drive models failed at an average age of two years and seven months. It is important to focus on these drives, drive models that are still in the fleet haven't finished failing yet.
If we assume that these drives had the same 1.4% AFR as the average of the whole fleet these drives would be 0.014 * (31/12) = 0.0362 of the population of these 35 models, implying a population size of about 93,343 drives. So 96.4% of the drives retired, only 3.6% failed. Since this is a pretty loose approximation, I hope to find time to do this analysis from the actual data rather than from Klein's summary.
End-of-life
This analysis suggests that the vast majority of drives are fully functional when they are retired. What happens to them? Shaun McManus explains Why millions of usable hard drives are being destroyed:Millions of storage devices are being shredded each year, even though they could be reused. "You don't need an engineering degree to understand that's a bad thing," says Jonmichael Hands.This waste is why Seagate's presentation stresses the standards for cryptographic sanitization:
He is the secretary and treasurer of the Circular Drive Initiative (CDI), a partnership of technology companies promoting the secure reuse of storage hardware. He also works at Chia Network, which provides a blockchain technology.
Chia Network could easily reuse storage devices that large data centres have decided they no longer need. In 2021, the company approached IT Asset Disposition (ITAD) firms, who dispose of old technology for businesses that no longer need it. The answer came back: "Sorry, we have to shred old drives."
"What do you mean, you destroy them?" says Mr Hands, relating the story. "Just erase the data, and then sell them! They said the customers wouldn't let them do that. One ITAD provider said they were shredding five million drives for a single customer."
- IEEE 2883-2022: IEEE Standard for Sanitizing Storage
- ISO/IEC 27040-2015 Information technology — Security techniques — Storage security
- NIST SP 800-88R1 2014: Guidelines for Media Sanitization
- 2007: At-speed Data Encryption, Instant Secure Erase
- 2010: FIPS 140-2 Certified, TCG Storage Security
- 2014: Digitally Signed Firmware, Secure Boot & Diagnostics
- 2018: ISO / NIAP Common Criteria, ISO Trusted Technology Provider
- 2023: Secure Isolation, Open Secure Hardware
The importance of this effort was emphasized in early 2015 with the revelations of the Equation Group's malware. Dan Goodin's How “omnipotent” hackers tied to NSA hid for 14 years—and were found at last describes their storage malware:
One of the Equation Group's malware platforms, for instance, rewrote the hard-drive firmware of infected computers—a never-before-seen engineering marvel that worked on 12 drive categories from manufacturers including Western Digital, Maxtor, Samsung, IBM, Micron, Toshiba, and Seagate.Related supply chain issues are still prevalent. Andy Greenberg's The US Navy, NATO, and NASA are using a shady Chinese company’s encryption chips reports on one for USB-connected encrypted storage media:
The malicious firmware created a secret storage vault that survived military-grade disk wiping and reformatting, making sensitive data stolen from victims available even after reformatting the drive and reinstalling the operating system. The firmware also provided programming interfaces that other code in Equation Group's sprawling malware library could access. Once a hard drive was compromised, the infection was impossible to detect or remove.
In July of 2021, the Commerce Department's Bureau of Industry and Security added the Hangzhou, China-based encryption chip manufacturer Hualan Microelectronics, also known as Sage Microelectronics, to its so-called “Entity List,” a vaguely named trade restrictions list that highlights companies “acting contrary to the foreign policy interests of the United States.” Specifically, the bureau noted that Hualan had been added to the list for “acquiring and ... attempting to acquire US-origin items in support of military modernization for [China's] People's Liberation Army.”While encrypting the data on the medium is important for both internal and external media once the owner no longer has physical control of the device, it is worth noting a potential downside. If there were to be a remotely exploitable vulnerability in the mechanism for generating a new key and thus sanitizing the device, ransomware groups task would be much easier. Instead of having to both exfiltrate and encrypt the files, both time-consuming processes, once exfiltrated they could remotely sanitize the drives, a much quicker process.
Yet nearly two years later, Hualan—and in particular its subsidiary known as Initio, a company originally headquartered in Taiwan that it acquired in 2016—still supplies encryption microcontroller chips to Western manufacturers of encrypted hard drives, including several that list as customers on their websites Western governments' aerospace, military, and intelligence agencies: NASA, NATO, and the US and UK militaries. Federal procurement records show that US government agencies from the Federal Aviation Administration to the Drug Enforcement Administration to the US Navy have bought encrypted hard drives that use the chips, too.
...
Hualan's Initio chips are used in encrypted storage devices as so-called bridge controllers, sitting between the USB connection in a storage device and memory chips or a magnetic disk to encrypt and decrypt data on a USB thumbdrive or external hard drive. Security researchers' teardowns have shown that storage device manufacturers including Lenovo, Western Digital, Verbatim, and Zalman have all at times used encryption chips sold by Initio.
It is not a surprise that the Circular Drive Initiative is being driven by Chia Network. Bitcoin and many other blockchains waste electricity in the form of provably performed but otherwise useless computation to defend against Sybil attacks, a technique called Proof-of-Work. In contrast, Chai's blockchain defends against Sybil attacks by wasting hardware and some electricity in the form of provably performed but otherwise useless data storage, a technique called Proof-of-Space-and-Time. I have written a number of posts about Chia:
The technology is clever but significantly more complex than Bitcoin's or Ethereum's. There is some truth to the idea that it is less environmentally damaging than Proof-of-Work; if scaled up by 2,100 times to Bitcoin's size it would certainly uses less electricity but would consume 53,000EB of storage (about 10 times the global annual production). It would likely have at least as big an e-waste problem because the way it uses hard disk and SSDs is well outside their design parameters. Chia could definitely reduce their cost by using the remaining life of retired drives. But by reducing their cost they would reduce the effectiveness of their Sybil defense.
Source |
But to give you some idea of the insanity of allowing anyone to invent their own "money", Chia is in the top 1% of the universe of cryptocurrencies. The average coin has a "market cap" of about $100M, the median coin has a "market cap" of about $25M, about 10% of Chia's.
1 comment:
Scharon Harding's Backblaze probes increased annualized failure rate for its 240,940 HDDs reports on Andy Klein's Backblaze Drive Stats for Q2 2023 and as usual the data is fascinating:
"The drive model with the oldest average age is still the 6TB Seagate (model: ST6000DX000) at 98.3 months (8.2 years), with the oldest drive of this cohort being 104 months (8.7 years) old.
The oldest operational data drive in the fleet is a 4TB Seagate (model: ST4000DM000) at 105.2 months (8.8 years). That is quite impressive, especially in a data center environment, but the winner for the oldest operational drive in our fleet is actually a boot drive: a WDC 500GB drive (model: WD5000BPKT) with 122 months (10.2 years) of continuous service."
This quarter's data shows:
"The AFR for Q2 2023 was 2.28%, up from 1.54% in Q1 2023. While quarterly AFR numbers can be volatile, they can also be useful in identifying trends which need further investigation. In this case, the rise was expected as the age of our fleet continues to increase. But was that the real reason?"
The analysis involved in answering that question is well worth reading.
Post a Comment