Tuesday, August 8, 2017

Approaching The Physical Limits

As storage media technology gets closer and closer to the physical limits, progress on reducing the $/GB number slows down. Below the fold, a recap of some of these issues for both disk and flash.

The current examples of this are Heat Assisted Magnetic Recording (HAMR) and its planned successor Bit Patterned Media (BPM). As I wrote last December:
Seagate 2008 roadmap 
Here is a Seagate roadmap slide from 2008 predicting that the then (and still) current technology, perpendicular magnetic recording (PMR), would be replaced in 2009 by heat-assisted magnetic recording (HAMR), which would in turn be replaced in 2013 by bit-patterned media (BPM).
...

ASTC 2016 roadmap
Here is a recent roadmap from ASTC showing HAMR starting in 2017 and BPM in 2021. So in 8 years HAMR has gone from next year to next year, and BPM has gone from 5 years out to 5 years out. The reason for this real-time schedule slip is that as technologies get closer and closer to the physical limits, the difficulty and above all cost of getting from lab demonstration to shipping in volume increases exponentially.
HAMR is still slipping in real time. About the same time I was writing, Seagate was telling the trade press that:
It is targeting 2018 for HAMR drive deliveries, with a 16TB 3.5-inch drive planned, featuring 8 platters and 16 heads.
It is tempting to imagine that this slippage gives flash the opportunity to kill off hard disk. As I, among others such as Google's Eric Brewer, and IBM's Robert Fontana have pointed out, this scenario is economically implausible:
NAND vs. HDD capex/TB
The argument is that flash, despite its many advantages, is and will remain too expensive for the bulk storage layer. The graph of the ratio of capital expenditure per TB of flash and hard disk shows that each exabyte of flash contains about 50 times as much capital as an exabyte of disk. Fontana estimates that last year flash shipped 83EB and hard disk shipped 565EB. For flash to displace hard disk immediately would need 32 new state-of-the-art fabs at around $9B each or nearly $300B in total investment.
But there's also a technological reason why the scenario is implausible. Flash already hit one physical limit:
when cell lithography reached the 15-16nm area ... NAND cells smaller than that weren’t reliable data stores; there were too few electrons to provide a stable and recognisable charge level.
Despite this, flash is currently reducing in cost quite fast, thanks to two technological changes:
  • 3D, which stacks up to 96 layers of cells on top of each other.
  • Quad-Level Cell (QLC), which uses 16 voltage levels per cell to store 4 bits per cell.
Going from 2D to 3D is a one-time gain because, unfortunately, there are unsolved technical problems in reducing cost further by going from 3D to 4D. QLC requires more electrons per cell, so requires bigger cells to hold them. Reducing cost again by going to 32 voltage levels would need bigger cells again, so won't be easy or cost-effective. Thus the current rate of $/GB decrease is unlikely to be sustained.

At The Register, Chris Mellor has an clear, simple overview of the prospect for flash technology entitled Flash fryers have burger problems: You can't keep adding layers:
The flash foundry folk took on 3D NAND because it provided an escape hatch from the NAND scaling trap of ever-decreasing cell sizes eventually to non-functioning flash.

But 3D NAND, the layering of many 2D planar NAND chip structures, will run into its own problems.
The piece is quite short and easy to understand; it is well worth a read.

5 comments:

David. said...

Chris Mellor comes away from the Flash Summit enthusiastic about flash displacing hard disk:

"El Reg considers that 2.5-inch disk drive sales will collapse in the next couple of years, accelerating a trend that is already apparent."

and:

"With QLC flash, another layer count increase in 3D NAND, and string stacking of one 3D NAND die above another, we will surely be looking at 256TB SSDs in 2019/2020 when we might have 20TB disk drives.

IDC analysts are saying that the general solid-state drive price premium over disk should decline from 6.6x now to 2.2x in 2021."

I agree that in the long term this displacement is inevitable, but I have a long track record of pointing out that "industry analysts" are optimistic about time-scales. And I think Mellor is too focused on the density increase of tape, and not enough on its operational disadvantages.

David. said...

An anonymous storage industry insider deflates Chris Mellor's enthusiasm for flash killing disk with some arithmetic:

"The NAND industry is going to use 3D to try to get closer and closer to that 40 per cent density CAGR that they had before they started hitting limits on planar. But 40 per cent CAGR on 60EB in 2015 is ~320EB by 2020 – not even half what HDD vendors were putting on the market in 2015. They'd need 10X that level of output to reach 3ZB."

David. said...

Chris Mellor at The Register has a series looking at how 3D flash is made. Part 1 starts with 2D flash manufacture. Part 2 uses Toshiba's BiCS (Bit Cost Scalable) 3D NAND as an example of manufacturing 3D flash. So far, the series tends to reinforce my skepticism that the industry can sustain anything like 40% CAGR for the next 5 years.

David. said...

Four years ago Jim Handy posted 3D NAND: Making A Vertical String, a truly awesome look at what 3D flash manufacturing needs to cope with. It was part of a series What is 3D NAND? Why do we need it? How do they make it?.

David. said...

It isn't just storage that gets hard near the physical limits. Extreme UV chip defects may force a new approach to processor design by Peter Bright at Ars Technica is subtitled "EUV has been the next big thing in chip manufacturing for nearly 30 years" and explains:

"Chips built with extreme ultraviolet (EUV) light are plagued with random defects with no obvious solution, according to research presented at a chipmakers' conference reported in EETimes. The EUV hardware seems to work acceptably for 20nm or larger processes, but below this scale, small defects are cropping up that ruin the chip and prove hard to detect."