Thursday, August 29, 2019

SSD vs. HDD (Updated)

IDC & TrendForce data
via Aaron Rakers
Chris Mellor's How long before SSDs replace nearline disk drives? starts with a quote I think the good Dr. Pangloss would love:
Aaron Rakers, the Wells Fargo analyst, thinks enterprise storage buyers will start to prefer SSDs when prices fall to five times or less that of hard disk drives. They are cheaper to operate than disk drives, needing less power and cooling, and are much faster to access.
Below the fold, some skepticism.

Mellor continues:
So when will the wholesale switch from nearline HDD to SSDs begin? We don’t have a clear picture yet but a chart of $/TB costs for enterprise SSDs and nearline disk drives shows how much closer the two storage mediums have come in the past 18 months.

It is unwise to extrapolate too much but it is clear the general trend direction is that Enterprise SSD cost per terabyte is falling faster than nearline disk drive cost/TB. Our chart below shows the price premium for enterprise SSDs has dropped from 18x in the fourth 2017 quarter to 9x in the second 2019 quarter.
The chart is interesting but I think Rakers estimate of 5x as the tipping point is too optimistic, for several reasons:
  • The purchase cost of an HDD is much more than 20% of the power and cooling costs over its service life.
  • So the benefit of the extra speed would have to be large. But nearline drives only see accesses that the SSDs above them in the storage hierarchy don't handle. So speed isn't as important as low $/TB. Speed in nearline is nice, but it isn't what the nearline tier is for.
  • As SSDs get cheaper, the size of the tier above will grow slowly but steadily relative to the nearline tier. The upper tier will service more of the traffic, which will reduce the benefit of SSD's speed in the nearline tier.  At 5x their cost won't justify wholesale replacement of the nearline tier.
  • The recent drop in SSD price reflects the transition to 3D flash. The transition to 4D flash is far from imminent, so this is a one-time effect.
  • Industry projections should always be taken with many grains of salt, as my earlier posts about the good Dr. Pangloss indicate.
Update 4th Sept. 2019:

Western Digital shares my skepticism, as Chris Mellor now reports in Holy MAMR: Western Digital's 18TB and 20TB microwave disk drives out soon:
Western Digital said demand for high-capacity data centre disk drives will keep up over the next few years as it told the world it would begin shipping samples of its new MAMR 18 and 20TB drives over the next four months.
Seagate's 2009 roadmap
But note that, as has been true for an entire decade, we are still being promised that MAMR (and HAMR) will happen next year:
WD will sample the HC650 and HC550 drives to select customers by the end of the year, with qualification and volume shipments beginning in the first half of 2020. In other words, volume ships of these drives could be up to nine months away.


Blissex2 said...

«18 and 20TB drives»

Disk drives like that still have only one arm, so IOPS-per-TB are terrible. In practice I reckon that drives over 2TB are best thought as sequential tapes capable of quick but infrequent random access, rather than random access devices. Flash SSD instead have so many IOPS that even a large number of TBs of capacity still result in good IOPS-per-TB ratios.

David. said...

IOPs measure the latency of small, random transfers. SSDs are good at this, which is why they live in a tier above the nearline tier of the storage hierarchy. If the nearline tier of your storage hierarchy is primarily serving small, random transfers there something seriously wrong with your storage system's design. IOPs/$ or IOPs/TB are not useful criteria for the nearline tier, since it should not be serving lots of small random I/Os. Ideally, its workload should be mostly large, contiguous writes, making write bandwidth the important criterion.

Also, see Chris Mellor's Seagate spins off a bit of cash from slowing disk drive business:

"Seagate's MACH.2 dual-actuator tech will begin shipping later this calendar year, starting "around" the 20TB capacity point, the firm's CEO Dave Mosley has confirmed.

Competitor Western Digital is also developing dual-actuator technology to increase disk drive IO rates."

David. said...

Chris Mellor's Amazon drops infrequent access file storage prices reports that:

[AWS' Steve Roberts] cited “Industry analysts such as IDC, and our own analysis of usage patterns confirms, that around 80 per cent of data is not accessed very often. The remaining 20 per cent is in active use.”

AWS cut prices on EFS IA by 44% when Lifecycle Management is used to automate moving files that haven't been accessed recently from Standard to Infrequent Access. The two tiers are transparent to applications, but:

"The data remains accessible within the same file system namespace albeit with a slightly higher latency; double digit ms vs single digit ms"

This is probably the difference between SSD and HDD. Standard costs $0.30/GB-month, and IA costs $0.025/GB-month, or 8.3% of Standard. If this were solely due to the hardware cost difference between SSD and HDD, SSD would cost 12x HDD, so this is plausible.

David. said...

In SSDs are on track to get bigger and cheaper thanks to PLC technology reports that Intel and Toshiba are announcing 5 bits/cell NAND technology, promising (somewhat less than) 25% improvement in density over QLC flash. He points out one of the downsides of PLC:

"Unfortunately, while PLC SSDs will likely be bigger and cheaper, they'll probably also be slower. Modern SSDs mostly use TLC storage with a small layer of SLC write cache. As long as you don't write too much data too fast, your SSD writes will seem as blazingly fast as your reads—for example, Samsung's consumer drives are rated for up to 520MB/sec. But that's only as long as you keep inside the relatively small SLC cache layer; once you've filled that and must write directly to the main media in real time, things slow down enormously."

Another downside is that the error rate of PLC will be worse than QLC, necessitating more bits devoted to error correction. A third downside is that endurance will be reduced, meaning more of the write bandwidth is consumed by internal refresh cycles. So, overall, the potential improvement is likely less than 20%. The effect on the nearline layer will thus likely be relatively small.

David. said...

AT The Register, industry veteran Hubbert Smith makes a related point:

"QLC is only 25 per cent better capacity than TLC, and with every generation the industry trades slower and slower performance with poorer write endurance. With just 25 per cent better capacity than TLC, QLC shows diminishing returns."