Tuesday, October 27, 2015

More interesting numbers from Backblaze

On the 17th of last month Amazon, in some regions, cut the Glacier price from 1c/GB/month to 0.7c/GB/month. It had been stable since it was announced in August 2012. As usual with Amazon, they launched at an aggressive and attractive price, and stuck there for a long time. Glacier wasn't under a lot of competitive pressure, so they didn't need to cut the price. Below the fold, I look at how Backblaze changed this.

The announcement also introduced a new S3 product, called S3 Infrequent Access:
The new S3 Standard – Infrequent Access (Standard – IA) storage class offers the same high durability, low latency, and high throughput of S3 Standard. You now have the choice of three S3 storage classes (Standard, Standard – IA, and Glacier) that are designed to offer 99.999999999% (eleven nines) of durability.‎ Standard – IA has an availability SLA of 99%.
...
Prices for Standard – IA start at $0.0125 / gigabyte / month (one and one-quarter US pennies), with a 30 day minimum storage duration for billing, and a $0.01 / gigabyte charge for retrieval (in addition to the usual data transfer and request charges). Further, for billing purposes, objects that are smaller than 128 kilobytes are charged for 128 kilobytes of storage.
The competitive landscape changed 5 days later. Backblaze announced their B2, a storage service priced at 0.5c/GB/month. The details are interesting:
  • Storage: the first 10GB is free, then it costs 0.5c/GB/month.
  • Inbound bandwidth: is free.
  • Outbound bandwidth: is free for the first GB each day, above that it is 0.5c/GB.
  • Retrieval: $0.004 per 10,000 API calls. The first 2500 per day are free.
  • All other API calls: $0.004 per 1,000 calls. The first 2500 per day are free.
Although this pricing competes with Glacier, the B2 service actually competes with S3. Amazon needed to make sure that Glacier did not compete with S3, so it has its own API with significant access latency. But B2 appears to have the S3 API, so it does compete with S3. B2's storage is 2.5 times cheaper than S3's cheapest option with simpler and cheaper access charges.

How, despite Amazon's dominance of the cloud and its consequent economies of scale, can Backblaze undercut them substantially? Their explanation is:
For nearly a decade, Backblaze has focused on building the lowest cost cloud storage for its cloud backup business. Through a combination of the Backblaze Storage Pod server design, Backblaze Vault cloud storage file system, and highly operationalized processes, the company has a system that scales to zettabytes incredibly cost effectively.
My explanation is threefold. First, like most things in the real world, economies of scale follow an S-curve. You have to get to a certain size to get any economy at all, but once you get a certain amount bigger the economies of getting even bigger tail off. Backblaze, at about 150PB of storage, is big enough to get almost all the possible economies of scale.

Second, although Amazon is famous for running on very low margins, I've been pointing out for a long time that their margins on storage are extortionate.

Third, S3 and Backblaze take different approaches to reliability and availability. Backblaze uses erasure coding to provide 20/17 replication - the data is broken into blocks, the blocks grouped into 17s, then 3 parity blocks derived from the 17 in such a way that the blocks can be regenerated if no more than 3 of them are lost. Backblaze reports
Backblaze stores each file redundantly across multiple drives, in multiple servers, in multiple locations in our datacenter. For details on this, read our Backblaze Vault post. As a result, the Backblaze Cloud Storage system is designed for 99.999999% durability.
S3 is designed for 11 nines of durability, instead of B2's 8 nines. The key differences are actually more about availability than reliability, since S3 replicates across 3 of Amazon's data centers in a region and Backblaze has only one. So for the extra money you get more reliability and availability. Given that you need to have at least one copy outside your cloud provider, it isn't clear that the extra is important.

The other interesting numbers from Backblaze, as usual, are their reports on the reliability of the disk drives they use. Here are the Q2 and Q3 reports, showing that the 4TB drive generation is performing well, and the early signs for the 6TB generation look good.

3 comments:

Unknown said...

David,
Andy Klein here from Backblaze. Very even-handed treatment of things in your post. Wanted to know if you've signed up for the B2 Beta or would like to do so? Certainly your thoughts on what we are building would be very useful. Thanks for your time.
---Andy

David. said...

Thanks, Andy. I'm swamped with work on the report on emulation and other stuff, including a list of 13 topics I need to write blog posts about. So the last thing I need right now is new stuff to try! Be happy this post was simple enough I could write it while waiting for a delayed flight.

Unknown said...

I have done some of my best work while waiting in airports. Stay busy.
---Andy