S3's data is, in the jargon, hot. It has to be available with a low latency. Internally, Amazon has a lot of data with even shorter latency requirements. They could store this data in flash, but that is costly. Before the advent of flash, the only way to provide low latency for hot data on disk was to "short-stroke" the drives. Using only a small range of the tracks on the disk meant that the seek time between accesses to the data was minimized. But it was expensive.
Glacier's data is cold, Amazon is prepared for it to take several hours to access. Suppose the disk is shared between a small amount, say 15% of hot S3 data, and a large amount, say 85%, of cold Glacier data. S3 data generates at least 5.5c/GB/mo, or $660/TB/yr. Glacier data generates 1c/GB/mo, or $120/TB/yr. Consider a group of 3 3TB drives in service for 4 years. They will generate:
- S3 15% of 3TB at $660/TB/yr = $1188.
- Glacier 85% of 3TB at $120/TB/yr = $1224.
- 3 drives say $225
- 3 drives worth of server say $135
- Power, space, cooling, etc. at the rate Backblaze reports (about 1/9 of hardware purchase per year) say $160.