Thursday, March 21, 2019

Cost-Reducing Writing DNA Data

In DNA's Niche in the Storage Market, I addressed a hypothetical DNA storage company's engineers and posed this challenge:
increase the speed of synthesis by a factor of a quarter of a trillion, while reducing the cost by a factor of fifty trillion, in less than 10 years while spending no more than $24M/yr.
Now, a company called Catalog plans to demo a significant step in the right direction:
The goal of the demonstration, says Park, is to store 125 gigabytes, ... in 24 hours, on less than 1 cubic centimeter of DNA. And to do it for $7,000.
That would be 1E11 bits for $7E3. At the theoretical maximum 2 bits/base, it would be $3.5E-8 per base, versus last year's estimate of 1E-4, or around 30,000 times better.

If the demo succeeds, it marks a major achievement. But below the fold I continue to throw cold water on the medium-term prospects for DNA storage.

Catalog's technique is different from experiments such as Microsoft's, which synthesize DNA strands a base at a time
When Park and Roquet formed CATALOG in 2016, they shunned the idea of assembling bases one by one to represent the digital “alphabet.” ... CATALOG opted for prefab: it buys or makes fragments of DNA, “in massive quantities,” and then assembles with a custom-made liquid-handling robot.

“DNA molecules are like Lego blocks,” says Park. “We can string them together in virtually infinite combinations. We take advantage of that and start with a few hundred molecules to generate in the end, trillions of different molecules.”

Park likens the approach to movable type. Instead of having to write out every letter each time you want to write something, old-style typesetters cast their letters in advance, and then slotted them into position.
But, despite Catalog's technical ingenuity, they face enormous obstacles to market success:
  • $56,000/TB is still an extraordinarily expensive storage medium. 10TB hard disks retail at around $300, so are nearly 2000 times cheaper. 125GB in 24 hours is around 1.4MB/s, compared to the 240MB/s transfer rate of a single current 10TB drive. Since DNA storage has both very slow write and read, and is not rewritable, it is restricted to competing in the archival storage market. It has to be much cheaper than tape and optical media, not just hard disk, before it can compete successfully.
  • Catalog's pitch is based on the idea that demand for data storage is insatiable:
    For a startup, a solution is less important than a solid problem, Park told the Weinert Center’s Distinguished Entrepreneurs Lunch on Feb. 27. And Park’s problem – the glut of information sometimes called the “datapocalypse” — is a result of a tsunami of data from pretty much every sphere of human activity.
    But, as I discussed Where Did All Those Bits Go?, the actual shipment data for storage vendors shows this is a fallacy. The demand for storage media, like the demand for any good, depends upon the price. At current prices demand for bytes of hard disk is growing steadily but more slowly than the Kryder rate, so unit shipments are falling.
  • As I pointed out a year ago in Archival Media: Not a Good Business. The total market is probably less than $1B/yr, and new archival media have to compete with legacy media, such as hard disk, whose R&D and manufacturing investments have long been amortized. Given the long latency of DNA storage, to compete with these fully depreciated and much faster media it has to be vastly cheaper.
  • In The Future Of Storage I discussed the fundamental problems of long-lived media such as DNA, including:
    The research we have been doing in the economics of long-term preservation demonstrates the enormous barrier to adoption that accounting techniques pose for media that have high purchase but low running costs, such as these long-lived media.
To sum up, while Catalog may be able to demonstrate a significant advance in the technology of DNA storage, they will still be many orders of magnitude away from a competitive product in the archival storage market.

1 comment:

David. said...

The team from Microsoft Research and U.W. have a paper and a video describing a fully-automated write-store-read pipeline for DNA. This is, I believe, a first automated end-to-end demonstration. From their abstract:

"Our device encodes data into a DNA sequence, which is then written to a DNA oligonucleotide using a custom DNA synthesizer, pooled for liquid storage, and read using a nanopore sequencer and a novel, minimal preparation protocol. We demonstrate an automated 5-byte write, store, and read cycle with a modular design enabling expansion as new technology becomes available."

Their system is base-at-a-time, so it is still slow:

"Our system’s write-to-read latency is approximately 21 h. The majority of this time is taken by synthesis, viz., approximately 305 s per base, or 8.4 h to synthesize a 99-mer payload and 12 h to cleave and deprotect the oligonucleotides at room temperature. After synthesis, preparation takes an additional 30 min, and nanopore reading and online decoding take 6 min."

Again, this is a significant step forward, but a practical product is a long way away.