Tuesday, July 27, 2021

Yet Another DNA Storage Technique

An alternative approach to nucleic acid memory by George D. Dickinson et al from Boise State University describes a fundamentally different way to store and retrieve data using DNA strands as the medium. Will Hughes et al have an accessible summary in DNA ‘Lite-Brite’ is a promising way to archive data for decades or longer:
We and our colleagues have developed a way to store data using pegs and pegboards made out of DNA and retrieving the data with a microscope – a molecular version of the Lite-Brite toy. Our prototype stores information in patterns using DNA strands spaced about 10 nanometers apart.
Below the fold I look at the details of the technique they call digital Nucleic Acid Memory (dNAM).

The traditional way to use DNA as a storage medium is to encode the data in the sequence of bases in a synthesized strand, then use sequencing to retrieve the data. Instead:
dNAM uses advancements in super-resolution microscopy (SRM)15 to access digital data stored in short oligonucleotide strands that are held together for imaging using DNA origami. In dNAM, non-volatile information is digitally encoded into specific combinations of single-stranded DNA, commonly known as staple strands, that can form DNA origami nanostructures when combined with a scaffold strand. When formed into origami, the staple strands are arranged at addressable locations ... that define an indexed matrix of digital information. This site-specific localization of digital information is enabled by designing staple strands with nucleotides that extend from the origami.


In dNAM, writing their 20 character message "Data is in our DNA!\n" involved encoding it into 15 16-bit fountain code droplets then synthesizing two different types of DNA sequences:
  • Origami: There is one origami for each 16 bits of data to be stored. It forms a 6x8 matrix holding a 4 bit index, the 16 bits of droplet data, 20 bits of parity, 4 bits of checksum, and 4 orientation bits. Each of the 48 cells thus contains a unique, message-specific DNA sequence.
  • Staples: There is one staple for each of the 15x48 matrix cells, with one end of the strand matching the matrix cell's sequence, and the other indicating a 0 or a 1 by the presence or absence of a sequence that binds to the flourescent DNA used for reading.
When combined, the staple strands bind to the appropriate cells in the origami, labelling each cell as a 0 or a 1.


The key difference between dNAM and traditional DNA storage techniques is that dNAM reads data without sequencing the DNA. Instead, it uses optical microscopy to identify each "peg" (staple strand) in each matrix cell as either a 0 or a 1:
The patterns of DNA strands – the pegs – light up when fluorescently labeled DNA bind to them. Because the fluorescent strands are short, they rapidly bind and unbind. This causes them to blink, making it easier to separate one peg from another and read the stored information.
The difficulty in doing so is that the pegs are on a 10 nanometer grid:
Because the DNA pegs are positioned closer than half the wavelength of visible light, we used super-resolution microscopy, which circumvents the diffraction limit of light.
The technique is called "DNA-Points Accumulation for Imaging in Nanoscale Topography (DNA-PAINT)". The process to recover the 20 character message was:
40,000 frames from a single field of view were recorded using DNA-PAINT (~4500 origami identified in 2982 ┬Ám2). The super-resolution images of the hybridized imager strands were then reconstructed from blinking events identified in the recording to map the positions of the data domains on each origami ... Using a custom localization processing algorithm, the signals were translated to a 6 × 8 grid and converted back to a 48-bit binary string — which was passed to the decoding algorithm for error correction, droplet recovery, and message reconstruction ... The process enabled successful recovery of the dNAM encoded message from a single super-resolution recording.


The first thing to note is that whereas traditional DNA storage techniques are volumetric, dNAM like hard disk or tape is areal. It will therefore be unable to match the extraordinary data density potentially achievable using the traditional approach. dNAM claims:
After accounting for the bits used by the algorithms, our prototype was able to read data at a density of 330 gigabits per square centimeter.
Current hard disks have an areal density of 1.3Tbit/inch2, or about 200Gbit/cm2, so for a prototype this is good but not revolutionary, The areal density is set by the 10nm grid space, so it may not be possible to greatly reduce it. Hard disk vendors have demonstrated 400Gbit/cm2 and have roadmaps to around 800Gbit/cm2.

dNAM's writing process seems more complex than the traditional approach, so is unlikely to be faster or cheaper. The read process is likely to be both faster and cheaper, because DNA-PAINT images a large number of origami in parallel, whereas sequencing is sequential (duh!). But, as I have written, the big barrier to adoption of DNA storage is the low bandwidth and high cost of writing the data.

No comments: