First, Expanding the Molecular Alphabet of DNA-Based Data Storage Systems with Neural Network Nanopore Readout Processing by S. Kasra Tabatabaei et al from UIUC. From their abstract:
Here, we describe a prototype of a DNA data storage system that uses an extended molecular alphabet combining natural and chemically modified nucleotides. Our results show that MspA nanopores can discriminate different combinations and ordered sequences of natural and chemically modified nucleotides in custom-designed oligomers. We further demonstrate single-molecule sequencing of the extended alphabet using a neural network architecture that classifies raw current signals generated by Oxford Nanopore sequencers with an average accuracy exceeding 60% (39× larger than random guessing). ... Overall, the extended molecular alphabet may potentially offer a nearly 2-fold increase in storage density and potentially the same order of reduction in the recording latency, thereby enabling new implementations of molecular recorders.Jenna Kurtzweil's news release Expanded alphabet, precise sequencing make DNA the next data storage solution provides a readable summary. Essentially, the team constructed DNA segments with eleven different bases instead of the normal four, thereby increasing the amount of data that could be stored in a given length of DNA. Since both the rate at which bases can be added to a DNA molecule (writing), and the rate at which the DNA can be passed through a nanopore to identify the bases (reading) are limited, increasing the data content per base increases the data rate. Secondly, the team used machine learing to improve the accuracy of the nanopore reading process, thereby reducing the need for additional error-correcting data. This again improves the read data rate.
Second, Dynamics of driven polymer transport through a nanopore by Kaikai Chen et al from Cambridge's Cavendish Lab and U. Massachusetts. From their abstract:
The movement of polymers into and out of confinement is also the basis for a wide range of sensing technologies used for single-molecule detection and sequencing. Acquiring an accurate understanding of the translocation dynamics is an essential step in the quantitative analysis of polymer structure, including the localization of binding sites or sequences. Here we use synthetic nanopores and nanostructured DNA molecules to directly measure the velocity profile of driven polymer translocation through synthetic nanopores.Through the nano hole: Lego technique reveals the physics of DNA transport through nanopores by co-author Nicholas Bell provides a summary for the lay reader. He writes:
At the moment the process of reading the sequence of the DNA relies on the use of molecular motors that slow down the DNA sufficiently to allow reading of the sequence. However, the molecular motors only make steps every few milliseconds which limits the read-out speeds. If we can avoid the use of these molecular motors, the reading speed could be greatly increased. To achieve this goal we need a quantitative understanding of the physics of DNA molecules driven through a nanopore.Kaikai Chen writes:
These results will help improve the accuracy of nanopore sensors in their various applications, for instance in localising specific sequences on DNA with nanometer accuracy, or detecting diseases early through target RNA detection. The superior resolution in analysing molecules passing through nanopores will also allow for low- error decoding of digital information stored on DNA. We are exploring and improving the utility of nanopore sensors for their applications in DNA/ RNA sequence detection, DNA data storage and DNA sequence mapping.Because this research increases understanding of the precise dynamics of the DNA molecule as it passes through the nanopore, it appears to offer the prospect of being able to pass it through much faster, while retaining the accuracy with which bases are identified and thus increasing the read data rate.
It seems likely that a combination of these two efforts would result in a really significant increase in the read data rate.
At this year's Library of Congress Designing Storage Architectures meeting, a session included several talks on DNA storage:
- Karin Strauss updated the status of the Microsoft/UW team that I last reported in December. Their publications can be found here.
- Three years ago I reported on Catalog, who encode data not in individual bases, but in short strands of pre-synthesized DNA. The idea is to sacrifice ultimate density for write speed. David Turek reported that, by using conventional ink-jet heads to print successive strands on dots on a polymer tape, they have demonstrated writing at 1Mb/s.
- David Chadash of Twist Bioscience reported on a synthesis chip using technology which I think is similar to Microsoft's capable of writing 1TB of data. He also reported on a significant development, the establishment of the DNA Storage Alliance including most of the players in the space.