Using DNA as a digital storage medium promises the ability to store vast quantities of information, with a raw limit of one exabyte — equivalent to one billion gigabytes — per cubic millimeter. Now, say the researchers, they have produced the first demonstration of random access in large-scale DNA data storage.
When using DNA as a digital storage medium, the data must be converted from digital zeros and ones to the molecules of DNA (adenine, thymine, cytosine, and guanine). To restore the data to its digital form, the DNA must be sequenced and the files decoded back to zeros and ones - a process that becomes more challenging as the amount of data increases.
Currently, say the researchers, without the ability to perform random access - the ability to access any item of data from a population of addressable elements as easily and efficiently as any other - recovering stored data on a large scale requires all the DNA in a pool to be sequenced, even if only a subset of the information needs to be extracted. In addition, the DNA synthesis and sequencing processes are error-prone, which can result in data loss.
The researchers addressed these problems by designing and validating an extensive library of primers - short strands of RNA or DNA that serve as a starting point for DNA synthesis - and by developing a new decoding algorithm. Using synthetic DNA, the researchers encoded and successfully retrieved 35 distinct files ranging in size from 29 kilobytes to over 44 megabytes, representing a record-setting 200 megabytes of high-definition video, audio, images, and text - a significant increase over the previous record of 22 megabytes set by previous researchers.
"Our work reduces the effort, both in sequencing capacity and in processing, to completely recover information stored in DNA," says Microsoft Senior Researcher Sergey Yekhanin, who was instrumental in creating the codec and algorithms used to achieve the team's results.