Data Storage: Easy as ATCG
In the digital age, our world revolves around data. Archives of data provide proof of our own existence, such as birth records and proof of the mundanity of everyday life, like that grocery list you wrote in your Notes app. Tech companies in particular create a demand for efficient and secure data storage that is beginning to outstrip our current ability to store and retrieve this information. Today, we produce an almost inconceivable 16 zettabytes of data per year (1 zettabyte = 1 billion terabytes). Some estimates suggest that by 2040, worldwide data storage needs will outpace the expected supply of microchip-grade silicon, a material commonly used to store computer memory. As our data storage needs quickly near the limits of existing technology, data storage innovators have begun to look within themselves for solutions. Specifically, researchers have started to look within their own cells to the quintessential storage platform: the DNA molecule.
DNA, nature’s bespoke server farm, can store a data density of about 214 petabytes per gram (1 zettabyte = 1 million petabytes). That means that the 16 zettabytes of data that we currently produce per year could be stored in about 75,000 grams of DNA, or about 75 kg, the mass of about four large German Shepherds. Although biological molecules aren’t always associated with durability, DNA can withstand the test of time. When stored in a dry, cool place, DNA lasts for thousands of years, as evidenced by DNA sequenced from long-dead organisms including a 700,000 year old horse. Another advantage of DNA storage is that the tools required to copy data exist in each and every one of our cells: DNA polymerase, an enzyme which can read and copy DNA ad infinitum.
So far, plans for storing and retrieving data from DNA involve a specific workflow pioneered by Yaniv Erlich and Dina Zielinski. First, the files to be stored are encoded as binary strings of 1s and 0s, the ubiquitous language of computers. These binary strings of information are then assigned to short stretches of DNA using an algorithm the researchers call the “DNA Fountain.” From here, DNA can be synthesized in short tracts by biotech companies. By sequencing the DNA, the information contained within can be reassembled in the proper order to reconstitute the original files. Remarkably, the researchers reported complete fidelity in data retrieval – each of the files was entirely intact after being encoded in DNA.
Storage of information in DNA is not without its drawbacks. For example, the DNA polymerase molecules that replicate DNA are not error-proof. In fact, errors introduced by unfaithful replication are the grist of genetic variation and the basis of natural selection. However, scientists are hard at work designing algorithms that make DNA storage systems more refractory to these natural coding errors.
Furthermore, the capacity for efficient and compact data storage using the DNA molecule is still being expanded. Groundbreaking scientists have recently synthesized DNA made up of eight base pairs rather than four. Including our classic favorites, A, C, T and G, Hachimoji (“eight letters”) DNA also features four new base pairs: P, Z, B and S. The doubling of the DNA code has the capacity to dramatically increase the density at which information can be stored in the molecule.
The need for data storage will only accelerate in the years to come. In the meantime, scientists are taking inspiration from the DNA molecule: the solution nature has already been tweaking for four billion years.
Peer edited by Joanna Warren.
Follow us on social media and never miss an article: