DNA Storage In a Zettabyte World

By Hubert Yoshida posted 11-15-2019 00:00

Ever since I saw the IDC report that projected the Global Data Sphere, would reach 175 zettabytes by 2025, I have posted several blogs sounding the alarm that we will not be able to store that amount of data. According to IDC, data will grow from 40 zettabytes in 2019 to 175 zettabytes by 2025, an increase of 135 zettabytes while the shipments of storage will only increase from 21.9 zettabytes in that same time frame. What makes it worse is that some of that new capacity will need to tech refresh existing storage capacity and about 30% of storage is allocated but unused. That will leave about 10 zettabytes of new capacity to store the growth of 135 new zettabytes of data! 

I do not see any way to solve this dilemma with current technologies. Even if we could double the QLC cell densities and stack hundreds of layers of 3D NAND, We could not come close to closing the gap before we ran out of electricity and spend all our time refreshing zettabytes of worn out flash drives every few years.

What disturbs me is that no one seems to be concerned about this mismatch. I guess we have been desensitized to these numbers by all the articles about the explosion of data. Everyone seems to be talking about the explosion of data from IoT, and the billions of new devices on the edge generating zettabytes of data, but no one seems have a solution about how we store that data. Everyone seems to trust that, somehow, the storage manufacturers will continue to build denser devices and Moore’s law will continue to save the day. I have noticed that even my blogs about data growth have the lowest number of hits. No one seems to care, but I keep blogging about this because I think it is so important.  

The only way that I can see to store zettabytes and future yottabytes of data is through the use of DNA Storage, an idea that has been around for over 60 years. The idea of DNA digital data storage dates back to 1959, when the physicist Richard P. Feynman, published a paper, "There's Plenty of Room at the Bottom: An Invitation to Enter a New Field of Physics". 

In 2012, George Church and colleagues at Harvard University published an article in which DNA was encoded with digital information that included an HTML draft of a 53,400 word book written by the lead researcher, eleven JPG images and one JavaScript program. They showed that 5.5 petabits can be stored in each cubic millimeter of DNA. (Some recent papers are projecting a zettabyte in one gram of DNA.) This demonstrated that DNA can be another type of storage medium such as hard drives and magnetic tapes, but with much higher densities.  

Ötzi, also called the Iceman, is the well-preserved natural mummy of a man who lived between 3400 and 3100 BC and was discovered in the Ötztal Alps in 1991. His DNA was found to be intact and provided a wealth of information including what he had eaten two hours before he died. In 2012 a DNA search linked him to 19 relatives in Austria's Tyrol region. This attested to the longevity of DNA storage. Imagine storing data for 1000s of year with no requirement for electrical power.

In recent years, researchers at Columbia University and the New York Genome Center published a method known as DNA Fountain that stored data at a density of 215 petabytes per gram of DNA. It costs $7000 to synthesize 2 megabytes of data and another $2000 to read it. Research published by Eurecom and Imperial College in January 2019, demonstrated the ability to store structured data in synthetic DNA. The research showed how to encode structured or, more specifically, relational data in synthetic DNA and also demonstrated how to perform data processing operations (similar to SQL) directly on the DNA as chemical processes.

DNA storage appears to be the only possible way to store data in a zettabyte world. It appears to be feasible, except for the cost and access speed. There will need to be a mix of conventional and DNA storage. Other technologies from biology, chemistry and microscopy may help to reduce costs and increase access speeds. Scanning tunneling electron microscopes and atomic force microscopes can capture images of DNA’s chemical building blocks (nucleotides)of adenine (A), cytosine (C), guanine (G), and thymine (T), the four main components that make up DNA and provide a quick readout of DNA images. A technique called CRISPR which was recently developed for gene therapy, can be used to edit DNA. There are also some articles that describe how CRISPR can be used to create a DNA computer.

There are many new developments that can be used to solve the storage crisis. We need to push for them and not be complacent. DNA storage will radically change how we store data, going from 0 and 1 bits and bytes to groups of four nucleotides). The change will be similar to what we will need to go through for quantum computing. There will be whole new infrastructures for creation, modification and access of DNA and a whole new set of storage and data management tools to be developed.  I suspect that, similar to quantum computing, most of this will be done in the cloud and storage will be provided as a service. The danger will be that it gives more control to the cloud providers. However, this seems to be the only way forward.