Awaken The World Through Enlightened Media

The Rise of DNA Data Storage

by Megan Molteni: Could DNA as an archival medium be the solution to our information overload?

Awaken

The 144 words of Robert Frost’s seminal poem “The Road Not Taken” fit neatly onto a single printed page. Or in a 1-kilobyte data file. Or, in Hyunjun Park’s hands, in a few drops of water at the bottom of a pink Eppendorf tube. Well, really what’s inside the water: invisible floating strands of DNA.

Scientists have long touted DNA’s potential as an ideal storage medium; it’s dense, easy to replicate, and stable over millennia. And in the past few years, researchers have encoded all kinds of things in those strings of As, Ts, Cs, and Gs: War and Peace, Deep Purple’s “Smoke on the Water,” a galloping horse GIF. But in order to replace existing silicon-chip or magnetic-tape storage technologies, DNA will have to get a lot cheaper to predictably read, write, and package.

That’s where scientists like Park come in. He and the other cofounders of Catalog, an MIT DNA-storage spinoff emerging out of stealth on Tuesday, have come a long way since encoding their first poetic kilobyte by hand a year and a half ago. Now they’re building a machine that will write a terabyte of data a day, using 500 trillion molecules of DNA. They plan to launch industrial-scale storage services for IT companies, the entertainment industry, and the federal government within the next few years—joining several much larger tech companies, like Microsoft, Intel, and Micron, which are funding their own DNA storage projects.

If successful, DNA storage could be the answer to a uniquely 21st-century problem: information overload. Five years ago humans had produced 4.4 zettabytes of data; that’s set to explode to 160 zettabytes (each year!) by 2025. Current infrastructure can handle only a fraction of the coming data deluge, which is expected to consume all the world’s microchip-grade silicon by 2040.

Most digital archives—from music to satellite images to research files—are currently saved on magnetic tape. Tape is cheap. But it takes up space. And it has to be replaced roughly every 10 years. “Today’s technology is already close to the physical limits of scaling,” says Victor Zhirnov, chief scientist of the Semiconductor Research Corporation. “DNA has an information-storage density several orders of magnitude higher than any other known storage technology.”

How dense exactly? Imagine formatting every movie ever made into DNA; it would be smaller than the size of a sugar cube. And it would last for 10,000 years.

The trouble of course, is cost. Sequencing—or reading—DNA has gotten far less expensive in the last few years. But the economics of writing DNA remain problematic if it’s going to become a standard archiving technology. DNA-synthesis companies like Twist Bioscience charge between 7 and 9 cents per base. That means a single minute of high-quality stereo sound could be stored for just under $100,000.

Catalog thinks it can rewrite those cost curves by decoupling the process of writing DNA from the process of encoding it. Traditional methods map the sequence of bits—0s and 1s—onto a sequence of DNA’s four base pairs. In 2016, when Microsoft set a record by storing 200 megabytes of data in nucleotide strands, the company used 13,448,372 unique pieces of DNA. What Catalog does, instead, is cheaply generate large quantities of just a few different DNA molecules, none longer than 30 base pairs. Then it uses billions of enzymatic reactions to encode information into the recombination patterns of those prefab bits of DNA. Instead of mapping one bit to one base pair, bits are arranged in multidimensional matrices, and sets of molecules represent their locations in each matrix.

Share

Leave a Reply