All of the world's information, about 1.8 zettabytes, could be stored in about four grams of DNA
by Lucas Mearian
Researchers have created a way to store data in the form of DNA, which can last for tens of thousands of years.
The encoding method makes it possible to store at least 100 million hours of high-definition video in about a cup of DNA, the researchers said in a paper published in the journal Nature this week.
The researchers, from UK-based EMBL-European Bioinformatics Institute (EMBL-EBI), claimed to have stored encoded versions of an .mp3 of Martin Luther King's "I Have a Dream" speech, along with a .jpg photo of EMBL-EBI and several text files.
"We already know that DNA is a robust way to store information because we can extract it from wooly mammoth bones, which date back tens of thousands of years, and make sense of it," Nick Goldman, co-author of the study at EMBL-EBI, said in a statement. "It's also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy."
Reading DNA is fairly straightforward, but writing it has been a major hurdle. There are two challenges: First, using current methods, it is only possible to manufacture DNA in short strings. Secondly, both writing and reading DNA are prone to errors, particularly when the same DNA letter is repeated.
Nick and co-author Ewan Birney, associate director of EMBL-EBI, set out to create a code that overcomes both problems. The new method requires synthesizing DNA from the encoded information. EMBL-EBI worked with California-based Agilent Technologies, a maker of electronic and bio-analytical measurement instruments such as oscilloscopes and signal generators, to transmit the data and then encode it in DNA.
Agilent downloaded the files from the Web and then synthesized hundreds of thousands of pieces of DNA to represent the data. "The result looks like a tiny piece of dust," said Emily Leproust of Agilent.
Agilent then mailed the sample to EMBL-EBI, where the researchers were able to sequence the DNA and decode the files without errors.
This is not the first time DNA has been shown to be an effective method of storing data. Last fall, researchers at Harvard University demonstrated the ability to store 70 billion copies of a book in HTML form in DNA binary code.
The researchers created the binary code through DNA markers to preserve the text of the book, Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves in DNA.
The Harvard researchers stored 5.5 petabits, or 1 million gigabits, per cubic millimeter in the DNA storage medium. Because of the slow process for setting down the data, the researchers consider the DNA storage medium suitable only for data archive purposes -- for now.
"The total world's information, which is 1.8 zettabytes, [could be stored] in about four grams of DNA," Sriram Kosuri, a senior scientist at Harvard's Wyss Institute and senior author of the paper explaining the science, said at the time.
Researchers are pursuing methods of storing data in smaller and smaller packets because of the tremendous growth of data.
During the next eight years, the amount of digital data produced will exceed 40 zettabytes, which is the equivalent of 5,200GB of data for every man, woman and child on Earth, according to the latest Digital Universe study by research firm IDC.
The majority of data between now and 2020 will not be produced by humans but by machines as they talk to each other over data networks. That would include, for example, machine sensors and smart devices communicating with other devices.
"We've created a code that's error tolerant using a molecular form we know will last in the right conditions for 10,000 years, or possibly longer," Nick said. "As long as someone knows what the code is, you will be able to read it back if you have a machine that can read DNA."
The researchers said the next step in development is to perfect the coding scheme and explore practical aspects, paving the way for a commercially viable DNA storage model.
|This graphic details the mind-boggling numbers.|
A zettabyte (symbol ZB, derived from the SI prefix zetta-) is a quantity of information or information storage capacity equal to 1021 bytes or 1,000 exabytes (or one sextillion (one long scale trilliard) bytes).
As of April 2012, no storage system has achieved one zettabyte of information. The combined space of all computer hard drives in the world was estimated at approximately 160 exabytes in 2006. This has increased rapidly however, as Seagate reported selling 330 exabytes worth of hard drives during the 2011 Fiscal Year. As of 2009, the entire World Wide Web was estimated to contain close to 500 exabytes. This is a half-zettabyte.
1,000,000,000,000,000,000,000 bytes = 10007 bytes = 1021 bytes
The term "zebibyte" (ZiB), using a binary prefix, is used for the corresponding power of 1024.
Comparisons for scale: A zettabyte is equal to 1 billion terabytes.
The world's technological capacity to receive information through one-way broadcast networks was 0.432 zettabytes of (optimally compressed) information in 1986, 0.715 in 1993, 1.2 in 2000, and 1.9 (optimally compressed) zettabytes in 2007 (this is the informational equivalent to every person on earth receiving 174 newspapers per day).
According to International Data Corporation, the total amount of global data is expected to grow to 2.7 zettabytes during 2012. This is 48% up from 2011.
Mark Liberman calculated the storage requirements for all human speech ever spoken at 42 zettabytes if digitized as 16 kHz 16-bit audio. This was done in response to a popular expression that states "all words ever spoken by human beings" could be stored in approximately 5 exabytes of data (see exabyte for details). Liberman did freely confess that "maybe the authors [of the exabyte estimate] were thinking about text".
Research from the University of Southern California reports that in 2007, humankind successfully sent 1.9 zettabytes of information through broadcast technology such as televisions and GPS.
Research from the University of California, San Diego reports that in 2008, Americans consumed 3.6 zettabytes of information.