HudsonAlpha

Across several news outlets this week, there's been talk of storing the world's data on DNA strands. You can read one of those articles here while Neil Lamb, HudsonAlpha's director of educational outreach, gives a bit more insight into exactly how the process would work...

Scientists have developed a method for storing documents, images and sound files inside the strands of the DNA double helix. The technology could open new avenues to keep copies of your favorite photos, that short story you wrote in fifth grade or those home movies of Christmas and birthday parties. Best of all, the technology would be safe for thousands of years and would take up less space than a tube of lipstick.

Let’s back up for a moment and discuss storing data. Information, whether from text, image or sound, is digitally encoded as long strings of 0s and 1s. Eight of these digits make up a “byte” of information. A typed page is made up of 2,000 bytes while a movie download contains about a billion bytes. It’s been estimated that all of the world’s digital data takes up roughly three zettabytes (a billion trillion bytes).

DNA also uses a code to store information. In this case the code is four chemical “bases” – adenosine (A), thymine (T), guanine (G) and cytosine (C). Several years ago, scientists began to look at how the digital code of 1s and 0s could be stored inside the DNA. The digital string of 0s and 1s is rewritten as a series of A,T,C and G. (Keep in mind, the DNA fragments used for storage have no biological function and are kept inside a vial rather than inside a cell.) When stored under particular conditions, the DNA is stable for tens of thousands of years. When it’s time to recover the information, the DNA is sequenced and the order of the bases converted back to the corresponding bytes.

Early attempts to store information as DNA code directly mapped 0s and 1s onto the bases – for example, a 0 was represented by A or C and a 1 by T and G. Unfortunately, this approach is problematic when the string of 0s and 1s leads to a repeat in the DNA sequence – like CCCCC. Current DNA sequencing technology struggles to correctly identify these repeat regions, miscalculating how many “Cs” are present and introducing errors into the numerical data.

Here’s where the recent media attention comes into play. Nick Goldman and colleagues at the European Bioinformatics Institute in the UK have devised a method to minimize the likelihood of copying errors. Rather than use a direct link between 0s and 1s and DNA bases, they devised an intermediate code that prevents repeating bases. To further reduce errors, the original code is split into fragments four different ways, with the breakpoints occurring at different locations each time. This way, if an error does occur, other copies of the same region can be used as comparison.

The scientific team encoded multiple files, including part of an MP3 recording of Martin Luther King’s “I have a dream” speech, a text file of all the sonnets of William Shakespeare and a PDF of the 1953 paper by Francis Crick and James Watson describing the structure of DNA. All told, 757,000 bytes of information were encoded on over 153,000 DNA fragments. The scientists estimate their approach, which is described online in the journal Nature, can store over two petabytes (or two million billion bytes) of information on a single gram of DNA. That’s a mind-boggling amount of information contained in something about the size of 15 grains of sugar.

Speed and cost are the two biggest drawbacks to DNA-based storage. It took four days to synthesize the code into DNA and the process of sequencing and decoding the fragments required two weeks. The synthesis and decoding process costs $12,620 per megabyte of information – millions of times more expensive than storing data on magnetic tape. However, as technology continues to improve, both the price and timeframe are expected to drop dramatically. If current trends continue, the researchers estimate that in less than a decade DNA-based storage will be cost-effective for information stored 50 years or more. This could be especially useful for long-term archiving of governmental, historical or scientific data that only rarely would be accessed.

If you’ve ever had to search for a way to pull data from an old floppy disk, zip drive or VHS tape, you know how quickly digital storage technologies change. The researchers note DNA has been storing biological information for more than 3 billion years, meaning the odds are high it will be around in the future, available for conversion into whatever new technology civilizations are using to share data. Hang on to your CDs, DVDs and thumb drives a little bit longer, but this technology is certainly worth watching.

Dr. Neil Lamb is HudsonAlpha's director of

educational outreach. Trained as a human geneticist,

he now focuses his energy on creating programs and

activities that help Alabama's teachers, students and the public understand genetics and biotechnology.

So here’s a thought process I had a few months ago. It’s a trail, but just follow me.

I was reading an old edition of Edutopia’s Summer Rejuvenation Guide and the guide contained a try something new section which discussed Pecha Kucha nights:

Pecha Kucha is the onomatopoeic Japanese word for the sound of conversation. The equivalent English term is chit-chat. However, it tends to carry a slightly negative connotation like chitter-chatter or a frivolous exchange of words. (From Wikipedia).

[image credit]

According to Edutopia, Pecha Kucha is a fast paced speaking format in which speakers have only a few minutes and a few slides to make a point. Interesting, right? The article also mentioned Ignite, a competitive version of Pecha Kucha, with events all over the world. Ignite’s tagline is “enlighten us, but make it quick.”

My brain is now fully engaged so I follow the links to watch a few streamed videos at Ignite Show.*

Warning: these will suck you in completely. With titles such as “Commutapult”, “How to Fight Dirty in Scrabble” and “The Secret History of Fonts”, it’s easy to waste lots of time. I learned about “Botanicalls”, an iPhone app that tells you when to water your plants, and what physical computing really is, and why they are teaching ethics using social media in Australia, and how to get the best deal on a new car ……. I said they would suck you in.

In all seriousness, these videos feature people talking about what they’re passionate about. They are pointed, opinionated, funny, short, and I’m learning the whole time.

My brain immediately jumps to how I see real-life classroom application. I’m thinking my colleagues would be wicked good at this. Can we craft a few genetics/genomics lessons that take advantage of available media and a YouTube attention span?

*Ignite is not an educational website. Not all content is appropriate for the classroom.

Madelene Loftin works as an education specialist
at HudsonAlpha. She was named Mississippi's
Outstanding Biology Teacher of the Year in 2008.
Since joining HudsonAlpha, she's been inspiring
Alabama students to pursue careers in science while
inspiring science teachers to be better educators.

Pages

Friday, February 1, 2013

Could DNA store all the world's data?

Monday, January 28, 2013

A thought process

Tuesday, November 27, 2012

UAB epigenetics symposium

Search

Blog Archive