A study published in Nature last summer has revealed a much more complex view of the vast, uncharted regions of the human genome than previously supposed. “Junk DNA,” noncoding sequences that make up the bulk of the genome’s 3 billion letters, may indeed have a purpose. Now the challenge is to figure out what all that DNA is for. Doing so may prove crucial for understanding complex human diseases.

“We’re trying to map out what’s there,” said Michael Snyder, Ph.D., professor of molecular, cellular and developmental biology.

Snyder’s lab is part of the Encyclopedia of DNA Elements (ENCODE) project, a mammoth undertaking of the National Human Genome Research Institute (NHGRI) at the National Institutes of Health (NIH) involving 35 groups of researchers at 80 institutions in 11 nations. Researchers have spent the last four years sifting through more than 400 million data points to make sense of just 1 percent of the human genome. Their analysis has turned up some surprises.

For one thing, the genome hosts a lot more activity than expected. The conventional wisdom has long held that the important pieces of DNA—the readily decipherable genes making up 1.5 percent of the genome—are converted into RNA via a process called transcription. RNA in turn instructs the cell to make proteins. Scientists have long assumed that in general each gene is transcribed into one RNA fragment and that the remaining gene-free portions of our DNA aren’t transcribed at all.

Not so, according to the ENCODE project. Most letters in the genomic instruction manual wind up being transcribed. Each gene is often transcribed along with a surprisingly large number of nonprotein-coding (NPC) sequences to produce some extraordinarily long RNA fragments. A single gene can be transcribed into many different RNA fragments of varying lengths. The purpose of all these extra transcripts remains unclear.

Even more perplexing is the prevalence of RNA molecules transcribed entirely from gene-free portions of the genome. NPC RNA transcripts were previously known to exist, but the ENCODE project identified many new ones. Again, their purpose is unknown.

Snyder is even more excited that the project has identified new regulatory regions that do not encode proteins but instead control when, where and to what extent genes are expressed. Recent studies have linked complex diseases with variations in NPC regions of the human genome that could have regulatory functions. Might variations in NPC DNA promote disease by interfering with the expression of genes at distal sites?

Snyder and his collaborators hope that the project will answer such questions. “This is really what the ENCODE project is all about,” he said.