The Case for Junk DNA

Genomes are like books of life. But until recently, their covers were locked. Finally we can now open the books and page through them. But we only have a modest understanding of what we’re actually seeing. We are still not sure how much our genome encodes information that is important to our survival, and how much is just garbled padding.

Today is a good day to dip into the debate over what the genome is made of, thanks to the publication of an interesting commentary from Alex Palazzo and Ryan Gregory in PLOS Genetics. It’s called “The Case for Junk DNA.”

The debate over the genome can get dizzying. I find the best antidote to the vertigo is a little history. This history starts in the early 1900s.

At the time, geneticists knew that we carry genes–factors passed down from parents to offspring that influence our bodies–but they didn’t know what genes were made of.

Our DNA is a string of units called bases. Our cells read the bases in a stretch of DNA–a gene–and build a molecule called RNA with a corresponding sequence. The cells then use the RNA as a guide to build a protein. Our bodies contain many different proteins, which give them structure and carry out jobs like digesting food.

But in the 1950s, scientists also began to discover bits of DNA outside the protein-coding regions that were important too. These so-called regulatory elements acted as switches for protein-coding genes. A protein latching onto one of those switches could prompt a cell to make lots of proteins from a given gene. Or it could shut down the gene completely.

Meanwhile, scientists were also finding pieces of DNA in the genome that appeared to be neither protein-coding genes nor regulatory elements. In the 1960s, for example, Roy Britten and David Kohne found hundreds of thousands of repeating segments of DNA, each of which turned out to be just a few hundred bases long. Many of these repeating sequences were the product of virus-like stretches of DNA. These pieces of “selfish DNA” made copies of themselves that were inserted back in the genome. Mutations then reduced them into inert fragments.

Other scientists found extra copies of genes that had mutations preventing them from making proteins–what came to be known as pseudogenes.

The human genome, we now know, contains about 20,000 protein-coding genes. That may sound like a lot of genetic material. But it only makes up about 2 percent of the genome. Some plants are even more extreme. While we have about 3.2 billion bases in our genomes, onions have 16 billion, mostly consisting of repeating sequences and virus-like DNA.

The rest of the genome became a mysterious wilderness for geneticists. They would go on expeditions to map the non-coding regions and try to figure out what they were made of.

Some segments of DNA turned out to have functions, even if they didn’t encode proteins or served as switches. For example, sometimes our cells make RNA molecules that don’t simply serve as templates for proteins. Instead, they have jobs of their own, such as sensing chemicals in the cell. So those stretches of DNA are considered genes, too–just not protein-coding genes.

With the exploration of the genome came a bloom of labels, some of which came to be used in confusing–and sometimes careless–ways. “Non-coding DNA” came to be a shorthand for DNA that didn’t encode proteins. But non-coding DNA could still have a function, such as switching off genes or producing useful RNA molecules.

Scientists also started referring to “junk DNA.” Different scientists used the term to refer to different things. The Japanese geneticist Susumu Ohno used the term when developing a theory for how DNA mutates. Ohno envisioned protein-coding genes being accidentally duplicated. Later, mutations would hit the new copies of those genes. In a few cases, the mutations would give the new gene copies a new function. In most, however, they just killed the gene. He referred to the extra useless copies of genes as junk DNA. Other people used the term to refer broadly to any piece of DNA that didn’t have a function.

And then–like crossing the streams in Ghostbusters–junk DNA and non-coding DNA got mixed up. Sometimes scientists discovered a stretch of non-coding DNA that had a function. They might clip out the segment from the DNA in an egg and find it couldn’t develop properly. BAM!–there was a press release declaring that non-coding DNA had long been dismissed as junk, but lo and behold, non-coding DNA can do something after all.

Given that regulatory elements were discovered in the 1950s (the discovery was recognized with Nobel Prizes), this is just illogical.

Nevertheless, a worthwhile questioned remained: how of the genome had a function? How much was junk?

To Britten and Kohne, the idea that repeating DNA was useless was “repugnant.” Seemingly on aesthetic grounds, they preferred the idea that it had a function that hadn’t been discovered yet.

Others, however, argued that repeating DNA (and pseudogenes and so on) were just junk–vast vestiges of disabled genetic material that we carry down through the generations. If the genome was mostly functional, then it was hard to see why it takes five times more functional DNA to make an onion than a human–or to explain the huge range of genome sizes:

In recent years, a consortium of scientists carried out a project called the Encyclopedia of DNA Elements (ENCODE for short) to classify all the parts of the genome. To see if non-coding DNA was functional, they checked for proteins that were attached to them–possibly switching on regulatory elements. They found a lot of them.

“These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions,” they reported.

Science translated that conclusion into a headline, “ENCODE Project writes eulogy for junk DNA.”

A lot of defenders of junk have attacked this conclusion–or, to be more specific, how the research got translated into press releases and then into news articles. In their new review, Palazzo and Gregory present some of the main objections.

Just because proteins grab onto a piece of DNA, for example, doesn’t actually mean that there’s a gene nearby that is going to make something useful. It could just happen to have the right sequence to make the proteins stick to it.

And even if a segment of DNA does give rise to RNA, that RNA may not have a function. The cell may accidentally make RNA molecules, which they then chop up.

If I had to guess why Britten and Kohne found junk DNA repugnant, it probably had to do with evolution. Darwin, after all, had shown how natural selection can transform a population, and how, over millions of years, it could produce adaptations. In the 1900s, geneticists turned his idea into a modern theory. Genes that boosted reproduction could become more common, while ones that didn’t could be eliminated from a population. You’d expect that natural selection would have left the genome mostly full of functional stuff.

Palazzo and Gregory, on the other hand, argue that evolution should produce junk. The reason has to do with the fact that natural selection can be quite weak in some situations. The smaller a population gets, the less effective natural selection is at favoring beneficial mutations. In small populations, a mutation can spread even if it’s not beneficial. And compared to bacteria, the population of humans is very small. (Technically speaking, it’s the “effective population size” that’s small–follow the link for an explanation of the difference.) When non-functional DNA builds up in our genome, it’s harder for natural selection to strip it out than if we were bacteria.

While junk is expected, a junk-free genome is not. Palazzo and Gregory based this claim on a concept with an awesome name: mutational meltdown.

Here’s how it works. A population of, say, frogs is reproducing. Every time they produce a new tadpole, that tadpole gains a certain number of mutations. A few of those mutations may be beneficial. The rest will be neutral or harmful. If harmful mutations emerge at a rate that’s too fast for natural selection to weed them out, they’ll start to pile up in the genome. Overall, the population will get sicker, producing fewer offspring. Eventually the mutations will drive the whole population to extinction.

Mutational meltdown puts an upper limit on how many genes an organism can have. If a frog has 10,000 genes, those are 10,000 potential targets for a harmful mutation. If the frog has 100,000 genes, it has ten times more targets.

Estimates of the human mutation rate suggest that somewhere between 70 to 150 new mutations strike the genome of every baby. Based on the risk of mutational meltdown, Palazzo and Gregory estimate that only ten percent of the human genome can be functional.* The other ninety percent must be junk DNA. If a mutation alters junk DNA, it doesn’t do any harm because the junk isn’t doing us any good to begin with. If our genome was 80 percent functional–the figure batted around when the ENCODE project results first came out–then we should be extinct.

It may sound wishy-washy for me to say this, but the junk DNA debates will probably settle somewhere in between the two extremes. Is the entire genome functional? No. Is everything aside from protein-coding genes junk? No–we’ve already known that non-coding DNA can be functional for over 50 years. Even if “only” ten percent of the genome turns out to be functional, that’s a huge collection of DNA. It’s six times bigger than the DNA found in all our protein-coding genes. There could be thousands of RNA molecules scientists have yet to understand.

Even if ninety percent of the genome does prove to be junk, that doesn’t mean the junk hasn’t played a role in our evolution. As I wrote last week in the New York Times, it’s from these non-coding regions that many new protein-coding genes evolve. What’s more, much of our genome is made up of viruses, and every now and then evolution has, in effect, harnessed those viral genes to carry out a job for our own bodies. The junk is a part of us, and it, too, helps to make us what we are.

*I mean functional in terms of its sequence. The DNA might still do something important structurally–helping the molecule bend in a particular way, for example.

[Update: Fixed caption. Tweaked the last paragraph to clarify that it’s not a case of teleology.]