In the mid-2000s, David Markovitz, a scientist at the University of Michigan, and his colleagues took a look at the blood of people infected with HIV. Human immunodeficiency viruses kill their hosts by exhausting the immune system, allowing all sorts of pathogens to sweep into their host’s body. So it wasn’t a huge surprise for Markovitz and his colleagues to find other viruses in the blood of the HIV patients. What was surprising was where those other viruses had come from: from within the patients’ own DNA.
HIV belongs to a class of viruses called retroviruses. They all share three genes in common. One, called gag, gives rise to the inner shell where the virus’s genes are stored. Another, called env, makes knobs on the outer surface of the virus, that allow it to latch onto cells and invade them. And a third, called pol, makes an enzyme that inserts the virus’s genes into its host cell’s DNA.
It turns out that the human genome contains segments of DNA that match pol, env, and gag. Lots of them. Scientists have identified 100,000 pieces of retrovirus DNA in our genes, making up eight percent of the human genome. That’s a huge portion of our DNA when you consider that protein coding genes make up just over one percent of the genome.
Scientists have studied these so-called endogenous retroviruses both in humans and in other species, and the evidence all points to the same scenario for how they genetically merged with us. Our ancestors were infected with retroviruses on a regular basis. On rare occasion, a virus infected a sperm or egg and managed to end up in an embryo. Every new cell in the embryo inherited the retrovirus DNA implanted in its genome. And then the embryo grew up into an adult, which then had offspring of its own, and passed the virus DNA on as well.
At first, the virus still retained some of its old powers. Its DNA could sometimes still give rise to new viruses. Mutations arose in the viral genes, and they might prevent it from making shells. Yet the dying virus could still make a new copy of its genes and insert them back into its host genome. That would explain why it’s possible to classify our many endogenous retroviruses into different families. The families are made up of new copies of an ancestral virus.
Eventually, however, the endogenous retroviruses got so hobbled by mutations that they became nothing more than baggage. (In some cases, we’ve domesticated their genes, co-opting them for our own functions, such as building a placenta.) Given that many matching endogenous retroviruses can be found in other primates, this process has been going on for millions of years–even tens of millions.
The world of our inner viruses is still a murky, mysterious one that scientists are still surveying. And Markovitz’s discovery enabled him to add considerably to our understanding of these shadowy creatures. He discovered new members of a particularly interesting class of endogenous retroviruses–ones that, even today, can still have life breathed into them.
Markovitz and his colleagues analyzed the sequence of the virus genes they found in the patients with HIV. The genes belonged to a family of endogenous retroviruses called HERV-K, but they were not quite like any known HERV-K virus previously found.
The Michigan scientists wondered if this new HERV-K virus was hidden in the human genome. They checked the most complete draft of the human genome and couldn’t find a match. They knew that the human genome sequence was only about 95% finished, so they turned instead to the chimpanzee genome, on the off chance that the virus had infected the common ancestor of humans and chimpanzees over six million years ago. Bingo: a single copy of the virus turned up in the chimp genome. They dubbed it K111.
Having found this match, the scientists decided to return to the human genome and search for K111. They isolated DNA from their HIV patients, as well as from healthy people. They then split apart the two strands of the DNA and added a short piece of DNA that would bind to K111, should it be lurking there. In all 189 of their subjects, the scientists found the virus’s DNA.
Remarkably, though, the scientists didn’t find just one copy of K111 in each of their subject’s genomes, as is the case in chimps. The more the scientists looked, the more variants they found. Some K111 viruses were fairly intact, while others were vestiges. The scientists found over 100 copies of the virus in the human genome, scattered across fifteen chromosomes.
To figure out the origin of K111, the scientists looked back at other primates. They couldn’t find a version of K111 in any species other than chimpanzees. They concluded that the virus infected our ancestors not long before the split between humans and chimpanzees roughly six million years ago.
To find out what happened next, Markovitz and his colleagues turned to the genomes of extinct humans. Svante Paabo of the Max Planck Institute and his colleagues have sequenced the Neanderthal genome, as well as the genome of a lineage of mysterious cousins of Neanderthals, known as Denisovans. Our own ancestors diverged from those of Neanderthals and Denisovans about 800,000 years ago. Markovitz and his colleagues looked for K111 in their genomes, and there it was. The scientists found seven copies of K11 in Neanderthal DNA and four in the Denisovan genome.
This finding suggests that between 6 million and 800,000 years ago, K111 was duplicated a few times at a fairly slow pace. It’s possible that Markowitz and his colleagues missed some other copies because the reconstruction of those ancient genomes wasn’t quite accurate enough for their search. But even if we generously assumed that Neanderthals and Denisovans had twenty K111 viruses apiece, that’s still a small fraction of the 100 or more copies of K111 the scientists found in the human genome. It was only later, in the past 800,000 years, that K111 started proliferating at a faster pace.
One reason that K111 has gone overlooked till now is that it found a good place to hide–the center of chromosomes. This region, called the centromere, is a genomic Bermuda Triangle. It’s loaded with lots of short, repetitive stretches of DNA. When scientists reconstruct the sequence of a genome, they break DNA down into many overlapping segments, which they then try to rebuild based on overlapping similarities. Centromere DNA is so similar to itself that it’s easy to line up fragments in many different arrangements. As a result, centromeres make up much of the last 5% of the human genome that has yet to be mapped.
Another reason K111 has been able to hide for so long is that it’s fairly feeble. It lacks genes to make shells, so it can’t escape from its host cells any more. In fact, it was our own centromeres that appear to have made all the extra copies of K111. The repeating DNA in centromeres is not just tricky for human gene sequencers. It’s also tricky for the enzymes in a cell that make new copies of our DNA. They can slip up and accidentally swap segments from two chromosomes. K111 was thus able to spread from the centromere of one chromosome to another. Our cells also stutter sometimes when they try to copy centromere DNA, making extra copies of segments there. Markovitz and his colleagues argue that this is how new copies of K111 proliferated within each centromere.
Ironically, it was the HIV in the patients Markovitz and his colleagues studied which brought K111 back to light. When people get infected with HIV, the virus makes a protein called Tat which uncoils tightly wound stretches of human DNA, which allows its host cell to make more HIV at a faster rate.
Markovitz and his colleagues wondered if the Tat in their HIV-infected patients was spurring cells to also make copies of K111. To find out, they injected Tat proteins into human cells that were free of HIV. As they predicted, out came new genes for K111.
It’s conceivable that K111 interacts with HIV to contribute to AIDS, but Markovitz and his colleagues found no evidence of that. It’s certainly worth investigating further. But there’s another reason to keep learning about K111. Â Now that scientists have discovered K111, they can look for more copies of it in centromeres. Markovits suggests that their distinctive genes might serve as a kind of genetic barcode that could help genome mappers orient themselves in the hall of mirrors that is centromere DNA. Perhaps the human genome sequence will finally be completely mapped thanks to a virus that has been hiding in it for six million years.
(For more information, see my book, A Planet of Viruses.)
Originally published May 10, 2013. Copyright 2013 Carl Zimmer.