Stalking the Perfect Tree

When Charles Darwin was thrashing out his theory of evolution, he would doodle sometimes in his notebooks. To explain how new species came into existence, he wrote down letters on a page and then connected them with branches. In the process, he created a simple tree. Across the top of the page, he wrote, “I think.”

That single tree has given rise to the thousands of trees that are published in scientific journals these days. A particular tree may show that humans are more closely related to chimpanzees than gorillas. It might show how the SARS virus in humans descends from viruses in other animals.

When you look at the picture of a tree in a scientific paper, it is easy to take it as an illustration of an unadorned fact. That is not, however, how science works. A tree represent a hypothesis that offers the best explanation of the data at hand. It shows the most likely pattern by which new species might have branched off one another, taking on new traits along the way, and giving rise to the range of species a scientist is studying.

These hypotheses are not simple to come by. Large-scale studies of phylogeny only became possible when computers began turning up the desks of biologists. You need that computing power because even in a simple comparison of a dozen species, there are so many alternative trees to test. Say you have 3 species, A, B, and C. A and B might be more closely related, or maybe A and C, or B and C. Three choices. But as you add more species, the possibilities explode to millions and more. Sifting through those possibilities takes both gigaflops and smart statistics.

From life’s long reign, we have relatively few pieces of information to figure out the shape of its tree. The first evolutionary biologists to draw trees could only compare features that they could see through a microscope or on a fossilized skeleton. These days, most trees are based on genes. Once scientists could sequence genes, they tapped into a far richer lode of information than previous generations could reach. What’s more, genes offer a much crisper picture of the evolutionary process than, say, a horn or a petal. After all, mutations to genes lead to inherited changes in how the body develops. Whereas the change to the body may be hard to tease out, the mutation may be as simple as snipping out a few nucleotides in a gene sequence.

But gene trees are not unadorned facts, either. Some genes have evolved relatively quickly, so that if you compare them in different species that took millions and millions of years to diverge, it may offer a distorted picture of how they are related. On the other hand, a gene that evolves too slowly may not be able to distinguish the fine details of a recent explosion (like the cichlids of Africa that I wrote about recently). In bacteria and other single-celled organisms, the picture gets even more fuzzy when you consider the fact that they can trade genes with one another, rather than just inheriting them from ancestors. In some regions, the tree of life is more like a mangrove, with branches grafting together rather than splitting apart.

One convenient thing about building evolutionary trees is that you can get an idea of how much confidence you can have in it. One way is to pick out a random subset of your data to base a new tree on. In some cases, the switch may produce a tree with a different shape. Perhaps just one section of it changes. Or perhaps the tree barely changes at all. By repeatedly testing the evidence in different combinations, it’s possible to estimate how likely each branch point is authentic.

Gene trees have shed a lot of light on the history of life. Just to pick one case among many, several different studies have strongly supported the notion that hippos are the closest relatives to whales on land. But these studies are like telescopes for looking back in evolutionary time, and they are only as precise as their design allows. Studying one gene in all animals may give you a different picture of animal evolution than studying a different one. It’s not as if one gene will point to snapdragons as the closest relative of fish, or link mushrooms and monkeys. But it can get hard to determine whether comb jellies are more closely related to jellyfish or to crustaceans, vertebrates, and other more complex animals. This is may sound esoteric, but it’s not really. If comb jellies are closer to us, scientists could find some important clues in them about our own evolution. If they’re out on a more distant branch, they aren’t so important to our own evolutionary story.

In recent years, some scientists have argued that the best way to bring the evolutionary telescope into tighter focus is to study a bunch of genes at once. Fortunately, in this age of genomics, we’re swimming in genes. Scientists have just started running studies in which they compare dozens of genes in various species. The results have been promising. But until now, no one had looked systematically at how much help multiple genes could offer to unsolved mysteries in phylogeny.

All of this is a very long preamble to a fascinating study in Nature this week from Sean Carroll at the University of Wisconsin and some of his current and former students. They looked at seven species of yeast, all of whose genomes have been fully sequenced in recent years. They picked out 106 genes in all seven species, choosing them because they clearly show signs of being variations of each other, descended from a common ancestral gene that duplicated many times Then they used each gene to come up with a tree showing how the yeast are related. Many of the genes produced different trees. Not surprising. What was surprising was what happened when they analyzed all 106 genes together. Suddenly, a single tree emerged as the most likely. And no matter how they tested the tree, they found 100% confidence at every node. As the authors note, this certainty is unprecedented, and they argue that they have established the evolutionary history of these seven species.

It seems that the annoying disagreements from individual genes fade away when a computer can crunch down on a lot of them. Carroll and his co-authors realized that they may have been indulging in overkill by using 106 genes, and so they narrowed down their data set to see how few genes they needed to get the same sort of overpowering results. They could get down to just 20 genes and still produce the same tree.

Carroll et al haven’t found the guaranteed method to figure out every evolutionary tree. Each group of species will have its own peculiarities to take into account. But their astonishing results offer a very sunny forecast for phylogenies in the next few years. The days of “I think” may be over.