It was eight years ago that some computer programmers got together and issued a manifesto for something they called open source software. Conventional software development–kept hidden behind walls of intellectual property, copyright, and secrecy–was clumsy and slow. It would be far better, the open source advocates declared, to make software open to all. It would foster the growth of a vast decentralized community of developers and consumers who could work together to create better software together. Individuals would grab software created by others, tinker with it, and then make it available in turn to the community for more testing and tinkering.
The open source movement may not have taken over the world yet, but it certainly has thrived. Take a look at the web site for the eighth annual Open Source Convention. Along with a vast range of talks about everything from Perl to seventeenth-century censorship, you may notice the big corporations such as IBM that are sponsoring the event. Corporate fear of the open source movement seems to be shifting to acceptance, if not enthusiasm. Another sign of the open source movement’s health is its influence beyond the world of software development, from mash-ups to Wikipedia to open source as a force for democracy to open-source biology.
Its success has drawn curious minds back to the origins of the open source movement. In a sense, people were thinking about it long before it had a name. Eric Raymond, one of the founders of the official open source movement, puts its origins four decades ago, in the hacker culture of the 1960s. Back then it was expected that each hacker would share his secrets with the rest of the hacker tribe.
I’d suggest that Raymond is not be thinking big enough. The open source movement is a wee bit older. Instead of four decades, try four billion years.
Biologists have long recognized some striking parallels between genes and software. Genes stored information in a language of DNA, with the four nucleotides serving as its alphabet. A genetic code allowed cells to translate the information in genes into the separate language of proteins, which used an alphabet of twenty amino acids. From one generation to the next, mutations introduced slight tweaks to the software. Sex combined different versions of subroutines. If the software performed better–in the sense that an organism had more reproductive success–the changes might become incorporated into the genome across an entire species. This was only a metaphor, but it was a powerful one. One example of its power is the rise of genetic algorithms. Rather than trying to find a perfect solution to a problem–the ideal shape for a plane, for example–genetic algorithms create simulations and tweak them through a process that mimics evolution. The algorithm can seek out good solutions very effectively.
This sort of evolution resembles old-fashioned, closed-source software. All of the innovations happen in-house–that is, within a single species. None of the solutions from one species can be incorporated into the operating system of another. While this process has indeed been an important one in the history of life, a number of scientists have argued for an open-source side to evolution.
The simplest example is antibiotic resistance. A person who gets sick with dysentery takes a single kind of antibiotic, and before you know it, he or she has a gut full of multiple-drug resistant pathogens. This occurs because the harmless bacteria in the gut may carry genes that provide massive resistance to many antibiotics. They can pass these genes en masse to a few of the newly arrived pathogens. The antibiotics kill off the vulnerable bugs, leaving behind the resistant ones to thrive.
Well beyond antibiotic resistance, genes are showing an impressive capacity to move from species to species. The causes of this transfer are many, but viruses have proven to be the most important. They pick up genes from one host and insert them into another. Viruses themselves can mix their genomes together, blending genes from their hosts in the process. This horizontal gene transfer is not so important in the evolution of animals like us (although we do carry thousands of dead viruses in our genome). But it is very important among single-celled organisms. Single organisms (and viruses) represent most of the genetic diversity on the planet, and for most of its history life was microbial. So this trade in genes has been a significant force. In many cases these transferred genes turned out to be useless to their new hosts. But often enough, the new genes proved to be a powerful addition to a genome. And in their new home, those genes were modified through ordinary natural selection before being spread back out to other species. The parallels to Linux and other examples of open source software did not go unnoticed by biologists. One recent review described this process as open-source evolution.
Some scientists have argued that the sort of gene trading we see today is nothing like what was going on during the early years of life on Earth. Back then, primitive organisms did not have major barriers in place to keep foreign genes from joining their genomes. No one has championed this view more than Carl Woese of the University of Illinois. Woese is famous for having carried out the first significant studies on the tree of life by analyzing molecules from different species. He discovered that life formed three major branches, the eukaryotes (that includes us), bacteria (E. coli and such), and archaea (microbes that some scientists argue are closer to eukaryotes than to bacteria). In recent years Woese has argued that the base of this tree was actually more of a web of relationships, as genes moved from host to host.
I’ve been reading Woese’s latest venture into this weird time, a paper that was just published in the Proceedings of the National Academy of Sciences called “Collective evolution and the genetic code.” The genetic code is the dictionary by which genes can be translated into proteins. Each amino acid corresponds to a series of three nucleotides in a gene. In some cases, different triplets produce the same amino acids. Thus, twenty amino acids are generated from sixty possible DNA triplets. This new paper is something of a nostalgia trip for Woese, inasmuch as before he started studying the tree of life in the 1960s, he helped make sense of the genetic code.
One of the biggest discoveries of that time was that the genetic code is pretty much the same in every living organism on Earth. Scientists have long debated how the same genetic code wound up in all living things. Why twenty amino acids? Why three nucleotides? One possibility was that it was just a “frozen accident.” Another has been that it evolved in an ancient lineage and provided an evolutionary edge against others with different codes. Indeed, when scientists run computer simulations to compare it to alternative codes, it does work extremely well.
Woese, along with Kalin Vetsigian and Nigel Goldenfeld, also of the University of Illinois, offer an open-source perspective on the origin of the genetic code. They argue that in the gene-swapping early world, genes encoded proteins in a very sloppy manner. Cells had not yet evolved the enzymes that ensure that one codon always produces one amino acid. The translation was thus much rougher than today. An ambiguous dictionary may seem like a serious handicap for any species. But back then, there were no modern, finely-tuned organisms around to compete with the early organisms. They weren’t great, but they were the best at the time.
Evolution gradually produced more precise genetic codes, Woese and his colleagues argue, but different communities of microbes evolved different codes. In each community, a shared code made it easier for microbes to share genes. If you plug a gene into an organism with a radically different code, it will produce a radically different protein–mostly likely one that is useless as well. It’s like grabbing a piece of software and trying to run it on the wrong operating system.
The more microbes used the same genetic code, the bigger the pool of genes they could all take advantage of. Those shared innovations benefited the entire community as it competed with communities with other genetic codes. Imagine microbes colonizing some bizarre new ecological niche–a seep of petroleum, for example, or undersea volcanic chambers. The microbes that can take advantage of more innovations will outcompete the ones that belong to the smaller community. This advantage would also drive the evolution of different genetic codes to be more like one another, because communities of microbes would get access to even more innovations.
Over time, the benefits of a big innovation pool wiped out the original diversity of rare codes, replacing it with one universal language. Only later did life begin to lose its communal nature and begin to evolve into separate lineages that we see now as the tree of life. While those lineages produced things as different as humans and bacteria, they all share the same genetic code that evolved during that communal age.
Woese’s provocative ideas are least consistent with the evidence at hand, and he and his colleagues offer some mathematical simulations in the paper that show that gene swapping can drive life to a universal genetic code, but that our more familiar generation-to-generation heredity cannot. But we’re still in the early days yet of understanding the nature of life four billion years ago. (I wrote about some competing models in Science in May.) At this point, what intrigues me most is how the human-based open source movement may be able to shed light on the early evolution of life on Earth (and vice versa). How have open source languages merged together to make meaningful exchanges of innovation possible? Perhaps open source evolution could inspire new ideas about building software the open-source way–much as genetic algorithms emerged a couple decades ago.
This blog being semi-open source (no hacking my text, please), I invite your comments…
Originally published July 12, 2006. Copyright 2006 Carl Zimmer.