A big part of what biologists do is to catalog the diversity of life. That diversity can take many forms. There are hundreds of thousands of species of beetles on Earth, for example. There are also some untold number of genes in the human genome that play a part in cancers of different types.
In his early years, the Nobel-Prize winning biologist James Watson infamously derided nature’s catalogers as “stamp collectors.” Simply creating lists didn’t seem to get at the heart of things–like the structure of DNA, which Watson helped decipher. But it’s impossible to fully understand nature only by taking it apart into its smallest parts. An organism is a sum of those parts, as is an ecosystem or a biosphere. Watson implicitly acknowledged that fact when he became the first director of the Human Genome Project, which sought to sequence all our genes. The catalog itself–whether it’s a catalog of plant species in a forest, or bacteria in a microbiome, or genes in a genome–doesn’t automatically reveal insights on its own. But scientists can explore a catalog to test their ideas about large-scale biology.
Right now, cancer biologists are in the midst of a similar catalog project. They are spotting cancer genes by examining the genomes of cancer cells and comparing them to the genomes of normal ones. So far, they’ve identified roughly 150 such cancer genes. Presumably there’s a finite number of such cancer genes, and some scientists argue that knowing them all would give cancer research a huge boost. We could, in effect, get to know all the tricks in cancer’s bag.
But, as I write in my latest “Matter” column in the New York Times, scientists may have a long way to go to finish such a catalog. A new study suggests that scientists will need to examine 100,000 cancer samples–about ten times the number they’ve looked at so far–to get close to finishing it.
One thing that I found fascinating in working on this column is the method that the scientists used to estimate how much work remains. You might think that was an unknowable answer. It’s as if cancer biologists are walking on a foggy road at night, with a flashlight casting a beam that only reaches a few feet ahead. How can they know how far they have left to go until they get to the end of the road?
What the scientists did in this case is look at the genes they had found so far. They randomly picked out cancer samples, creating sets of various sizes. If they were close to finding all the cancer genes, they’d expect that the more samples they looked at, the fewer new genes they’d add to their list.
It turns out that other scientists use similar methods to gauge how much diversity remains to be discovered. Scientists who are cataloging the residents of our gut chart their curve of discovery to see if it’s reaching a plateau. And, as I wrote in the Times in 2011, biologists use a similar method to estimate the total number of species on Earth, based on the history of discovery up to this point.
I was surprised to discover this link between these different catalogs. But in hindsight, it makes sense. Cancer biologists and beetle experts are not all that different when you think about it. They walk the same foggy road together, a long way from the end.