by JOHN MARKOFF: STANFORD, Calif. — Scientists at Stanford University and the J. Craig Venter Institute have developed the first software simulation of an entire organism, a humble single-cell bacterium that lives in the human genital and respiratory tracts.
The scientists and other experts said the work was a giant step toward developing computerized laboratories that could carry out many thousands of experiments much faster than is possible now, helping scientists penetrate the mysteries of diseases like cancer and Alzheimer’s.
“You read in the paper just about every week, ‘Cancer gene discovered’ or ‘Alzheimer gene discovered,’ ” said the leader of the new research, Markus W. Covert, an assistant professor of bioengineering at Stanford. “A lot of the public wonders, ‘Why haven’t we cured all these things?’ The answer, of course, is that cancer is not a one-gene problem; it’s a many-thousands-of-factors problem.”
For medical researchers and biochemists, simulation software will vastly speed the early stages of screening for new compounds. And for molecular biologists, models that are of sufficient accuracy will yield new understanding of basic cellular principles.
This kind of modeling is already in use to study individual cellular processes like metabolism. But Dr. Covert said: “Where I think our work is different is that we explicitly include all of the genes and every known gene function. There’s no one else out there who has been able to include more than a handful of functions or more than, say, one-third of the genes.”
The simulation of the complete life cycle of the pathogen, Mycoplasma genitalium, waspresented on Friday in the journal Cell. The scientists called it a “first draft” but added that the effort was the first time an entire organism had been modeled in such detail — in this case, all of its 525 genes.
The simulation, which runs on a cluster of 128 computers, models the complete life span of the cell at the molecular level, charting the interactions of 28 categories of molecules — including DNA, RNA, proteins and small molecules known as metabolites, which are generated by cell processes.
“The model presented by the authors is the first truly integrated effort to simulate the workings of a free-living microbe, and it should be commended for its audacity alone,” wrote two independent commentators, Peter L. Freddolino and Saeed Tavazoie, both of Columbia University, in an editorial accompanying the article. “This is a tremendous task, involving the interpretation and integration of a massive amount of data.”
They called the simulation an important advance in the new field of computational biology, which has recently yielded such achievements as the creation of a synthetic life form — an entire bacterial genome created by a team led by the genome pioneer J. Craig Venter. The scientists used it to take over an existing cell.
Efforts to build computer models of cell behavior are not new. A decade ago, scientists developed simulations of metabolism that are now being used to study a wide array of cells, including bacteria, yeast and photosynthetic organisms. Other models exist for processes like protein synthesis.
“These models are now in routine use around the world to study the metabolic properties of many organisms,” said Bernhard O. Palsson, a professor of bioengineering at the University of California, San Diego, who added that they were used commercially to formulate commodity chemicals and biofuels.
For the new computer simulation, the researchers had the advantage of extensive scientific literature on the bacterium. They were able to use data taken from more than 900 scientific papers to validate the accuracy of their software model.
Still, they said, the model of the simplest biological system was pushing the limits of their computers.
“Right now, running a simulation for a single cell to divide only one time takes around 10 hours and generates half a gigabyte of data,” Dr. Covert wrote. “I find this fact completely fascinating, because I don’t know that anyone has ever asked how much data a living thing truly holds. We often think of the DNA as the storage medium, but clearly there is more to it than that.”
In designing their model, the scientists chose an approach called object-oriented programming, which parallels the design of modern software systems. Software designers organize their programs in modules, which communicate with one another by passing data and instructions back and forth.
Similarly, the simulated bacterium is a series of modules that mimic the various functions of the cell.
“The major modeling insight we had a few years ago was to break up the functionality of the cell into subgroups, which we could model individually, each with its own mathematics, and then to integrate these submodels together into a whole,” Dr. Covert said. “It turned out to be a very exciting idea.”
M. genitalium, a parasite that causes sexually transmitted disease, has the smallest genome of any independent organism. It played a role in 2008 in the Venter Institute‘s synthesis of the first artificial chromosome; the researchers were able to stitch together the entire genome of the bacterium.
The bacterium, with its 525 genes, is far less complex than E. coli, another bacterium widely used in laboratory experiments; E. coli has 4,288 genes. The researchers said that more complex cells would present significant challenges. Currently it takes about 9 to 10 hours of computer time to simulate a single division of the smallest cell — about the same time the cell takes to divide in its natural environment.
“The real question on our minds is: what happens when we bring this to a bigger organism, like E. coli, yeast or even eventually a human cell?” Dr. Covert said. He noted that E. coli divided every 20 to 30 minutes and that the number of molecular interactions in E. coli was a much higher multiple, which would significantly extend the time required to run the simulation.
“I’ll have the answer in a couple of years,” he wrote.
A version of this article appeared in print on July 21, 2012, on page A14 of the New York edition with the headline: In a First, an Entire Organism (All 525 Genes) Is Simulated With Software.