DOE Human Genome Program Contractor-Grantee
60. Information Systems to Support Experimental and Computational Research into Complex Biological Systems and Functional Genomics: Several Pilot Projects
Jay Snoddy1, Denise Schmoyer4, Kathe Fischer5, Gwo-Lin Chen5, Miriam Land3, Sergey Petrov1, Sheryl Martin4, Ed Michaud3, Bob Barry7, Gene Rinchik3, Peter Hoyt3, Mitch Doktycz 2, and E. Uberbacher1
1Computational Biosciences Section, Life Sciences Division; 2Biochemistry and Biophysics Section, Life Sciences Division; 3Mammalian Genetics and Development Section, Life Sciences Division; 4Toxicology and Risk Analysis, Life Sciences Division; 5Computational Physics Division; 6Computer Science and Mathematics Division; and 7Robotics and Process Systems Division; Oak Ridge National Laboratory, P.O. Box 2009, Oak Ridge, TN 37831; 5Department of Biochemistry, Cellular, and Molecular Biology, University of Tennessee, Knoxville, TN 37996
In order to promote a research capability that can help understand complex biological systems, the ability to acquire, manage, and interpret the complex information of biology is a prerequisite. To study biological systems, computerized systems must function to automate the routine operations that are needed in large-scale, data-driven research projects. Secondly, information systems must permit the biologist to analyze data from both data-driven and hypothesis-driven research; this analytical support needs to connect genome-scale data or other data-driven approaches with more focused, smaller-scale hypothesis-driven research into more complex biosystems. These analytical connections need to be made, in part, by generating inferences and supporting the decision-making of biologists.
Chip-based technologies for mRNA expression analysis are a large-scale, data-driven approach that can supply experimental information that is useful in exploring tissue-specific systems and pathways. In addition, advances in genomics, mutagenesis/ phenotype screening, and other areas facilitate a higher-throughput mouse biology research for insights into complex traits and systems. For example, a pilot project was recently initiated to begin developing information systems that can help ORNL and Tennessee Mouse Genome Consortium (TMGC) and other collaborators to acquire insights into complex biological systems. This will result in a Complex Biosystems Information Warehouse (CBIW) that will be developed in ORACLE 8i and is closely associated with the Genome Information Warehouse (see related abstract of Petrov et al.).
Users will enter data and acquire data from specific modules that are application-specific. Four related bioinformation modules are currently planned that will be supported by this data warehouse. These are:
These systems are all interrelated, will need to share some data, and will need to work together.
The first proposed information module, MuTrack, must acquire information about specific mice, tissues, and especially molecular samples and track them as they are processed through mouse phenotype screens and other experiments. This system, once out of pilot stage, must track the distribution of mice, track mouse tissue samples, and catalog observations about the phenotypes of these mice. Part of this problem that needs to be solved for the TMGC is moving mutagenized mice and samples around from ORNL to UT Memphis, Vanderbilt University, and other sites and returning phenotype screen data to a central, shared information system. This system also needs to connect to the GIMS chip expression system, especially in sharing information about mouse RNA samples sent by the mouse biologists to the chip lab for expression analysis. The chip lab must also return some data to be integrated with other observations about specific mice or mice strains.
An electronic notebook is being developed to provide some of this needed functionality and will be demonstrated at the meeting. The general electronic notebook approach should be able to allow a reasonable compromise among the power of an information system (e.g. ability to query the data) and the required flexibility in different kinds of lab data that can be stored.
GIMS, the second information module, will acquire, automate, and interpret data produced by the Genosensor chips and other similar microarrays (see related abstract of Doktycz et al.) This information system will address one of the major current bottlenecks to this technology--the data handling and, especially, the computational data interpretation to find patterns in this expression data. We are using commercially available software modules for some of this component--at least initially--but other operational logic and analytical reasoning will need to be developed to glue together these different components and provide for both operational and analytical support.
Gene and Protein Catalog is a user interface to new data about the structure and system functions of genes and proteins. This user interface is being designed and developed so that it can take data from both the Genome Information Warehouse and the Complex Biosystems Information Warehouse. It should provide access to relevant new data discovered by our experimental collaborators, any expert-curated information, predicted gene and protein models from genome annotation, and any cross-links to the underlying archival data from community databases. A pilot project, for example is testing the addition of single nucleotide polymorphisms (SNPs) that are in or next to GenScan and Grail-EXP-predicted genes.
The last information module, CompariSys, is proposed to follow on after the other systems are further developed. This system should help create classify and cross-link homologous genes and proteins. This will assist the user to extrapolate from genes and systems in the mouse, for example, to genes and systems in the human. It will use existing methods of sequence similarity, conservation of synteny, protein classification, and possibly other developing methods like large-scale phylogenetic gene tree generation, to help navigate and create links among the gene and system data found in MuTrack, Gene and Protein Catalog, and GIMS. This should allow a user or another computer system to automatically move from data about one gene to data about homologous genes, proteins, and systems. This will provide a comparative approach that is critical to understand and navigate the biological data about genes, proteins, and the pathways or systems that involve those genes and proteins.
(Research sponsored by the Office of Biological and Environmental Research, USDOE under contract number DE-AC05-96OR22464 with Lockheed Martin Energy Research Corp.)
|The online presentation of this publication is a special feature of the Human Genome Project Information Web site.|