Chris Fields, Gifford Keen, Shelley March, David Rider, John Rorlich, and Charles Troop
National Center for Genome Resources, 1800 Old Pecos Trail, Santa Fe, NM 87505 USA
High-throughput production sequencing operations require robust information management systems to function efficiently and cost-effectively. Key requirements for such systems include the following:
Process integration: The sequencing process includes steps ranging from clone library preparation through both automated and interactive analysis of the resulting sequence data. A robust data management system must represent all steps in this process in a way that allows queries to access and correlate any data or metadata generated during the process.
User-definable procedures: Procedures for materials preparation, sequencing, and data analysis may change weekly. Use of inappropriate or obsolete procedures can cause data inconsistency and render results unusable. The data management system must provide data representations and user interfaces that allow new procedures to be defined, accessed by those implementing them, and tracked as they are applied.
Quality-control and failure analysis: Lack of appropriate data management often seriously impacts quality control. The data management system needs to support both longitudinal and retrospective data quality analyses spanning the entire sequencing and analysis process. The system must also support explicit failure-mode tracking.
Cost accounting and process optimization: Often it is unclear to a laboratory's managers how much alternative processes cost in materials and staff time. Even the locations of relevant bottlenecks are often hard to identify. The data management system must track resource and personnel use and process success and failure rates and conditions in a way that allows appropriate cost accounting and process optimization.
NCGR is developing a detailed requirements analysis and an implementable functional specification for a Sequencing Information Management System with these capabilities. This system will run on a commercial relational database management platform with multiplatform graphic user interfaces, and will include an applications programming interface capable of supporting both public and commercial data analysis tools.
*Supported by Cooperative Agreement 95ER62062 with the U.S. Department of Energy, Office of Health and Environment Research.
 Fields, C. in Automated Technologies for Genome Characterization, ed. T. Beugelsdijk, Wiley (in press).
Return to Table of Contents