Introduction to the Workshop
URLs Provided by Attendees
- Ethical, Legal, and Social Issues
The electronic form of this document may be cited in the following style:
Human Genome Program, U.S. Department of Energy, DOE Human Genome Program Contractor-Grantee Workshop IV, 1994.
Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected.
An Integrated Data Management System for DNA Sequencing
Arthur Kobayashi (email@example.com), David J. Ow, Mark C. Wagner, T. Mimi Yeh, and Tom Slezak
Human Genome Center, L-452, Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California 94550.
We are implementing an integrated data management system to support our local DNA sequencing efforts. The system consists of a loosely-coupled set of tools which allows users to define, track, process, and analyze sequence data. We have performed system analysis, design, and prototyping for the new system, and are currently implementing the various components. Our goals include minimizing data entry, consistent naming conventions for the various sample types, online interactive data access, automatic generation of sample sheets or setup files where possible, tracking replications (reprocessing of samples at any level), and ultimately, an integrated view of source, processing, analysis, and mapping data.
We have characterized the system based on the functions necessary to complete sequencing tasks. Functions may then be implemented using the most practical and cost-effective means available--commercial packages, custom software, etc. This approach gives a loose coupling where functions may be stand-alone single-purpose programs, or larger, multi-function programs. These programs may be modified or replaced with minimal impact to other parts of the system. The integration of the system is accomplished by our high-level model of sequencing tasks and sequencing data transformations, and the use of a common schema and relational database as our underlying data repository.
Sequence information consists of source, processing, analysis, and physical mapping data. We have developed a sequence source hierarchy which defines clone libraries using library, clone, prep, DNA, and sequence reaction levels. We have also defined "batch" tables in our database schema (where a batch is a user-defined set of similar items). Users may define batches and then use these batches as a way to conveniently reference related items for display or processing.
We have completed a core set of functions which allows rapid definition and editing of sequence source and processing information. A graphical user interface (GUI) front end provides interactive "dialogs" to select libraries and add, display, edit, or print entries at any level. Sequence reactions can be assigned to a labelling run, added to a sample sheet editor, and output to a printer. The sample sheet assignments are then used to set up the labelling run. Once labelled, the sequence reactions can then be assigned to a sequencing run, and output as a setup file for electrophoresis using a sequencing run editor. The setup file is transferred over the network and used to configure the sequencer. Users then use commercial programs or Unix scripts to edit, assemble or analyze sequence data. Results of these functions are stored in the database and associated with sequence sources.
Our strategy allows us to use a heterogeneous mix of software tools to implement the desired system functions, and relies on a high-level system view for integration of functions and data into a coherent system.
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory contract no. W-7405-ENG-48.