Introduction to the Workshop
URLs Provided by Attendees
- Ethical, Legal, and Social Issues
The electronic form of this document may be cited in the following style:
Human Genome Program, U.S. Department of Energy, DOE Human Genome Program Contractor-Grantee Workshop IV, 1994.
Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected.
Integrated Informatics Support for Large-Scale Sequencing at the LBL Human Genome Center
E. Theil, A. Aggarwal, D. Davy, F. Eeckman, T. Fleming, V. Markowitz, J. McCarthy, S. Pitluck, E. Veklerov, M. Zorn
Human Genome Center, Lawrence Berkeley Laboratory, Berkeley, CA 94720
After approximately two years of effort, the LBL Human Genome Center has demonstrated that it can sequence DNA at a sustained rate of approximately 750kb per each six person sequencing team. The Informatics Group has supported this project with a variety of software programs and databases, most of which have been described at earlier conferences or are detailed in separate abstracts for this meeting.
LBL now is planning to scale up its sequencing effort significantly. In order to accomplish this while continuing to lower the cost per base, it will not be enough to merely hire additional sequencing teams and support them with existing software. What is required instead is a more fully integrated and automated system in which the data produced by the directed strategy developed at LBL is both captured and modeled in software, so that all the information generated prior to sequencing can be exploited subsequently, in assembly and dissemination. Furthermore, this more highly integrated system will help to increase the level of quality control on data and operations by identifying errors and providing computer assistance in troubleshooting. We do not believe that there are general solutions available, but we are able to use a number of robust software components that have been developed elsewhere in conjunction with our own software to form an integrated system tailored to our own particular needs.
We discuss a system architecture that is designed to capture data either automatically or with manual input and which is gradually replacing personal laboratory notebooks with a unified view of the data at any moment. The first pieces of this system will consist of modules for dealing with automatic inspection of ABI sequencing runs, automatic trace cutting and browsing. Other modules recently introduced are automatic generation of transposon maps and the ability to perform post mortems by comparing maps based on actual sizes of sequenced clones with those based on estimated sizes from electrophoresis. This helps to reduce the number of gaps encountered when assembling the sequence.
Another important component now under development is the introduction of a figure of merit associated with each base call. This will be a significant aid as we move to more automatic editing of sequences and the systematic demonstration of quality in sequenced data.
In order to integrate information from and for these modules, our Syndb database will store operational data, finished sequence, and up-to-date maps, all linked to each other. Application modules will communicate with the database to both access and update data as it is produced. Syndb will also support more than one analysis of the basic data (typically gels) in order to troubleshoot inconsistencies (typically false positives) as required.