DOE Genomes
-

Human Genome Project Information


Archive

logo

DOE Human Genome Program Contractor-Grantee Workshop IV

Santa Fe, New Mexico, November 13-17, 1994

PDF

Introduction to the Workshop
URLs Provided by Attendees

Abstracts
Mapping
Informatics
Sequencing
Instrumentation
Ethical, Legal, and Social Issues
Infrastructure
 

The electronic form of this document may be cited in the following style:
Human Genome Program, U.S. Department of Energy, DOE Human Genome Program Contractor-Grantee Workshop IV, 1994.

Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected.

Software Support for Large Scale Sequencing

Joe Gatewood[1], Robert M. Pecherer[2], and Elaine Best[3]
[1] Genomics and Structural Biology Group; LS-2, MS 880; Los Alamos National Laboratory; Los Alamos, New Mexico 87545. [2] Theoretical Biology and Biophysics Group; T-10, MS K710, LANL. [3] Applications Programming Group; CIC-12, MS B295; LANL.

Current and projected DNA sequencing rates effectively prohibit direct human interaction on experimentally derived raw sequence data -- i.e., inspection, elimination of cloning artifacts, and editing in general. The investigator-as-data-processor bottleneck is further compounded during sequence analysis where DNA homology comparisons against public sequence databases result in redundant or extraneous information: Relevant homology is diluted by the irrelevant.

Our goals in developing software to support large scale sequencing are:

  1. High speed sequence data entry where routine computer processing is the rule and human intervention the exception;
  2. Sequence analysis where homology comparison and reporting are interactively customizable to enduser needs;
  3. For STS generation and sequencing, primer selection is automated and customizable;
  4. Sequence order relationships and base confidence information are captured during sequencing and exploited in consensus sequence assembly.

The first three goals have been addressed. For primer directed sequencing, we have developed database representations and an exploring assembly algorithms.

Our system architecture includes four components: Enduser interface, database, context management, and analytical tools. The enduser interface is implemented using Gain Momentum (Sybase) and programmed in GEL, a proprietary scripting language specialized for interactive I/O and task management. Database functionality is provided by the Relational DBMS Sybase using recursive DNA representations. (See "Recursive Relational Representation for DNA and Attribute-Value Lists: Techniques for Reducing Schema Modifications", Pecherer et al., DOE Human Genome Contractors Workshop IV, Santa Fe, NM, November 13-17, 1994.) Context management and analytical tools are written in C for performance and flexibility. Context management provides data management capability for the objects and collections obtained from the database and/or operated upon by analysis software. The analytical tools include homology comparison, feature selection, and primer selection algorithms.

All analytical tools are designed as integral system components to avoid file parsing. We have implemented BLAST with a global alignment capability (BLASTga) to avoid segmented DNA homologies and have incorporated post screening of homology results to eliminate redundant extraneous information resulting from repetitive elements.

Research funded by U.S. Department of Energy under Contract W-7405-ENG-36.


Last modified: Wednesday, October 29, 2003

Home * Contacts * Disclaimer

Document Use and Credits
Publications and webpages on this site were created by the U.S. Department of Energy Genome Program's Biological and Environmental Research Information System (BERIS). Permission to use these documents is not needed, but please credit the U.S. Department of Energy Genome Programs and provide the website http://genomics.energy.gov. All other materials were provided by third parties and not created by the U.S. Department of Energy. You must contact the person listed in the citation before using those documents.

Base URL: www.ornl.gov/hgmis

Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program