DOE Genomes
-

Human Genome Project Information


Archive

logo

DOE Human Genome Program Contractor-Grantee Workshop IV

Santa Fe, New Mexico, November 13-17, 1994

PDF

Introduction to the Workshop
URLs Provided by Attendees

Abstracts
Mapping
Informatics
Sequencing
Instrumentation
Ethical, Legal, and Social Issues
Infrastructure
 

The electronic form of this document may be cited in the following style:
Human Genome Program, U.S. Department of Energy, DOE Human Genome Program Contractor-Grantee Workshop IV, 1994.

Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected.

Statistical Methods to Improve DNA Sequencing Accuracy

David O. Nelson [1,2], and Terence P. Speed [2]
[1] Human Genome Center, L-452, Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California 94550. [2] Statistics Department, University of California, Berkeley, California 92041.

LLNL is investigating statistical approaches to the problem of determining the DNA sequence underlying data obtained from fluorescence-based gel electrophoresis. Several features of electrophoresis make it interesting to statisticians and probabilists:

  • the physical, chemical, and stochastic behavior of the process is complex and still not completely understood
  • the yield of fragments of any given size can be quite small and variable
  • the mobility of fragments of a given size can depend in predictable ways on the terminating base
In addition, the data generation process in fluorescence-based sequencing poses interesting statistical problems:
  • the data consists of samples from one or more continuous, non-stationary signals
  • boundaries between segments generated by distinct elements of the underlying sequence are ill-defined or nonexistent in the signal
  • the sampling rate of the signal greatly exceeds the transition rate of the underlying discrete sequence
Recently published approaches to base calling, such as Giddings et al. [1] and Tibbetts et al. [2], address some of these issues using elementary statistics and heuristic decision procedures. While such approaches do tend to out perform the native software, the level of improvement in four dye-per-lane systems appears to diminish rapidly beyond 350-400 bases. Further improvements through software will have to come from a more sophisticated approach to recovering sequence from signal.

Our approach to signal recovery and base calling involves combining a stochastic model of the electrophoresis process, which describes the diffusion of DNA through a gel, with adaptive equalization techniques from digital communications theory to recover the underlying sequence. We will present the initial results of our investigation of the extent to which this approach enables us to increase base calling accuracy by providing a rational, statistical foundation to the process of deducing sequence from signal.

Research by D. O. Nelson was performed under the auspices of the U. S. Department of Energy by Lawrence Livermore National Laboratory under contract no. W-7405-ENG-48, with additional support from NSF grant DMS-91-13527. Research by T. P. Speed was partially supported by NSF grant DMS-91-13527.

[1] Giddings, M.-C., R. L. Brumley, M. Haker, and L. M. Smith (1993). An adaptive, objectoriented strategy for base calling in DNA sequence analysis. Nucleic Acids Research, 21(19), 4530-4540.
[2] Tibbetts, C., J. M. Bowling, and J. B. Golden III, (1994). Neural networks for automated base calling of gel-based DNA sequencing ladders. In J. C. Venter (Ed.), Automated DNA Sequencing and Analysis Techniques, Chapter 31, 219-229. Academic Press.


Last modified: Wednesday, October 29, 2003

Home * Contacts * Disclaimer

Document Use and Credits
Publications and webpages on this site were created by the U.S. Department of Energy Genome Program's Biological and Environmental Research Information System (BERIS). Permission to use these documents is not needed, but please credit the U.S. Department of Energy Genome Programs and provide the website http://genomics.energy.gov. All other materials were provided by third parties and not created by the U.S. Department of Energy. You must contact the person listed in the citation before using those documents.

Base URL: www.ornl.gov/hgmis

Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program