DOE Genomes
-

Human Genome Project Information


Archive

logo

DOE Human Genome Program Contractor-Grantee Workshop IV

Santa Fe, New Mexico, November 13-17, 1994

PDF

Introduction to the Workshop
URLs Provided by Attendees

Abstracts
Mapping
Informatics
Sequencing
Instrumentation
Ethical, Legal, and Social Issues
Infrastructure
 

The electronic form of this document may be cited in the following style:
Human Genome Program, U.S. Department of Energy, DOE Human Genome Program Contractor-Grantee Workshop IV, 1994.

Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected.

A Battery of Programs for STS Content Mapping

Farid Alizadeh [2], Richard M. Karp [1], Deborah Weisser [1], and Geoffrey Zweig [1]
[1] International Computer Science Institute, Berkeley, CA and University of California at Berkeley. [2] Rutgers University

The goal of STS content mapping is to infer the linear ordering of a set of Sequence Tagged Sites, given data about the incidence of these sites to the clones in a library. The task is complicated by the presence of errors in the incidence data and clone abnormalities such as chimerism. We have derived a battery of computer programs for performing this task. The programs have performed successfully both on synthetic data and on data from chromosome 21.

Our algorithms are presented with a zero-one matrix giving the measured incidences between probes (STSs) and clones. The generation of our synthetic data is controlled by several parameters, including the numbers of probes and clones, the distribution of clone lengths, the length of the DNA strand being mapped, the false positive and false negative rates, and the fraction of clones that are chimeric.

The main engine of our approach is a simulated annealing algorithm which, given estimates of the false negative and false positive rates and the chimerism rate, seeks to construct the most likely ordering of the STSs. We use several methods of enhancing the performance of the basic simulated annealing algorithm. A preprocessing routine is used to screen out false positives and to identify and split chimeric clones. Because the screening routine occasionally errs by identifying a true probe-clone incidence as a false positive, we couple it with a postprocessing routine which inspects the solution produced by simulated annealing, identifies anomalies traceable to errors in screening, and corrects these errors. To further speed up simulated annealing we also employ routines for generating a good starting solution.

We have also formulated the probe ordering problem as a traveling-salesman problem in which the cities are zero-one vectors and the distance between cities is the number of positions in which the vectors differ. This method is much faster than simulated annealing and performs nearly as well. It is also applicable to a wide range of other physical mapping strategies such as radiation hybrid mapping.

We are presently extending our programs to deal with pooled incidence data rather than incidences between probes and individual clones, to accept in situ hybridization data and genetic mapping data, and to work in a feedback loop with experimenters to identify and correct errors in the probe-clone incidence data.

[1] Alizadeh, F., Karp, R.M., Newberg, L. and Weisser, D. (1993) Physical Mapping of Chromosomes: A Combinatorial Problem in Molecular Biology, Algorithmica, to appear.
[2] Alizadeh, F., Karp, R.M., Weisser, D., Zweig, G. (1994) Physical Mapping Using Unique Probes, Proc. ACM/SIAM Symp. on Discrete Algorithms.


Last modified: Wednesday, October 29, 2003

Home * Contacts * Disclaimer

Document Use and Credits
Publications and webpages on this site were created by the U.S. Department of Energy Genome Program's Biological and Environmental Research Information System (BERIS). Permission to use these documents is not needed, but please credit the U.S. Department of Energy Genome Programs and provide the website http://genomics.energy.gov. All other materials were provided by third parties and not created by the U.S. Department of Energy. You must contact the person listed in the citation before using those documents.

Base URL: www.ornl.gov/hgmis

Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program