Introduction to the Workshop
URLs Provided by Attendees
- Ethical, Legal, and Social Issues
The electronic form of this document may be cited in the following style:
Human Genome Program, U.S. Department of Energy, DOE Human Genome Program Contractor-Grantee Workshop IV, 1994.
Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected.
A Battery of Programs for STS Content Mapping
Farid Alizadeh , Richard M. Karp , Deborah Weisser , and Geoffrey Zweig 
 International Computer Science Institute, Berkeley, CA and University of California at Berkeley.  Rutgers University
The goal of STS content mapping is to infer the linear ordering of a set of Sequence Tagged Sites, given data about the incidence of these sites to the clones in a library. The task is complicated by the presence of errors in the incidence data and clone abnormalities such as chimerism. We have derived a battery of computer programs for performing this task. The programs have performed successfully both on synthetic data and on data from chromosome 21.
Our algorithms are presented with a zero-one matrix giving the measured incidences between probes (STSs) and clones. The generation of our synthetic data is controlled by several parameters, including the numbers of probes and clones, the distribution of clone lengths, the length of the DNA strand being mapped, the false positive and false negative rates, and the fraction of clones that are chimeric.
The main engine of our approach is a simulated annealing algorithm which, given estimates of the false negative and false positive rates and the chimerism rate, seeks to construct the most likely ordering of the STSs. We use several methods of enhancing the performance of the basic simulated annealing algorithm. A preprocessing routine is used to screen out false positives and to identify and split chimeric clones. Because the screening routine occasionally errs by identifying a true probe-clone incidence as a false positive, we couple it with a postprocessing routine which inspects the solution produced by simulated annealing, identifies anomalies traceable to errors in screening, and corrects these errors. To further speed up simulated annealing we also employ routines for generating a good starting solution.
We have also formulated the probe ordering problem as a traveling-salesman problem in which the cities are zero-one vectors and the distance between cities is the number of positions in which the vectors differ. This method is much faster than simulated annealing and performs nearly as well. It is also applicable to a wide range of other physical mapping strategies such as radiation hybrid mapping.
We are presently extending our programs to deal with pooled incidence data rather than incidences between probes and individual clones, to accept in situ hybridization data and genetic mapping data, and to work in a feedback loop with experimenters to identify and correct errors in the probe-clone incidence data.
 Alizadeh, F., Karp, R.M., Newberg, L. and Weisser, D. (1993) Physical Mapping of Chromosomes: A Combinatorial Problem in Molecular Biology, Algorithmica, to appear.
 Alizadeh, F., Karp, R.M., Weisser, D., Zweig, G. (1994) Physical Mapping Using Unique Probes, Proc. ACM/SIAM Symp. on Discrete Algorithms.