|Genome Informatics Section
DOE Human Genome Program Contractor-Grantee Workshop
112. Probabilistic Physical Map Assembly
David J. States, Thomas W. Blackwell,
John McCrow, and Volker Nowotny
Physical map assembly is the inference of genome structure from experimental data derived on clones and markers, and map assembly is central to genome analysis. Map assembly depends on the integration of diverse data including sequence tagged site (STS) marker content, clone sizing, and restriction digest fingerprints (RDF). Like any experimental data, these data are uncertain and error prone. Physical map assembly from error free data is algorithmically straightforward and can be accomplished in linear time in the number of clones. However, the assembly of an optimal map from error prone data is an NP-hard problem [Turner, Shamir]. In this abstract we present an approach to physical map assembly that is based on a probabilistic view of the data and seeks to identify those features of the map that can be reliably inferred from the available data. Based on our alternative approach, we achieve several goals. These include the use of multiple data sources, appropriate representation of uncertainties in the underlying data, the use of clone length information in fingerprint map assembly, and the use of higher order information in map assembly. By higher order information, we mean relationships that are not expressible in terms of neighboring clone relationships. These include triplet and higher order constraints (a+, b, c+ => b likely to be +), the uniqueness of STS position, and fingerprint marker locations. Probabilistic descriptions of the map provide an alternative approach to the problem of physical mapping. In this view, we assert that it is impossible to know which of the many possible map assemblies is correct. We can only state which assemblies are more likely than others given the available experimental observations. Parameters of interest are then derived as likelihood weighted averages over map assemblies. Ideally these averages should be sums or integrals over all possible map assemblies, but computationally this is not feasible for real-world map assembly problems. Instead, Gibbs sampling is used to asymptotically approach the desired parameters. Software implementing our probabilistic approach to mapping has been written. Assembly of mixed RDF and STS maps containing up to 60 clones can be accomplished on a desktop PC with run times under an hour. A JAVA based physical map viewing tool has also been written to display the results of these calculations.
|Author Index||Sequencing Technologies||Microbial Genome Program|
|Search||Mapping||Ethical, Legal, & Social Issues|