Guochun Xie, Michael L. Engle, and Christian Burks
Theoretical Biology and Biophysics Group & Center for Human Genome Studies; T-10, MS K710; Los Alamos National Laboratory, Los Alamos, NM 87545.
Several groups have developed variations on the general strategy of clone end-sequence sampling [1-4], with the goal -- given a target cloned region -- of using the assembly of end-sequences of sub-clones, information about sub-clone lengths, and offset and orientation relationships among pairs of end-sequences to drive a sub-clone layout. Most sequence assembly packages currently available do not provide for this 'meta-assembly' task. Generating layouts for and among the sequence contigs in this context is part of the general challenge of taking advantage of ancillary information during sequence assembly.[5] Using the cosmid-based SASE (SAmple SEquencing) data sets at Los Alamos [4] as a starting point, we have developed a simple suite of tools for: (i) extracting the end-sequence contig information from ABI Autoassembler data files; (ii) linking related end-sequence contigs to one another based on sub-clone assignments; (iii) developing a layout for the contigs based on the links established in (ii) and sub-clone lengths; and (iv) display of the layouts developed in step (iii) in an X-WINDOWS plot. These tools are meant for use during in-stream evaluation of the coverage and experimental consistency of sampled sequence data. The generic specification of objects in the input file for the layout module in (iv) should make it useful for graphically displaying layout relationships among a variety of different types of sequencing and mapping data. We will describe these tools in greater detail and show examples of their application to clone end sequencing sample data.
* This work was done under the auspices of the Department of Energy, and was supported through the DOE/OTHER genome project (R. Moyzis, P.I., ERW-F137).
[1] Chen EY; Schlessinger D; and Kere J. (1993) Ordered shotgun sequencing, a strategy for integrated mapping and sequencing of YAC clones. Genomics 17: 651-656.
[2] Smith MW; Holmsen AL; Wei YH; Peterson M; Evans GA. (1994) Genomic sequence sampling: a strategy for high resolution sequence-based physical mapping of complex genomes. Nature Genetics 7: 40-47.
[3] Roach JC; Boysen C; Wang K; and Hood L. (1995) Pairwise end sequencing: a unified approach to genomic mapping and sequencing. Genomics 26: 345-353.
[4] Moyzis, RK; Doggett, NA; Altherr, MR; and Deaven, LL. (1995) An integrated physical map of human chromosome 16: Sample Sequencing (SASE) analysis as a framework for complete genomics sequencing. Genome Science & Technology 1: P-18.
[5] Burks C; Parsons, RJ; and Engle, ML. (1994) Integration of competing ancillary assertions in genome assembly. In "Proceedings: Second International Conference on Intelligent Systems for Molecular Biology", Altman et al., Eds. AAAI Press, Menlo Park, CA, pp. 62-69.