|Genome Sequencing Technologies and Resources
DOE Human Genome Program Contractor-Grantee Workshop
of a Genome-Wide, Highly Characterized Clone Resource for Genome Sequencing
Gregory G. Mahairas, Keith D. Zackrone, Stephanie Tipton, Sarah Schmidt,
Alan Blanchard, Anne West, and Leroy Hood
As the genome project shifts into the large-scale sequencing phase, an overwhelming technical challenge resides in developing an efficient method for producing minimum tiling paths of sequence-ready clones across the entire genome. BACs are becoming the large fragment clones of choice among sequencing centers and BAC clones provide significant advantages as source material for sequencing (Kim et al., 1996a). BAC clones are stable, have a small vector size (7.4 kb), can be sequenced directly by the shotgun approach, are sufficiently long to traverse most tandem arrays of homology units or genome-wide repeats, and have been shown to be randomly distributed across the human genome (Kim et al., 1996b). By the fall of 1998, together with The Institute for Genome Research (TIGR), we will have sequenced the BAC ends or sequence tagged connectors (STCs) from 150,000 clones (e.g. 300,000 STCs --150,000 from each laboratory). We have also generated a HinDIII restriction digest for each BAC whose end sequences have been determined at the University of Washington. We have developed a strategy and tools for using this resource in support of large-scale genomic sequencing and demonstrated proof of concept for its use. Together with TIGR, we propose to complete the characterization of an STC clone resource from two IRB-approved human BAC libraries to 22.5-fold clone (BAC) coverage (e.g. 450,000 BAC clones assuming an average insert size of 150 kb). These data will be immediately available on the world wide web through dbGSS and our web sites (www.genome.washington.edu and www.tigr.org) and the clones will be available for distribution to the scientific community through Research Genetics. Nine hundred thousand STC sequences will provide a sequence marker of 300 to 500 base pairs (bp) on average every 3,100 bp across the genome. The BAC libraries and the data pertaining to them will enable the facile selection of minimum tiling paths of BAC clones across each of the human chromosomes for large-scale sequence analysis (see below).
1) By fall of 1998, TIGR and the University of Washington High-Throughput Sequence Center (HTSC) will have each sequenced 150,000 STCs (total 300,000). By fall of 1999, TIGR and the University of Washington HTSC each propose to sequence an additional 300,000 STCs for a total of 900,000 STCs. This will, on average, place 1 STC every 3.1 kb across the genome and provide a 22.5-fold clone coverage of the genome. All clones sequenced after June of 1998 will come from IRB-approved BAC libraries.
2) Produce a restriction map of each BAC clone characterized at the University of Washington. These will be useful for identifying clones that contain no inserts or short inserts for verification of genomic fidelity and for checking the assembly process.
3) Continue the development of tools to extract biologically relevant information from the data and utilize the resource for high-throughput genomic sequencing.
|Author Index||Sequencing Technologies & Resources||Microbial Genome Program|
|Search||Mapping||Ethical, Legal, & Social Issues|