![]() |
|
| Archive Edition | |
|
Sponsored
by the U.S. Department of
Energy Human Genome Program
|
Santa Fe, New Mexico, November 13-17, 1994
|
Introduction to the Workshop
The electronic form of this document may be cited in the following style: Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected. |
WHOLE GENOME SHOTGUN SEQUENCING OF THE 2.0 Mb GENOME OF HAEMOPHILUS INFLUENZAECarol J. Bult, Robert D. Fleischmann, Mark D. Adams, Joseph M. Merrick, Jeannine Gocayne, Li-ing Liu, Anthony Kerlavage, Hamilton O. Smith, and J. Craig Venter The accepted approach for sequencing large segments of DNA (>100 kb) has been to spend substantial effort in the development of lambda or cosmid libraries and their subsequent mapping. Developing approaches for rapid and efficient sequencing and assembly of large segments of DNA is critical for genome sequencing projects. We have developed a whole genome sequencing approach that eliminates "top down" efforts. A random shotgun approach to whole genome sequencing of the approximately 2.0 Mb genome of Haemophilus influenzae was undertaken by creating a sheared random genomic library with an average insert size of 1.5 - 2.0 kb cloned into pUC18. Using the high throughput capacity of the TIGR DNA sequencing facility we have prepared approximately 20,000 randomly selected H. influenzae double stranded templates and analyzed them on ABI 373A DNA sequencers with the 48 cm gel plate modification. Initial sequencing and assembly data from our random genomic library fit the Lander and Waterman model for predicting the rate of assembly of a 2 Mb genome. We have developed a number of software programs which allow us to manage large scale genome sequencing projects All data and analysis results are written directly into our SYBASE database. A number of methods are being evaluated for assembling the sequence data into contigs and generating a consensus sequence. One of the more successful approaches involves clustering all the available sequences by doing pairwise Smith Waterman comparisons. The cluster are then assembled by a multiple sequence alignment program written at TIGR, and which runs on our MASPAR computer. Large clusters are assembled using the ABI AutoAssembler program. To date, greater than 10 Mb of raw sequence data has been generated. This accounts for approximately 1.6 Mb of the H. influenzae genome which we currently have ordered into 61 groups containing 182 contigs. The remainder of the genome can be accounted for in the 6 rRNA repeats, 3 additional repeats and gaps. Closure is being accomplished by using our database information to target the sequencing of specific templates which are likely to close gaps.
|
Send the url of this page to a friend
To read pdf files, download the free Acrobat Reader software.
Last modified: Wednesday, October 29, 2003
Home * Contacts * Disclaimer
Base URL: www.ornl.gov/hgmis
Site sponsored by the U.S. Department of Energy
Office of Science, Office
of Biological and Environmental Research, Human
Genome Program