DOE Genomes
Human Genome Project Information  Genomics:GTL  DOE Microbial Genomics  home
-
HGP Home
Archive Edition

logo

DOE Human Genome Program Contractor-Grantee Workshop IV

Santa Fe, New Mexico, November 13-17, 1994

Introduction to the Workshop
URLs Provided by Attendees

Abstracts
Mapping
Informatics
Sequencing
Instrumentation
Ethical, Legal, and Social Issues
Infrastructure

The electronic form of this document may be cited in the following style:
Human Genome Program, U.S. Department of Energy, DOE Human Genome Program Contractor-Grantee Workshop IV, 1994.

Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected.

WHOLE GENOME SHOTGUN SEQUENCING OF THE 2.0 Mb GENOME OF HAEMOPHILUS INFLUENZAE

Carol J. Bult, Robert D. Fleischmann, Mark D. Adams, Joseph M. Merrick, Jeannine Gocayne, Li-ing Liu, Anthony Kerlavage, Hamilton O. Smith, and J. Craig Venter
The Institute for Genomic Research, Gaithersburg, MD, and Johns Hopkins University, Baltimore, MD.

The accepted approach for sequencing large segments of DNA (>100 kb) has been to spend substantial effort in the development of lambda or cosmid libraries and their subsequent mapping. Developing approaches for rapid and efficient sequencing and assembly of large segments of DNA is critical for genome sequencing projects. We have developed a whole genome sequencing approach that eliminates "top down" efforts. A random shotgun approach to whole genome sequencing of the approximately 2.0 Mb genome of Haemophilus influenzae was undertaken by creating a sheared random genomic library with an average insert size of 1.5 - 2.0 kb cloned into pUC18. Using the high throughput capacity of the TIGR DNA sequencing facility we have prepared approximately 20,000 randomly selected H. influenzae double stranded templates and analyzed them on ABI 373A DNA sequencers with the 48 cm gel plate modification. Initial sequencing and assembly data from our random genomic library fit the Lander and Waterman model for predicting the rate of assembly of a 2 Mb genome.

We have developed a number of software programs which allow us to manage large scale genome sequencing projects All data and analysis results are written directly into our SYBASE database. A number of methods are being evaluated for assembling the sequence data into contigs and generating a consensus sequence. One of the more successful approaches involves clustering all the available sequences by doing pairwise Smith Waterman comparisons. The cluster are then assembled by a multiple sequence alignment program written at TIGR, and which runs on our MASPAR computer. Large clusters are assembled using the ABI AutoAssembler program. To date, greater than 10 Mb of raw sequence data has been generated. This accounts for approximately 1.6 Mb of the H. influenzae genome which we currently have ordered into 61 groups containing 182 contigs. The remainder of the genome can be accounted for in the 6 rRNA repeats, 3 additional repeats and gaps. Closure is being accomplished by using our database information to target the sequencing of specific templates which are likely to close gaps.

Send the url of this page to a friend


To read pdf files, download the free Acrobat Reader software.

Last modified: Wednesday, October 29, 2003

Home * Contacts * Disclaimer

Base URL: www.ornl.gov/hgmis

Office of Science Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program