DOE Human Genome Program Contractor-Grantee
18. Human and Mouse BAC Ends
Shaying Zhao, Mark D. Adams, Joel Malek, Lily Fu, Bola Akinretoye, Sofiya Shatsman, Maureen Levins, Stephany McGann, Keita Geer, Getahun Tsegaye, Margaret Krol, Peter Choi, Tamara Feldblyum, William Nierman, and Claire Fraser
The Institute for Genomic Research, Rockville, MD 20850
End sequences from Bacterial Artificial Chromosomes (BACs) provide highly specific sequence markers in large-scale sequencing projects. To date, we have generated >300,000 BAC end sequences (BESs) from >186,000 human BAC clones with the following properties. 1) Over 60% of the clones have BESs from both ends representing 5X coverage of the human genome by the paired-end clones. 2) The average read length is ~460 bp providing a total of 141 MB covering ~4.7% of the genome. 3) The average phred Q20 length is ~400 bp giving an identity of >99% to the human finished sequences. 4) Over 90% of the BESs faithfully represent the original clones and over 85% of the paired-end clones have both ends tracked correctly. This high quality of data gives BAC end users a high confidence in 1) retrieving the right clones from the BAC libraries based on the BAC end sequence matches; and 2) building a minimum tiling path of sequence-ready clones across the genome and building genome assembly scaffolds. Our sequence analyses indicate that BESs from human BAC libraries developed at The California Institute of Technology (CalTech) and Roswell Park Cancer Institute (RPCI) have similar properties. The analyses have highlighted differences in insert size for different segments of the CalTech library. Problems with the fidelity of tracking of sequence data back to physical clones have been observed in some subsets of the overall BES dataset. The annotation results of BESs for the contents of available genomic sequences, sequence tagged sites (STSs), expressed sequence tags (ESTs), protein encoding regions and repeats indicate that this resource will be valuable in many areas of genome research. (human BAC ends URL)
We have been funded to end sequence the mouse BACs from RPCI-23 library within the next year. To date, we have over 25,000 mouse BESs with an quality similar to our human ends. In addition, all end sequencing are being conducted on the ABI 3700 sequencers to eliminate the lane tracking errors experienced on the ABI 377 sequencers. We expect that our mouse ends will have 1) an average read length of 500 bp; 2) an average phred Q20 bases of 400; 3) over 90% of the clones having paired-ends; and 4) a clone tracking accuracy of 99%. The mouse resource will have an even higher quality than the current human ends.
|The online presentation of this publication is a special feature of the Human Genome Project Information Web site.|