DOE Human Genome Program Contractor-Grantee
8. Progress of Concatenation cDNA Sequencing at the BCM-Human Genome Sequencing Center
Baylor College of Medicine, Houston, TX
Concatenation cDNA sequencing (CCS) has been used to complete sequencing of more than 750 clones from the human brain cDNA library (1NIB) and 30 clones representing childhood leukemia. Together these represent a total length of 1.2 megabases of assembled sequence. An additional 390 clones are currently in the sequence pipeline. Statistics from 14 completed projects continue to show that CCS is as efficient as sequencing of single large DNA fragments, with an average of 17 reads and one custom primer to complete each kb of sequence. Methodological improvements, including pooling of clones during growth and the use of Phred and Phrap for assembly, have further simplified CCS.
For 596 different clones from the 1NIB brain library, a similarity search was performed against the public cDNA database. Of these 58% were novel and the remaining 42% (251) had partial matches to known sequences or genes from human or other organisms. Of the latter, 159 clones displayed similarity matches to known proteins. A comparison against the Unigene cDNA dataset revealed that 61% of the cDNAs or submitted cDNAs represented novel contributions, 258 from among a set of 424.
Of 159 clones with partial protein matches, only 32 (20%) had a complete ORF (open reading frame). This indicates a low percentage of cDNA clones representing full length mRNAs from the libraries. To generate better libraries, a postdoctoral student made a trip to Japan where four cDNA libraries (one human infant brain, one mouse brain and two childhood leukemia) were constructed, using the CAP-trapping technology developed by Hayashizaki's group at the RIKEN Institute. Three libraries have been evaluated in detail, acquiring ESTs (expressed sequence Tags) from 192 clones of each library. The data show good quality through little contamination with vector (1.8-2.5%), ribosomal DNAs (1.3-1.8%), and low redundancy (3.1-4.2%). About half of the ESTs lacked matches with Unigene cDNA sequences. Some 65.0-66.6% of clones possessed the first ATG codon of the encoded protein, indicating very high quality of the libraries. Thus the three analyzed cDNA libraries are suitable for large-scale and full-length sequencing. About 8,000 ESTs have been generated from these cDNAs and potentially novel clones are being selected for subsequent full-length sequencing.
|The online presentation of this publication is a special feature of the Human Genome Project Information Web site.|