Functional Genomics Section 

DOE Human Genome Program Contractor-Grantee Workshop VII 
January 12-16, 1999  Oakland, CA


123. Development and Application of Subtractive Hybridization-Based Approaches to Facilitate Gene Discovery 

Maria de Fatima Bonaldo, Brian Berger, and Marcelo Bento Soares 
The University of Iowa, 451 Eckstein Medical Research Building, Iowa City, IA 52242 
bento-soares@uiowa.edu 

It is widely recognized that the generation of Expressed Sequence Tags (ESTs) from 3' terminal exons of cDNA clones randomly picked from libraries constitutes an efficient strategy to identify genes (Adams et al. 1992; Adams et al. 1991; Adams et al. 1995; Adams et al. 1993; Houlgatte et al. 1995; Khan et al. 1992; Matsubara and Okubo 1993; Okubo et al. 1992). However, it is important to acknowledge that despite its advantages, there are several problems associated with the EST approach. One of the problems commonly observed in large scale EST programs is the redundant generation of ESTs corresponding to the most common RNAs (i.e. mRNAs of the super-prevalent and intermediate frequency classes, mitochondrial RNAs, and rRNAs). This is a problem that can significantly impair the overall efficiency of a gene discovery program that relies solely on the generation of ESTs from cDNA clones randomly picked from standard libraries. The use of normalized cDNA libraries has been shown to expedite gene discovery in large scale EST programs (Berry et al. 1995; Hillier et al. 1996). Because in a typical normalized cDNA library the frequency of all clones is within an order of magnitude range (Soares et al. 1994), redundant identification of the most common RNAs is greatly minimized. Normalized libraries can be generated by a number of reassociation-kinetics based approaches (Bonaldo et al. 1996; Soares and Bonaldo 1996; Soares et al. 1994). It is noteworthy, however, that the process of normalization only contributes to minimize redundancies within (not across) libraries. Redundant identification of ESTs derived from mRNAs that are expressed in multiple tissues and therefore are represented in multiple libraries constitutes a major problem at advanced phases of gene discovery programs. The use of normalized libraries cannot help to solve this problem. Hence, we have argued that this problem can be more effectively addressed by the use of subtractive libraries that are progressively enriched for novel ESTs (Bonaldo et al. 1996). With this support from the U.S. Department of Energy, we have developed a subtractive hybridization-based gene discovery strategy, which we named "Serial Subtraction of Normalized Libraries", which involves the generation of ESTs from subtracted libraries enriched for novel cDNAs. Serial Subtraction of Normalized Libraries is an iterative approach whereby all arrayed cDNA clones from a library (which have been or will be used for generation of ESTs) are pooled and used as a driver in a subtractive hybridization with the library from which they originated. Since the representation of the driver population is significantly reduced in the resulting subtracted library, redundant generation of ESTs, regardless of abundance, is significantly minimized. Hence, every new library of a series is enriched for novel ESTs. Most importantly, however, this process enhances the proportional representation of rare transcripts rather significantly. It should be emphasized that such transcripts are likely to be missed in more random sampling approaches, unless very large numbers of ESTs are generated from a library, which inevitably ends up becoming costly and inefficient due to the very high redundancy levels that are reached. This strategy has been successfully applied in the rat gene discovery program that we are conducting at The University of Iowa with NIH support. We have been able to minimize redundancies rather significantly and thus maintain a high frequency of identification of novel ESTs (62 % overall average) after a total of approximately 32,000 ESTs submitted to GenBank since February 1998. Most importantly, we were able to identify approximately 20,000 unique clusters from a total of 32,000 3' ESTs, a gene discovery efficiency that is unprecedented in any EST program described to date. 


 
Home Sequencing Functional Genomics
Author Index Sequencing Technologies Microbial Genome Program
Search Mapping Ethical, Legal, & Social Issues
Order a copy Informatics Infrastructure