DOE Human Genome Program Contractor-Grantee
69. Discovery of Distant Regulatory Elements by Comparative Sequence-Based Approaches.
Inna Dubchak1, Chris Mayor1, Lior Pachter2, Gabriella Cretu1, Edward M. Rubin1, and Kelly A. Frazer1
1Genome Sciences Department, Lawrence Berkeley National Laboratory, 1 Cyclotron Road MS 84-171, Berkeley, CA. 94720 and 2Mathematics Department, University of California, Berkeley, CA 94720
Distant regulatory elements, such as enhancers, silencers, and insulators, are experimentally difficult to identify. Exploiting the fact that these elements tend to be highly conserved among mammals we are using comparative sequence-based approaches to discover them. To find conserved non-coding sequences with physical attributes of distant regulatory elements we compared ~ 1 Mb of orthologous human (5q31 interleukin cluster region) and mouse (chromosome 11) sequences. Ninety non-coding sequences (>= 100 bp and >=70% identity) were identified - analysis of 15 found that ~ 70% were conserved across mammals but unique in the human genome. Although this study discovered numerous conserved non-coding sequences with features of distant regulatory elements only one of the two enhancers previously identified in the human 5q31 region was detected.
To improve the ability of comparative sequence analysis to identify distant regulatory elements we have developed a new method which globally aligns the sequences being compared and plots the percent identity of a moving average point (MAP). The advantage of MAP analysis over the previous method used is that it can detect conserved non-coding sequences with small insertions/deletions and is capable of three-way species comparisons. Comparison of ~ 200 kb of orthologous human (5q31), mouse (chromosome 11), and dog (chromosome 4) sequences using MAP analysis found all the known conserved non-coding sequences (>= 100 bp and >= 70% identity) in region as well as additional non-coding elements, including the enhancer previously undetected by comparative analysis. The overall pattern of non-coding sequence conservation in the orthologous human, dog and mouse genomic DNA is strikingly similar suggesting the majority of elements identified based on conservation are likely to have biologic function. Experimental characterization of the largest non-coding element identified in these studies determined it to be a potent regulatory element of three genes, IL-4, IL-13 and IL-5, spread over 120 kb.
|The online presentation of this publication is a special feature of the Human Genome Project Information Web site.|