| Gene hunts. Genes, the regions that actually code for proteins, constitute only a small fraction, perhaps 10%, of the human genome. Thus, even with sequence in hand, finding the genes is yet another daunting step away. One tool developed to help in the hunt is GRAIL, a computer program developed at Oak Ridge that uses heuristics based on existing data, together with artificial neural networks, to identify likely genes. Coding and noncoding regions of the genome differ in many subtle respects -- for example, the frequency with which certain short sequences appear. Further, particular landmarks are known to characterize the boundaries of many genes. In the example shown here, GRAIL has searched for likely genes in both strands of a 3583-base-pair sequence. The results are shown at the upper left. The upper white trace indicates five possible exons (coding regions within a single gene) in one strand, whereas the lower white trace suggests two possible exons in the other strand. However, the lower trace scores worse on other tests, leading to a candidate set of exons shown by the five green rectangles. By refining this set further, GRAIL then produces the final gene model shown in light blue. The lower part of the figure zeros in on the end of the candidate exon outlined in yellow, thus providing a detailed look at one of the differences between the preliminary and final models. The sequence is shown in violet, together with the amino acids it codes for, in yellow. The preliminary model thus begins with the sequence GTCGCA. . . , which codes for the amino acids valine and alanine. In fact, though, almost all genes begin with the amino acid methionine, a feature of the final gene model. At the upper right, GRAIL displays the results of a database search for sequences similar to the final five-exon gene model. Close matches were found among species as diverse as soybean and the nematode Caenorhabditis elegans. | |
![]() |
|
To Know Ourselves
was prepared at the request of the U.S. Department of Energy, Office of Health and Environmental Research, as an overview of the Human Genome Project.