Beyond the Identification of Transcribed
Sequences:
Functional and Expression Analysis
11th Annual Workshop
November 9-12, 2001
Washington D.C.
Katheleen Gardiner, Ph.D.
Eleanor Roosevelt Institute
1899 Gaylord Street
Denver, CO 80206-1210
USA
telephone: 303-336-5652
fax: 303-333-8423
email: gardiner@eri.uchsc.edu
prestype: Platform
presenter: Katheleen Gardiner, Ph.D.
Katheleen Gardiner and Muriel Davisson
Eleanor Roosevelt Institute, Denver Colorado and The Jackson Laboratory, Bar
Harbor Maine
The 34 Mb finished sequence of human chromosome 21 was published in May 2000 with 225 genes/models reported. With hand curation, the gene number is now 250, however, much annotation of the genomic sequence remains to be added. We will present information on alternative splicing within coding regions, putative antisense transcripts, and conservation with homologous mouse sequences.
i) Alternative splicing: To determine the number of protein isoforms encoded
by chromosome 21, information from dbEST entries was used to detect potential
splice variants within coding regions. Of the 200 genes for which sufficient
EST data were available, >40% showed two or more splice variants. These included
novel splice variants of well studied genes (e.g. APP and HMG14), and prediction
of possibly >30 forms for the Intersectin gene.
ii) Antisense transcripts: Information from spliced ESTs was also used to identify
putative antisense transcripts to known protein coding genes. Eight examples
were found where spliced opposite-strand transcripts contained exons complementary
to one or more coding exons of the sense strand gene. Biological functions of
the sense genes are diverse; preliminary experimental analysis of the antisense
genes suggests that transcription is very low in the tissues tested.
iii) Comparison with mouse genomic sequence: Draft genomic sequences from the public sector and the Celera databases were searched for all segments with homology to human chromosome 21. Comparison of >30 Mb shows that most, but not all, known genes and complete cDNAs are present in both species; gene order and orientation is generally, but not always, conserved; a 7 Mb gene desert is conserved; intergenic and intronic regions, largely only within non-GC-rich regions, show numerous segments of significant conservation, but unknown function; and a large number of spliced EST sequences appear to be species specific.