DOE Human Genome Program Contractor-Grantee
90. Updated ASDB: Database of Alternatively Spliced Genes
I. Dralyuk, M. Brudno, M.S. Gelfand1, S. Spengler, M. Zorn, and I. Dubchak
National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA and 1State Scientific Center for Biotechnology NIIGenetika, Moscow, 113545, Russia
Version 2.1 of ASDB consists of two divisions, ASDB(proteins), which contains 1922 amino acid sequences, and ASDB(nucleotides) with 2486 genomic sequences. ASDB(nucleotides) was developed in 1999, while ASDB (proteins)was updated with the latest data from SwissProt and improved clustering procedures. The database can be assessed on the Web.
SwissProt uses two formats for description of alternative splicing. Thus the protein sequences were selected from SwissProt using full text search for the words "alternative splicing" and "varsplic".
In order to group proteins that could arise by alternative splicing of the same gene, we developed the clustering procedure. Two proteins were linked if they had a common fragment of at least 20 amino acids, and clusters were initially defined as maximum connected groups of linked proteins. Each cluster was represented by multiple alignment of its members.
It turned out that some clusters were chimeric, in the sense that they contained members of multigene families, but not alternatively spliced variants of one gene. Therefore the multiple alignments were subject to additional analysis aimed at detection of chimeric clusters.
This processing covers the cases when alternatively spliced variants are described in separate SwissProt entries. The other kinds of ASDB records, originating from the SwissProt entries with the "varsplic" field in the feature table, usually provide the information on the variable fragments of the several proteins which result from the alternative splicing of a single gene. Thus ASDB(proteins) entries are marked with different symbols to allow for easy differentiation among the three types: those proteins which are part of the ASDB clusters and the corresponding multialignments, those which have the information on different variants in the associated SwissProt entries, and those for which the information on the variants is not available at the present time. ASDB contains internal links between entries and/or clusters, as well as external links to Medline, GenBank and SwissProt entries.
The ASDB(nucleotides) division was generated by collecting all GenBank entries containing the words "alternative splicing" and further selection of those entries that contain complete gene sequences (all CDS fields are complete, i.e. they do not have continuation signs).
|The online presentation of this publication is a special feature of the Human Genome Project Information Web site.|