DOE Genomes
-

Human Genome Project Information


Archive

logo

DOE Human Genome Program Contractor-Grantee Workshop IV

Santa Fe, New Mexico, November 13-17, 1994

PDF

Introduction to the Workshop
URLs Provided by Attendees

Abstracts
Mapping
Informatics
Sequencing
Instrumentation
Ethical, Legal, and Social Issues
Infrastructure
 

The electronic form of this document may be cited in the following style:
Human Genome Program, U.S. Department of Energy, DOE Human Genome Program Contractor-Grantee Workshop IV, 1994.

Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected.

Biological evaluation of d^2: an algorithm for high performance novel DNA sequence functional assignment.

Winston Hide, John Burke and Daniel Davison
MasPar Computer Corporation and University of Houston.

A number of algorithms exist for searching sequence databases for biologically significant similarities based on primary sequence similarity of aligned sequences. We present here the biological sensitivity and selectivity of d^2, a high performance comparison algorithm that rapidly determines the relative dissimilarity of large datasets of DNA/Protein sequences. We have determined that d^2 is uniquely capable of detecting significant functional similarities between sequences that have no measurable alignable sequence similarity. These relationships remain undetectable by alternate methodologies.

Querying with a lipoprotein lipase DNA sequence results in hits that share functional similarity such as DNA sequences coding for proteins that regulate lipid metabolism, membrane and fat interaction and lipogenesis. No other algorithm can provide such important pointers to sequence functionality. d^2 uses sequence-word multiplicity as a simple measure of dissimilarity. It is not constrained by the comparison of direct sequence alignments and so can use word contexts to yield new functional information on relationships. It is extremely efficient: comparing a query of length 884 bases (INS/ECLAC) with 19,540,603 bases of the bacterial division of Genbank 76.0 in 52 CPUseconds on a Cray Y/MP-48 supercomputer. A parallel version (in development) is projected to run 100 times faster on a MasPar MP-2216. d^2 is unique in that subsequences of biological interest can be weighted to improve sensitivity and selectivity of a search over existing methods.

We have determined the ability of d^2 to detect biologically significant matches between a query and large datasets of DNA sequences while varying parameters such as word-length and window size. We have optimized parameters to present maximal sensitivity and selectivity relative to FASTA.

Not funded by DOE.


Last modified: Wednesday, October 29, 2003

Home * Contacts * Disclaimer

Document Use and Credits
Publications and webpages on this site were created by the U.S. Department of Energy Genome Program's Biological and Environmental Research Information System (BERIS). Permission to use these documents is not needed, but please credit the U.S. Department of Energy Genome Programs and provide the website http://genomics.energy.gov. All other materials were provided by third parties and not created by the U.S. Department of Energy. You must contact the person listed in the citation before using those documents.

Base URL: www.ornl.gov/hgmis

Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program