Daniel B. Davisonl,3, Brent A. Wiese2, Istvan Ladunga2, Kim C. Worley2 and Randall F. Smithl,2
Department of Cell Biology[1], Human Genome Center[2] and Department of Molecular and Human Genetics, Baylor College of Medicine and University of Houston[3], Houston TX
We are providing a variety of molecular biology-related search and analysis services to Genome Program investigators to improve the identification of new genes and their functions. These services are available via the BCM Search Launcher World Wide Web (WWW) pages which are organized by function and provide a single point-of-entry for related searches. Pages are included for 1) protein sequence searches, 2) nucleic acid sequence searches, 3) multiple sequence alignments, 4) pairwise sequence alignments, 5) gene feature searches, 6) sequence utilities, and 7) protein secondary structure prediction. The Protein Sequence Search Page, for example, provides a single form for submitting sequences to WWW servers that provide remote access to a variety of different protein sequence search tools, including BLAST, FASTA, Smith-Waterman, BEAUTY, BLASTPAT, FASTAPAT, PROSITE, and BLOCKS searches. The BCM Search Launcher extends the functionality of other WWW services by adding additional hypertext links to results returned by remote servers. For example, links to the NCBI's Entrez database and to the Sequence Retrieval System (SRS) are added to search results returned by the NCBI's WWW BLAST server. These links provide easy access to Medline abstracts, links to related sequences, and additional information which can be extremely helpful when analyzing database search results. For novice, or infrequent users of sequence database search tools, we have pre-set the parameter values to provide the most informative first-pass sequence analysis possible.
A batch client interface to the BCM Search Launcher for Unix and Macintosh computers has also been developed to allow multiple input sequences to be automatically searched as a background task, with the results are returned as individual HTML documents directly on the user's system. The BCM Search Launcher as well as the batch client are available on the WWW at URL http://gc.bcm.tmc.edu:8088/ search-launcher/launcher.html.
The BCM/UH Server Core provides the necessary computational resources and continuing support infrastructure for the BCM Search Launcher. The BCM/UH Server Core is composed of three network servers and currently supports electronic mail, and WWW-based access; ultimately, specialized client-server access will also be provided. The hardware used includes a 2048-processor MasPar massively parallel MIMD computer, a DEC Alpha AXP/OSF1, a Sun 2-processor SparcCenter 1000 server, and several Sun Sparc workstations.
In addition to grouping services available elsewhere on the WWW, and providing access to services developed at BCM and UH, the BCM/UH Server Core will also provide access to services from developers who are unwilling or unable to provide their own Internet network servers. One such service from Dr. Don Gilbert, "GenBank Subset Search", will allow one to search a user-specified subset of GenBank via E-mail and the WWW. Another tool under development is The Institute for Genomic Research's multiple sequence alignment program (MSA) which uses simulated annealing.
A major focus for future development is our collaboration with the Genome Sequence Data Base (GSDB) to allow analysis services provided by the Server Core to be integrated within the GSDB Annotator by inter-process communication. This will allow a researcher to perform in-depth sequence analysis and easily integrate the results into sequence database annotations.
This research is supported by grants to D.D. from the U.S. Department of Energy Office of Health and Environmental Research (DE-FG03-9SER62097/A000), the National Library of Medicine (lR01-LM05792), the National Science Foundation (BIR 91-11695), a National Research Service Award to K. C. W. (lF32-HG00133-01), a grant to the Baylor Human Genome Center (P30-HG00210), and a grant to R. F. S. (lR01-HG00973-01) from the National Center for Human Genome Research, National Institutes of Health.