Introduction to the Workshop
URLs Provided by Attendees
- Ethical, Legal, and Social Issues
The electronic form of this document may be cited in the following style:
Human Genome Program, U.S. Department of Energy, DOE Human Genome Program Contractor-Grantee Workshop IV, 1994.
Abstracts scanned from text submitted for November 1994 DOE Human Genome Program Contractor-Grantee Workshop. Inaccuracies have not been corrected.
BioPOET: Large Scale Sequence Analysis On Workstation Farms
Manfred D. Zorn, Jane F. Macfarlane, Rob Armstrong , Michael H. Cooper, and Nicholas C. Weaver
Software Technologies and Applications Group, Information and Computing Sciences Division, Lawrence Berkeley Laboratory, Berkeley CA 94720
 Sandia National Laboratories, Livermore, CA 94550
The rate at which new sequences are being generated has dramatically increased. A standard procedure to analyze sequences is to compare them with already known sequences. Thus longer sequences are matched against increasingly larger databases of sequences.
The available sophisticated computing technology to tackle such problems, e.g., faster machines, parallel processing, distributed computing, exists already. However, the use of these resources requires detailed knowledge of the particular resources to optimally access them.
We developed a framework that allows to partition the necessary tasks and execute them on a workstation farm. A master reads the database and creates a task for each sequence. The workers request a new task, compare the database with the query sequence, and report the results back to the master. A graphical user interface allows easy input and parameter specifications, interacts with a network server to launch the program, and displays the final results graphically. The tasks themselves use existing software, e.g., filter  to search the database efficiently and align  to generate the final alignment.
The framework makes use of the Parallel Object-oriented Environment and Toolkit, POET, that is modeled after the X11 toolkit and enables both high and low level control of the computational methods. The object-oriented programming paradigm allows data encapsulattion and methods to hide implementation details so as to present a unified object view to the user. Existing software can be adapted to exploit the power of parallel processing. Thus sequence analysis can be performed transparently to the user in reasonable time, where POET divides either the query sequence or the database in multiple pieces to run on parallel computers or a number of workstations in a distributed environment.
We will present a prototype system that integrates sequence analysis into the sequencing protocol and performs comparisons of sequences on a workstation farm. The framework has been implemented using the C++ language and uses PVM as communication package. The graphical user interface is implemented in VisualWorks\Smalltalk from ParcPlace Systems.
 Chang, W. and Marr T., Approximate String Matching and Local Similarity, in Combinatorial Pattern Matching, Springer Verlag, 1994.
 Huang X. and Miller W., A space-efficient algorithm for local similarities. Comput. Appl. Biosci. 6:373-381 (1990)
This work was supported by the Director, Office of Energy Research, Office of Health and Environmental Research, Human Genome Program, of the US Department of Energy under Contract No. DE-AC03-76SF00098.