|Genome Informatics Section
DOE Human Genome Program Contractor-Grantee Workshop
91. A Graphical Work-Flow Environment Seamlessly Integrating Database Querying and Data Analysis
Dong-Guk Shin1, Lung-Yung
Chu1, Lei Liu1, Nori Ravi1, Joseph Leone2,
Rich Landers2, and Wally Grajewski2
In the past, we have been has been very successful in developing a graphical ad hoc query interface capable of accessing heterogeneous public genome databases. This project aimed at developing a suite of user-friendly software designed to aid computational biologists in accessing various independently managed genome databases. This software makes the SQL query syntax manageable for the novice user and makes unfamiliar complex genome database schemas quickly understandable for less experienced persons. Furthermore, this software aids users in quickly expressing semantically-correct ad hoc queries. The impact of wide distribution of this software is expected to be significant. Computational biologists who have been reluctant to use genome databases themselves would begin to query the databases themselves, thanks to the numerous user-friendly features built-in the easy-to-use graphical interfaces. Most distinctively, the computational biologists will be able to ask cross-database queries against multiple genome databases that are springing up within the genome community.
We are currently investigating ways of embedding this user-friendly database access tool into a graphical work-flow management environment. Although being able to query various genome databases easily and being able to make associations between remotely located data is essential, we consider that it is imperative to produce an integrative environment in which both database querying and data analysis activities can be carried out seamlessly in a cohesive manner. This requirement is critical because many biologically significant questions are centered around performing analysis programs. In the proposed scenario, a computational biologist should be able to store persistently results of a Blast search into a database and subsequently should be able to query and cross-link filtered Blast result with existing genome databases. Similarly, a computational biologist should be able to perform a database query and funnel the query results into a subsequent Blast or Fasta search, etc. Furthermore, the user should also be able to conveniently change analysis or query results into a certain data format to be input to tree alignment programs, like CLUSTALW, or tree building programs, like Phylip and Puzzle, for final stages of analysis visualization. The ultimate goal of this project is to produce an easy-to-use work-flow editing environment in which the user can easily specify data flow involving both database querying and data analysis. This project is being pursued in collaboration with JGI.
|Author Index||Sequencing Technologies||Microbial Genome Program|
|Search||Mapping||Ethical, Legal, & Social Issues|