Genome Informatics Section 

DOE Human Genome Program Contractor-Grantee Workshop VII 
January 12-16, 1999  Oakland, CA


91. A Graphical Work-Flow Environment Seamlessly Integrating Database Querying and Data Analysis 

Dong-Guk Shin1, Lung-Yung Chu1, Lei Liu1, Nori Ravi1, Joseph Leone2, Rich Landers2, and Wally Grajewski2 
1Computer Science & Engineering, University of Connecticut, Storrs, CT 06269-3155 and 2CyberConnect EZ, LLC, Storrs, CT 06268 
shin@engr.uconn.edu 

In the past, we have been has been very successful in developing a graphical ad hoc query interface capable of accessing heterogeneous public genome databases. This project aimed at developing a suite of user-friendly software designed to aid computational biologists in accessing various independently managed genome databases. This software makes the SQL query syntax manageable for the novice user and makes unfamiliar complex genome database schemas quickly understandable for less experienced persons. Furthermore, this software aids users in quickly expressing semantically-correct ad hoc queries. The impact of wide distribution of this software is expected to be significant. Computational biologists who have been reluctant to use genome databases themselves would begin to query the databases themselves, thanks to the numerous user-friendly features built-in the easy-to-use graphical interfaces. Most distinctively, the computational biologists will be able to ask cross-database queries against multiple genome databases that are springing up within the genome community. 

We are currently investigating ways of embedding this user-friendly database access tool into a graphical work-flow management environment. Although being able to query various genome databases easily and being able to make associations between remotely located data is essential, we consider that it is imperative to produce an integrative environment in which both database querying and data analysis activities can be carried out seamlessly in a cohesive manner. This requirement is critical because many biologically significant questions are centered around performing analysis programs. In the proposed scenario, a computational biologist should be able to store persistently results of a Blast search into a database and subsequently should be able to query and cross-link filtered Blast result with existing genome databases. Similarly, a computational biologist should be able to perform a database query and funnel the query results into a subsequent Blast or Fasta search, etc. Furthermore, the user should also be able to conveniently change analysis or query results into a certain data format to be input to tree alignment programs, like CLUSTALW, or tree building programs, like Phylip and Puzzle, for final stages of analysis visualization. The ultimate goal of this project is to produce an easy-to-use work-flow editing environment in which the user can easily specify data flow involving both database querying and data analysis. This project is being pursued in collaboration with JGI. 

Acknowledgments 
1The author's work was supported in part by the NIH/NHGRI Grant No. HG00772-05. 
2The author's work was supported in part by the DOE SBIR Phase II Grant No. DE-FG02-95ER81906. 


 
Home Sequencing Functional Genomics
Author Index Sequencing Technologies Microbial Genome Program
Search Mapping Ethical, Legal, & Social Issues
PDF Informatics Infrastructure