DOE Human Genome Program Contractor-Grantee
83. A Visual Data-Flow Editor Capable of Integrating Data Analysis and Database Querying
Dong-Guk Shin1, Ravi Nori2, Rich Landers2, and Wally Grajewski2
1Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-3155 and 2CyberConnect EZ, LLC, Storrs, CT 06268
Determining mapping sequence variations or polymorphism between homologous genomic regions requires access to genomic data available from different sources and use of many data analysis and visualization programs. It is imperative that software be developed to enable genome scientists to automate tedious and repetitive data handling, database querying and analysis tasks. Our approach is to develop a data-flow editing environment in which genome scientists with minimal computer training can easily describe data analysis tasks. The scientists' use of the software tool involves organizing and coordinating individual tasks of data retrieval from different data sources, combined with data analysis tasks to derive answers to biologically significant questions.
Phase I aimed at developing prototype software which demonstrates the feasibility of a full-scale development of a data-flow editing environment in which interactions between data access and data analysis can be freely described by genome scientists with minimal computer training. The feasibility study is based on a working scenario of determining homology relationships between some known DNA sequences from one species and unknown sequences from a taxonomically-related species.
Software of this kind is expected to be immediately usable by molecular biology and the pharmaceutical industry both of which are becoming more computationally intensive. Since data-flow management problems are not unique to computational biology, the software developed is expected to be useful in many other data and computationally intensive areas, e.g., physics, chemistry, engineering and finance.
The proposed software will enable scientists to automate the repetitive analysis tasks involving an enormous amount of DNA sequence data that must be analyzed to understand its implications to biological and environmental processes. Without the software tool, the difficulties involved in conducting these large scale data analysis projects could be insurmountable due to the magnitude of data available and the variety of analysis techniques involved.
This work was supported in part by the DOE SBIR Phase I Grant No. DE-FG02-99ER82773.
|The online presentation of this publication is a special feature of the Human Genome Project Information Web site.|