DOE Human Genome Program Contractor-Grantee
81. Working Examples of XML in the Management of Genomic Data
J. D. Cohn and M. O. Mundt
Bioscience Division and DOE Joint Genome Institute, Los Alamos National Laboratory, Los Alamos, NM 87545
XML is fast becoming the universal format for structured data exchange and documents on the Web. Standards for XML were developed by the World Wide Web Consortium (W3C) and have been adopted for a wide range of applications from e-commerce to mathematics and chemistry. Unlike HTML, XML was designed to be extended and offers a much richer base to build upon (including capabilities for using binary as well as ASCII data).
Until now, exchange of genomic data has been limited primarily to FastA
files and a few proprietary or application-specific formats. XML seems
to offer an ideal means of enhancing our capability of exchanging data
within the genomic community. Using a growing array of XML parsers and
other development tools, XML formatted data can be utilized by software
applications written in a variety of different languages across multiple
hardware platforms. Major database systems (e.g. Oracle) are beginning
to offer XML output for SQL database queries. Further, the W3C is working
on a standard for an XML query language for searching
Recently we have taken the first steps in making use of XML in our distributed sequencing informatics system. Among the applications for XML which we will describe are: 1) automated loading and analysis of sample files from multiple sources (production sequencing, finishing, cDNA, outside laboratories, etc) using naming convention documents; 2) distributed data management; 3) user preference files; and 4) sequence annotation. All of these have been accomplished using the XML Parser for Java from Datachannel. Examples of XML code as well as descriptions of the applications will be presented.
|The online presentation of this publication is a special feature of the Human Genome Project Information Web site.|