Bioinformatics Section 

DOE Human Genome Program Contractor-Grantee Workshop VIII
February 27-March 2, 2000  Santa Fe, NM


Home
 
PDF

Author Index
Sequencing
Table of Contents
Abstracts   
Instrumentation
Table of Contents
Abstracts
Mapping 
Table of Contents
Abstracts
Bioinformatics
Table of Contents
Abstracts
Function and cDNA Resources
Table of Contents
Abstracts

Microbial Genome Program
Table of Contents
Abstracts
Ethical, Legal, and Social Issues
Table of Contents
Abstracts
Infrastructure
Table of Contents
Abstracts

Ordering Information

Abstracts from
Past Meetings

81. Working Examples of XML in the Management of Genomic Data

J. D. Cohn and M. O. Mundt

Bioscience Division and DOE Joint Genome Institute, Los Alamos National Laboratory, Los Alamos, NM 87545

jcohn@lanl.gov

XML is fast becoming the universal format for structured data exchange and documents on the Web. Standards for XML were developed by the World Wide Web Consortium (W3C) and have been adopted for a wide range of applications from e-commerce to mathematics and chemistry. Unlike HTML, XML was designed to be extended and offers a much richer base to build upon (including capabilities for using binary as well as ASCII data).

Until now, exchange of genomic data has been limited primarily to FastA files and a few proprietary or application-specific formats. XML seems to offer an ideal means of enhancing our capability of exchanging data within the genomic community. Using a growing array of XML parsers and other development tools, XML formatted data can be utilized by software applications written in a variety of different languages across multiple hardware platforms. Major database systems (e.g. Oracle) are beginning to offer XML output for SQL database queries. Further, the W3C is working on a standard for an XML query language for searching
XML documents directly.

Recently we have taken the first steps in making use of XML in our distributed sequencing informatics system. Among the applications for XML which we will describe are: 1) automated loading and analysis of sample files from multiple sources (production sequencing, finishing, cDNA, outside laboratories, etc) using naming convention documents; 2) distributed data management; 3) user preference files; and 4) sequence annotation. All of these have been accomplished using the XML Parser for Java from Datachannel. Examples of XML code as well as descriptions of the applications will be presented.

 


The online presentation of this publication is a special feature of the Human Genome Project Information Web site.