|Genome Informatics Section
DOE Human Genome Program Contractor-Grantee Workshop
87. Verification of Finished Sequence at JGI-LLNL
Karolyn J. Burkhart-Schultz, Amy
M. Brower, Arthur Kobayashi, Matt Nolan, Melissa Ramirez, and Jane E. Lamerdin
The JGI-LLNL sequencing group submitted 8.6 Mb of genomic sequence to the NCBI database in the 1997-1998 fiscal year. This accomplishment represents an 8 fold increase over our submissions for the previous year. An integral part of our finishing process is verification. The verification process allows an independent assessment of the validity of the finished assembly and final consensus sequence of each large insert clone (i.e. cosmid or Bac) project. Verification involves: 1) re-checking the finisher's validation of the assembled clone/project; 2) independent re-assembly of all the reads in the project with all finisher edits removed and comparison of this "no-edits" consensus to that submitted by the finisher; and 3) analysis of the extent and quality of the overlap of the finished clone with adjacent clones in the sequencing tiling path.
A finished project is submitted for verification along with a validation report prepared by the finisher. The verifier uses this report, as well as Consed and LLNL-developed tools to identify regions where strict standards for quality and double stranding may not be met. The verifier "re-checks" any "problematic" or difficult regions encountered during the assembly process. In addition, the final consensus is "digested" and the fragment sizes compared to those obtained by restriction mapping data compiled for each cosmid or Bac. At least three digests (e.g. BamH1, BglII, EcorI, EcoRI/BglII, or Xho1) are used in these comparisons and any significant deviations between the map and sequence data are flagged.
The purpose of the "no-edits" re-assembly of a project is to remove any possible biases introduced by the finisher in the process of obtaining contiguous sequence. Currently this assembly is performed using an earlier version of the Phrap assembly engine. If the product of the "no-edits" re-assembly is not contiguous, the reasons for any breaks are examined and explained. Similarly, base discrepancies between the "no-edits" and finished consensus sequences are examined to determine which contains the valid basecall. If the contig breaks or sequence discrepancies cannot be resolved, the verifier may request that further data be generated.
Completion of the verification process requires resolution of all issues discussed above and that the final assembly and consensus sequence are supported by the data. A verification report with explanations of various issues is added to the validation report in the project directory. Rigorous verification of finished sequence assures the integrity and the quality of the final submitted sequence in the public database.
This work was performed by Lawrence Livermore National Laboratory under the auspices of the U.S. Department of Energy, Contract No. W-7405-Eng-48.
|Author Index||Sequencing Technologies||Microbial Genome Program|
|Search||Mapping||Ethical, Legal, & Social Issues|