![]() |
Genome Sequencing Section
DOE Human Genome Program Contractor-Grantee Workshop
VII
|
| 10. Sequence Validation and Quality
Assessment at the Joint Genome Institute
M. Bussod, N. Doggett, J. Fawcett,
D. Ricke, K. Watson, O. Tatum, P.S. White, and M. Mundt
The Joint Genome Institute (JGI) is committed to producing high quality finished sequence data with fewer that 1 error in 10,000 bases. To ensure that we meet these strict criteria the JGI prescribes to a quality control process which requires that greater than 95% of all finished bases have Phrap scores greater than 40 and at least 95% of all bases are covered in reads from both strands (or 2 chemistries). In addition to these quality control criteria, the JGI has implemented post-sequencing Validation and Quality Assessment processes, which occur in 2 phases within the Joint Genome Institute. Sequence Validation occurs at each sequencing site prior to the submission of a sequence and involves comparing the final assembled sequence to 3 independent high-resolution restriction fingerprints. This pre-submission Sequence Validation process ensures that the finished sequence has been assembled correctly. The Quality Assessment process is a post-submission assessment of the sequence produced by the JGI. LANL has the responsibility for performing this Quality Assessment process for all of the sequence produced by the Joint Genome Institute. During the summer of 1998 our group successfully completed one round of sequence quality assessment sponsored by the NIH of 600 kb of finished sequence from 3 NIH centers, and we have recently begun a second round of this assessment involving greater than 1.2 Mb of finished sequence from three centers including the Sanger Institute. Our strategy for the Quality Assessment process is to identify the poorest quality regions within each finished clone and target these for verification. Software tools are being developed to evaluate the quality of clone sequencing projects based on Phred and Phrap scores. In addition, the techniques used and software modules written can be applied to the task of choosing optimal targets for resequencing. Base calling and structural assembly errors can be identified by using PCR, for example, and sequencing if necessary. Determination of sequence error probability is based on the Phrap values of the consensus bases where each base is given a P-value, the probability of the base being incorrect, depending on its quality. If the data is given in the form of a histogram, the calculation of the probability values for each clone project is dependent on the proportion of bases within each quality range. We used this technique to find good candidates for our JGI validation effort without requiring the full set of quality values. However, if the Phrap value of every base is available, a more accurate prediction of error rate is possible. In this case, sliding windows of consecutive bases can also be evaluated to detect regions with higher error rates and design targets for resequencing. In either case, correction factors can also be applied to the error calculations to account for the supposed conservative nature of the Phrap scoring system. The approaches described above are among those being compiled into a set of Java tools whose uses extend beyond just validation. Finishing requirements often mirror the needs of a quality assessment project. Right now, we use a similar version of a Java filter around a Primer3-based program to select oligonucleotide sequences for both finishing primer walks and validation PCR primers. In the NIH QA exercise, our success rate for getting PCR products was about 85%, even though we targeted more difficult regions to sequence. We recently received over 130,000 BAC end sequences from two centers to evaluate the DOE-funded BAC end sequencing effort. Studying these should be a new, exciting challenge with great potential benefit to the sequencing community. Supported by USDOE under contract W-7405-ENG-36. |
| Home | Sequencing | Functional Genomics |
| Author Index | Sequencing Technologies | Microbial Genome Program |
| Search | Mapping | Ethical, Legal, & Social Issues |
| Order a copy | Informatics | Infrastructure |