Skip to main content

Quality Scores for 32,000 Genomes...

Publication Type
Journal Name
Standards in Genomic Sciences
Publication Date
Page Number

More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). We examined the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 120 conserved genes in prokaryotes. Most (~87.6%) of the genomes had quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis, although only 6.1% of the genomes had a perfect quality score. We find that about 2.9% of the genomes had a score below 0.6 and probably have too low a quality to yield reliable analysis. This score corresponds to more than 1000 contigs. Comparison of the codon usage across 15,000 quality genomes found that anti-codons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU).