![]() |
Exceptional Chromosome Regions II |
Jeffrey A. Bailey, Amy M. Yavor, Julie E. Horvath, Barbara Trask, and Evan
E. Eichler
Department of Genetics and Center for Human Genetics, Case Western Reserve School
of Medicine and University Hospitals of Cleveland, Cleveland, OH, 44106
Segmental duplications play fundamental roles in both genomic disease and gene
evolution. Over the past year, we developed the computational tools and methods
necessary to detect identity between long stretches of genomic sequence despite
the presence of high copy repeats and large insertion-deletions. We focused
our analysis on large recent duplication events that fell well-below levels
of draft sequencing error (alignments 90-98% similar and >1 kb in length)
revealing an unprecedented amount of duplicated sequence (3.6%) in the human
draft assembly (oo15). Here we present a more refined analysis of the most recent
genome assembly (oo23) in which we focus on the role duplications play in whole-genome
assembly process. Duplications (90-98%; > 1 kb) comprise 3.6% of all sequence
in oo23. These duplications show clustering and up to 10-fold enrichment within
pericentromeric and subtelomeric regions. Despite this bias, complex regions
of duplication have also been identified within gene rich regions. In terms
of assembly, duplicated sequences are 6.7-fold over-represented in unordered
and unassigned contigs indicating that duplicated sequences are difficult to
assign to their proper position. Further, utilizing data from 134 sequence BACs
with FISH signals to multiple chromosomes, only 57% (280/571) of chromosomes
positive by FISH had a corresponding chromosomal BLAST hit to oo23. We present
data that indicates that this is due to misassembly/misassignment and decreased
sequencing coverage within duplicated regions. Suprisingly, if we consider putative
duplications >98%, we identify 10.3% (286 Mb) of the current assembly as
paralogous. At high similarities (>98%) 10.3% of the sequence is involved
in pairwise alignments. The majority of these alignments, we believe, represent
unmerged overlaps within unique regions. Taken together the above data indicates
that segmental duplications represent a significant impediment to accurate human
genome assembly, requiring the development of specialized techniques to finish
these exceptional regions of the genome. Specific examples from chromosomes
16 and 19 will be presented.
Base URL: www.ornl.gov/meetings/ecr2/
Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program