Scientists complete first chapter of book of life with decoding of chromosome
An international team of researchers has achieved a scientific milestone
by unraveling for the first time the genetic code of an entire human chromosome.
Reported in this week's issue of Nature (Dec. 2), researchers at the Sanger Centre near
Cambridge, England; University of Oklahoma, Norman, OK; Washington University, St. Louis,
MO; and Keio University in Japan have succeeded in deciphering the sequence of the 33.5
million "letters," or chemical components, that make up the DNA of chromosome 22.
This sequence includes the longest, continuous stretch of DNA ever deciphered and assembled.
It is over 23 million letters in length.
Each human gene is made up of a series of chemical building blocks represented by letters, A
(adenine), T (thymine), G (guanine) and C (cytosine). The number and order of these letters,
also called bases, determine what we are, how we look, and the diseases to which we may be
predisposed. The chromosome 22 team has deduced the text of one chapter of the human genetic
The next mammoth task is to determine what it all means. Sequencing and mapping efforts have
already revealed that chromosome 22 is implicated in the workings of the immune system,
congenital heart disease, schizophrenia, mental retardation, birth defects, and several cancers
including leukemia. But, the scientific team agrees that many more secrets are to be discovered
in this decoded text.
The sequencing of chromosome 22 permits scientists for the first time to view the entire DNA of
"This is the first time that we have been able to see the organization of a chromosome at the base
pair level," said Dr. Ian Dunham, senior research fellow at the Sanger Centre and leader of the
research team that deciphered chromosome 22. "This immediately suggests new experiments
and avenues of research which can be pursued."
"To see the entire sequence of a human chromosome for the first time is like seeing an ocean
liner emerge out of the fog, when all you've ever seen before were rowboats," said Dr. Francis
Collins, director of the National Human Genome Research Institute of the National Institutes of
Health which supported the U.S. contribution to the sequencing of chromosome 22.
University of Oklahoma scientist Dr. Bruce Roe, one of the researchers who deciphered the
sequence of chromosome 22, added, "It's incredible. For the first time we can stand back and
view a picture of all the structures and other features of a human chromosome, to see how a
chromosome is organized. Now we can begin to understand where genes are located on
chromosomes, how they express themselves, how deletions that give rise to disease-causing
mutations occur, and how chromosomes are duplicated and inherited."
Chromosome 22 is the first of 23 human chromosome pairs to be deciphered because of its
relatively small size and its association with several diseases and because of the groundwork of
several scientists beginning in the early 1990s.
Because protein-coding genes do not seem to occur on the short arm of chromosome 22, the
scientists focused on the chromosome's long arm, which is richer in genes relative to other
human chromosomes. Ninety seven percent of this arm was sequenced.
The sequence contains 11 gaps or areas that could not be deciphered with current technology.
The location and size of the gaps were determined. The 33.5 million bases of sequenced DNA
are extremely high quality with an error rate of less than one in 50,000 bases.
The sequence reveals the following about the landscape of chromosome 22:
A total number of at least 545 genes and 134 pseudogenes (genes that once functioned but no
longer do) were detected on the chromosome, with 200 to 300 additional ones likely. If
representative of other chromosomes, this count suggests that the total number of genes on all
human chromosomes will not be substantially more or less than the previously estimated number
The genes range in size from 1,000 to 583,000 bases of DNA with a mean size of 190,000 bases.
A total of 39 percent of the chromosome is copied into RNA (exons and introns), while only 3
percent of the chromosome encodes protein.
A total of 247 genes were revealed by computer analyses to be identical to previously identified
human genes or protein sequences. Computer analysis of the chromosome 22 sequence found
150 additional genes with DNA sequence similarity to known genes. An additional 148 predicted
genes containing sequence homologous to known genetic markers (ESTs) were identified.
Several gene families appear to have arisen by tandem duplication. There are families of genes
that are interspersed among other genes and distributed over large chromosomal regions.
There is unexpected long-range complexity of the chromosome with an elaborate array of repeat
sequences near the centromere of the chromosome. The existence of so much repetitive DNA
information could help explain how this chromosome rearranges or reshuffles its DNA, leading
to human disorders such as DiGeorge syndrome, which includes a form of mental retardation,
and how chromosome structure changes over time.
An unexpected finding shows several regions where recombination is increased, and others
where it is suppressed, and these will probably play a role in health and disease.
Comparing the chromosome 22 sequence to known gene sequences of the mouse, a lab animal
frequently used to facilitate understanding of human genetic disorders, the research team found
160 human genes that have comparable sequences in the mouse. Examining the chromosomal
locations of the mouse genes that have counterparts on the human chromosome 22 shows that
the order of the genes along the chromosome in the two species is genetically conserved,
although the mouse homologs of human genes on chromosome 22 are dispersed to eight
different mouse chromosomal regions.
The sequencing of the DNA of chromosome 22 was conducted as part of the international
Human Genome Project, which involves scientists in the U.S., England, Japan, France, Germany
In deciphering chromosome 22, scientists used the approach that has been developed and widely
tested by the Human Genome Project. This approach involves sequencing overlapping cloned
segments of DNA from known locations on the chromosome.
Until now, scientists were uncertain about whether an entire human chromosome could be
sequenced in this manner. For example, they did not know whether insurmountable problems
would prevent assembling their sequencing data. The presence of a small number of unclonable
gaps was not unexpected, but the scientists carrying out this project adhered to the agreed upon
standard that a chromosome should not be considered "essentially complete," until the sequence
of regions that are clonable and sequenceable with current technology have been determined to
high accuracy, and the sizes of any remaining gaps have been determined.
"That chromosome 22 was essentially sequenced by using overlapping clones increases our
confidence that the Human Genome Project will be able to complete a 'working draft' of the
DNA sequence of the human genome in Spring 2000 and finish it by 2003," said Dr. Richard
Wilson, co-director of the Genome Sequencing Center at Washington University School of
Medicine in St. Louis and member of the research team that deciphered chromosome 22.
The results of the Human Genome Project, which are freely accessible through public databases
such as GenBank (www.ncbi.nlm.nih.gov/genome/seq),
give scientists insight into the way genes are arranged along a strip of DNA and paves the way for
major advances in the diagnosis and treatment of disease.
Knowing the identity and order of the chemical components of the DNA of the 23 pairs of
chromosomes that are found in almost every human cell provides a tool to determine the basis of
health and disease. "The fact that all of this information is now freely available for scientists to
use, without the constraints of patents and fees, is of major importance, if the knowledge of our
genetic make-up is to be used for the good of mankind," said Dr. Michael Morgan, chief
executive of the Wellcome Trust Genome Campus, which is home to the Sanger Centre.
For more information contact:
- Department of Energy Human Genome Program, Jeff Sherwood, 202-586-5806
- NHGRI, Cathy Yarborough 301-594-0954
- Sanger Centre, Jane Rogers, 44 122 383-4244 (United Kingdom)
- Washington University School of Medicine in St. Louis, Linda Sage, 314-286-0119