| This joint
journal article index is a special feature of the Human
Genome Project Information Web site in conjunction with the completion
of the Human Genome Project (2003) and the publication of the working
draft sequence (2001). |
 |
Insights
Learned from the Sequence
What
has been learned from analysis of the working draft sequence of the human
genome? What is still unknown?*
*information
taken from Science, Nature, Wellcome Trust, and Human
Genome News publications (February 15 & 16, 2001)
By the Numbers
- The human genome contains
3164.7 million chemical nucleotide bases (A, C, T, and G).
- The average gene consists
of 3000 bases, but sizes vary greatly, with the largest known human gene being
dystrophin at 2.4 million bases.
- The total number of genes
is estimated at 30,000, much lower than previous estimates of 80,000 to 140,000
that had been based on extrapolations from gene-rich areas as opposed to a
composite of gene-rich and gene-poor areas.
- The order of almost all
(99.9%) nucleotide bases is exactly the same in all people.
- The functions are unknown
for over 50% of discovered genes.
The Wheat from the Chaff
- Less than 2% of the genome
encodes for the production of proteins.
- Repetitive
sequences that do not code for proteins (sometimes called "junk DNA") make
up at least 50% of the human genome.
- Repetitive sequences
are thought to have no direct functions, but they shed light on chromosome
structure and dynamics. Over time, these repeats reshape the genome by rearranging
it, thereby creating entirely new genes or modifying and reshuffling existing
genes.
- During the past 50 million
years, a dramatic decrease seems to have occurred in the rate of accumulation
of repeats in the human genome.
How It's Arranged
- The human genome's gene-dense
"urban centers" are predominantly composed of the DNA building blocks G and
C.
- In contrast, the gene-poor
"deserts" are rich in the DNA building blocks A and T. GC- and AT-rich regions
usually can be seen through a microscope as light and dark bands on chromosomes.
- Genes appear to be concentrated
in random areas along the genome, with vast expanses of noncoding DNA between.
- Stretches of up to 30,000
C and G bases repeating over and over often occur adjacent to gene-rich areas,
forming a barrier between the genes and the "junk DNA." These CpG islands
are believed to help regulate gene activity.
- Chromosome 1 has the
most genes (2968), and the Y chromosome has the fewest (231).
How the Human Genome Compares
with That of Other Organisms
- Unlike the human's seemingly
random distribution of gene-rich areas, many other organisms' genomes are
more uniform, with genes evenly spaced throughout.
- Humans have on average
three times as many kinds of proteins as the fly or worm because of mRNA transcript
"alternative splicing" and chemical modifications to the proteins. This process
can yield different protein products from the same gene.
- Humans share most of
the same protein families with worms, flies, and plants, but the number of
gene family members has expanded in humans, especially in proteins involved
in development and immunity.
- The human genome has
a much greater portion (50%) of repeat sequences than the mustard weed (11%),
the worm (7%), and the fly (3%).
- Although humans appear
to have stopped accumulating repeated DNA over 50 million years ago, there
seems to be no such decline in rodents. This may account for some of the fundamental
differences between hominids and rodents, although gene estimates are similar
in these species. Scientists have proposed many theories to explain evolutionary
contrasts between humans and other organisms, including those of life span,
litter sizes, inbreeding, and genetic drift.
Variations and Mutations
- Scientists have identified
about 1.4 million locations where single-base DNA differences (SNPs) occur
in humans. This information promises to revolutionize the processes of finding
chromosomal locations for disease-associated sequences and tracing human history.
- The ratio of germline
(sperm or egg cell) mutations is 2:1 in males vs females. Researchers point
to several reasons for the higher mutation rate in the male germline, including
the greater number of cell divisions required for sperm formation than for
eggs.
What We Still Don't Understand:
A Checklist for Future Research
- Exact gene number, exact
locations, and functions
- Gene regulation
- DNA sequence organization
- Chromosomal structure
and organization
- Noncoding DNA types,
amount, distribution, information content, and functions
- Coordination of gene
expression, protein synthesis, and post-translational events
- Interaction of proteins
in complex molecular machines
- Predicted vs experimentally
determined gene function
- Evolutionary conservation
among organisms
- Protein conservation
(structure and function)
- Proteomes (total protein
content and function) in organisms
- Correlation of SNPs (single-base
DNA variations among individuals) with health and disease
- Disease-susceptibility
prediction based on gene sequence variation
- Genes involved in complex
traits and multigene diseases
- Complex systems biology,
including microbial consortia useful for environmental restoration
- Developmental genetics,
genomics
Applications, Future
Challenges
Deriving meaningful knowledge from the DNA sequence will define research through
the coming decades to inform our understanding of biological systems. This enormous
task will require the expertise and creativity of tens of thousands of scientists
from varied disciplines in both the public and private sectors worldwide.
The draft sequence already
is having an impact on finding genes associated with disease. Over 30 genes
have been pinpointed and associated with breast cancer, muscle disease, deafness,
and blindness. Additionally, finding the DNA sequences underlying such common
diseases as cardiovascular disease, diabetes, arthritis, and cancers is being
aided by the human variation maps (SNPs) generated in the HGP in cooperation
with the private sector. These genes and SNPs provide focused targets for the
development of effective new therapies.
One of the greatest impacts
of having the sequence may well be in enabling an entirely new approach to biological
research. In the past, researchers studied one or a few genes at a time. With
whole-genome sequences and new high-throughput technologies, they can approach
questions systematically and on a grand scale. They can study all the genes
in a genome, for example, or all the transcripts in a particular tissue or organ
or tumor, or how tens of thousands of genes and proteins work together in interconnected
networks to orchestrate the chemistry of life.
Post-sequencing projects
are well under way worldwide. (See Genomics:GTL).
These explorations will result in a profound, new, and more comprehensive understanding
of complex living systems, with applications to agriculture, human health, energy,
global climate change, and environmental remediation, among others.