Announcements on the First Analysis of Genome Sequence
February 12, 2001
Over the last decade, genomes have been sequenced for more than 40 species, mostly bacteria. The human genome sequence is 8 times larger than all the previously sequenced genomes put together. In 1990, the Human Genome Project began as an international collaboration propelled by the hope that global views of entire genomes would allow researchers to attack scientific problems in systematic and unbiased ways. In its early years, the HGP produced maps of the human and mouse genomes and sequenced the genomes of yeast and nematode worm. Now, it has produced a 94%-complete working draft of the human genome sequence, the totality of human DNA, where each letter in the draft has been read an average of 5 times. About 30% of the human genome has been sequenced with more than twice this redundancy, resulting in highly accurate "finished" sequence. For example, the whole of chromosomes 21 and 22 have been sequenced to a finished state. No later than 2003, all the human chromosomes will be sequenced to a finished state.
The Human Genome Project first separated the genome into large "clones" - segments of DNA each representing about 0.005% of the whole genome - before chopping the clones and sequencing small fragments. Using such clones whose positions are known added to the confidence that the genome sequence would be assembled properly and allowed effective international collaboration. All collaborators in the project made sequence data publicly available without restriction within 24 hours. Large blocks of highly repetitive sequence, for example at the tips of chromosome arms and at the centromeres (the portions of chromosomes that appear as pinched centers when chromosomes are condensed) have been avoided, because current technology cannot yet sequence these regions.
The total human genome, contained in a set of 23 chromosomes, is now estimated to contain 3,164.7 million letters (or nucleotides). Genome size does not always correlate with the apparent complexity of a species because of the large amounts of repetitive sequence in many genomes. In humans the actual part of the genome that codes for proteins makes up less than 2% of the genome while repeated sequences make up at least 50% of the genome. Repetitive sequences are thought to have no direct functions, but they shed light on chromosome structure and dynamics. They hold important clues about evolutionary events, help chart mutation rates, and by seeding DNA rearrangements, they can modify genes and create new ones. They also serve as tools for genetic studies.
The vast majority of repeated sequences in the human genome are derived from transposable elements - sequences like those that form viral genomes - that propagate by inserting fresh copies of themselves in random places in the genome. A full 45% of the human genome derives from such transposons. A major surprise of this new global analysis of the human genome is that many components in this diverse array of repeated sequences, traditionally considered to be "junk," appear to have played a beneficial role over the course of human evolution.
Genes are sprawled over much larger regions in humans compared with fruit fly and nematode worm. Genes remain difficult to identify in humans because they form such a small portion of the genome and are so spread out, but it appears that the total number of genes is 30,000-35,000, close to the number originally estimated some 20 years ago, but much smaller than more recent estimates. Apparently, humans have only twice as many genes as the fly or worm, but they have on average three times as many kinds of proteins because of "alternative splicing," a process that can yield different protein products from the same gene.
Compared with the organisms whose genomes have been sequenced before, humans have a particular abundance of proteins involved in cell structure, defense and immunity, DNA copying, the synthesis of RNA and proteins, and communication between cells. Humans have an unusually high number of complex proteins that fit into more than one functional category and many proteins that are embedded in the surface of cells.
Since the genome sequence has been released as it was generated over the last four years, a large number of discoveries have already been spawned by the sequence data. At least 30 different disease genes have been identified by directly using sequence produced by the HGP. In the coming years, the human genome should be sequenced to a finished state, where all gaps are closed and the sequence is at least 99.99% accurate. Genome sequence from other species will provide crucial insights about genes and the regions that regulate their activity. There will be a pressing need for improved methods to analyze the abundance of information being generated. And genetics will become an increasingly important part of the medical mainstream. The pressure will grow to encourage educated use of genetic information and to set thoughtful limits on its use.
Papers are available online at http://www.nature.com/genomics/
Last modified: Wednesday, October 29, 2003
Home * Contacts * Disclaimer
Document Use and Credits
Publications and webpages on this site were created by the U.S. Department of Energy Genome Program's Biological and Environmental Research Information System (BERIS). Permission to use these documents is not needed, but please credit the U.S. Department of Energy Genome Programs and provide the website http://genomics.energy.gov. All other materials were provided by third parties and not created by the U.S. Department of Energy. You must contact the person listed in the citation before using those documents.
Base URL: www.ornl.gov/hgmis
Site sponsored by the U.S. Department of Energy Office of Science, Office of Biological and Environmental Research, Human Genome Program