previous contents

Introducing the Human

Genome




The Recipe for Life

For all the diversity of the world's five and a half billion people, full of creativity and contradictions, the machinery of every human mind and body is built and run with fewer than 100,000 kinds of protein molecules. And for each of these proteins, we can imagine a single corresponding gene (though there is sometimes some redundancy) whose job it is to ensure an adequate and timely supply. In a material sense, then, all of the subtlety of our species, all of our art and science, is ultimately accounted for by a surprisingly small set of discrete genetic instructions. More surprising still, the differences between two unrelated individuals, between the man next door and Mozart, may reflect a mere handful of differences in their genomic recipes -- perhaps one altered word in five hundred. We are far more alike than we are different. At the same time, there is room for near-infinite variety.

It is no overstatement to say that to decode our 30,000 genes in some fundamental way would be an epochal step toward unraveling the manifold mysteries of life.

Some DNA details (49k GIF)

Some definitions

The human genome is the full complement of genetic material in a human cell. (Despite five and a half billion variations on a theme, the differences from one genome to the next are minute; hence, we hear about the human genome -- as if there were only one.) The genome, in turn, is distributed among 23 sets of chromosomes, which, in each of us, have been replicated and re-replicated since the fusion of sperm and egg that marked our conception. The source of our personal uniqueness, our full genome, is therefore preserved in each of our body's several trillion cells. At a more basic level, the genome is DNA, deoxyribonucleic acid, a natural polymer built up of repeating nucleotides, each consisting of a simple sugar, a phosphate group, and one of four nitrogenous bases. The hierarchy of structure from chromosome to nucleotide is shown in Some DNA details. In the chromosomes, two DNA strands are twisted together into an entwined spiral -- the famous double helix -- held together by weak bonds between complementary bases, adenine (A) in one strand to thymine (T) in the other, and cytosine to guanine (C-G). In the language of molecular genetics, each of these linkages constitutes a base pair. All told, if we count only one of each pair of chromosomes, the human genome comprises about three billion base pairs.

The specificity of these base-pair linkages underlies all that is wonderful about DNA. First, replication becomes straightforward. Unzipping the double helix provides unambiguous templates for the synthesis of daughter molecules: One helix begets two with near-perfect fidelity. Second, by a similar template-based process, depicted in From genes to proteins, a means is also available for producing a DNA-like messenger to the cell cytoplasm. There, this messenger RNA, the faithful complement of a particular DNA segment, directs the synthesis of a particular protein. Many subtleties are entailed in the synthesis of proteins, but in a schematic sense, the process is elegantly simple.

From genes to proteins (67k GIF)

Every protein is made up of one or more polypeptide chains, each a series of (typically) several hundred molecules known as amino acids, linked by so-called peptide bonds. Remarkably, only 20 different kinds of amino acids suffice as the building blocks for all human proteins. The synthesis of a protein chain, then, is simply a matter of specifying a particular sequence of amino acids. This is the role of the messenger RNA. (The same nitrogenous bases are at work in RNA as in DNA, except that uracil takes the place of the DNA base thymine.) Each linear sequence of three bases (both in RNA and in DNA) corresponds uniquely to a single amino acid. The RNA sequence AAU thus dictates that the amino acid asparagine should be added to a polypeptide chain, GCA specifies alanine -- and so on. A segment of the chromosomal DNA that directs the synthesis of a single type of protein constitutes a single gene.

A plan of action

In 1990 the Department of Energy and the National Institutes of Health developed a joint research plan for their genome programs, outlining specific goals for the ensuing five years. Three years later, emboldened by progress that was on track or even ahead of schedule, the two agencies put forth an updated five-year plan. Improvements in technology, together with the experience of three years, allowed an even more ambitious prospect.

In broad terms, the revised plan includes goals for genetic and physical mapping of the genome, DNA sequencing, identifying and locating genes, and pursuing further developments in technology and informatics. To a large extent, the following pages are devoted to a discussion of just what these goals mean, and what part the DOE is playing in pursuing them. In addition, the plan emphasizes the continuing importance of the ethical, legal, and social implications of genome research, and it underscores the critical roles of scientific training, technology transfer, and public access to research data and materials. Most of the goals focus on the human genome, but the importance of continuing research on widely studied "model organisms" is also explicitly recognized.

Among the scientific goals of human genome research, several are especially notable, as they provide clear milestones for future progress. In reciting them, however, it is important to note an underlying assumption of adequate research support. Such support is obviously crucial if the joint plan is to succeed. Some of the central goals for 1993-98 follow:

  • Complete a genetic linkage map at a resolution of two to five centimorgans by 1995 -- As discussed in Exploring the Genomic Landscape, this goal was far surpassed by the fall of 1994.

  • Complete a physical map at a resolution of 100 kilobases by 1998 -- This implies a genome map with 30,000 "signposts," separated by an average of 100,000 base pairs. Further, each signpost will be a sequence-tagged site, a stretch of DNA with a unique and well-defined DNA sequence. Such a map will greatly facilitate "production sequencing" of the entire genome. By the end of 1995, molecular biologists were halfway to this goal: A physical map was announced with 15,000 sequence-tagged signposts. Physical mapping is discussed in the first half of Exploring the Genomic Landscape.

  • By 1998 develop the capacity to sequence 50 million base pairs per year in long continuous segments -- Adequate fiscal investment and continuing progress beyond 1998 should then produce a fully sequenced human genome by the year 2005 or earlier. Sequencing is the subject of the second half of Exploring the Genomic Landscape.

  • Develop efficient methods for identifying and locating known genes on physical maps or sequenced DNA -- The goals here are less quantifiable, but the aim is central to the Human Genome Project: to home in on and ultimately to understand the most important human genes, namely, the ones responsible for serious diseases and those crucial for healthy development and normal functions.

  • Pursue technological developments in areas such as automation and robotics -- A continuing emphasis on technological advance is critical. Innovative technologies, such as those described in Beyond Biology, are the necessary underpinnings of future large-scale sequencing efforts.

  • Continue the development of database tools and software for managing and interpreting genome data -- This is the area of informatics, discussed in Beyond Biology. The challenge is not so much the volume of data, but rather the need to mount a system compatible with researchers around the world, and one that will allow scientists to contribute new data and to freely interrogate the existing databases. The ultimate measure of success will be the ease with which biologists can fruitfully use the information produced by the genome project.

  • Continue to explore the ethical, legal, and social implications of genome research -- Much emphasis continues to be placed on issues of privacy and the fair use of genetic information. New goals focus on defining additional pertinent issues and developing policy responses to them, disseminating policy options regarding genetic testing services, fostering greater acceptance of human genetic variation, and enhancing public and professional education that is sensitive to sociocultural and psychological issues. This side of the genome project is discussed in Ethical, Legal, and Social Implications.

previous contents



To Know Ourselves was prepared at the request of the U.S. Department of Energy, Office of Health and Environmental Research, as an overview of the Human Genome Project.