Mapping and Resources
abstracts from the
Maria de Fatima Bonaldo, Greg Lennon and Marcelo Bento Soares
The methods we originally developed to normalize directionally cloned cDNA libraries (Soares et al., 1994; Bonaldo, Lennon & Soares, 1996) have been successfully utilized to generate a number of human, mouse and rat cDNA libraries. All human and mouse libraries (and soon the rat libraries as well) have been contributed to the I.M.A.G.E. consortium and they have been extensively used for large scale generation of expressed sequence tags (ESTs). Both the ESTs and their respective clones are publicly available. Although the use of normalized libraries has proven most advantageous to minimize the redundant identification of the mRNAs of the super-prevalent and intermediate frequency classes within a particular tissue, it cannot prevent the redundant identification of mRNAs (of any frequency class) that are expressed in multiple tissues. In other words, normalization alone cannot avoid the redundant identification of ESTs that have been obtained previously from other libraries. This problem is becoming increasingly more relevant as we approach completion of the ongoing human and mouse gene discovery efforts. Hence, we proposed to take advantage of subtractive hybridization strategies that we developed, to generate libraries enriched for novel cDNAs. Briefly, pools of I.M.A.G.E. clones from which ESTs have been derived, are used as drivers in hybridizations with single or multiple normalized libraries thus generating subtracted libraries enriched for cDNAs not yet represented in public databases. Subtracted libraries are characterized by Southern hybridization to assess reduction in representation of clones of the driver population and then contributed to the I.M.A.G.E. consortium for large-scale arraying and sequencing. Sequence analysis of two subtracted libraries that we have generated indicated a four-fold reduction in representation of the driver population. It is anticipated that the use of subtracted libraries will become increasingly advantageous as we strive towards the ultimate goal of identifying all human and mouse genes. This project has two major goals: (1) to optimize the method for construction of subtracted libraries, and (2) to generate subtracted libraries to facilitate the ongoing human and mouse EST programs.
Auffray, C.1, Devignes, M.D.1, Pietu, G.1,
Ansorge, W.2, Schwager, C.2, Ballabio, A.3,
Borsani, G.3, Banfi, S.3, Estivill, X.4, Lynch,
Gibson, K.5, Mundy, C.5, Lehrach, H.6, Poustka,
A.7, Wiemann, S.7, Korn, B.7, O'Brien, J.7,
M.8, Lundeberg, J.8
The general objectives of the European IMAGE Consortium are:
The European IMAGE Consortium will devote 20% of the resources to the assembly of the physical resources (cDNA and CpG island arrays characterized by end sequencing and fingerprinting, minimal sets of clones selected after comprehensive sequence, clone and functional clustering, master set of 3,000 full-length clones). These resources will be available throughout the European Union for the user community in both academy and industry. The Consortium will serve the future needs of the scientific community in the systematic identification of all human genes and their regulatory sequences by deciphering in an efficient and economic manner 6 Mb of complete, finished sequence for 3,000 transcripts of average size 2 kb: 50% of the resources will be devoted to the sequencing of full-length cDNAs and CpG islands selected by Consortium on the basis of their map position, similarity to known families or expression profiles. In order to ensure that advances in basic genetic knowledge is used to further enhance human health, the Consortium will seek to contribute to the identification of genes involved in human biology and diseases by correlating precise map location and phenotypic expression data, exploiting various comparative approaches for 1,000 of the genes represented in the master set: 10% of the resources will be devoted to high-resolution and comparative functional mapping in close interaction with the mapping consortia in order to obtain the most precise and evolutionary relevant map location. Last but not least, 20% of the resources will be devoted to the IMAGE Consortium Data Base. This will provide the community with up to date, integrated sequence, mapping and expression data related to the IMAGE consortium arrays, as they are collected by IMAGE Consortium members in Europe, the United States and Japan, and will help in sharing and harmonizing such data.
Supported by the participating institutions and the European Union BIOMED 2 Program.
Dabney Johnson, Monica Justice, Ken Beattie,
Michelle Buchanan, Michael Ramsey, Rose
Ramsey, Michael Paulus, Nance Ericson, David
Allison, Reid Kress, Richard Mural, Ed
Uberbacher, Reinhold Mann
The Functional Genomics Initiative at the Oak Ridge National Laboratory integrates outstanding capabilities in mouse genetics, bioinformatics, and instrumentation. The 50 year investment by the DoE in mouse genetics/mutagenesis has created a one-of-a-kind resource for generating mutations and understanding their biological consequences. It is generally accepted that, through the mouse as a surrogate for human biology, we will come to understand the function of human genes. In addition to this world class program in mammalian genetics, ORNL has also been a world leader in developing bioinformatics tools for the analysis, management and visualization of genomic data. Combining this expertise with new instrumentation technologies will provide a unique capability to understand the consequences of mutations in the mouse at both the organism and molecular levels.
The goal of the Functional Genomics Initiative is to develop the technology and methodology necessary to understand gene function on a genomic scale and apply these technologies to megabase regions of the human genome. The effort is scoped so as to create an effective and powerful resource for functional genomics. ORNL is partnering with the Joint Genome Institute and other large scale sequencing centers to sequence several multimegabase regions of both human and mouse genomic DNA, to identify all the genes in these regions, and to conduct fundamental surveys to examine gene function at the molecular and organism level. The Initiative is designed to be a pilot for larger scale deployment in the post-genome era. Technologies will be applied to the examination of gene expression and regulation, metabolism, gene networks, physiology and development.
The initiative was launched in 1996 and is comprised of the following component efforts:
Directed High-Efficiency Mouse Mutagenesis (Monica Justice) - We have established a comprehensive high-efficiency mouse germline mutagenesis program to examine mammalian gene function. N-ethyl-N-nitrosourea (ENU) is currently the mutagen of choice because it induces point mutations that reflect single gene function. Large genomic regions can be scanned for mutations that may reflect loss of function, gain of function, or partial loss of function. In parallel, we are developing sequence-ready BAC contigs of a region that we are mutagenizing. Allelic series for loci will be obtained that will complement other mutagenesis approaches, such as gene disruptions. Our program incorporates many different methods of genetic screening to isolate mutations, and the mutations will serve as a resource to the mouse and human genome communities.
The Mouse Screenotyping Center (Dabney Johnson) - We have established a mouse phenotype-screening center to complement and extend current mouse mutational analyses by developing screening protocols for biochemical and behavioral abnormalities. High-throughput methodologies are being developed to efficiently screen for induced aberrations in behaviors, in locomotor and neuromuscular function, and in biochemical and hematological parameters. The screens are designed to detect mouse models of a variety of human diseases from among large populations of potentially mutant mice.
Comprehensive Cellular Protein Analysis using Microfluidic Devices (Rose Ramsey) - ORNL's unique lab-on-a-chip technology is being combined with mass spectrometric methods to provide uniquely powerful methods for comprehensive cellular protein analysis using microfluidic devices. The microchips will integrate on a single structure, elements that enable multidimensional separations of protein mixtures with electrospray ionization of the analytes for direct, on-line interfacing with mass spectrometry. This system will provide unparalleled throughput and sensitivity, allowing robust protein identification in small samples.
Gene Expression Studies using Genosensor Microchip Technology (Ken Beattie) - Porous glass flowthrough Genosensor chip technology provides unique hybridization surface area and sensitivity for use in large-scale gene expression studies, mapping, and mutational surveillance.
Genome Lab-on-a-Chip (Michael Ramsey) - Microfluidic and microinstrumentation capabilities are being applied to labor intensive assays used in gene mapping and analysis of genetic mutations and polymorphisms. "Laboratory-on-a-chip" technology is being used to automate assay procedures, increase analysis rates, eliminate the use of radioactive isotopes, and reduce the consumption of limited DNA samples. Current efforts are focused on development of multiplexed microdevices for PCR-based analysis of simple tandem repeat markers used in the genetic and physical mapping of the mouse genome.
Mouse Physiological Monitoring Microchip (Nance Ericson) - Unique integrated circuitry is being created to provide a capability for large scale monitoring of mouse physiological parameters related to physiological or behavioral abnormalities due to mouse mutations. Application of this new technology to genome studies will accelerate mass specimen screening by providing automated detailed observation and reporting of multiple key physiological parameters, such as body temperature, heart rate, physical activity level, location and motion patterns using a microscale implanted device.
High-Throughput Tomographic Imaging of Mouse Phenotypes (Michael Paulus) - A new high-resolution high-throughput automated 3D mouse imaging and screening methodology is being created which will rapidly identify, quantify and record subtle phenotypes in mutagenized mice, based on a novel x-ray imaging technology with 50 micron spatial resolution and a 3D dataset acquisition time of <1 minute per mouse.
Development of DNA Sequencing and Automated Genotyping Infrastructure (Richard Mural) Basic capabilities for DNA sequencing and automated genotyping are being developed to support mouse genetics and molecular biology. Not only are these capabilities necessary for addressing basic problems in genetic and molecular biology but they will also and flexibility to the program for initiating new projects. Having the DNA sequence of regions which are being mutagenized will also be important to mutation detection.
Event-Based Flexible Automated DNA Sequencing (Reid Kress) - By moving automated DNA sequencing from a sequential, batch-mode process to a continuous mode hands off process we will be able to maximize the efficiency and cost-effectiveness of our limited DNA sequencing resources at ORNL.
Bioinformatics Support for Gene Function (Ed Uberbacher) - A comprehensive system to collect and assemble information relevant to the function of newly discovered genes is being constructed for application to sequenced human and mouse regions in the Functional Genomics Initiative. This includes support for initial gene finding and informatics-based characterization, and compilation of experimental information generated related to mouse mutations, gene and protein expression studies, and computational and experimental information derived from numerous external databases and other model organisms. The resulting catalog of gene function information will be linked to genome-wide browsing being produced by the Genome Annotation Consortium.
Direct Visualization of Regulatory Protein-DNA complexes and Mutations using AFM (David Allison) - High-throughput atomic force microscopy will be used to precisely locate and visualize proteins bound to individual DNA molecules or genes, including complexes related to regulation of gene expression and repair enzymes site-specifically bound to lesion sites. Automated image analysis and interpretation systems are being applied to locate and characterize the protein-DNA complexes.
Mass Spectrometry for DNA Analysis (Michelle Buchanan) - To complement the genome-wide mutation screening efforts, we are examining parameters in mass spectroscopy-based DNA analysis. Our initial efforts have focused on optical imaging and diagnostics of the matrix assisted laser desorption ionization (MALDI) plume, and investigations of the crystallization of DNA for the MALDI matrix. In parallel, we are examining methods for higher-throughput DNA analysis necessary for large-scale genome-wide polymorphism analysis.
Sandra L. McCutchen-Maloney, Mark Shannon,
Jane Lamerdin and Michael P. Thelen
The human XPF gene is required for the cellular response to DNA damage caused by ultraviolet radiation and mutagenic chemicals. Mutation in this gene leads to one form of the disease syndrome xeroderma pigmentosum. Homologs of the XPF protein are present in fly, worm, plant, yeast and archea, indicating conservation of an essential function throughout all of evolution. The expression of mouse XPF in pachytene spermatocytes corroborates other indications that XPF has an additional role in genetic recombination during meiosis. Information from all of the homologs has been valuable in directing experiments to determine the specific characteristics of the XPF protein. We have demonstrated the predicted endonuclease activity, protein-protein interactions, and DNA binding in the recombinant XPF protein that was overexpressed and purified from E. coli. Fragments of XPF obtained during purification retain endonuclease activity. This result, together with the sequence analysis of homologous proteins, has lead us to produce a series of deletion fragments of XPF in order to identify functional domains. The use of comparative genomics coupled with protein chemistry has thus enabled us to begin defining the function of this protein that is crucial to human health.
Work was funded by NIH grants to SLM-M and MPT; research was performed under the auspices of the U.S. DOE by LLNL under contract No. W-7405-ENG-48.
Natalay Kouprina*, Lois Annab**, Michael A.
Resnick*, J. Carl Barrett** and Vladimir
Recently we demonstrated that transformation- associated recombination (TAR) in the yeast Saccharomyces cerevisiae can be used to selectively isolate single copy genes, BRCA2 and BRCA1, from total human DNA as large circular YACs (1). The TAR cloning method is based on co-penetration into yeast spheroplasts of gently isolated genomic DNA along with the vector DNA that contains 5' and 3' sequences specific for gene of interest, followed by recombination between the vector and the human DNA to establish a YAC (2). We investigated whether a single copy gene could be isolated directly from total human DNA by TAR cloning using only one piece of its sequence information. A TAR cloning vector was constructed that contained a small amount of 3' HPRT sequence and an Alu repeat. Transformation with the vector along with human DNA led to the selective isolation of large circular YACs containing the entire HPRT gene. YACs up to 400 kb were generated that extended from the unique position of 3' HPRT to various Alu's similar to "genome walking". Based on transfection of the NeoR-retrofitted YAC clones into mouse cells, the YACs contained the functional HPRT. Use of the common Alu repeat as a second targeting sequence greatly expends utility of TAR cloning. We propose that a new TAR cloning approach is readily applicable to direct isolation of single copy genes from mammalian genomes.
(1) Larionov, V., Kouprina, N., Solomon, G, Barrett, J. C. and Resnick, M. A. Direct isolation of human BRCA2 gene by transformation-associated recombination in yeast. Proc. Nat. Acad. Sci. USA, 94, p.7384-7387, 1997.
(2) Larionov, V., Kouprina, N., Graves, J., X.-N., Chen, Julie R. Korenberg and Resnick, M. A. Specific cloning of human DNA as YACs by transformation-associated recombination. Proc. Nat. Acad. Sci. USA, 93, p. 491-496, 1996.
During the last two years or so, our DOE-sponsored research led to the discovery and characterization of DNA targets for retroposon integration (1,2). These targets are most likely recognized and cut by L1-ORF2 enzymes already existing in mammalian cells and have been postulated to be hot spots for homologous recombination (3). Recently, our team in collaboration with Dr. Sun-Yu Ng, has demonstrated that the preferred integration targets are also hot spots for homologous recombination which is increased by up to two orders of magnitude in test cancer cell lines (4). In another collaborative effort, we have discovered that mobile elements integrate primarily at kinkable DNA sites (5).
We continued our repeat annotation service to the community (6), as well as, discovery and analysis of new repetitive families (7,8).
1. Jurka, J., Klonowski, P. J. Mol. Evol. 43, 685-689 (1996)
2. Jurka, J. Proc. Natl. Acad. Sci. 94, 1872-1877 (1997)
3. Jurka, J. Site-Directed Recombination. U.S. Patent No. 08/643,886
4. Ng, S.-Y., Ma, J., Jurka, J. (in preparation)
5. Jurka, J., Klonowski, P., Trifonov, E.N. Mammalian Retroposons Integrate at Kinkable DNA sites (submitted)
6. CENSOR server: http://www.girinst.org
7. Jurka, J.,Kapitonov, V.V., Klonowski, P., Walichiewicz, J. and Smit, A.F.A. Genetica 98, 235-247 (1996)
8. Kapitonov, V.V. and Jurka, J. DNA Sequence (in press)
Alexander Boitsov1,2, Boris Oskin1, Pieter J. de
Construction of large-insert libraries in bacterial hosts, such as those using PAC/BAC vectors and DH10B cells, has been limited by the inefficient transformation of Escherichia coli with large DNA molecules by electroporation. The goal of this project was to elucidate the mechanism and kinetics of the entrance of large DNA molecules into DH10B cells on exposure to electric fields and to exploit this information to increase the efficiency of PAC/BAC library construction. The essence of the approach used is the independent optimization of the three distinct stages of electrotransformation (ET): i) electrophoretic movement of DNA towards cells in cell-DNA suspension, ii) permeabilization of cell wall (production of electric pores) and iii) electrophoretic permeation of DNA molecules into the cells through electric pores. This was feasible due to the apparatus constructed in Saint Petersburg State Technical University that can generate multiple independently-regulated electric pulses. The results obtained previously with different Escherichia coli strains, other than DH10B, have revealed that the optimal time of electrophoretic permeation of DNA molecules into the cells through electric pores is proportional to the square root of the molecular length of supercoiled plasmids in the range from 7 up to 250 kb. This explains the failure of previous large DNA ET work that had been based on results with smaller DNA. To more fully characterize the ET process of DH10B cells with large DNA, we have determined the effects of each of the electric field parameters on transformation with respect to plasmid size (and topology). Plasmids ranging in size up to 250 kb have been used, and the resulting transformed molecules have been examined to verify their fidelity. The data which will be presented shall show that kinetics of DNA penetration into DH10B cells on exposure of electric field differ drastically from those obtained for other used E. coli cells (C600, HB101, K802, DH5a). First of all no clear dependence on molecule size of supercoiled plasmids was observed. The preliminary interpretation of the results is: the size of appearing electropores in DH10B is much bigger than that in other E. coli cells. It explains high ET efficiency of DH10B cells with large supercoiled plasmids which can be achieved (up to 108 transformants per mg of 250 kb plasmid) and which is, on a molar basis, only several times lower than the highest ET efficiency with small pUC18 plasmid.
Acknowledgment: This work has been sponsored in part by a DOE humanitarian grant OR0033-93CIS015 awarded to A.B. (firstname.lastname@example.org) and by DE-FC03-96ER62294 to PdJ (email@example.com)
Jonathan L. Longmire, Nancy C. Brown, Evelyn
W. Campbell, Mary L. Campbell, John J. Fawcett,
Phil Jewett, Mary Maltbie, and Larry L. Deaven
Because of the advantages of large insert size and stability associated with BAC cloning systems, we have attempted to adapt the pBelloBAC vector for use with flow sorted human chromosomes. Compared to making genomic libraries (where DNA mass is not limited) the cloning of chromosomes into the BAC vector is very challenging due to the fact that only microgram quantities of DNA can be obtained even after extensive periods of sorting (months). The technical challenges involved in making chromosome-specific BAC libraries include developing methodologies to 1) allow efficient recovery of flow sorted chromosomes into agarose plugs; 2) predictable partial digestion of small masses of chromosomal DNA embedded in agarose; and 3) improving BAC cloning efficiencies to allow construction of multiple representation libraries from microgram quantities of chromosomal DNA.
We have found that partial digestions using HindIII can be performed in a predictable and reliable manner on microgram quantities of genomic DNA by carefully controlling incubation time and enzyme concentration-to-DNA mass ratios. In addition, these small amounts of partially digested genomic DNA can be size selected using PFG electrophoresis and cloned into the BAC vector with efficiencies that are sufficient for producing multiple representation chromosome libraries (103-104 cfu per mg starting DNA; average insert size approximately 90 kb).
Improvements have also been made in the collection of flow sorted chromosomes prior to DNA isolation. The previously used method for collecting chromosomes involved sorting into an agarose-coated tube until the tube was full of sheath fluid and then harvesting the relatively small number of chromosomes by centrifugation. We have developed a new method that allows larger numbers of chromosomes to be sorted into a single agarose coated tube. This is accomplished by using a series of centrifugation, decanting, and re-sorting steps to 'stack' chromosome pellets within the tube. A final spin followed by brief melting and regelling of the agarose is used to embed the chromosomes within the agarose plug. Higher DNA yields have resulted from using the stacked pellet method. However, chromosomes were lost during this process because the yield of chromosomal DNA was not directly proportional to the number of chromosomes that were originally sorted. This loss of chromosomes during the embedding step represents the single major problem that remains to be solved in order to allow the production of chromosome-specific BAC libraries. This work was supported by the US DOE under contract W-7405-ENG-36.
N. Kouprina, M. Campbell, J. Graves, E.
Campbell, L. Meincke, J. Tesmer, N. Brown, J.
Fawcett, P. Jewett, R. K. Moyzis, N. Doggett, L.
Deaven, and V. Larionov
Transformation-associated recombination (TAR) was exploited in yeast to the selective isolation of human DNAs as circular YACs from monochromosomal mouse/human hybrid cell lines. Chromosome 5 and 16 specific YAC libraries were produced from the hybrid cell lines Q826-20 and CY18, respectively, using the F-factor based TAR vectors containing human Alu repeats. The presence of an F-factor origin in the TAR vectors provides the opportunity for transfer of generated YACs to E. coli to produce BACs. Although <2% of the DNA in the hybrid cells was human as many as 80% of the transformants had human DNA YACs based on colony hybridization of a representative number of clones with both human and mouse probes. Thus, the level of enrichment of human DNA to mouse was nearly 3,000-fold. The YAC libraries of chromosome 5 and 16 consist of 299 and 1320 clones, respectively, with an average size of 150 kb. Approximately 266 chromosome 5 and 748 chromosome 16 specific BACs were obtained after electroporation of YACs into E. coli. Based on the size of the clones generated from chromosome 5, the YAC and BAC libraries are ~0.24X and ~0.21X, respectively, in chromosomal coverage. Based on the size of the clones generated from chromosome 16, the YAC and BAC libraries are ~2.4X and ~1.2X, respectively, in chromosomal coverage. The chromosomal distribution of 41 TAR BACs each from chromosomes 5 and 16 was evaluated by fluorescence in situ hybridization. The distribution of FISH signals was random along the length of each chromosome. We concluded that TAR cloning may provide an efficient means for generating YAC/BACs from specific chromosomes.
Steve C. Mitchell, Diana Bocskai, Yicheng Cao2,
Robert Xuequn Xu, Mei Wang, Troy Moore3, So
Hee Dho4, Enrique Colayco2, Christie Gomez,
Gabriella Rodriguez, Annabel Echeverria, Melvin
I. Simon2, and Ung-Jin Kim2
The goal of the human genome project is to characterize and sequence entire genomes of human and several model organisms, thus providing complete sets of information on the entire structure of transcribed, regulatory and other functional regions for these organisms. In the past years, a number of useful genetic and physical markers on human and mouse genomes have been made available along with the advent of BAC library resources for these organisms. The advances in technology and resource development made it feasible to efficiently construct genome-wide physical BAC contigs for human and other genomes. Currently, over 30,000 mapped STSs and 27,000 mapped Unigenes are available for human genome mapping. ESTs and cDNAs are excellent resources for building contig maps for two reasons. Firstly, they exist in two alternative forms - as both sequence information for PCR primer pairs, and cDNA clones - thus making library screening by colony hybridization as well as pooled library PCR possible. We are now able to screen genomic libraries efficiently for large number of DNA probes by combining over 100 cDNA probes in each hybridization. Second, the linkage and order of genes are rather conserved among human, mouse and other model organisms. Therefore, gene markers have advantages over random anonymous STSs in building maps for comparative genomic studies.
As a preliminary work for the ongoing "BAC-EST" project, we are currently screening our human BAC libraries with thousands of cDNA probes. We have thus far used over 3,000 Unigene probes and the number will increase to 7,000 in this year. Our goal is to screen the library with at least 27,000 markers, most of which are in the form of cDNA probes. This represents at least 1 marker per every 100 kb of euchromatin regions. We plan to deconvolute the positive BACs to each marker by sorting the library into groups of BACs that are positive to specific pools, arraying each group on small hybridization filters, then hybridizing the filters with individual probes. We are also determining the end sequences for the positive BACs. BAC end sequence (BES) information can be extremely useful to precisely align these mapped BAC clones against any known sequence contigs by means of sequence match. Putative contigs or clone overlaps identified by markers or sequence match are verified via restriction fingerprint analysis. The BAC clone resources integrated with physical mapping information will be useful for building sequence-ready contigs on any chromosomal region.
N. A. Doggett, L. A. Goodwin, J. G. Tesmer, L. J.
Meincke, D. C. Bruce, M. R. Altherr, R. D.
Sutherland, U.-J. Kim, and L. L. Deaven
We have previously reported on the construction of an integrated physical map of human chromosome 16 (Doggett et al., Nature 377:Suppl:335-365, 1995). This map was constructed against a framework somatic cell hybrid breakpoint map which divides the chromosome into 90 intervals. The physical map consists of both a low resolution YAC contig map and a high resolution cosmid/P1/BAC contig map. The low resolution YAC contig map is now comprised of 900 CEPH megaYACs, and 300 flow-sorted 16-specific miniYACs that are localized to and ordered within the breakpoint intervals with 1150 STSs. (These include 200 megaYACs and 300 STSs which were incorporated from the Whitehead Institute's total genome mapping effort.) The YAC/STS map provides practically complete coverage of the euchromatic arms of the chromosome and provides STS markers on average every 78 kb. The integrated map also includes 470 genes/ESTs/exons and 400 genetic markers--as part of an ongoing effort to incorporate all available loci into a single map of this chromosome. A high resolution 'sequence ready' cosmid contig map consisting of 4000 fingerprinted cosmids assembled into contigs covering 60% of the chromosome is anchored to the YAC and cytogenetic breakpoint maps via STSs developed from cosmid contigs and by hybridizations between YACs and cosmids. Current work is focused on completing the 'sequence ready' map using a combination of cosmids and BACs. IRS-bubble PCR products from a minimally tiling set of YACs are being hybridized to the chromosome 16 cosmid library to localize cosmids to gaps in the existing map; and over 200 BACs, identified by library screening are now linked to the cosmid map. Several large contigs have been completed across disease gene regions in collaboration with several other investigators by supplying the available map resources (YACs, STSs, and cosmid contigs) and high density cosmid filter arrays for cosmid walking and YAC to cosmid hybridization experiments. The largest of these is a 4 Mb restriction mapped 'sequence ready' cosmid contig extending proximally from the p telomere. A 3.0 Mb region of this contig, extending proximally from the PKD1 locus is the substrate for a finished sequencing effort underway at Los Alamos. Supported by the US DOE, OBER under contract W-7405-ENG-36.
Yicheng Cao, Diana Bocskai, Steve C. Mitchell,
Robert Xuequn Xu, So Hee Dho#, Jun-Ryul Huh#,
Byeong-Jae Lee#, Anna Glodek*, Mei Wang,
Enrique Colayco, Gony H. Kim, Christie Gomez,
Gabriella Rodriguez, Judith G. Tesmer**, Annabel
Echeverria, Robin Hua Li, Melvin I. Simon,
Norman A. Doggett**, Mark D. Adams*, and
Extensive physical mapping efforts and advances in automated sequencing technology have resulted in the initiation of genomic sequencing of large human chromosomal regions. Currently, both NIH and DOE are supporting several centers in the U.S. to begin sequencing the genomes of human and model organisms systematically and in massive scale. In the past 1.5 years, Caltech has been building physical contig maps on the 20 Mbp region of the human chromosome 16p arm (16p13.1-11.2) jointly with TIGR and LANL as a pilot experiment to generate sequence-ready physical map using large insert human BAC libraries (A, B and C) that have been constructed at Caltech.
First, the pooled library A (with 3.5 X genomic coverage) was screened with 98 ordered STS primer pairs taken from the integrated chromosome 16 YAC-STS map constructed by LANL. In this initial screening, 77 STS markers successfully identified 184 positive BACs. Positive BACs were characterized by checking multiple single colonies per clone, and restriction fingerprint analysis. For the clones to be sequenced, FISH mapping and genomic Southern hybridization steps were added for the verification of the chromosomal localization and genomic colinearity. Inserts from these positive BACs were used for screening library B and C (approximately 10X genomic coverage) by colony hybridization. The libraries were also screened with approximately 90 Unigene cDNA probes that have been localized to the 20 Mb region, and with approximately 250 Unigene cDNA probes mapped to the other regions on chromosome 16. Shotgun clones derived from the ends of completely sequenced BACs were used as probes to efficiently identify BACs that overlap minimally with the sequenced BACs. We have thus far identified nearly 1,000 putative BACs belonging to the 20 Mb region, and more than 2,000 BACs over the entire chromosome 16. End sequences were determined from all of these BACs. The BES (BAC end sequence) have been used to align these clones against the sequenced BACs by sequence match, thus allowing rapid and precise determination of the extent of overlaps between clones. The putative overlaps from the sequence match are verified by restriction fingerprint analysis.
Currently we are building BacDB database, a modified version of ACeDB.4_1, by entering and organizing all information related to human BAC clones and physical mapping data that are available from Caltech, LANL, TIGR, as well as public resources. The database contains available BAC related data and mapping information, STSs, ESTs, BES, and completed BAC sequences. BacDB will not only serve as an integrated database for mapping and sequencing, but will be a tool for the efficient identification of sequence-ready clones.
Mark D. Adams, Steve Rounsley, Casey Field,
Jenny Kelley, Steve Bass, Brook Craven, and J. Craig Venter
Libraries constructed in BAC vectors have become the choice for clone sets in high throughput genomic sequencing projects because of their higher stability as compared to their YAC or cosmid counterparts. We have proposed the use of BAC end sequences as a primary means of selecting minimally overlapping clones for sequencing large genomic regions. A necessary prerequisite of this is the collection of end sequences from all the clones in deep coverage BAC libraries. This is now being pursued for both the human and Arabidopsis genomes. BAC vectors are based on the E. coli F-factor replicon and offer strict copy number control limiting the number of BACs to 1-2 copies per cell. However, in addition to minimizing the chances of chromosomal rearrangements, the low copy number also poses a challenge for high throughput direct sequencing of the BAC clone ends because of the difficulty in obtaining sufficient quantities of high quality template from standard minipreps.
We have developed reliable approaches for both a multiprep for BAC DNA purification and a protocol for the direct sequencing of BAC DNA using Dye Terminator chemistry. The combination of these two methods allows us to produce daily about 400 high quality BAC end sequences with an average edited length of 400 bases using 4 ABD 377 sequencers and a small team of personnel. To aid in sample tracking and high throughput processing, the prep is processed completely in a 96 well format, from clone storage and growth through DNA purification, isopropanol precipitation, and final resuspension. Sequencing reactions are also processed in a 96 well format, including the removal of excess dyes before loading onto the sequencers. Both high throughput methods are amenable to future automation. Our methods will be presented along with discussion regarding the advantages and disadvantages of various methods we tried.
Gregory G. Mahairas, Keith D. Zackrone,
Stephanie Tipton, Sarah Schmidt, Alan
Blanchard, Anne West, Joe Slagel and Leroy Hood
The STC approach has been proposed as an attractive strategy to provide a sequence ready scaffolding for the efficient and directed sequencing of the complete human genome (1). This effort has been undertaken through a collaborative effort between the California Institute of technology, TIGR and the University of Washington, and funded through the U. S. Department of Energy. The approach entails the sequencing of the ends of 300,000 Bacterial Artificial Chromosomes (BACs) that constitute a 20X deep Human DNA library to construct a sequence ready scaffold of the human genome.
At the Univ. of Washington we have assembled a high throughput automated end sequencing and fingerprinting process with its associated informatics. BAC clones are robotically inoculated from 384 well plates into 4 ml 96 well culture format, grown and the BAC DNA robotically extracted using AutoGen 740 robots. BAC template DNA from the AutoGen is then robotically transferred into 96 well microtiter plates from which DNA sequencing and fingerprinting reactions are setup. DNA fingerprinting is performed using conventional agarose electrophoresis, digestion with a single restriction enzyme (EcoRV) followed by automated imaging and analysis. DNA sequencing is performed using PE-ABD High Sensitivity dye primers and ABI 377 DNA sequencers. Laboratory protocols, automated data production, data processing, quality control measures and LIMS will be described in detail. During a 50 day period the STC laboratory sequenced 23317 BAC ends (STCs). 19224 (82.4%) where greater than 100 bp non trimmed and the average nontrimmed read length was 388 bp for a total of 7.46 Mb (.25%) of the genome. 29 % of the STCs contained repetitive DNA but less than 11% where entirely repeat. 12% of the repetitive DNA were LINE sequence, 4.6% LTR, 6.7% SINE sequence and 1.3% of the STCs contain a microsatellite or simple sequence repeat. The total G + C content was 40% and the average CpG content was .28, both expected numbers for human genomic DNA. 224 STCs had CpG scores of 1 representing CpG islands. 3103 STCs (16.8%) hit the EST, non-redundant nucleotide or Sixframe database. 1103 STCs hit the EST database (DB), 517 of which hit only the EST; 1087 STCs hit the nr nucleotide DB, 471 of which hit the nr nucleotide DB only, and 913 STCs hit the nr protein DB, 500 of which hit only the nr protein DB. 181 STCs (1%) hit all three databases, 131 hit nr nuc. and nr protein DBs, 101 hit the EST and nr prot. DBs, and 304 hit EST and nr nucleotide DBs, i.e., 4% hit more than one of these DBs and probably represent genes.
1. Venter, J. C., Smith, H. O., and Hood, L. (1996) Nature 381: 364-366
Joomyeong Kim, Ethan A. Carver, and Lisa
PCR-based methods have provided invaluable tools for the analysis of large genomic clones, such as YACs and BACs, that comprise the bulk of most existing and emerging genomic physical maps. However, applications of the available methods are often limited, or made more difficult to apply, because of the need for significant identity between primers and sequences located on both sides of a targeted site. We have developed a method that permits target sequences to be exponentially amplified, with a high degree of purity and specificity, from low-complexity templates when only a single sequence-specific anchor primer is present in the mixture. Amplification is efficiently driven by specific binding of the primer to one end of the target locus, with reverse priming initiated at nearby regions containing only a 5-6 bp sequence match with the anchor oligonucleotide. Using standard vector-derived primers, we have applied the single-primer protocol to amplify end-fragments directly from a number of different BAC-containing colonies, and have generated high quality DNA sequence information from the PCR mixtures without the need for further purification. This work introduces single-primer PCR as a simple, efficient and convenient alternative to existing methods for isolation of sequence-ready end fragments and other sequences from within BACs and other large-insert genomic clones.
Ze Peng, Steve Lowry, Duncan Scott, Yiwen Zhu,
Eddy Rubin and Jan-Fang Cheng
One of the JGI genomic sequencing targets is the distal 45 Mb of the long arm of human chromosome 5. This region was chosen because it contains a cluster of cytokine growth factor (IL3, IL4, IL5, IL9, IL12, IL13, GM-CSF, FGFA, M-CSF) and receptor genes (GRL, ADRB2, M-CSFR, PDGFR) and is likely to yield new and functionally related genes through long range sequence analysis. This region is relatively rich in disease-associated genes including susceptibility to asthma, several autosomal dominant corneal dystrophies, low-frequency hearing loss, dominant limb-girdle muscular dystrophy, Treacher Collins syndrome, and myeloid disorders associated with the 5q- syndrome.
The mapping strategy is based on a combination of both hybridization and PCR approaches. Inter-Alu fragments generated from non-chimeric YACs covering 3-5 Mb of DNA were used to isolate regionally specific P1/PAC/BAC clones. These clones were sized by pulsed-field gel electrophoresis, and their map locations were confirmed using fluorescent in situ hybridization. We estimated that approximately 60% of the clones representing a targeted region in the P1, PAC or BAC libraries were identified by this approach. Overlaps between P1s, PACs and BACs were mainly established by PCR using STSs generated from ends of clones. Contigs were further oriented using STSs developed from known genes, ordered markers, and ends of P1s, PACs, BACs and YACs. The STS content mapping also allowed us to identify new clones to fill gaps.
We have used this strategy to map 198 P1s, 60 PACs and 1,407 BACs so far in the region of 5q23-q35. These clones were linked by 1,321 STSs to form 74 contigs. The contigs are approximately 0.2-4 Mb in size. The average density of STSs is about 3.7 per 100 Kb in the proximal 20 Mb and about 2.3 per 100 Kb in the distal 25 Mb. Most STSs were derived from the ends of BACs which allowed detection of 3.4% chimeras by PCR analysis. The clone coverage of the proximal 20 Mb is over 95% and the coverage of the distal 25 Mb is between 70-80% at different locations. To date, 121 P1/PAC/BAC clones spanning the proximal 10 Mb were in the pipeline for production sequencing. This clone map with STS information is distributed through our Web site (http://www-hgc.lbl.gov/sequence-archive.html) and is being updated periodically.
L.A. Gordon, A. Georgescu, M. Christensen, S.
Ross, L. Woo, L.K. Ashworth, H. W.
Mohrenweiser, A.V. Carrano and A. S. Olsen.
We have developed an integrated genetic and physical map for human chromosome 19 consisting of metrically ordered, well-annotated cosmid/BAC contigs that provide a critical resource for sequencing.
At approximately 65 Mb, chromosome 19 represents 2% of the haploid genome; it is the most GC-rich human chromosome, suggesting an especially high gene density. The current map consists of 185 ordered cosmid/BAC contigs of average size 225 kb (range 40 kb to 3.3 Mb) spanning a total of 42 Mb, i.e. over 75% of the non-centromeric portion of chromosome 19. An additional 8 Mb of small EcoRI mapped cosmid contigs, average size 80 kb, have not yet been incorporated into the ordered map. The order of the constituent contigs have been determined by standard FISH techniques applied to a series of chromatin targets with increasing resolution. High-resolution FISH to decondensed human sperm pronuclei establishes genomic distances between 286 ordered FISH markers in 19p and q, thereby providing a "metric" framework that links the ordered contigs to the cytogenetic map. Complete digest EcoRI maps have been constructed for all contigs, which provide validation of the contig assembly and constituent clones, as well as an indication of contig size and extent of clone overlap.
The map currently includes 15 Mb of restriction mapped contigs greater than 500 kb (average size 1 Mb), which provide ideal substrates for large-scale genomic sequencing of this chromosome. About 7.5 Mb have been sequenced or are currently in the sequencing pipeline. The high depth of coverage (average 5X) and mix of cosmid and BAC clones enables selection of an optimum set of spanning clones with minimum overlap for sequencing.
The map is extensively annotated, with over 300 genes/cDNAs and 180 polymorphic markers that have been localized at the clone, and occasionally restriction fragment, level. Placement of the genetic markers in the physical map demonstrates excellent correspondence with existing genetic maps and provides relative order of many markers that cannot be distinguished by recombination. This map provides a unique resource for identification of disease genes mapped to this chromosome.
Work performed under the auspices of the US DOE by Lawrence Livermore National Laboratory under contract No. W-7405
Cliff S. Han, Mark O. Mundt, Linda J. Meincke,
Judy G. Tesmer, Robert K. Moyzis, Larry L.
Deaven and Norman A. Doggett
The framework genome-wide physical maps have largely been constructed with YAC clones. YACs however, are often unstable and chimeric, and because of the difficulty to isolate cloned YAC DNA in a pure form, are unsuitable for DNA sequencing. Therefore, BAC libraries, which retain the advantages of large insert size, are stable, and easy to manipulate, were constructed. Physical mapping with these libraries is now underway and is a critical component to large scale genomic sequencing.
Clone based physical maps have been constructed by many methods, including clone fingerprinting and STS content mapping. We are developing an alternative method which is applicable to small to moderate complexity clone libraries such as chromosome specific cosmid and BAC libraries and plasmid subclone libraries of BACs. This two-dimensional hybridization method is based on clone hybridization with pooled clone DNA as probes. The principal experimental design is as follows: First, several grids of the library are made. Second, DNA of the clones are pooled by a two-dimensional strategy (rows and columns) and purified by subtractive hybridization to remove both low abundance and high abundance repetitive elements. Third, the grids are hybridized separately with the row and column pooled DNAs. Positive hybridizing clones that are in common between the row and column pool probes will overlap with the clone that is in the intersection point with these pools. We are using the program MAP (written by Mark Mundt) to construct contigs from the two-dimensional hybridization results.
This method was tested with 350 cosmids, chosen from the chromosome 16 specific cosmid library by grid hybridization with 3 YACs. Overlaps identified by two-dimensional hybridization were confirmed by restriction fingerprinting. We are currently implementing this approach for rapid construction of contigs from a chromosome 16 specific BAC library and for the ordering of plasmid subclones of cosmids and BACs into minimal tiling sets prior to their sequencing. Supported by the US DOE, OBER under contract W-7405-ENG-36.
Norman A. Doggett, Robert D. Sutherland and
David C. Torney
ISCN 1995 established a new set of Human chromosome ideograms that comprise 850 metaphase-chromosome bands, differentiated by five shades of staining intensity [ISCN (1995), Mitelman, F. (ed), S. Karger, Basel; and Francke, U. Cytogenet. Cell Genet. 65 206-219 (1994)]. The five band shades are referred to as white, light gray, medium gray, dark gray, and black. These shades, presumably, reflect different underlying states of chromatin. The publication of the mapping of 16,000 partially sequenced genes, or cDNAs, to a framework radiation hybrid map of the Human genome has established the chromosome assignment for as much as 20% of all Human genes [Schuler et al., Science 274 540-546 (1996)]. In this work, the genes were observed to be distributed non-uniformly along the chromosomes. Because none of these cDNAs were localized to a specific chromosome band, no conclusions were drawn about gene densities in the different types of bands.
To establish a relationship between gene density and band type we have used the proportion of each type of band on each chromosome to estimate the respective gene densities, by optimizing the consistency of the model with the numbers of cDNAs on the 22 autosomes. In detail, the theoretical prediction for the number of genes on chromosome number j equals â5=1 aipij, in which the ai are gene density parameters for band type I and the pij are the proportions for the five types of bands on this chromosome. The statistical analysis used involves integrating over the posterior joint distribution for the parameters, given the linear model and the cDNA data. In one data-analysis we grouped the white and light gray bands together, assigning them a shared gene-density, and also grouped the two darkest bands together, thereby reducing the number of gene-density parameters to three. The inferred gene density for the white & light gray bands equaled 7.3/Mb, with a standard deviation of 0.3, the inferred gene density in gray bands equaled 1.45/Mb, with a standard deviation of 1.1, and the inferred gene density in gene density in dark gray and black bands equaled 0.45/Mb, with a standard deviation of 0.4. If we knew the total number, T, of genes in the Human genome, the expected gene densities are, of course, our estimates multiplied by T/16,000. Reasonable choices for T range from 60,000 to 100,000. We will present the gene density estimates for all types of bands, plus confidence intervals.
It is reasonable to conclude that the expected gene density decreases with band darkness. These results have practical implications for large scale sequencing of the Human genome and have ramifications for the study of the evolution of genomes.
Dabney K. Johnson, Edward J. Michaud, and
Monica J. Justice
As sequencing of the human genome progresses, the role of the mouse as proxy mammal for functional studies will become crucial. We at ORNL have designed a comprehensive program that will combine the power of our unparalleled mouse 'mutation machine' with our massive computational capability in bioinformatics and our integrated technology development effort in detection and analysis of new mutant mouse phenotypes. Our goal is to assign function to all the genes that reside in defined segments of the mouse genome, via efficient identification of phenotypic changes in behavior, biochemistry, and gene expression that accompany induced genetic changes.
The creation of functional maps of the mammalian genome must keep pace with the rapidly expanding physical and transcriptional maps. High-efficiency mutagenesis in mouse spermatogonial stem cells, using the supermutagen N-ethyl-N-nitrosourea (ENU) to induce point mutations in single genes, will provide the functional information for given segments of DNA sequence within the physical map. Other mutagens, for made-to-order mutagenesis, are also being tested, as are DNA-repair deficient mouse strains as mutagenesis targets. For an immediate target, we propose to 'saturate' a long and well-characterized chromosomal deletion with ENU-induced mutations to develop a gene-by-gene phenotype map of the region uncovered by the deletion. This efficient experimental protocol will generate phenotypes evident in deletion hemizygotes after only the second generation post-treatment, and all progeny in that generation will be 'color-coded' for instant genotyping. Submapping of phenotypes will be quite efficient because we have existing mutant stocks that carry many additional deletions whose endpoints nest within the large deletion. Physical and transcriptional mapping within this set of nested deletions are well under way.
We will also create similar deletion reagents in the mouse cognate for a human chromosomal region already being sequenced; good choices would be mouse chr 16/human chr 21, or mouse chr 8/ human chrs 16 and 19. A complex of nested deletions will be generated molecularly using the Cre/loxP approach, or via radiation-induced mutations in ES cells.
Pilot experiments conducted here at ORNL demonstrated that high-efficiency ENU mutagenesis generated numerous visible and lethal mutations that fall within a large deletion encompassing the p locus in mouse chr 7. These experiments resulted in the isolation and fine localization of at least seven new loci in 1244 gametes tested; since only lethal and visible phenotypes were under scrutiny, more subtle mutations/alterations were missed. We propose to extend this analysis to a second p-region deletion, p30PUb, that has been shown to contain multiple interesting phenotypes. These phenotypes have been submapped to specific intervals between deletion breakpoints, but we now must create the necessary intragenic mutant alleles for any genes in the deletion. We expect also to generate new phenotypes as we disable additional genes within the large deletion. DNA sequence of the deleted region will be determined in conjunction with the JGI to facilitate gene identification and mutational analysis.
Additional crucial components of our strategy are automation of phenotype screening by microfabrication of subdermal sensors, computer-aided imaging of both skeletal and soft tissues, and improvements in through-put for behavior-testing paradigms and GC/MS analysis of blood, urine, and breath for biochemical alterations. Changes in gene expression will be detected by the use of chip-based DNA/RNA analysis, and mass-spectrometry-based protein analysis from mutant vs normal cell populations.
We are also developing sperm-freezing protocols for mutation preservation and distribution, and efficient artificial insemination for recovery.
This work is supported by the U.S. Department of Energy FWP ERKP260 under contract no. DE-AC05-96OR22464 with Lockheed Martin Energy Research Corporation.
M.J. Justice1, E.M. Rinchik2, D.A.
S.E. Thomas1, and D.K. Johnson1
The mouse, with its powerful genetic tools and its extensive comparative molecular linkage map with the human, is a useful model organism to study mammalian gene function. Multiple mutant alleles of genes can be derived by mutagenesis that may reflect loss of function, partial loss of function, or gain of function. The entire series of alleles must be studied together to dissect the function of the gene.
Overlapping deletions obtained at albino (c) are useful genetic reagents for functional genomics. Saturation mutagenesis with ethylnitrosourea (ENU) utilizing the deletions at the c region on mouse Chromosome 7 revealed many new functional units that reflect single gene changes (Rinchik et al. 1990; Rinchik et al. 1995). The region is homologous to human Chromosome 11q13-q21, and is linked to the mouse homologue of human oculocutaneous albinism, type 1A. Because of the nature of the phenotypic screen, many of the new mutants die as embryos. The region is particularly valuable for functional studies because of the variety of genetic reagents, including overlapping deletions and point mutations, that are available. Our focus is a group of alleles isolated at a locus (axis) that affects the development of the body axis. The homozygous phenotypes of six alleles at axis that are likely to be point mutations range from early prenatal lethality to adult viability. The baseline function of axis is demonstrated by two alleles that arrest during formation of the primitive streak. However, two other alleles have a severely disorganized body axis later in development. One allele exhibits a variety of neural tube defects, including exencephaly. Together, these observations suggest that the axis locus functions in the formation of the rostral-caudal body axis and neurulation. Intriguingly, homozygotes of one allele of axis survive to adulthood, and have axial skeletal abnormalities. Complete complementation studies of these six alleles reveal complex genetic characteristics such as intragenic complementation among two of the lethal alleles and a maternal effect of the viable allele. The maternal effect is a phenotype of severe neural kinking, accompanied by somite abnormalities, suggesting again, a role in the determination or maintenance of the integrity of the body axis. The varied features of axis will provide important insights into the mechanism of gene function at this complex locus.
The chromosomal region that includes axis contains other genes that affect body axis and skeletal development. Predictions of the potential role of a gene or genes at axis will be presented, as well as a view of the functional organization and possible interactions of other loci in the region. These analyses will give us essential data for subsequent large-scale expansions of the functional map of the mouse genome in parallel with human gene maps using additional induced and targeted deletions combined with chemical mutagenesis.
E.M. Rinchik, D.A. Carpenter & P.B. Selby. Proc. Natl. Acad. Sci. USA 87, 896-900 (1990). E.M. Rinchik, D.A. Carpenter & M.A. Handel. Genetics 92, 6394-6398 (1995).
This work is supported by the U.S. Department of Energy FWP ERKP260 under contract with Lockheed Martin Energy Research Corporation.
Joomyeong Kim, Mark Shannon, Linda
Ashworth, Elbert Branscomb, and Lisa Stubbs
One of the larger syntenically homologous blocks of human and mouse genomes includes the long arm of human chromosome 19 and the proximal portion of mouse chromosome 7. As part of an extensive comparative genome mapping of human chromosome 19, we have targeted a region spanning approximately 2.5 Mb near the telomere of H19q. As a first step for characterization of this region, we have assigned several known genes and cDNA sequences to the 19q13.4 physical map. Most genes assigned to this region are C2H2-type, zinc finger (ZNF)-containing genes, which include ZNF134, ZNF154, ZIK1, C2H2-25, and one EST (T26651). These genes are all localized to a centrally-located contig (577/1514) and KRAB (Kruppel-associated box)-type ZNFs. To provide clues for the potential role of these genes, the expression-patterns of these ZNFs have been determined; most genes are expressed ubiquitously in all tissues examined but the highest expression level for each gene has been observed in different tissues. Using an interspecific backcross system, we have mapped ZNF134-related sequences in the mouse and confirmed that the mouse genome has similar zinc-finger genes clusters in proximal Mmu 7. We are currently constructing mouse BAC contigs and, at the same time, isolating the mouse homologs or related genes for the human ZNFs from these contigs.
Recently, we have also assigned the human homolog of a mouse imprinted gene, Peg3 (paternally expressed gene 3), to a contig (174) located approximately 1Mb proximal of a ZNF134 contig. Studies of several imprinted domains, including Prader-Willi and Angelman syndrome-region (H15q11-13)/central Mmu 7 and Beckwith-Wiedemann syndrome-region (11p15.5)/distal Mmu7, indicated that genomic imprinting is generally conserved among mammalian species and also that imprinted domains are large--spanning distances ranging from several hundred kilobases to two megabases. Due to these observations, we have decided to characterize PEG3/Peg3-containing regions in both human and mouse. In human, the adjacent contig to a PEG3-contig appears to have numerous ZNFs. In mouse, we have isolated two zinc finger genes located very close to Peg3 and are currently investigating the imprinting status and genomic organization of these genes relative to Peg3.
Julie R. Korenberg, Xiao-Ning Chen, Steve
Mitchell, Rajesh Puri, Zheng-Yang Shi and Dean
Chromosome duplication is a force that drives evolution. We now suggest that this may also be true of the primates and that the resulting duplications in part determine the spectrum of human chromosomal rearrangements. To investigate the existence and origin of duplications in the human genome, and their consequences, 5,000 bacterial artificial chromosomes (BACs) were mapped at 2-5 Mb resolution on human high resolution chromosomes by using fluorescence in situ hybridization. A subset of 469 of these was defined that generated two or more signals, excluding those located in regions of known repeated sequences, viz., the regions of centromeres, telomeres and ribosomal genes. Although a subset of these multiple site BACs represent the chimeric artifacts of cloning, derived from two different chromosomal regions, others reflect regions of true homology in the human genome.
Two questions were considered; first, the extent to which the multiple sites of hybridization of single BACs within single chromosomes reflected the breakpoints of naturally occurring human inversions, and second, the extent to which these same multiple hybridization points reflected the chromosomal inversion points in primate evolution.
For human inversions, the results of the analyses revealed a total of 124 BACs (2.5%) mapping to two or more sites on the same chromosome, of which 81 (65%) mapped to one of 27 distinct human inversion sites, the largest share of which recognized the well-established pericentromeric inversions of chromosomes 1, 2, 9, and 18, as well as the paracentric inverted region of chromosome 7q11/q22. From this, we infer that meiotic mispairing involving the homologous regions may be responsible for the inversions.
With respect to primate evolution, a significant proportion of inversion breakpoints that characterize the chromosomal changes seen in the evolution of the great apes through man, are also reflected in the distribution of BAC multiple intrachromosomal sites. Further analyses of the 29 independent BACs recognizing the pericentromeric region of human chromosome 9 suggest at least three classes, two of which recognize only single sites in Pan troglodytes. These data suggest that inversions occurring through primate evolution may generate small duplications that, although they can cause chromosomal imbalance in single individuals, they also provide the additional genetic material for speciation.
Marion D. Johnson and Jacques R. Fresco
An in situ methodology employing solution conditions has been developed for binding oligodeoxyribonucleotide 'third-strands' to chromosomal DNA targets in non-denatured protein-depleted metaphase spreads and interphase nuclei. Third-strand in situ hybridization (TISH) was performed on slides at pH 6.0 using a dual psoralen-and biotin-modified 17-nt pyrimidine-rich third strand to target a unique multicopy sequence in human chromosome 17 alpha-satellite (D17Z1 locus). UVA photofixed third strands, rendered fluorescent by FITC-labeled avidin, are reproducibly centromere-specific for chromosome 17, and visible without amplification in human lymphocyte and somatic cell hybrid spreads and interphase nuclei. Two D17Z1 haplotypes, one positive and the other negative for third-strand binding, were identified in three combinations (+/+, +/-, and -/-). Third-strand probes specific for unique multicopy alpha-satellite targets in human chromosome X and 16 have also been developed. Similar alpha-satellite target sequences have been identified in 22 of the 24 human chromosomes, making centromere-specific chromosome identification by TISH applicable to virtually to all human chromosomes. The technology is presently being applied to chromosomes of other eukaryotes. TISH has potential diagnostic, biochemical, and flow cytometric applicability to native metaphase and interphase chromatin.
P. Scott White1, Owatha L. Tatum1,2, Larry
Deaven1, Jonathan L. Longmire1
Human genetic polymorphisms are valuable for extracting information about population structure and evolutionary histories. Clonally inherited DNA such as mitochondrial or Y chromosome-specific DNA is of particular use due to lack of recombination. Microsatellites have been used to reconstruct human evolutionary histories. Until recently, only six male-specific tetranucleotide repeats have been publicly available as PCR markers. We have developed six additional microsatellite markers using a cosmid library of flow-sorted human Y-chromosomes. These microsatellites are tetranucleotide (GATA)n repeats of nine to twelve repeat units each, and are polymorphic among unrelated individuals. All markers were analyzed using Applied Biosystems Genescan fluorescent fragment sizing. At least three alleles were identified for each marker when diverse genomic DNAs were used as PCR template. The allele sizes range from 162 to 372 nucleotides. An additional marker was identified, having polymorphic alleles in both males and females of sizes 217-237. These markers fall into sets of non-overlapping alleles that allow for efficient gel multiplexing with fluorescent dyes. Because six of these markers are male-specific or have male-specific alleles, they are valuable for evolutionary and population studies where non-recombining DNA is desired.
Rubin, E.M., Zhu, Y., Fraser, K., Ueda, Y.,
Smith, D.J., Symula, D., Cheng, J.F.
Libraries of all or part of the mammalian genome have been propagated in single cells and have been used as tools in gene discovery through in vitro analyses. We have expanded upon this concept by the creation of panels of YAC and P1 transgenic mice containing defined contiguous regions of the mouse or human genome. Since each library member contains a large (80-700 kb) transgene, together several megabases of contiguous DNA from a defined region of the genome can be propagated using the mouse as a host. We have successfully used such libraries to sift through large genomic regions and to localize and clone genes based on phenotyping members of the library. Examples of our use of these libraries to link sequence with function include:
(1) Biological annotation of human 5q31 genomic sequence data. Computational analysis of 1.2 Mb of sequence from human 5q31 generated as part of the JGI sequencing program has revealed multiple putative genes and exons. As a tool to validate the computationally predicted novel genes in this region, and to determine their site and timing of expression, we have created and analyzed a 1.5 Mb in vivo library of human 5q31 and documented the expression patterns of the newly identified genes in the YAC transgenics.
(2) Fine mapping a 5q31 QTL for asthma. Several human studies have mapped a major QTL determining IgE levels in asthmatics to 5q31. Through analysis of IgE levels in members of the 5q31 in vivo library following an airway irritant we have identified a single YAC noted in two separate founder lines to be associated with a marked decrease in IgE levels.
We are in the process of looking at mice containing fragments of this YAC to move from this disease associated 5q31 QTL to the causative gene.
(1) In vivo complementation for cloning mouse mutations. We developed an in vivo library of the 550 kb region to which the mouse recessive neurological mutation vibrator had been localized by meiotic mapping. Through the in vivo complementation of the phenotype with a member of the library, we have been able to fine map and than clone the gene (PITP-N) responsible for the disorder.
(2) Identifying genes contributing to defects in cognition on chromosome 21. We have created a 1.8 megabase in vivo library of human chromosome 21q22.2 in a panel of YAC transgenic mice. Analysis of these animals with regard to learning and behavior identified a 550 kb YAC responsible for specific deficits. Through fragmentation of the YAC we have been able to identifying a gene (GIRK) whose altered level of expression is responsible for behavioral abnormalities in mice, a gene whose altered expression has also been linked to learning defects in Drosophila.
These studies on panels of transgenic mice containing large inserts have effectively enabled phenotypic assays at the organismal level to be performed on many genes at once. This, in essence, constitutes a multiplex analysis that permits increased throughput of data collection relating sequence to phenotype.
Y. Ding, M. Johnson, Y. Chen, J. Colayco, J.
Melnyk, S. Khan, D. Gilbert, and H. Shizuya
Bacterial Artificial Chromosomes (BACs) are extensively used for large-scale mapping and sequencing. In order to identify and characterize BAC clones for contig assembly, we have developed a rapid fingerprinting method using fluorescently labeled dideoxyadenosine triphosphates ([F]ddATP). Taq FS incorporates [F]ddATP at the first nucleotide of a 5' overhang generated by Hind III digestion. The labeled fragments are further digested by four-base cutters to generate even smaller fragments (less than 500 bp) for visualization on sequencing gel. The reaction uses [F]ddATP labeled with one of three mobility-matched fluorescent dyes for fragments of each four-base cutter. They allow to multiplex the analysis in a single gel lane. The fourth color in the same lane is used to identify the location of the fragments of known molecular weights and then to calculate the size of each fragments based on that result. Fragment size data assigned by Genescan (ABI) are converted into FPC (Fingerprinted Contigs, Sanger Center, UK) format and then electronically transferred to FPC for automatic contig assembly. We have tested the system on one of the most well characterized BAC contigs of human chromosome 22; 96 BACs from the chromosome 22 regions where large-scale sequencing is underway. Without manual intervention, the system did not produce false overlap in all 96 BAC clones, and assembled 16 contigs and 31 singletons. The accuracy of overlaps constructed by the method is comparable to that established using BAC end sequence, and compared with completely sequenced BACs.
Fragments patterns determined by this method provide each BAC with digital fingerprint, and may be stored for searching clones with minimal overlap, and for identification of clones.
Y. Sheng, Y-J. Chen, C. Neal, and H. Shizuya
Human BAC clones have been extensively used in a variety of research areas of Human Genome Project because of their stability in E. coli, easy handling of BAC DNA, and relatively large insert size. Over the past four years, we have generated over 400,000 BAC clones from two individuals, and arrayed them in 384-well microtiter plates to organize these clones. Recently we initiated a new round of construction of BAC libraries for the community involved in the high throughput sequencing efforts. In order to build high quality libraries, we have implemented many checks and extensive experiments to test for the degree of representation and the degree to which BACs accurately reflect the human genome. We extend our improved quality control procedures to the new library making endeavor. For these libraries, we made a new generation BAC vector, pIndigoBAC45 which gives much darker blue colored colonies on the X-gal plates. This feature enables us to identify clones with inserts more accurately, resulting low percentage of empty clones, and to shorten the time required for library construction. To estimate the number of empty wells and the number of clones that lack inserts, we stamp all the clones from each 384-well microtiter plates onto LB agar-plates. This detects mistakes made during colony picking procedure. To check for chimerism, co-habitation of wells by multiple clones, and representation of the human genome, we end-sequence BACs and carry out RH mapping based on PCR using primers designed by the sequences. In collaboration with Mark Adams at TIGR, we have sequenced thus far both ends about 1,000 BAC clones in the newly constructed BAC library. Furthermore, we plan to compare new libraries with previously constructed BAC libraries by testing with the same probes used for these libraries. We will report the progress of these library construction and discuss quality of these BAC clones.
Holger Schmitt, Mitzi Shpak, Yan Ding, Melvin
Simon, and Hiroaki Shizuya
The major goals of the Human Genome Project are the identification and the localization of 50,000-100,000 genes expected in the human genome, the generation of physical maps for each individual chromosome, and finally, the determination of the nucleotide sequence for the entire 30,000 megabase pairs of DNA. There is a clear need to develop reliable clone resources which are accurately mapped and at the same time can provide templates for direct use in large scale sequencing. A new generation of cloning system, the Bacterial Artificial Chromosomes (BACs), has been developed in our lab and is now extensively used throughout the community in a variety of research areas.
To initially demonstrate the usefulness of the BAC system for long range physical mapping, a scaffold integrated contig map of the entire long arm of chromosome 22 was constructed. Individual clones were ordered into contigs by fingerprint analysis and placed along the genomic stretch according to their content of genetic anchorpoints. The map consists of more than 700 BAC clones and spans the length of approximately 45 megabase pairs. Each BAC clone is further characterized by fluorescence fingerprinting and end-sequencing the inserts. Sequence information of the clone ends is presently used to select new clones from the 15x coverage BAC library, which extend contigs and close the gaps.
Our ultimate goal is to generate a well characterized collection of BACs, which provides complete physical coverage of the entire long arm of chromosome 22 with minimal overlaps. This map will serve as an excellent resource to discover all of the transcripts mapped on the chromosome, and can readily be used for cost-efficient genomic DNA sequencing with minimal redundancy.
Collaborators in our current mapping project are Dr. N. Blin, Univ. of Tuebingen, and Dr. E. Meese, Univ. of Saarland.
M. N. Ericson, D. K. Johnson, R. S. Burlage, T.
L. Ferrell, D. E. McMillan, K. G. Falter, A. D.
McMillan, S. F. Smith, G. E. Jellison, and C. L.
Researchers at the Oak Ridge National Laboratory are developing a highly automated integrated-circuit based research tool for subdermal monitoring of physiological parameters in mice used for gene expression studies. Application of this new instrumentation capability to genome studies will accelerate mass specimen screening by providing automated detailed observation and reporting of multiple key physiological parameters of interest. Body temperature, heart rate, physical activity level, movement trajectory, and possibly blood pressure will be measured by an implanted integrated-circuit based instrument containing multiple integrated sensors. Measured data will be transmitted periodically via wireless techniques for subsequent data processing, visualization, fusion, and storage. The integrated sensor/telemetry package will be low-cost, reusable, and sufficiently miniaturized to be directly injectable. The system will provide detailed parameter measurement and analysis capabilities not presently available to genomics researchers. Advanced multi-parametric data presentation will permit improved detail and accuracy in high-volume phenotype screening and increased detectability of subtle genetic defects. This paper will present preliminary information on parameter measurement methods, sensor selection, instrument and system architectures, instrument miniaturization techniques, and data processing methods.
P. Scott White, Michelle Petrovic, Owatha L.
Tatum, Nancy Lehnert, Usha S-Nair, Zaolin
Wang, Larry L. Deaven, and Babetta Marrone
Beryllium alloys are used in several industrial processes and products, including the synthesis of components of nuclear weapons. Chronic beryllium disease (CBD) affects a percentage of human individuals exposed to airborne particles of beryllium. CBD is an autoimmune disease affecting the lungs of susceptible individuals. Antigen-presenting cell surface proteins have been the focus of investigation into possible genetic susceptibility to this disease. It was discovered previously that certain HLA-DPß1 alleles correlated with the development of CBD. Because the suspected allele is also found in normal populations at frequencies of over 40% it was necessary to collect sequence information from more individuals. In order to determine if the implicated genotype, with a Glu-69 mutation, was the sole contributing mutation we sequenced over 30 individuals for exon II of this locus. More than 90% of disease individuals possessed the Glu-69 mutation, although most of these were in the heterozygous state. This larger data set supports previous results, but because of the high frequency of this mutation in normal individuals it will require more extensive investigation to determine if other contributing genetic factors are present. In addition to this locus, we are sequencing two other MHC Class II HLA loci, HLA-DQß1 and HLA-DRß1 for CBD correlating genotypes. Screening for susceptibility to CBD will require robust assays, and the development of a genetic screen is one goal of this research.
M.J. Paulus, H. Sari-Sarraf, D.K. Johnson, D.H.
Lowndes, M. L. Simpson, C.L. Britton, Jr., F.F.
Knapp, Jr., J.S. Hicks
The Oak Ridge National Laboratory has recently begun the development of a new high-resolution, high-throughput 3-D mouse imaging and computer-aided screening methodology to rapidly identify, quantify and record subtle phenotypes in mutagenized mice. The research goals for this program are to develop a novel x-ray imaging technology with <50 mm spatial resolution, a 3-D data acquisition time of <1 minute per mouse and an estimated cost of a few dollars per image. A key element of this new system will be a novel cadmium zinc telluride detector operating in pulse counting mode. This new detector technology provides spatial resolutions and x-ray energy discrimination capabilities unattainable with traditional x-ray computed tomography detectors. Due to its high atomic number, the new detector is also suitable for traditional nuclear medicine studies. Additionally, new image processing and pattern recognition algorithms will be developed for the system to assist researchers in identifying phenotypes. In this paper we present the objectives and approach for this research program and some preliminary data.
Jeff G. Hall, Andrea L. Mast, Victor Lyamichev,
James R. Prudent, Michael W. Kaiser, Tsetska
Takova, Bob Kwiatkowski, Bruce Neri, and Mary
Ann D. Brow
We have developed an enzymatic assay for direct, sensitive and quantitative nucleic acid detection. This assay is based on cleavage of a unique secondary structure that can be formed between two DNA probe oligonucleotides and a target nucleic acid of interest. The assay depends on the coordinate action between the two synthetic oligonucleotides. By the extent of their complementarity to the target strand, each of these oligonucleotides defines a specific region of the target strand. These regions are oriented such that when the two oligonucleotides are hybridized to the target strand, the 3' end of the upstream oligonucleotide overlaps with the 5' end of the labeled downstream "signal" oligonucleotide. The resulting structure is recognized by a structure-specific nuclease, which cuts the signal oligonucleotide to release a labeled fragment. When the reaction contains an excess of the signal molecules and is performed at elevated temperature to promote rapid dissociation and association of these molecules, the cleaved signal oligonucleotide is readily replaced by an intact copy so that the process can be repeated. In this way many signal molecules are cleaved for each copy of the target nucleic acid. The amount of target present may then be calculated from the yield of cleavage product, the rate of product accumulation and the time of incubation.
We have found that the accumulation of cleaved product is linear over both time and target concentration, with the rate of accumulation dependent on the turnover rate of the enzyme. With a turnover rate of about 35 cleavage events per minute, our current system permits signal to be amplified by more than 1000-fold, compared to single round hybridization, in 30 minutes, thus allowing the quantitative detection of sub-attomole levels of target nucleic acid.
Quantitative detection of nucleic acids in this fashion has several advantages over other methods of oligonucleotide-based detection. Foremost, because the cleavage requires the precise coordination and hybridization of two oligonucleotides, this reaction has a high level of specificity for the intended target sequence. The specificity of the detection is further enhanced because the investigator can select the site of cleavage in the signal molecule by designing the appropriate amount of overlap with the upstream oligonucleotide. The production of a discrete cleavage product of expected size allows that product to be more easily distinguished from the products of oligonucleotide destruction that may arise from thermal degradation or from nuclease contaminants in diagnostic test samples. Further, the use of oligonucleotides that are mostly or completely composed of DNA, rather than RNA, eliminates background that could arise from target-independent degradation by ribonucleases. Finally, because each cleavage event is dependent on the presence of the actual target material, and not on the products of the cleavage reaction, contamination by material carried over from completed detection reactions cannot induce additional signal in subsequent reactions. We have applied this method to direct detection of DNA targets such as DNA viral genomes, and to the detection of mRNA for monitoring of gene expression. Judicious placement of the oligonucleotide pair around splice junctions allows mRNA detection without prior destruction of genomic DNA. The products of the cleavage reaction can be analyzed by gel electrophoresis, or by non-gel methods such as capture to solid support. We will show how additional post-cleavage manipulation of the products can lower the limit of detection by 2 to 3 orders of magnitude when compared to gel-based readout.
Heinz-Ulrich G. Weier, Stanislav Volik, Jenny
Wu, Thomas Duell, Mei Wang, Ung-Jin Kim1,
Jan-Fang Cheng and Joe W. Gray
The construction of high resolution physical maps and definition of a minimal tiling path are indispensable for directed approaches to large scale DNA sequencing. Similarly, closure of gaps and completion of shot-gun sequencing projects depends on knowledge of the physical location and size of gaps. We applied 'Quantitative DNA Fiber Mapping (QDFM)', an optical procedure for mapping based on hybridization of fluorescently labeled probes on to individual stretched DNA molecules, for construction of high resolution physical maps and definition of minimal tiling paths as well as for quality control of sequencing templates derived from human chromosome 20. Digital image analysis allowed localization of probes with near kilobase(kb)-resolution in intervals of several hundred kb(1,2). When the technique was applied to construct physical maps for regions on the proximal long arms of human chromosomes 11 and 22 (3), respectively, we encountered numerous unstable yeast artificial chromosome (YAC) clones. Deletions in these YACs prohibited their use for map construction. We therefore modified our mapping scheme and prepared DNA fibers comprised of genomic DNA. This proved to be a rapid method for determination of clone overlap and genomic distances between genes or markers. Furthermore, the genomic fibers provide a 'gold standard' for validation of clones and their contigs as well as the delineation of deletions in large clones. Compared to mapping on to fibers that were prepared from cloned fragments and purified by pulsed field gel electrophoresis, genomic DNA fibers are expected to significantly shorten the mapping cycle time, because slides carrying these fibers can be prepared in large batches and stored. The use of genomic fiber slides for mapping will also facilitate the implementation of standardized protocols that are amenable to automation and increase the mapping throughput. Using clones derived from selected regions on human chromosomes 5 and 16, respectively, we are presently evaluating the utility of genomic fibers for clone/contig validation.
*Supported by a grant from the Director, Office of Energy Research, Office of Health and Environmental Research, Department of Energy, under contract DE-AC-03-76SF00098.
1 H.-U.G. Weier et al. Human Molecular Genetics 4, 1903-1910 (1995).
2 M. Wang et al. Bioimaging 4, 73-83 (1996).
3 T. Duell et al. Genomics (in press).
D. G. Albertson1, R. Segraves2, D. Sudar1,
S. Clark2, C. Collins1, C. Chen2, W.-L.
Kuo2, D. Kowbel1, S. H. Dairkee3, I.
Poole4, M. Dürst5, J.
W. Gray1,2 , D. Pinkel1,2
Gene dosage alterations underlie many diseases. For example, variations in DNA sequence copy number are associated with a significant proportion of the genetic aberrations involved in cancers, and also with certain developmental abnormalities. Comparative genomic hybridization (CGH) has proven to be an effective method for detecting and mapping these genetic alterations. In CGH total genomic DNA from a test specimen and a normal genomic reference DNA are labeled with different fluorochromes and hybridized to normal metaphase chromosomes. The ratio of the fluorescence intensities at a location on the chromosomes is approximately proportional to the ratio of sequences in the test and reference genomes that bind there. The use of metaphase chromosomes as the hybridization target has previously limited the resolution of CGH to 10-20 Mb. However, we have now implemented a new form of high resolution CGH by replacing the normal metaphase spread with an array of genomic cosmid, P1, and BAC clones (Sudar et al., these Abstracts). This approach provides a resolution at least a factor of 100 better than standard CGH, as it is determined only by the size and spacing of the target genomic clones. Thus, measurements of copy number can be made at low resolution on clones spaced at several Mb, or at high resolution using closely spaced or overlapping clones. We have performed both low and high resolution analyses of the DNA sequence copy number variation occurring on chromosome 20 in breast cancer and in pre-cancerous models. A low resolution, 'scanning' array of the entire chromosome was constructed from genomic clones spaced at ~1-3 Mb intervals on human chromosome 20. The analysis of breast tumors and breast tumor cell lines revealed at least five independent regions of copy number increase and a region of decrease on 20q that were present in various combinations. Thus, multiple, interacting genes involved in breast and perhaps other cancers may be located on 20q. Two of these regions were also recurrently present at elevated copy number in HPV-immortalized keratinocytes, suggesting that the processes of tumorigenesis in vivo and immortalization of cells in culture may proceed through common pathways of amplification and overexpression of certain genes mapping to these two loci. High resolution analysis was also performed on breast tumors with elevated copy number occurring within a ~1 Mb region at 20q13.2. The array was composed of contiguous and overlapping clones from a contig that has been constructed spanning this region (Collins et al., these Abstracts). For some tumors, a constant level of elevated copy number was observed across the region. However, in others, an abrupt variation in copy number was recorded, which mapped the boundaries of different levels of amplification to within a fraction of a BAC or P1 clone. These copy number alterations measured with array CGH are concordant with data obtained by using the array target clones as probes for interphase fluorescent in situ hybridization. However array CGH appeared to be both more quantitative and substantially faster, since only one hybridization was required to obtain the data at all loci. The increasing availability of genomic resources and the technology to print arrays with more than 104 elements/cm2 (Sudar et al., these Abstracts) make it reasonable to consider performing genome-wide analyses of copy number variation using arrays that would provide 1 Mb or better resolution for the entire human genome.
C. Collins1, J. Rommens2, D. Kowbel1, G.
Nonet1, L. Stubbs3, M. Shannon3, M.
Wernick1, J. Froula1,
G. Hutchinson4, T. Godfrey5, D. Polikoff1, T.
Cloutier1, K. Myambo1, C. Martin1, M.
Dan Pinkel1,5, D. Albertson1,5, and J.W. Gray1,5
High level amplification of chromosome 20 band 13.2 occurs in 10% of primary breast tumors and correlates with poor prognosis in node negative patients. This amplification is also detected in numerous other solid tumors including bladder, brain, colon, head and neck and melanoma. We hypothesize that selection for overexpression of one or more genes encoded within this amplicon drives the 20q13.2 amplification event. To fine map the amplicon and identify the hypothesized oncogene(s) a 1.0 Mb interval spanning the amplicon was cloned in contiguous BAC, PAC, and P1 clones.
To fully explore the genomics of the 20q13.2 amplicon we have sequenced ~ 80% of the 1 Mb contig and analyzed it extensively for genes using a suite of bioinformatics tools. This combined with exon trapping and cDNA direct selection has led to the discovery of four genes ZABC1, ZABC2, NABC1, PIC1-related and a new cyclophilin pseudogene. ZABC2 may be the ortholog of the Drosophila melanogaster homeotic gene tea shirt. PIC1-related is ~ 93% identical to PIC1 or sentrin, a gene that encodes a ubiquitin-homology domain protein. NABC1 encodes a novel cytoplasmic protein of unknown function. ZABC1 is especially interesting. The cDNA sequence of ZABC1 is predicted to encode a putative transcription factor containing eight C2H2 zinc finger domains. To manage this data and make it available to collaborators we have developed a tailored ACEDB database accessible via the World-Wide Web (Davy et al, these abstracts).
A ~260 kb "minimum common amplicon" was defined by performing interphase FISH using P1 and BAC probes from the sequence-ready contig on ~300 primary tumors. ZABC1 is centrally located in the 260 kb critical region. Quantitative PCR and Northern analysis were employed to analyze the expression of this gene in breast cancer cell lines and primary tumors. Expression of ZABC1 is elevated in all cell lines and tumors in which it is amplified. High level expression of ZABC1 was found in the breast cancer cell line 600MPE, a notable finding because 600MPE is not amplified at the ZABC1 locus. This suggests alternative mechanisms may elevate ZABC1 transcripts in some tumors. ZABC1 is the only transcribed sequence identified that maps in the 260 kb critical region with a pattern of expression so strongly correlated with copy number. The murine ortholog of ZABC1 has been cloned to study its expression in murine mammary tumors and its normal spatial and temporal pattern of expression.
We have now constructed a BAC contig across the mouse ZABC1 locus. It is our goal to sequence this contig and identify conserved noncoding regulatory elements through comparative sequence analysis. BACs from the contig will also be used to determine if ZABC1 is amplified in murine mammary tumors. It is anticipated that the combination of comparative and functional genomics will elucidate key regulatory elements of the ZABC1 gene and lead to insights regarding the mechanism of amplification at 20q13.2. In addition, comparative genomic hybridization to DNA microarrays is being used to map this amplicon at unprecedented resolution revealing new regions of consistent copy number abnormalities and suggesting additional co-selected and counter-selected genes involved in the evolution breast tumors (Albertson et al., these abstracts).
Supported by grants from US DOE contract DEAC0376SF00098, USPHS grants CA44768, CA45919, CA52807 and Vysis.
SJ Lockett+, C Ortiz de Solorzano+, A
Rodriguez+, K Chino, C Fernandezo, D
Pinkel+o, JW Gray+o
The Resource for Molecular Cytogenetics is developing computer assisted microscopy and image analysis techniques to allow combined genotypic and phenotypic analysis of intact cells in tissue. The long range goal is to obtain quantitative information about the copy number of DNA sequences, levels of expression of RNA and proteins and their distributions in cells; to morphologically characterize cells, nuclei and other organelles, and to analyze the cellular organization of tissue. This information will contribute to our understanding of functional processes in normal and diseased tissue involving candidate genes. Some of these genes will have been discovered using genome-wide surveillance techniques, such as array-based comparative genomic hybridization(CGH).
The technical procedure we have adopted for this project, which is at an early stage, is as follows: Adjacent thin (4 micron) and thick (30 micron) sections are cut from cancer specimens. The thin sections are used for standard histological staging, while the thick sections are used for quantifying the specific molecular species of interest in the individual cells. Thick sections are employed, because they contain intact cells, thus permitting accurate quantification at the individual cell level, and the cellular organization of the tissue is preserved. However, such sections require careful optimization of fluorescent in situ hybridization (FISH) and immunocytochemical procedures for labeling the particular molecular species, followed by 3D (confocal) microscopic image acquisition and 3D image analysis. We have developed several image analysis programs for this project. The first interactively enumerates punctate FISH signals in the individual intact nuclei of thick sections, and the second registers adjacent thick and thin sections. It is thus now possible to use this procedure to study the relationship between the copy number of specific genomic sequences in individual cells and histological stage of the tissue. In a preliminary application to a breast biopsy specimen, we demonstrated that histologically benign regions contained cells with two copies of a chromosome 1 alpha-satellite probe, whereas invasive cancer regions had variable copy numbers per cell of the probe. We are continuing these experiments in order to determine the degree of genetic heterogeneity in tumor cells and surrounding histologically normal tissue.
The image analysis programs mentioned above limit the procedure to the enumeration of punctate FISH signals in each cell. In order to expand the range of applications, we have developed algorithms for segmenting individual cell nuclei within thick sections. This enables quantification of diffuse molecular markers inside nuclei, the size and shape of nuclei, the spatial positions of FISH signals inside the nuclei, and the spatial relationships of cells to each other in the tissue. The input to the algorithms is a 3D image of nuclei labeled with a DNA counterstain. The first of the algorithms automatically thresholds the image into regions of background and nuclei. The next algorithm is a custom-designed, interactive 3D rendering program, which the analyst uses to inspect each nuclear region and indicate if each region is a single nucleus or a cluster of nuclei. Clusters are then divided by an automatic algorithm, which employs a variant of the Hough transform to shrink nuclei and consequently separate them. The resulting divided regions are presented to the analyst, who indicates if they are nuclei, are still clusters, or have been incorrectly divided and should be rejoined. The alternating steps of automatic cluster division and human inspection are repeated as many times as necessary to correctly segment all nuclei in the image.
This work was supported by the US DOE contract DEAC0376SF00098, NIH grant CA67412 and a contract with Carl Zeiss Inc.
David C. Schwartz, Thomas Anantharaman, Christopher Aston, Ginger Clarke,
Stephanie Delobette, Eileen Dimalanta, Joanne Edington, Ariella Evenzahav, Veronica Gibaja,
Yuzhi Gao, Joe Giacalone, Catharina Hiort, Edward Huff, Junping Jing, John Lai, Ernest Lee,
Jieyi Lin, Bud Mishra, Lei Ni, Brett Porter, Rong Qi, Arvind Ramanathan, Jason Reed, Akhtar
Samad, Alex Shenker, Yianni Skiadas, Hui Wang, Jonathan J. Vafai, Weining Wang, Hongjuan
Current molecular biology techniques were developed primarily for characterization of single genes, not entire genomes, and, as such, are not ideally suited to high resolution analysis of complex traits and the molecular genetics of very large populations. Despite rapid progress in the human genome project effort, there is little doubt that radically new conceptual approaches are needed before routine whole genome-based analyses can be undertaken by both basic research and clinical laboratories.
Physical mapping of genomes, using restriction endonucleases, has played a major role in the identification and characterizing various loci, for example, by aiding clone contig formation and by characterizing genetic lesions. Restriction maps provide precise genomic distances, unlike ordered sequence-based landmarks such as Sequence Tagged Sites (STSs), that are essential for optimizing the efficiency of sequencing efforts, and for determining the spatial relationships of specific loci. When compared to tedious hybridization-based fingerprinting approaches, ordered restriction maps offer relatively unambiguous clone characterization that is useful in contig formation, establishment of minimal tiling paths for sequencing, and preliminary characterization of sequence lesions. In addition, such maps provide a useful scaffold for sequence assembly, often critical in the final sequence finishing stage. Despite the broad applications of restriction maps, the associated techniques for their generation have changed little over the last ten years, primarily because they still utilize electrophoretic analysis. To help overcome these shortcomings, our laboratory developed the first practical non-electrophoretic genomic mapping approach, Optical Mapping, to meet this need.
Optical Mapping is a single molecule methodology for the rapid production of ordered restriction maps from single DNA molecules. Ordered restriction maps were constructed originally from yeast chromosomes by imaging restriction endonuclease cutting events on single, stained DNA molecules with fluorescence microscopy. Cut sites appeared as gaps that widened as the DNA fragments relaxed. Maps were then constructed by measuring fragment sizes via relative fluorescence intensity or apparent length measurements. Modern Optical Mapping technology uses aminosilane treated surfaces to adhere molecules prior to digestion. In this way, multiple samples can be robotically gridded onto a single surface and digested in parallel. Deposition techniques developed in our laboratory elongate and fix molecules to these surfaces, while retaining biochemical accessibility of samples. Following staining with a fluorochrome, cleaved molecules are imaged by a fully-automated microscope system, developed in our laboratory. Importantly, cleaved molecule fragments retain their order, facilitating fragment sizing and obviating complicated schemes to re-establish fragment order. The final map is of course, a very informative ordered restriction map instead of a mere fingerprint.
Intensive effort in our laboratory have been directed to the development of machine vision systems, and map construction algorithms to automatically construct maps from images of digested molecules. This analysis is based on Bayesian inference techniques and enables the construction of maps from noisy data. For example, the map construction algorithms can produce maps from a population of partially digested clone molecules, (BACs, cosmids, phage) having a digestion rate as low as 15%.
Using the approaches discussed above, this laboratory has generated ordered restriction maps for the Beckwith-Wiedeman locus in humans (in collaboration with Dr. D. Housman's group at MIT), the Brca2 locus (in collaboration with Dr. S. Fisher's group at Columbia University), and the mouse olfactory locus (in collaboration with Dr. R. Axel's group at Columbia University). Optical Maps are currently being generated from phage, cosmid, YAC and Bacterial Artificial Chromosome (BAC) clones. Our laboratory has been extensively mapping BAC contigs derived from the human Y chromosome, in collaboration with Dr. David Page's group at MIT. The aims are to disambiguate clones and markers to provide the basis from which to critically understand the functionality of many loci, and provide a scaffold for sequence assembly. The detailed analysis of the Y chromosome requires such detailed maps, since it is highly punctuated with repeated sequences, that frequently confound traditional techniques of characterization.
More recent efforts have been directed at the high resolution mapping of bacterial (in collaboration with Dr. O. White, TIGR) and parasite genomes (in collaboration with Drs. M. Gardner, D. Carucci; TIGR, Naval Medical Research Institute) to provide high resolution scaffolds for facilitated sequence assembly and verification. Here, we have been using high molecular weight DNA gently extracted from cells, entirely obviating the need for the mapping of clonal material. We have used Optical Mapping to generate physical maps of two microbial genomes. Nhe I maps of the E. coli (4.6Mb) and Deinococcus radiodurans (3.1Mb) genomes were generated from chromosomal DNA, obviating the use of clones for the construction of primary maps. DNA samples, prepared from gel inserts, were fixed onto derivatized glass surfaces and molecules as large as 2.4 Mb were measured. Co-mounted lambda bacteriophage DNA was used as a sizing standard and to estimate cutting efficiency. Contig maps were created by aligning maps from multiple molecules. To benchmark our system, we compared the E. coli Nhe I optical map with the map predicted from the published sequence. The 150 fragment optical map had average fragment size 30 kb and a relative sizing error of 5 per cent for fragments >5 kb. We then generated a whole genome Nhe I map of the D. radiodurans genome. The final map was assembled without gaps at an average depth of 35X, using 157 molecules with an average restriction fragment size of 29 kb. This map will significantly aid sequence assembly and verification to collaborators at TIGR, sequencing this microorganism.
Full automation of Optical Mapping holds enormous promise for miniaturization, with expected increases in throughput and reductions in cost. Thus, advantages of Optical Mapping include high throughput and resolution, safety, and low cost. Compared with traditional electrophoresis-based methods, Optical Mapping produces information rich physical maps for whole genomes with much higher a resolution. High throughput and the obviation of clones makes Optical Mapping ideally suited for population-based genomic studies.We expect that the advantages of Optical Mapping will facilitate closure of the initial objectives of the human genome project, and aid in reducing costs associated with the sequencing of microbial genomes.