|Genome Mapping Abstracts
DOE Human Genome Program
|Author Index||Sequencing Technologies||Microbial Genome Program|
|Search||Mapping Index||Ethical, Legal, & Social Issues|
|54. Third-Strand Binding Probes
for Duplex DNA in Particles of Varying Size
Third-strand oligonucleotide binding to native double-stranded DNA in some free form or in chromatin via triple helix formation provides a sequence specific way to attach ligands, including cytochemically detectable ones, to DNA-containing structures. Last year we described the development of an in situ methodology for identifying metaphase chromosomes. We particularly demonstrated binding of a highly specific fluorescent third-strand probe directed to a 16 base pair DNA -satellite target of human chromosome 17. We continue to exploit this type of interaction for various purposes of relevance to the Human Genome Project.
Some of our recent efforts have been directed at:
1. Developing probes specific for the a-satellite regions of human chromosomes X and 16. Such probes expand our capacity to cytogentically identify and isolate individual human chromosomes.
2. Exploiting such third strand probes to greatly facilitate isolation by flow sorting of individual human chromosomes and possibly chromosomes of other species.
3. Developing fluorescent-labeled probes to identify accessible multicopy sequences within the Drosophila histone gene cluster. These probes are being employed to locate such target sequences in Drosophila ovarioles.
To assure these ends, we have also investigated the consequences of DNA structure and size on the mechanism and equilibria of probe binding.
55. Optical Mapping: A Complete System For Whole Genome Shotgun Mapping
T. Anantharaman2, J. Apodaca1, C. Aston1,
V. Clarke1, D. Gebauer1, S. Delobette1,
E. Dimalanta1, J. Edington1, A. Evenzehav1,
J. Giacalone1, V. Gibaja1, C. Hiort1,
E. Huff1, J. Jing1, Z. Lai1, D. Lazaro1,
E. Lee1, J. Lin1, K. Lin1, B. Mishra2,
L. Ni1, S. Paxia2, B. Porter1, R. Qi1,
A. Ramanathan1, Y. Skiadis1, J. Vafai1,
W. Wang1, H. Zhao1
Optical Mapping is a single molecule approach for the rapid production of ordered restriction maps from individual DNA molecules. Fluorescence microscopy is used to directly image individual DNA molecules bound to derivatized glass surfaces, and cleaved by restriction enzymes. Fragments retain their original order, and cut sites are flagged by small, visible gaps. The Optical Mapping system has advanced in several critical areas to emerge as a means for the detailed mapping of both clones and entire genomes (Deinococcus radiodurans and Plasmodium falciparum). We mapped these entire microbial genomes using megabased-sized genomic DNA molecules (600-10,000 kb). Because large fragments of randomly sheared DNA are mapped with high cutting efficiency, many overlapping restriction site landmarks allow contigs to be assembled and a shotgun mapping strategy can be employed. High resolution whole genome maps can therefore be assembled without library construction and associated cloning artifacts. Because ensembles of single molecules are analyzed, small amounts of starting material are required enabling mapping of microorganisms, which are problematic to culture. Whole genome maps enable the size of the genome to be accurately determined, an important prelude to any sequencing endeavor. Most importantly, whole genome maps from genomic DNA provide an in situ picture of the architecture of the entire genome, revealing the number of chromosomes, existence of extrachromosomal elements etc. Populations can be potentially be characterized by comparing maps from different strains. Recent efforts have been to create high resolution maps of E. coli O157 strain (5.4 mgb) as a scaffold for facilitated sequence assembly and verification (Collaborator: F. Blattner, U. Wisconsin). We will compare maps generated "in silico" from the sequence of E. coli K12 (4.6 mgb) to identify regions that are unique to O157 and could be targeted for sequencing. Given the success we enjoyed in the restriction mapping of whole microbial genomes, and the proven reliability of the contig assembly algorithms developed for these efforts, we decided to construct a reference restriction map of the entire human genome. In four weeks our laboratory mapped 0.6 human genome equivalents at 40 kb resolution, using genomic fragments with average size of 2.1 mb. Our analysis of the contigs formed showed good correspondence with suitably modified Lander-Waterman physical mapping criteria in terms of the number and depth of overlapped genomic fragments. Goals are to simultaneously complete the human reference map to include 10-15x coverage and to link with other physical maps by the alignment of restriction mapped BAC contigs. The utility of this map will be to facilitate large scale sequencing projects and to provide a novel resource for the analysis of large populations.
56. Verifying Sequence By Atomic Force Microscopy
David P. Allison and Peter R. Hoyt
Atomic force microscopy (AFM) technology can be developed, as a sequencing alternative, to verify homologies between DNA species, accurately, inexpensively, and with high-throughput.
By forming heteroduplexes between sequenced and test molecules, deletions, substitutions, and perhaps even point mutations, can be imaged and precisely located by AFM.
Using AFM imaging we have identified deletions of 22 to 450 bp in heteroduplexes of linearized mutant and wildtype pSV-ß-Galactosidase plasmid (6821 bp). Additionally, utilizing an AFM technique we developed for mapping cosmids, which employs imaging a cleavage deficient mutant EcoRI endonuclease site-specifically bound to active sites, we have simultaneously located deletions relative to the EcoRI sites on the pSV-ß-Galactosidase heteroduplexes.
We have imaged the specific binding of mismatch repair enzymes to heteroduplexes between wildtype and plasmids with point mutations. Conditions to maximize binding efficiency and define specificity for all combinations of mismatches are being evaluated on plasmid constructs.
When developed this technology could provide high-throughput sequence verification of heteroduplexes generated from long range PCR or clone libraries. Furthermore these procedures can be accomplished by technicians, use readily available relatively inexpensive instrumentation, and should be fully transferable to most laboratories.
57. Molecular Cytogenetics Comes of Age: A Resource that Extends From "T" to Shining "T"
J. R. Korenberg1, X.
N. Chen1, D. Noya1, X.Wu2, B. Birren2,
With the maturation of the DNA sequence of the human genome, it becomes necessary to link this sequence to the language of clinical medicine. Therefore, to provide this bridge and at the same time, to anchor the genetic map to the chromosomal map, a genome-wide resource of bacterial artificial chromosomes (BACs) carrying defined Genethon polymorphic markers has been defined.
Using PCR, 17,200 BAC DNAs were screened using a five-dimensional pooling scheme. Positives were confirmed by PCR, linked to a mapped BAC array and/or streaked and re-confirmed by FISH (fluorescence in situ hybridization) using fluorescence reverse banding at the 500-700 band stage. All data were recorded in a Fourth Dimension relational database and images archived on a gigabyte optical disc system.
The resource is now represented by a BAC/STS map representing all human chromosomes, and includes 860 STS/BAC combinations, representing 882 total markers, each BAC mapping to a single chromosome sub-band. Of these 648 carry a Genethon markers, 84 carry ESTs, 42 carry known genes, 9 carry markers that were unmappable by any previous technique, and 108 carry random markers. A further 163 STS/BAC pairs mapped to one of multiple sites defined by FISH. Of the total 1122 marker/BAC pairs tested, 1023 of the chromosome assignments were in agreement; 99 were not.
This framework resource uniquely provides integration with all existing maps. It anchors early sequencing, provides rapid access to cancer and prenatal breakpoints and their candidate genes, can map new genes to single bands without FISH when integrated with RH data, and can be used as targets for CGH (comparative genome hybridization) on slides, chips or filters. The resource integrates genome with genetics and medicine. It should speed solutions for diagnosis, prognosis and ultimately treatment.
It may be viewed on http://www.csmc.edu/csri/korenberg/ and is available.
58. Automated Purification of Blood, or Bacterial Genomic DNA
Dan P. Langhoff, Tuyen Nguyen, and William
In Phase I of the SBIR research we have developed a prototype of a fully automatic high-throughput blood or bacteria genomic DNA isolation instrument. Unlike any other process currently used for genomic DNA isolation, this instrument uses a derivative of electrophoretic separation technology that was developed by our company for automated purification of plasmid DNA. The separation technique is novel and powerful in that it requires no moving parts and can be performed with the combination of a simple disposable sample cassette and an inexpensive processing instrument. The process purifies high molecular weight genomic DNA direclty from a cell lysate through the use of electrophoretic movement of the DNA which is placed in between barriers of agarose medium that are contained in a disposable cassette device. The purification method results in highly pure genomic DNA over a wide range of input sample quantities, including as low as 1000 starting cells, and it gives high DNA yields while not relying on chromatography adsorption or solvent precipitation at any step. The purified DNA is suitable for use in PCR sequencing and in RFLP analysis. The disposable sample cassette is designed to process twelve or more samples in parallel and the processing instrument holds up to several of these cassettes. The instrument will inexpensive to manufacture and it will occupy less then one square foot of laboratory bench space.
The isolation of the genomic DNA from blood, bacteria, and virus is a necessary starting point for molecular diagnosis of infection, genetic disease, inherited traits and identity determination, as well as in research applications. The ability to rapidly and reproducibly isolate DNA from blood and other bodily samples is required to identify, characterize and treat factors involved in human disease and disorders.
DOE SBIR PHASE I GRANT #DE-FG-03-98ER82612
59. New Host Strains for Stabilization and Modification of YAC Clones
Natalay Kouprina, Maxim Koriabine, and
The recent development of a new approach (TAR cloning) for the selective isolation of specific regions and genes from complex genomes as large linear or circular YACs greatly advanced YAC cloning technology1,2. While TAR cloning provides many opportunities for studying mammalian genomes, some YAC isolates containing multiple repeats may be mitotically unstable. Recently we systematically studied the contribution of several RAD genes to the stability of human YAC clones in yeast. Using a variety of linear and circular internally marked YACs, we demonstrated that rad52 substantially stabilizes human DNA inserts, decreasing YAC instability 25- to 400-fold compared to a recombination-proficient host strain. In contrast, other rad mutant strains analyzed (rad1, rad50, rad51, rad54 and rad55) had a minor affect (2- to 5-fold reduction) on YAC instability. Thus, if YAC stabilization is desired, propagation in a rad52-deficient strain is strongly advisable. However, there is no opportunity to manipulate YACs by recombination in rad52 strains. Moreover, rad52-deficient strains cannot be used for specific gene isolation by TAR cloning. Therefore, we chose to develop a rad52-based system that could be used for stable maintenance of any YAC, while providing the opportunity for recombinational manipulation. We constructed a set of kar1 strains that have a conditional RAD52 gene under the control of the galactose-inducible GAL1/GAL10 promoter. These strains are rad52-deficient on glucose-containing medium and recombination-proficient on medium containing galactose. A YAC from any genetic background can be efficiently and accurately transferred into new hosts during mating with karyogamy-deficient kar1 strains3. To expand more the utility of a new YAC transfer system, the RNA telomerase gene TLC1 in RAD52-conditional strains was modified to produce (TTAGGG)n repeats specific to human telomere sequences4. The kar1-induced transfer of a YAC into the strains with the modified TLC1 gene resulted to replacement of yeast-specific telomeric repeats by human-specific repeats in the YAC.
1Larionov et al. (1997)
Proc. Natl. Acad. Sci. USA 94: 7384-7387.
60. Direct Isolation of a Centromeric Region from a Human Mini-Chromosome by in Vivo Recombination in Yeast
Natalay Kouprina, Motonobu Katoh,
Mitsuo Oshimura, and Vladimir Larionov
Isolation of specific chromosomal regions and entire genes has typically involved cloning of random fragments as BACs or YACs followed by a long and laborious process to identify the region of interest. Using the recently developed TAR cloning technique in yeast1, it has been possible to directly isolate specific chromosomal regions and genes from complex genomes as large linear or circular YACs. In this study we applied a modified version of this technique2 for isolation of a centromeric region of the human mini-chromosome D1 containing 5 Mb of the human chromosome Y3. This mini-chromosome was generated by two rounds of telomere-directed chromosome breakage leading to a loss of sequences from both arms of the chromosome. Despite the small size and loss of a significant part of centromeric repeats (there is only 140 kb of alphoid DNA left), the D1 mini-chromosome segregates accurately in mitosis, suggesting that a 140 kb block of alphoid DNA alone or along with the short arm flanking sequences is sufficient for a centromere function. Taken in advantage that the first round of chromosome Y breakage resulted in truncation of the chromosome within a block of alphoid DNA (i.e. a new telomere and a block of alphoid DNA became physically linked), we developed a scheme to isolate a centromeric region from the mini-chromosome D1. Direct transformation of genomic DNA isolated from hybrid cells carrying the mini-chromosome into yeast spheroplasts resulted in a rescue a centromeric region as a set of linear YACs with sizes from 50 kb to 300 kb. To prevent YAC rearrangements due to the presence of multiple repeats, the isolates were maintained in the host strain with the conditional RAD52. Each YAC isolate containing an entire block of alphoid DNA (i.e. YACs bigger than 140 kb) was circularized, retrofitted into BAC with the NeoR mammalian selectable marker and accurately transferred into the E. coli cells. Since no detectable changes in YACs were observed after retrofitting to BACs, the BAC DNAs were transferred into human cells for further functional analysis.
1Kouprina et al. (1998) Proc.
Natl. Acad. Sci. USA 95: 4469-4474.
61. Insert Clone Selection by Sorting GFP-Expressing E. coli
Juno Choe and Ger van den Engh
At a previous DOE contractors meeting (Santa Fe, 1995), we proposed a method for insert clone selection by sorting Green Fluorescent Protein-containing E. coli into individual culture wells. Direct selection of insert-containing bacteria with a cell sorter abolishes the need for clone picking. We are now using this approach in an automated, integrated process for large-fragment DNA subcloning.
The process utilizes a vector in which insertion of DNA at a cloning site causes GFP expression. Insert-containing bacteria are selected in a cell sorter and deposited into 10 microliter wells. The vector may be amplified either by culturing the bacteria or by PCR. The amplified product may be used as a template for fluorescent dye-based sequencing reactions. Amplification by PCR avoids the need for a DNA extraction step.
We have now demonstrated proof-of-principle for several steps of the process. We have created a vector with a GFP gene downstream of a strongly regulated promoter. The plasmid contains a Lac repressor gene. Insertion of a DNA fragment inside the repressor gene disrupts its function, resulting in GFP expression. We currently have vector strains with Blue as well as Green Fluorescent Proteins. The green protein offers the best signal to noise ratio of the two. Individual bacterial clones containing inserts can easily be detected and sorted. Results of insert amplification by PCR from single-sorted bacteria will be presented.
62. A Resource of Mapped BAC Clones for Identifying Cancer Chromosome Aberrations
Norma J. Nowak1, Jeffrey
Conroy1, Greg P. Caldwell1, Joseph Catanese1,
Barbara Trask2, John D. McPherson3, David R. Bentley4,
Grace Shen5, and Pieter J. de Jong1
We are generating a resource of mapped BAC clones from the arrayed human BAC library (RPCI-11) for use in fluorescent in situ hybridization (FISH) analysis of chromosomal rearrangements in human tumors. This work is performed under the auspices of the NCI Cancer Chromosome Aberrations Project (CCAP). Our goal is to establish mapped BAC clones by screening 6-fold redundancy of the BAC library with 4,000 markers mapped at high resolution through radiation hybrid (RH) panels. Markers are judiciously selected spaced less than 1 Mb with preference for markers mapped to both high and low-resolution RH panels. To increase the success rate of the marker to BAC correlation, we are employing overlapping oligonucleotide probes ("overgo") based on the EST and STS sequences for the markers. The overgos are designed with the same average melting characteristics and are labeled by replicating the 5'overhangs using P-32 nucleotide triphosphates. To increase the throughput of the screening process to high density BAC colony membranes, the probes are pooled in mixtures of 36 probes each. Informative probe mixtures are prepared through the use of a 3-dimensional (6x6x6) probe pooling strategy consisting of 216 distinct probes. All predicted probe-BAC pairs are being confirmed by PCR and the expected 4-6 overlapping BACs for each marker are subsequently validated by restriction digest fingerprinting. In our initial mapping effort, we recovered BAC clones for 586 genetic markers on chromosomes 1, 5, 6, 18, 19, 21, 22 and Xp. After completion of three rounds of screening (736 markers), we have attained an average 2.5 Mb level of resolution and 4.2 BACs per marker. Overgos for the remaining Sanger framework markers are being designed and all mapped clones along with corresponding mapping information will be deposited in the public domain. Up to one BAC clone per marker will be characterized by in situ hybridization experiments to establish its usefulness as a FISH probe and to provide additional independent confirmation of the map location.
Work supported by NCI, DOE and the Wellcome Trust.
63. Preparation of New BAC Vectors for BAC Cloning and Transformation- Associated Recombination ("TAR") Cloning
Changjiang Zeng1, Yu
Wang1, Kazutoyo Osoegawa1, Natasha Kouprina2,
Vladimir Larionov2, and Pieter J. de Jong1
Recently, efficient procedures have been reported for the re-cloning of large genomic DNA fragments in yeast as circular YACs. This approach of "Transformation Associated Recombination" cloning utilizes homologous recombination between a linear YAC vector and homologous sequences in complex genomic DNA to generate circular yeast artificial chromosomes. For this purpose, a specific YAC vector equipped with short (unique) genomic sequences from the targeted region needs to be constructed. The sequences are positioned at the ends of the linear YAC fragment used for transformation into yeast spheroplasts. To make the process of TAR rescue of genomic DNA more universally applicable, we constructed several hybrid BAC/YAC vectors designated as pTARBAC-1, -2 and -4. These vectors differ from our earlier BAC vector (pBACe3.6) by the presence of a yeast centromere (CEN3) and a yeast-selectable marker (his3). The TARBAC vectors have been used to prepare BAC libraries for several species, including Trypanosoma brucei, Giardia and Cat. Clones lacking inserts do not generate viable yeast colonies after transformation of yeast spheroplasts and selection for his-function. However, most (25out of 30) of the Trypanosome BACs with 130 kb average inserts, transform yeast at high efficiency, indicating the presence of ARS elements in most of the genomic insert fragments. Most of the insert sequences in the Trypanosome BACs can be deleted by treating the BACs with a restriction enzyme (e.g. EcoRI) which lacks corresponding sites in the TARBAC vector. Such EcoRI-deleted BAC clones are functionally similar to the previous TAR-rescue vectors because they have (unique) genomic sequences at the ends of a hybrid BAC/YAC vector. Hence, most of the BAC clones in a TARBAC library can be used to generate TAR-rescue vectors to (re-)clone genomic segments from different haplotypes or from related species. We have confirmed that most of the deleted TARBACs have also lost the ARS elements as indicated by the loss of their capability to successfully transform yeast spheroplasts. We are currently exploring the re-isolation of the deleted sequences by co-transformation of deleted BAC DNA with genomic Trypanome DNA. The less-complex genomes of unicellular eukaryotes are used to model future work with mammalian TARBAC libraries. Information on our current libraries can be obtained from our Web page: http://bacpac.med.buffalo.edu.
* Supported in part by grants from the U.S. DOE , NHGRI and the German Forschungs Gemeinschaft (DFG).
64. "RPCI" Human and Mouse Bacterial Artificial Chromosome Libraries: Construction and Characterization
Kazutoyo Osoegawa1, ChungLi
Shu1, Baohui Zhao1,
Minako Tateno2, Eirik
Frengen1, Joseph J. Catanese1,
Yoshihide Hayashizaki2 and Pieter J. de Jong1
Human male and female bacterial artificial chromosome (BAC) libraries have been constructed in the pBACe3.6 vector in compliance with the new NIH-DOE guidelines on anonymous donor selection and informed consent. A 25-fold genome equivalent male BAC library (RPCI-11) was constructed by partial digestion with a combination of EcoRI and EcoRI methylase. The library has been distributed to 30 genome centers as a major resource for the human genome sequencing effort. The average insert size was estimated to be 173 kb. The average redundancy of the library was determined at 23.9 positives per marker by hybridization of 45 single locus probes. All 1,076 marker-positive BAC clones were confirmed to be part of 45 single-marker contigs by restriction-fingerprinting. An additional 123 BAC clones were identified using probes derived from a minimal overlapping set of PAC clones on 14q24.3. The resulting 1.5-Mb BAC contig has been assembled by hybridization using BAC-end probes and PCR with STS markers, thus allowing the mapping of 264 markers within the contig. The genomic sequence for the contig region generated at Washington University, facilitates the analysis of the contig-integrity and allows the determination of the BAC clone fidelity. A total of 87 clones were confirmed to be non-chimeric because both insert-ends map back to the contig. No rearranged clones have been observed within the 5.5-kb STS-resolution contig map. More recently, a second BAC library (RPCI-13) has been constructed from an anonymous female donor. This library has been generated from genomic DNA partially digested with EcoRI (segment 1 & 2, 10x redundant) and partially digested with DpnII (segment 3 & 4, 10x redundant). In addition to the human BAC libraries, two approximately 10-fold redundant murine BAC libraries have been prepared and extensively characterized. The source DNA for these libraries was obtained from female mice from two inbred strains: 129SvEvTAC and C57B6 for the RPCI-21 & 23 libraries, respectively. The information on the current libraries can be obtained on our Web page at: http://bacpac.med.buffalo.edu .
* Supported by grants from the U.S. DOE (#DE-FGO3-94ER61883), NIH (#1RO1RGOl 165).
65. Characterization of a BAC Clone Resource for Human Genomic Sequencing: Analysis of 150 Mb of Human STCs and Implications for Human Genomic Sequencing
G. G. Mahairas, J. C. Wallace, J.
Furlong, K Smith, S. Swartzell, A. Keller, HTSC Staff and L. Hood
Together with The Institute for Genomic Research (TIGR), we have sequenced the BAC ends or sequence tagged connectors (STCs) from 160,000 BAC clones. We have also generated a HinDIII restriction digest for each BAC whose end sequences have been determined at the University of Washington and developed strategies and tools for using this resource in support of large-scale genomic sequencing. We have demonstrated proof of concept for its use. Together with TIGR, we propose to complete the characterization of an STC clone resource from two IRB-approved human BAC libraries to 22.5-fold clone (BAC) coverage (e.g. 450,000 BAC clones assuming an average insert size of 150 kb). These data are available on the world wide web through dbGSS and our web sites (www.genome.washington.edu and www.tigr.org) and the clones are available for distribution to the scientific community through Research Genetics. Nine hundred thousand STC sequences will provide a sequence marker of 300 to 500 base pairs (bp) on average every 3,100 bp across the genome. The BAC libraries and the data pertaining to them will enable the facile selection of minimum tiling paths of BAC clones across each of the human chromosomes for large-scale sequence analysis. Here we present data to support the STC approach for sequencing of the human genome and other moderate to large genomes. The STC approach eliminates the need for up front physical mapping and uses BAC clones as the basic sequencing reagent. The major advantages of the STC approach are: (i) reduced cost and effort to obtain complete low and high resolution maps and front end automation is greatly simplified. (ii) The BAC clones are readily available through Research Genetics. (iii) As improved techniques for generating BACs or other yet to be developed libraries appear, reasonable numbers of these new clones could easily be added to the database and clone collection. (iv) This approach will obviate the significant problem of closure for high resolution physical mapping. (v) The existing chromosomal landmarks, STS, PCR-specific sites, EST, or partial cDNA sequence, can be easily placed on the BAC clones, adding additional markers for BAC clones and taking significantly advantage of any associated biological information. (vi) The 10% of the genome obtained in the STCs can be searched against the sequence data base to identify many interesting landmarks (e.g. genes, STSs, EST, etc.) that could locate the BAC clone on the preexisting chromosomal maps. (vii) Chromosomal regions of key biological interest can be identified and sequenced first. (viii) The human genome can be sequenced earlier and for less cost. (ix) The STC approach will provide useful clones for biological studies even at the very early STC sequencing stages when only 3- to 4-fold coverage is achieved. The STC approach streamlines the task of clone selection by doing much of the work up front and by using sequence alignment and computers as the primary tools to identify sequencing targets. Additional major advantages of the STC strategy are that it is rapid in that clone selection is automated, STC data directly correlates with a clone which can be used for shotgun sequencing without further evaluation, surveys the entire genome and is more dense allowing greater versatility in the use of the data including genotyping analysis. Perhaps the greatest advantage of the STC resource is that it can be used by any investigator for clone or sequencing target selection via the World Wide Web. The STC clone library also serves as a large scale genomic survey tool and provides access to many characterized clones in any part of the genome. The implications of this type of resource transcend simple genomic sequencing. Additionally, we will describe the University of Washington High Throughput Sequencing Facility capable of producing 2 million BAC end sequences per year.
66. Human BAC End Sequencing
Shaying Zhao, Mark Adams, Bill Nierman,
and Joel Malek
BAC end sequences (BESs) provide highly specific markers. In genomic sequencing, the clones to be sequenced next can be selected by searching the completed sequence against a BES database. The average insert size of BAC clones is about 150 kb and therefore BESs are useful in chromosomal walking and assembly. End sequences from 300,000 clones (15x clone coverage) will be generated by TIGR and UofWashington. At TIGR, we have sequenced BESs from both CalTech and Pieter de Jong libraries with a successful rate >80% and an average read length of 450. The pair percentage is >65% and the average phred score is 28. We also resequence both ends of one-end-failed clones for higher pair % and quality control. For those clones we resequenced so far, the redo sequences always match the original ones. The average cost is about $4.50 per BES and $0.10 per base. We continue improving our protocol to decrease the cost and increase the successful rate and read length. Up to date, we have submitted more than 130,000 BESs to GenBank. We have collected more than 300,000 BESs from TIGR, U of Washington and CalTech for our search database at http://www.tigr.org/tdb/ humgen/bac_end_search/bac_end_search.html and ftp site (ftp://ftp.tigr.org/pub/data/h_sapiens/ bac_end_sequences/).
The finished 600,000 BESs will cover about 10% human genome and provide a sequence marker every 5 kb across the genome. BESs can be used to survey the whole genome. We searched BESs against existing databases of repeats, STSs and ESTs and the results are presented at our web site (http://www.tigr.org/tdb/humgen/bac_end_search/ bac_end_anno.html). On average 50% of BESs contain known repeats and the length ranges from 21 to 806 with an average of 185 bases. And 30% bases are repeats masked. With identity >=95%, 3 % BESs match ESTs while 0.2% match STSs which are used to locate some of the BACs on the preexisting chromosomal maps. BESs are also used to assess the representative of BAC libraries and tie up the existing contigs. We are collaborating with other institutes to map some of the BESs and the results will be presented on web.
67. Construction of a Genome-Wide Human BAC-Unigene Resource
Bum-chan Park1, Robert
Xuequn Xu, Chang-Su Lim, Mei Wang, Aaron Rosin, Steve Mitchell, Hee Moon
Park1, Eunpyo Moon2, Ung-Jin Kim, and Melvin I. Simon
With the availability of high quality BAC
libraries with stable, large inserts, it is now feasible to rapidly develop
genome-wide physical BAC contig resources to cover the large mammalian
genomes. For this purpose, we have tried to screen human BAC libraries
using mapped Unigene cDNA clones as probes. Currently, over 52,000 mapped
Unigenes (non-redundant, unigene sets of cDNA representing EST clusters)
are available for human alone. A total of 44,000 Unigene cDNA clones have
been supplied to us by Research Genetics. We have currently deconvoluted
over 10,000 Unigene probes against a 4X coverage human BAC library D using
high density colony hybridization filters. 10,000 batches of Unigenes are
arrayed in a logical array of 100 X 100 matrix from which 100 row pools
and 100 column pools are derived. Library filters are hybridized with pooled
probes, thus reducing the number of hybridization required for addressing
the positives for each Unigene from 10,000 to 200. Details on the experimental
scheme as well as daily progress report is posted on our WEB site (http://www.tree.caltech.edu).
Initial assessment of the deconvolution data indicates that over 95% of
the Unigenes have been deconvoluted so that we could have made a BAC-Unigene
resource for them. 800 additional Unigene probes and 1,200 Unigene probes
which were already deconvoluted by 100x100 have been re-screened by 20x20
to determine the accuracy and to estimate the rate of false positive hits
as a function of probe complexity and improve the accuracy. To circumvent
the cross-hybridization problems inherent to some Unigene probes, we are
also designing OVERGOes from sequences derived from mapped, well annotated
genes. Human BAC-Unigene resources generated in this effort will contribute
toward the realization of the "whole genome" approaches for human and other
Sangdun Choi, Yu-Jiun Chen, Mel
Simon, and Hiroaki Shizuya
BAC (bacterial artificial chromosome) cloning has served an important role in human and mammalian genomics since its introduction in 1992 by our laboratory. BAC libraries are currently in use or under development for virtually every important genome. The primary reasons why BACs are so useful is that they can stably maintain large DNA inserts (up to 350 kb) in E. coli, and are amenable to virtually all of the sophisticated molecular biology techniques developed for E. coli. We have been constructing BAC libraries of human and mouse in the last several years. Total number of human BAC clones generated is now close to one million. Recently we developed a new BAC vector and an improved method of construction of BAC libraries, and began constructing a series of BAC libraries with much larger insert size (182 - 202 kb) from human and a variety of organisms including Arabidopsis, maize, and rice. The larger insert genomic BAC libraries will provide significant improvement to applications in physical mapping, positional cloning, and DNA sequencing.
69. One Tier Pooling of a Total Genomic BAC Library
D.C. Torney, J.L. Longmire, D.C. Bruce,
J. Fawcett, M. Campbell, J. Tesmer, M. Maltbie, B. Taggett, T. Tatum, P.
Jewett, J. Meyne, N. Lenhert, Y. Valdez, S. Bailey, A. Schliep1,
L.L. Deaven, and N.A. Doggett
We have developed a single-tier pooling approach which enables the screening of a 12X diploid human genomic BAC library of 221,184 clones, 165 kb average insert size (one-half of the total 24X RPCI-11 library, http://bacpac.med.buffalo.edu) in a single screen of 376 PCR reactions, followed by confirmatory reactions of the predicted positive BAC clones. Prior to pooling of the library, the 384 well library stock plates were translated to 96 well plates. Pooling of the library was performed in increments of a quarter at a time: each a 3x coverage containing 55,296 clones. Ninety-four pools were made from each quarter of the library including two sets of plate pools (11 pools each), two sets of row pools (12 pools each), two sets of column pools (12 pools each), and one set of left and right diagonal pools (12 pools each). Each of these eight sets of pools was made from a plate-rearranged 24x24 array of 96-well microtitre dishes. Thus, most pools contain 4,608 clones. The plate rearrangements were selected to minimize the numbers of co-incidences of pairs of clones in pools. For the row, column, and diagonal pools, the "lines" of clones from the array are combined to make the final 12 pools, with no pool containing more than one line incident on any plate. Pool construction was accomplished, straightforwardly, with manifolds which have been designed to work on the Robbins Hydra (manuscript in preparation). Prior to pooling, the performance of the single-tier design (in the presence of experimental errors) was simulated. As with real data, the Markov chain Monte Carlo procedure was used to rank the candidate positives. Results compared favorably with a random 8-sets pooling design (Knill et al., J. Comp. Biol., 3, 395-406 (1996), and Bruno et al., Genomics, 26, 21-30 (1995)). We will present results from screening the pools with STS primer pairs that demonstrate that these single-tier pools will serve as valuable resources for rapidly isolating BAC clones for mapping and sequencing. Supported by the U.S. D.O.E. Office of Biological and Environmental Research under contract W-7405-ENG-36.
70. High Density Colony Filter Production and Automated Data Analysis for Efficient Hybridization Screening of BAC Libraries
Anca Georgescu, Laura Kegelmeyer,
Bernadette Lato, Hummy Badri, Matthew Groza, and Anne Olsen
Bacterial Artificial Chromosomes (BACs) have proven to be excellent reagents for construction of sequence-ready maps. As the demand for mapped BACs continues to increase, efficient methods for screening these libraries are needed to identify clones at the required throughput. We describe a format for high-density colony filter production in conjunction with a program for automated analysis of hybridization results, which has greatly improved the accuracy of identifying positive signals.
Colony filters are plated at a density of 6 x 6 x 384, or 13,824 colonies per 8 x 12 cm filter. All colonies are plated in duplicate in a unique offset pattern optimized to prevent ambiguity in identification of positive offsets. A positive control is plated in the first and last offsets of each subgrid to enable automated drawing of major grid lines by the analysis program. Hybridization probes consist of Alu-PCR products of cosmid or BAC clones, or overgos (J. McPherson, Washington Univ.) designed from cosmid or BAC end sequences. Multiple probes are pooled in a single hybridization, and positive colonies are re-arrayed for hybridization with individual probes. Hybridization signals are analyzed by the "blot-score" program developed at LLNL. The program currently operates in semi-automated manner, with the potential for full automation in the near future. In the current mode, the user loads a phosphorimager file of hybridized filters and selects the filters to be analyzed. Using the positive control offset signals, the program dynamically draws gridlines to indicate the 384 major subdivisions on the filter. When a subgrid containing a positive is selected, the program displays an enlarged view of the positive subgrid with the 6 x 6 minor gridlines drawn. True positives appear in unique duplicate patterns. The program highlights the position of the expected duplicate signal when the user selects a positive, thus providing immediate feedback on the validity of a given pair of observed signals. The program is linked to the LLNL mapping database to facilitate entry of hybridization results into the database and retrieval from the database of map information relevant to positive clones identified.
Work performed under the auspices of the US DOE by Lawrence Livermore National Laboratory under contract W-7405-ENG-48.
71. Systematic Conversion of a YAC/STS Map into a Sequence Ready BAC Map
C. Han and N.A. Doggett
We are starting with a previously constructed integrated physical map of human chromosome 16 (Doggett et al., Nature 377:Suppl:335-365, 1995) and converting this to a new sequence-ready BAC of this chromosome. The YAC/STS component of the integrated map consists of 900 CEPH megaYACs, and 300 flow-sorted 16-specific miniYACs that are localized to and ordered within somatic cell hybrid breakpoint intervals with 1150 STSs. This YAC/STS map provides nearly complete coverage of the euchromatic arms of the chromosome and provides STS markers on average every 78 kb. The integrated map also includes 470 genes/ESTs/exons, 400 genetic markers, and 530 cosmid contigs (110 kb average size, and covering 60% of the chromosome). To create large sequenceable targets of this chromosome we are using a systematic approach to screen high density BAC filters with evenly spaced probes. Probes are either pooled overlapping oligonucleotides (overgos, method developed by John McPherson, Wash U.). In order to select evenly spaced probes we first identified all available sequences in the integrated map. These include sequences from genes, ESTs, STSs, and cosmid end sequences. Since the integrated map was constructed on a physical scale we are able to select for sequences at a spacing of 50 kb - 100 kb when these were available. We then used BLAST to identify 36 bp unique fragments of DNA for overgo probes. Up to 236 overgos have been pooled in a single hybridization against a 12X coverage human BAC library (RPCI-11). Positive BACs that are identified from the pooled overgos are rearrayed on membranes and hybridized with either two-dimensional subpools of overgos to determine which BAC clones are positive for individual overgos. Probe-content BAC contigs are constructed in this manner. BAC contigs are then restriction mapped to select the optimal tiling sets for sequencing. Thus far we have identified over 6000 BACs from the chromosome 16 long arm and from an 11 Mb region of the short arm by the hybridization of 1,00 3 overgos. 35 Mb of BAC probe-content maps (by completion of the 2-dimensional hybridizations) and 10 Mb of sequence ready restriction maps have been constructed from these targets. Supported by the US DOE, OBER under contract W-7405-ENG-36.
72. An Arrayed BAC Resource for the High Resolution Mapping of Cancer-Related Chromosome Aberrations
Eunpyo Moon1, Jonghyeob Lee1,
Mei Wang, Bum-Chan Park, Ken Myambo2, Colin Collins2,
Melvin Simon, Ung-Jin Kim
Numerous human chromosomal aberrations known to be related to cancer phenotypes have been catalogued to date. 27,000 such aberrations have been documented and mapped to chromosomal subregions at the resolution of cytobanding technologies. Mapping and characterizing these aberrations at high resolution will provide clues to the underlying molecular nature of many of these cancers.
We are currently establishing a BAC resource that will cover the known cancer-related regions. The resource will also include 2-3,000 BACs spread over the genome. These BACs are being identified by cDNA inserts or OVERGO probes designed from published ESTs that have been well annotated and mapped to genomic locations. To date, over 200 cDNA probes have been selected from I.M.A.G.E. cDNA library and used for screening against the approved Caltech Human BAC library D. A total of 800 BACs have been selected by these probes and have been deconvoluted against the individual probes. Numerous OVERGOes are being designed and are used for library screening. The resulting BAC resource will provide a roughly 1 Mb resolution BAC array and will serve as a framework for high resolution FISH mapping of the chromosome aberrations. The BACs and BAC contigs from the array will also be used for the identification of the culprit genes, the molecular basis of the cancers incurred by the aberrations, and for the development of the "oncochip" to be used for efficient diagnosis of chromosome aberrations by the use of Comparative Genome Hybridization (CGH) technique.
73. A 12 Mbp Completely Contiguous Sequence-Ready BAC Contig in Human Chromosome 16p13.1-11.2
Yicheng Cao, Hyung Lyun Kang, So
Hee Dho1, Diana Bocskai, Mei Wang, Xuequn Xu, Jun-Ryul Huh1,
Byeong-Jae Lee1, Francis Kalush2, Judith G. Tesmer3,
Eunpyo Moon4, Norman A. Doggett3, Mark D. Adams2,
Melvin Simon, and Ung-Jin Kim
Here we present a 12 Mbp of BAC contiguity in the centromeric half of the chromosome 16p arm. The work initially involved extensive screening of deep human BAC libraries developed at Caltech using the STS markers that have been mapped to the target regions by Los Alamos National Laboratory. The positive BACs were characterized by sizing, FISH mapping, BAC end sequencing and sequence matches, and restriction fingerprint analysis. For the clones submitted for complete sequencing, genomic Southern blot analysis was performed to confirm the colinearity of the clones with the genomic DNA. 51 BAC clones in this region have been completely sequenced by TIGR. We post a comprehensive summary of the screening and characterization data for this and all other projects through our WEB site http://www.tree.caltech.edu.
We use the AceDraw program developed by the CS140 team at Caltech for the construction and updating of the contig map. This tool allows freehand, real scale map drawing using various data and human judgement, and is capable of communicating with databases including ACeDB. We extensively utilize STS contents, restriction fingerprint data, and BAC end sequence matches for the establishment of clone-to-clone contiguity. For the gap closure, we also utilized BAC end probes, OVERGOes and additional BAC libraries such as RPCI.
74. Completing the Sequence-Ready Map of Chromosome 19
Laurie Gordon, Anca Georgescu, Mari
Christensen, Sha Hammond, Hummy Badri, Bernadette Lato, Matthew Groza,
Linda Ashworth, Mark Wagner, and Anne Olsen
Chromosome 19 is the most GC-rich chromosome, suggesting an especially high gene density. This prediction is supported by transcript mapping results (Deloukas et al., Science 282, 744-746, 1998), that indicate chromosome 19 has the highest number of gene-based STSs relative to size of all the chromosomes. Thus this chromosome should be an extremely rewarding sequencing target in terms of gene discovery and elucidation of gene structure and organization.
We are nearing completion of a sequence-ready map of chromosome 19. The current map consists of 72 BAC/cosmid contigs with an average size of 710 kb. The contigs have been ordered along the chromosome by high resolution FISH, so their location is well defined relative to the cytogenetic map. The ordered contigs span a total of 51 Mb, or 93% of the non-centromeric portion of the chromosome. The average size of remaining gaps is an estimated 80 kb. For gap closure, probes developed from the ends of contigs are hybridized to high-density BAC colony filters, and positive BACs are incorporated into the existing map by analysis of restriction digests.
All contigs have been restriction mapped with EcoRI, resulting in a high-resolution restriction map of almost an entire chromosome. The distribution of EcoRI sites varies along the chromosome, with relatively larger fragments more common in light band regions. Several EcoRI polymorphisms between clones from different sources have been detected in the process of assembling restriction maps. The average depth of coverage of restriction mapped contigs is 8.5-fold. The high depth of coverage and mix of cosmid and BAC clones generally enables selection of an optimum set of spanning clones with minimum overlap for sequencing. About 30 Mb of chromosome 19 have been sequenced or are currently in the sequencing queue. The average size of contigs being sequenced is 830 kb. An average overlap between sequence tiling path clones of 10% is estimated from the map. All sequence tiling path clones are digested with three additional restriction enzymes to provide data for confirmation of final sequence assembly. Updated chromosome 19 data, including all restriction maps, are available on the LLNL Genome Center web site at http://bbrp.llnl.gov/bbrp/genome/html/chrom_map.html.
Work performed under the auspices of the US DOE by Lawrence Livermore National Laboratory under contract W-7405-ENG-48.
75. High-Throughput Multiplexed Fluorescent-Labeled Fingerprinting of BAC Clones
Yan Ding1, Martin D.
Johnson2, Wang Q. Chen3, Gigi E Park1,
Yujin Chen1, and Hiroaki Shizuya1
Human Genome Project has entered in a large scale sequencing stage. Currently, numerous maps, for example, STS or EST content maps, and YAC contig maps, have been constructed across all human chromosomes. However, building of physical maps that involve large contigs of sequencing units such as BACs still falls far behind. In order to fill this gap in a timely fashion, we have been worked on high-throughput contig assembly of BAC clones through multiplexed Fluorescent-labeled fingerprinting. Projects that rely on restriction fragment size lists to establish relationships between clones require extensive overlaps due to the limited resolution inherent in these strategies. We are currently developing a fingerprinting method using certain class IIS restriction enzymes, which cut DNA a few basepair away from their recognition sites and generate 5' overhangs consisting of 1 to 5 unknown bases. With the recessed strand serving as primer, these overhangs can be sequenced using modified fluorescent dideoxy terminator sequencing reagents1. When a fifth dye is used for an internal lane size standard, each fragment can be characterized by both size and end sequence of it's terminal 1 to 5 bases. This enhanced detail greatly increases the power to detect minimum overlap. Using this method, it is theoretically possible to identify overlaps of 15% for a project with 10,000 clones. The increased information content of each fragment also assists assembling accurate overlaps and establishing minimal tiling path with fewer clones. We will report our latest effort on optimizing the fingerprinting technique, software development to interpret the data, and a test on 500 to 1000 BAC clones, which have been identified with markers located on a 20 Mb region from 16p13.1 to 16p11.2, to assemble contigs.
1 Brenner, S. and Livak, K. J. 1989, Proc. Natl. Acad. Sci. USA, 86:8902-8906.
76. Progress Towards a High Resolution Sequence-Ready Map of Human Chromosome 5
Steve Lowry, Ze Peng, Duncan Scott, Yiwen
Zhu, Mei Wang, Roya Hosseini, Michele Bakis, Joel Martin, Ingrid Plajzer-Frick,
Jeff Shreve, and Jan-Fang Cheng
The high resolution map of chromosome 5 at JGI/LBNL began at the distal portion of the long arm. The region was chosen because it contains a cluster of cytokine growth factors (IL3, IL4, IL5, IL9, IL12, IL13, GM-CSF, FGFA, M-CSF) and receptor genes (GRL, ADRB2, M-CSFR, PDGFR) and was thought likely to yield related genes through full sequence analysis. The expanded region also contains a number of disease genes. These include genes associated with susceptibility for asthma, schizophrenia, corneal dystrophies, low-frequency hearing loss, Treacher-Collins syndrome, various types of myeloid disorders including acute myeloid leukemia, Cockayne syndrome, spinal muscular atrophy, split hand/split foot (DSS1), polyposis coli. The putative colorectal cancer tumor suppressor MCC, Zinc-finger Protein 131 associated with lymphadenogenesis, and the Leukemia Inhibitory Factor Receptor (LIFR) are other disease associated genes in the region.
The isolation of BACs is based on a combination of colony hybridization and PCR approaches using STSs obtained mostly from public databases. Contigs are expanded by end-sequence STS walking. Contigs are oriented using STSs developed from known genes and ordered genetic and RH markers. All clones are sized by pulsed-field gel electrophoresis, and their map locations are confirmed by fluorescent in situ hybridization. The size of overlaps between BACs is determined by comparison of restriction fragments from a single endonuclease digest.
We have so far mapped 2463 clones to 5q. Ninety-four percent of these are BACs. A total of 2341 STSs have been employed in the contig forming process. Over 50% of the STSs were derived from clone ends. We have in excess of 120 contigs from the distal 65 Mb of 5q ranging in size from 200 Kb to 4.2 Mb.
Clones with minimal overlap that form contigs as determined by the STS content and restriction maps are selected for sequencing. To date, 390 clones on the q arm of chromosome 5 have entered the sequencing pipeline, totaling approximately 45 Mb of unique target or 71% of the clone insert total.
Detailed information on STS and restriction maps can be found at our Web site: (http://www-hgc.lbl.gov/human-maps.html)
77. High Throughput Fingerprinting and Contig Assembly to Supply Sequence Ready Templates to the JGI-PSF
Linda Meincke, Robert Sutherland,
Connie Campbell, Joe Fawcett, Phil Jewett, Lynn Clark, Cliff Han, Larry
Deaven, and Norman Doggett
LANL's mapping goal for FY99 is to produce 48 Mb of sequence ready maps (40% of the JGI production goal). Our primary target is the q arm of chromosome 16 for which approximately 5000 BAC clones have been identified. A probe-content map is first constructed with the use of overgo probes (see abstract by Cliff Han for details). Gaps in this map are closed by PCR-screening with BAC-end STSs of a single-tier pooled BAC library (see abstract by David Torney for details). Clones are selected for fingerprinting based on map order and DNA is prepared in 96 well deepwell dishes. This DNA is restricted with EcoR1, also in a 96 well format, and is run on twelve fingerprinting gels. Gels are stained with ethidium bromide and images are captured on the Biorad Fluor-S MultiImager system and scored using the Bio Image Advanced Quantifier software. Fragment data is generated and organized into Excel spreadsheets and processed into a Sybase mapping database. Input files for contig assembly are generated from the Sybase database and passed to GRAM, a program that provides graphical representation and utilizes algorithms to assist in contig assembly. Minimal tiling sets of clones are then selected from the GRAM contigs and these are fingerprinted with two additional enzymes. Overlap among the tiling set is confirmed with the additional fragment data using GRAM. These confirmed clones are then released for sequencing.
Supported by the US DOE, OBER under contract W-7405-ENG-36.