Part I Index
W.H. Benner and J.M. Jaklevic
Human Genome Group; Engineering Science Department; Lawrence Berkeley National Laboratory; University of California; Berkeley, CA 94720
510/486-7194, Fax: -5857, firstname.lastname@example.org
Mass spectrometry is an instrumental method capable of producing rapid analyses with high mass accuracy. When applied to genome research, it is an attractive alternative to gel electrophoresis. At present, routine DNA analysis by mass spectrometry is seriously constrained to small DNA fragments. Contrasted to other mass spectrometry facilities in which the development of ladder sequencing is emphasized, we are exploring the application of mass spectrometry to procedures that identify short sequences. This approach helps the molecular biologists associated with LBL's Human Genome Center to identify redundant sequences and vector contamination in clones rapidly, thereby improving sequencing efficiency. We are also attempting to implement a rapid mass spectrometrybased screening procedure for PCR products.
The implementation of these applications requires that the performance of matrixassistedlaserdesorptionionization (MALDI) and electrospray mass spectrometry is improved. Our focus is the development of new ion detectors which will advance the state-of-the-art of each of these two types of spectrometers. One of the limitations for applying mass spectrometry to DNA analysis relates to the poor efficiency with which conventional electron multipliers detect large ions, a problem most apparent in MALDITOFMS. To solve this problem, we are developing alternative detection schemes which rely on heat pulse detection. The kinetic energy of impacting ions is converted into heat when ions strike a detector and we are attempting to measure indirectly such heat pulses. We are developing a type of cryogenic detector called a superconducting tunnel junction device which responds to the phonons produced when ions strike the detector. This detector does not rely on the formation of secondary electrons. We have demonstrated this type of detector to be at least two orders of magnitude more sensitive, on an areanormalized basis, than microchannel plate ion detectors. This development could extend the upper mass limit of MALDITOFMS and increase sensitivity.
Electrospray ion sources generate ions of mega-Dalton DNA with minimal fragmentation, but the mass spectrometric analyses of these large ions usually leads only to a mass-to-charge distribution. If ion charge was known, actual mass data could be determined. To address this problem, we are developing a detector that will simultaneously measure the charge and velocity of individual ions. We have been able to mass analyze DNA molecules in the 1 to 10 MD— a range using charge-detection mass spectrometry. In this technique, individual electrospray ions are directed to fly through a metal tube which detects their image charge. Simultaneous measurement of their velocity provides a way to measure their mass when ions of known energy are sampled. Several thousand ions can be analyzed in a few minutes, thus generating statistically significant mass values regarding the ions in a sample population. We are attempting to apply this technology to the analysis of PCR products.
DOE Contract No. DEAC0376SF00098.
Mass Spectrometer for Human Genome Sequencing
Chung-Hsuan Chen, Steve L. Allman, and K. Bruce Jacobson
Oak Ridge National Laboratory; Oak Ridge, TN 37831
423/574-5895, Fax: -2115, email@example.com
The objective of this program is to develop an innovative fast DNA sequencing technology for the Human Genome Project. It can also be applied to fast screening of genetic and contagious diseases, DNA fingerprinting, and environmental impact analysis.
The approach of this program is to replace conventional gel electrophoresis sequencing methods by using lasers and mass spectrometry for sequencing. The present gel sequencing method usually takes hours to days to acquire DNA analysis or sequencing, since different lengths of DNA segments need to be separated in dense gel. With laser desorption mass spectrometry (LDMS) approach, various sizes of DNA segments are separated in the vacuum chamber of a mass spectrometer. Thus, the time taken to separate various sizes of DNA is less than one second compared to hours using other methods.
Recently, we successfully demonstrated sequencing short DNA segments with this approach. We also have succeeded in using LDMS for fast screening of cystic fibrosis disease. We succeeded in identifying both point mutation and deletion of cystic fibrosis. In addition, we had preliminary success in using LDMS to achieve DNA fingerprinting. Thus, laser desorption mass spectrometry (LDMS) is going to emerge as a new and important biotechnological tool for DNA analysis.
DOE Contract No. DE-AC05-84OR21400.
Genomic Sequence Comparisons
Harvard Medical School; Boston, MA 02115
617/432-0503 or 7562, Fax: 7266
The first objective of this project is completion of an automated system to sequence DNA using electrophore masstag (EMT) primers for dideoxy sequencing. The prototype machine will contain a 60 capillary array with 400 EMT-labeled sequence ladders per capillary. The system is designed to use 100-fold less reagent and have 500-fold higher speed (1000 bases per sec per instrument) than current sequencing technology. Cleavage and laser desorption of EMTs from membranes for subsequent detection by ECTOF mass spectrometry. The second objective is to overcome the limitations of purely hypothetical annotation of the growing number of reading frames in new genome sequences. We measure gene product levels and interactions using DNA microarrays, whole genome in vivo footprinting and crosslinking.
Our approach involves system integration of instrumentation, organic chemistry, molecular biology, electrophoresis and software to the task of increasing sequencing accuracy and efficiency. Likewise we integrate such instruments and others with the needs of acquiring and annotation of large-scale microbial and human genomic sequence and population polymorphisms.
To establish functions for new genes, we use large-scale phenotyping by multiplexed growth competition assays, both by targeted deletion and by saturation insertional mutagenesis. We will continue to develop a system to sequence DNA using electrophore masstags (EMTs). We will establish genome-scale experimental methods for sequence annotation.
The most significant findings in 1995-1996 were 1) Demonstration of use of electrophore masstags in dideoxy sequencing. 2) Development of IR-laser desorption method and model. 3) A novel dsDNA microarray synthesis strategy. 4) A new amplifiable differential display for wholegenome in vivo DNA-protein interactions. 5) Establishment and application of a microbial DNA-protein interaction database.
DOE Grant No. DE-FG02-87ER60565.
A PAC/BAC End-Sequence Data Resource for Sequencing the Human Genome: A 2-Year Pilot Study
Roswell Park Cancer Institute; Buffalo, NY 14263
716/845-3168, Fax: -8849, firstname.lastname@example.org
Large scale sequencing of the Human genome requires the availability of highfidelity clones with large genomic inserts and a mechanism to find clones with minimal overlaps within the clone collections. The first need can be satisfied with bacterial artificial chromosome libraries (PACs and BACs) which already exist and further such libraries now being developed. However, a costeffective way for establishing highresolution contig maps for the human genome has not yet been established. Recently, a new approach for virtual screening for overlapping clones has been proposed by several research groups and has been discussed eloquently in a manuscript by Venter et al., 1996 (Nature). We will implement this approach for use with our human PAC and BAC libraries and use the first year as a pilot stage. The goal of the one year pilot is to prove the feasibility of large scale end sequencing and to demonstrate usefulness.
The first goal will be met by sequencing the ends for 40,000 clones from our existing PAC library and from BAC libraries currently being developed under NIH funding within our laboratory. The endsequencing will be based on our new DOPvector PCR procedure (Chen et al, 1996, Nucleic Acids Research 24, 26142616). All sequence data will be made available through public databases (GSDB, GDB, Genbank) and will also become BLAST searchable through the UTSW WWW site from our collaborator, Glen Evans. In view of our current underdeveloped informatics structure, we do not expect to provide BLAST search access through our own web site during the pilot phase.
To prove the usefulness of available end sequences, we will prepare a chromosome 14-enriched clone collection from our current 20-fold deep PAC library. To detect the chromosome 14 clones, we will use as hybridization probes a set of 1,000 mapped STS markers available from Paul Dear (MRC, Cambridge, UK), the about 600 markers present in the Whitehead map and the in situ mapped BAC and PAC clones available from Julie Korenberg. We will hybridize with these existing markers in probe pools, specific for regions of chromosome 14. Thus we will isolate region-enriched PAC clone collections.
Assuming that the clone collections will be at least 50%specific for chromosome 14 (50% false positives) and will include most of the chromosome 14 PACs from our library, a collection of about 35,000 clones is expected.
Hence, the bulk of the end sequences obtained during the first year will be derived from the chromosome 14 enriched set and should result in a sequence ready clone collection covering about 100 Mbp of the human genome. The purity of the chromosome 14 PAC collection will be characterized in a number of different ways, including testing with independent markers not used as probes and by FISH analysis of a representative set of PAC clones. To test the usefulness of the end sequence resource, the Sanger Centre will sequence chromosome 14 PACs from our collection and identify overlapping clones by virtual screening, using our endsequence database.
If overlapping clones can not be found with the expected level of redundancy in the endsequence database, we will screen the original PAC library with probes or STS markers derived from the sequenced PAC clones.
Subcontract under Glen Evans' DOE Grant No. DE-FC03-96ER62294.
Multiple-Column Capillary Gel Electrophoresis
Department of Chemistry; University of Alberta; Edmonton, Alberta, Canada T6G 2G2
403/492-2845, Fax: -8231, email@example.com
The objective of this project is to develop high-throughput DNA sequencing instrumentation. A two-dimensional arrayed capillary electrophoresis instrument is under development.
We have developed multiple capillary DNA sequencers. These instruments have several important attributes. First, by operation at electric fields greater than 100 V/cm, we are able to separate DNA sequencing fragments rapidly and efficiently. Second, the separation is performed with 3%T 0%C polyacrylamide. This low viscosity, non-crosslinked matrix can be pumped from the capillary and replaced with fresh material when required. Third, we operate the capillary at elevated temperature. High temperature operation eliminates compressions, speeds the separation, and increases the read length. Fourth, our fluorescence detection cuvette is manufactured locally by means of microlithography technology. These detection cuvettes provide robust and precise alignment of the optical system. Currently, 5, 16, and 90 capillary instruments are in operation in our lab; 32 and 576 capillary devices are under development. Fourth, we use both avalanche photodiode photodetectors and CCD cameras for high sensitivity detection. We have obtained detection limits of 120 fluorescein molecules injected onto the capillaries. High sensitivity is important in detecting the low concentration fragments generated in long sequencing reads. This combination of low concentration acrylamide, high temperature operation, and high sensitivity detection allows separation of fragments over 800 bases in length in 90 minutes.
DOE Grant No. DE-FG02-91ER61123.
DNA Sequencing with Primer Libraries
John J. Dunn, Laura-Li Butler-Loffredo, and F. William Studier
Biology Department; Brookhaven National Laboratory; Upton, NY 11973
516/344-3012, Fax: -3407, firstname.lastname@example.org
Primer walking using oligonucleotides selected from a library is an attractive strategy for largescale DNA sequencing. Strings of three adjacent hexamers can prime DNA sequencing reactions specifically and efficiently when the template is saturated with a single stranded DNA-binding protein (1), and a library of all 4,096 hexamers is manageable. We would like to be able to sequence directly on 35kbp fesmid templates, but the signal from a single round of synthesis is relatively weak and triplehexamer priming has not yet been adapted for cycle sequencing. We reasoned that a hexamer library might be used for cycle sequencing if combinations of hexamers could be selectively ligated by using other hexamers as the template for alignment. In this way, the longer primers needed for cycle sequencing could be generated easily and economically without the need for complex machines for de novo synthesis.
We found that ordered ligation of 3 hexamers to form an 18mer occurs readily on a template of the 3 complementary hexamers (offset by three base pairs) that can base pair unambiguously to form a doublestranded complex of indefinite length (2). Each hexamer forms three complementary base pairs with two other hexamers, generating complementary chains of contiguous hexamers with strand breaks staggered by three bases. Two adjacent hexamers in the chain to be ligated contain 5' phosphate groups and the others are unphosphorylated. Both T4 and T7 DNA ligase can ligate the phosphorylated hexamers to their neighbors in such a complex at hexamer concentrations in the 50100 M range, producing an 18mer and leaving three unphosphorylated hexamers. The products of these ligation reactions can be used directly for fluorescent cycle sequencing of 35kbp templates.
Unambiguous ligation requires that alternative complexes with perfect base pairing not be possible with the combination of hexamers used. Since the combination of hexamers is dictated by the sequence of the desired ligation product, some oligonucleotides cannot be produced unambiguously by this method. However, 82.5% of all possible 18-mers could potentially be generated starting with a library of all 4096 hexamers, more than adequate for high throughput DNA sequencing by primer walking.
DOE Grant No. DE-AC02-76CH00016.
(2) Dunn, J. J., Butler-Loffredo, L. and Studier, F. W. Ligation of hexamers on hexamer templates to produce primers for cycle sequencing or the polymerase chain reaction. Anal. Biochem. 228, 91-100 (1995).
Rapid Preparation of DNA for Automated Sequencing
John J. Dunn, Matthew Randesi, and F. William Studier
Biology Department; Brookhaven National Laboratory; Upton, NY 11973
516/344-3012, Fax: -3407, email@example.com
We have developed a vector, referred to as a fesmid, for making libraries of approximately 35kbp DNAs for mapping and sequencing. The high efficiency lambda packaging system is used to generate libraries of clones. These clones are propagated at very low copy number under control of the replication and partitioning functions of the F factor, which helps to stabilize potentially toxic clones. A P1 lytic replicon under control of the lac repressor allows amplification simply by adding IPTG. The cloned DNA fragment is flanked by packaging signals for bacteriophage T7, and infection with an appropriate T7 mutant packages the cloned sequence into T7 phage particles, leaving most of the vector sequence behind. The size of the vector portion is such that genomic fragments packageable in lambda (normal capacity 48.5 kbp) should also be packaged in T7 (normal capacity 40 kbp).
We have made fesmid libraries of several bacterial DNAs, including Borrelia burgdorferi (the cause of Lyme disease), Bartonella henselae (the cause of cat scratch fever), E. coli, B.subtilis, H. influenzae, and S. pneumoniae, some of which have been reported to be difficult to clone in cosmid vectors. Human DNA is also readily cloned in these vectors. Brief amplification followed by infection with a gene 3 and 17.5 double mutant of T7, which is defective in replicating its own DNA, produces lysates in which essentially all of the phage particles contain the cloned DNA fragment. Simple techniques yield high-quality DNA from these phage particles. Primers for direct sequencing from the ends of fesmid clones have been made.
Primer walking from the ends of fesmid clones could be an efficient way to sequence bacterial genomes, YACs, or other large DNAs without the need for prior mapping of clones. The ends of fesmids from a random library provide multiple sites to initiate primer walking. Merging of the elongating sequences from different clones will simultaneously generate the sequence of the original DNA and determine the order of the clones. The packaged fesmid DNAs are a convenient size for multiple restriction analyses to confirm the accuracy of the nucleotide sequence.
DOE Grant No. DE-AC02-76CH00016.
A PAC/BAC EndSequence Database for Human Genomic Sequencing
GlenA. Evans, Dave Burbee, Chris Davies, Trey Fondon, Tammy Oliver, Terry Franklin, Lisa Hahner, Shane Probst, and HaroldR.(Skip) Garner
Genome Science and Technology Center and McDermott Center for Human Growth and Development; University of Texas Southwestern Medical Center at Dallas; Dallas, TX 75235-8591
214/648-1660, Fax: -1666, firstname.lastname@example.org
While current plans call for completing the human genome sequence in 2003, major obstacles remain in achieving the speed and efficiency necessary to complete the task of mapping and sequencing. As an approach to this problem, we proposed a novel approach to large scale construction of sequence-ready physical clone maps of the human genome utilizing endspecific sequence sampling. An earlier pilot project was initially carried out to develop a GSS (genomic sequence sampled) map of human chromosome 11 by sequencing the ends of 17,952 chromosome 11 specific cosmids. This chromosome 11-specific endsequence database allows rapid and sensitive detection of clone overlaps for chromosome 11 sequencing.
In this project, we propose to evaluate the utility of PAC and BAC endsequences representing the entire human genome as a tool for complete, high accuracy mapping and sequencing. In this approach, we utilized total genomic PAC/BAC libraries (constructed by P. de Jong, RPCI), followed by endsequencing of both ends of each clone in the library and limited regional mapping of a subset of clones as sequencing nucleation points by FISH (Fluorescence in situ hybridization).
To initiate regional analysis, a single clone would be sequenced by shotgun or primer directed sequencing, the entire sequence used to search the enddatabase for overlapping clones, and the minimal overlapping clones for extending the sequence selected. This approach would allow rational and efficient simultaneous mapping and sequencing, as well as expediting the coordination and exchange of information between large and small groups participating in the human genome project.
In this pilot project proposal we are carrying out automated endsequencing of approximately 40,000 PAC and BAC clones representing the entire human genome, as well as about 500 PAC clones localized to human chromosomes 11 and 15. The clones and resulting endsequence data base will be utilized to 1) nucleate regions of interest for large scale sequencing concentrating on regions of chromosome 11 and 15, 2) correspond with regions mapped by other methods to confirm the mapping accuracy and 3) used to evaluate the use of random clone end sequence libraries. DNA sequencing is being carried out in an entirely automated fashion using a Beckman/Sagian robotic system, ABI 377 automated sequencers and automated sequence data processing, annotation and publication using a Hewlett Packard/Convex superparallel computer located at the UTSW genome center. FISH analysis of a sample of PAC clones has been carried out and defines the potential chimera rate in existing PAC libraries as less than 1.2%. This effort will be coordinated with efforts of other groups carrying out PAC and BAC library construction, PAC and BAC endsequencing and FISH analysis to avoid duplication of effort and provide a comprehensive endsequence library and data set for use by the international human genome sequencing effort.
DOE Grant No. DEFC0396ER62294.
Automated DNA Sequencing by Parallel Primer Walking
GlenA. Evans, Dave Burbee, Chris Davies, Jeff Schageman, Shane Probst, Terry Franklin, Ken Kupfer, and HaroldR.(Skip) Garner
Genome Science and Technology Center and McDermott Center for Human Growth and Development; University of Texas Southwestern Medical Center at Dallas; Dallas, TX 75235-8591
214/648-1660, Fax: -1666, email@example.com
The development of efficient mapping approaches coupled with high throughput, automated DNA sequencing remains one of the key challenges of the Human Genome Project. Over the past few years, a number of strategies to expedite clone-by-clone DNA sequencing have been developed including efficient shotgun sequencing, sequencing of nested deletions, and transposonmediated primer insertion. We have developed a novel sequencing strategy applicable to high throughput, large scale genomic analysis based upon DNA sequencing directly primed on of cosmid templates using customdesigned, automatically synthesized oligonucleotide primers. This approach of directed primer "walking" would allow the number of sequencing reactions and the efficiency of sequencing to be vastly improved over traditional shotgun sequencing.
Custom primer design has been carried out using software we developed for prediction of "walking" primers directly from the output of ABI377 automated DNA sequencers, and the output used to automatically program synthesis of the custom primers using 96 or 192 channel oligonucleotide synthesizers constructed at UTSW. Automated operation of the sequencing system is thus possible where results of each sequencing reaction is used to predict, synthesize, and carry out appropriate extension reactions for downstream "walking". A automated prototype system has been assembled where dye terminator DNA sequencing can be carried out from 96 cosmid templates simultaneously followed by prediction of oligonucleotide "walking" primers for extending the sequence of each fragment, and programming an attached 96-channel oligonucleotide synthesizer to initiate a second round of sequencing. Using a set of nested cosmids covering 800 kb at 5X redundancy, primer directed sequencing should allow completion of 800 kb of finished, high accuracy DNA sequence in 8 to 16 cycles. Furthermore, coupling of automated DNA sequencing instrumentation to DNA sequence analysis programs and multichannel oligonucleotide synthesizers will allow almost complete automation of sequencing process and the development of instrumentation for completely unattended DNA sequencing.
DOE Grant No. DEFG0395ER62055.
*Parallel Triplex Formation as Possible Approach for Suppression of DNAViruses Reproduction
V.L. Florentiev, A.K. Shchyolkina, I.A. Il'icheva, E.N. Timofeev, and S.Yu Tsybenko
Engelhardt Institute of Molecular Biology; Russian
Academy of Sciences; Moscow 117984, Russia
Fax: +7095/1351405, firstname.lastname@example.org
It is well known that homopurine or homopyrimidine single stranded oligonucleotides can bind to homopurinehomopyrimidine sequences of twostranded DNA to form stable threestranded helices. In such triplexes two identical strands have antiparallel orientation. We denote these triplexes as "antiparallel" or "classical" triplexes.
A particular interest of investigators to triplexes has arisen due to an elegant idea of using triplexes as sequencespecific tools for purposeful influence on DNA duplexes. Triplex forming oligonucleotides were shown to be potentially useful as regulators of gene expression and subsequently as therapeutical (antiviral) agents.
A significant limitation to the practical application of antiparallel triplex is the requirement for homopurine tracts in target DNA sequences. Numerous investigations slightly expanded the repertoire of tripleforming sequences but did not completely remove this limitation.
It was recently shown that during homologous recombination promoted by RecA a triplestranded DNA intermediate was formed. Such a structure is a new form of the triple helix. In sharp contrast with the "classical" triplexes their third strand is parallel to the identical strand of the Watson-Crick duplex. We denote this structure as "parallel" triplex. Recently, the parallel triplex was obtained only by deproteinization of joint molecules generated by recombination proteins.
We first obtained experimental (chemical probe, melting curves and fluorescence due binding) results that provide convincingly evidence for proteinindependent formation of parallel triplex  and than confirmed this fact by FTIR data . Because the parallel triplex can be formed for any sequence, it might be "ideal" potential tool for sequence specific recognition of DNA. Unfortunately, low stability of parallel triplexes prohibits practical application of these structures.
Earlier we found that propidium iodide stabilizes selectively the parallel triplexes . This fact was the basis of new approach to stabilization of parallel triplexes being developed by us now. The approach consists in use of targeting oligonucleotide, which contains in internucleotide linkage the alkyl insert coupled with intercalated ligand through linker. Length of linker was chosen to allow ligand to intercalate in the same stackingcontact (length of linker was picked by molecular dynamic calculations).
Preliminary study showed that presence of intercalating inserts increase considerably stability of DNA duplexes . Now we are investigating in detail effect of such modification of targeting oligonucleotides on stability of parallel triplexes.
DOE Grant No. OR00033-93CIS005.
2. Dagneaux, C., Gousset, H., Shchyolkina, A. K., Ouali, M., Lettelier, R., Liquier, J., Florentiev, V. L. and Taillander, E. (1996) Parallel and antiparallel AAT intramolecular triple helices. Nucleic Acids Res., 24, 4506-4512.
3. Borisova, O. F., Shchyolkina, A. K., Timofeev, E. N., Tsybenko, S. Yu., Mirzabekov, A. and Florentiev, V. L. (1995) Stabilization of parallel triplex with propidium iodide. J. Biomol. Struct. Dynam., 13, 15-27.
4. Timofeev, E. N., Smirnov I. P., Haff, L. A., Tishchenko, E. I., Mirzabekov, A. D. and Florentiev, V. L. (1996) Methidium intercalator inserted into synthetic oligonucleotides. Tetrahedron Lett., 37, 8467-8470.
Advanced Automated Sequencing Technology: Fluorescent Detection for Multiplex DNA Sequencing
Andy Marks, Tony Schurtz, F.Mark Ferguson, Leonard DiSera, Alvin Kimball, Diane Dunn, Doug Adamson, Peter Cartwright, RobertB. Weiss,1 and RaymondF. Gesteland1
Department of Human Genetics and 1Howard Hughes Medical Institute; University of Utah; Salt Lake City, UT84112
Gesteland: 801/581-5190, Fax: /585-3910
Automation of a largescale sequencing process based on instrumentation for automated DNA hybridization and detection is a focal point of our research. Recently, we have devised a method for amplifying fluorescent light output on nylon membranes by using an alkaline phosphatase conjugated probe system combined with a fluorogenic alkaline phosphatase substrate . The amplified signal allows sensitive detection of DNA hybrids in the subfemtomole/band range.
On the basis of this detection chemistry, automated devices for detecting DNA on blotted microporous membranes using enzymelinked fluorescence, termed Probe Chambers, have been built. The fluorescent signal is collected by a CCD camera operating in a Time Delay and Integration mode. Concentrated solutions of probes and enzymes are stored in Peltier-cooled septa sealed vials and delivered by syringe pumps residing in a gantry style pipetting robot. Fluorescence excitation is generated by a mercury arc lamp acting through a fiber optic "light line". Three 30 x 63 centimeter sequencing membranes can be simultaneously processed, currently revealing up to 108 lane sets per multiplex cycle. A probing cycle is completed approximately every eight hours.
Integration of the Probe Chamber into the production pipe line is accomplished through connections to the laboratory data base. A critical component of a highthroughput sequencing laboratory is the software for interfacing to instrumentation and managing work flow. The Informatics Group of the Utah Genome Center has designed and implemented an innovative system for automating and managing laboratory processes. This software allows the model of workflow to be easily defined. Given such a model, the system allows the user to direct and track the flow of laboratory information. The core of the system is a generic, clientserver process management engine that allows users to define new processes without the need for custom programming. Based on these definitions, the software will then route information to the next process, track the progress of each task, perform any automated operations, and provide reports on these processes. To further increase the usefulness of our laboratory information system, we have augmented it with handhelp mobile computing devices (Apple Newtons) that link to the database through RF networking cards.
Base calling software has been developed to support our automated, large scale sequencing effort. 1st stage sequence calling identifies putative bands, however, depending on the number of reader indel errors (26%), merging 1st stage sequence without the aide of cutoff information can be difficult. To improve our base calling we have employed Fuzzy Logic to establish confidence metrics. The logic produces a confidence metric for each band using band height, width, uniqueness, shape, and the gaps to adjacent bands. The confidence metric is then used to identify the largest block of highest quality sequence to be merged.
DOE Grant No. DEFG0394ER61817.
Resource for Molecular Cytogenetics
Donna Albertson, Colin Collins, Joe Gray,1 Steven Lockett, Daniel Pinkel,1 Damir Sudar, Heinz Ulrich Weier, and Manfred Zorn
Lawrence Berkeley National Laboratory; Berkeley, CA94720 and 1University of California; San Francisco, CA94143
Gray: 415/476-3461, Fax: -8218, email@example.com
Pinkel: 415/476-3659, Fax: -8218, firstname.lastname@example.org
The purpose of the Resource for Molecular Cytogenetics is to develop molecular cytogenetic techniques, instruments and reagents needed to facilitate large scale genomic DNA sequencing and to assist in identification and functional characterization of genes involved in disease susceptibility, genesis and progression. This work is closely coordinated with the LBNL Human Genome Program and directly supports research in the LBNL Life Sciences Division and the UCSF Cancer Center. Work currently is in four areas: a)Genome analysis technology, b)Probe development and physical map assembly, c)Digital imaging microscopy and d)Informatics. The Resource acts as a catalyst for research in several areas so some support comes from Industry, the NIH and NIST.
Probe development and physical map assembly: The Resource maintains a list of over a thousand publicly available probes suitable for molecular cytogenetic studies. These include approximately 600 probes each selected by the Resource to contain a known STS or EST. Probes selected by the Resource can be requested through our web page.
The resource also participates in the development of low and high resolution physical maps to facilitate analysis and characterization of genetic abnormalities associated with human disease. Low resolution mapping panels with probes distributed at few megabase intervals have been completed this year for chromosomes 1, 2, 3, 7, 8, 10, and 20. The mapped STSs associated with these probes facilitate movement from low to high resolution physical maps. STS content mapping and DNA fingerprinting have been applied to develop a high resolution, sequenceready map comprised of BAC and P1 clones for the ~1Mb region of chromosome 20 between WI9227 and D20S902. This region is amplified in ~10% of human breast cancers. Approximately 300 kb of this region has been sequenced by the LBNL Human Genome Program.
Quantitative DNA fiber mapping (QDFM) has been developed this year to facilitate high resolution analysis of genomic overlap between cloned probes. In this approach, cloned DNA molecules are uniformly stretched during drying by the hydrodynamic action of a receding meniscus. The position of specific sequences along the stretched DNA molecules is visualized by fluorescence in situ hybridization (FISH) and measured by digital image analysis. QDFM has been used to map gamma alpha transposons, plasmid or cosmid probes along P1 molecules, and P1 or PAC clones along straightened YAC molecules with few kilobase resolution. QDFM is now being studied to determine its utility in the assembly of minimally overlapping, sequenceready contigs, assessment of the integrity of cloned BACs and mapping of subclones prepared for directed DNA sequencing along the clone from which they were derived.
Genome analysis technology: The Resource has participated in the development of comparative genomic hybridization (CGH) as a tool for detection and mapping of changes in relative DNA sequence copy number in humans and mouse. This year, CGH to arrays of cloned probes (CGHa) has been demonstrated. This is advantageous because it allow aberrations to be mapped with resolution determined by the genomic spacing of probes on the array. CGHa also is attractive since it appears to be linear over a relative copy number range of at least 104 between the two nucleic acid samples being compared.
The Resource has participated in the development of FISH approaches to analysis of relative gene expression in normal and aberrant tissues. FISH with cloned or predicted expressed sequences, previously developed in C. elegans, is now being applied to the assessment of expression of human genes. The C. elegans work suggests a throughput of several dozen sequences per month. Information from this approach will be important in assessment of the function of newly discovered genes, including those predicted from DNA sequencing.
Digital imaging microscopy: The Resource supports work in microscopy, image processing and analysis methods needed for CGH and CGHa, 3D FISH, tissue analysis, rare event detection, multicolor image acquisition, aberration scoring for biodosimetry, and analysis of FISH to DNA fibers. Developments this year include an improved package for CGH and prototype systems for analysis of DNA fibers, CGHa arrays and semiautomatic segmentation of nuclei in three dimensions.
Informatics: The Resource maintains a web site at http://rmc-www.lbl.gov that summarizes information about mapped probes. Probes developed by the Resource can be requested directly through this page. In addition, the Resource has developed a Web page for exchange of genomic, genetic and biologic information between geographically disperse collaborators. The page, under password control, carries information about physical maps, genomic sequence, sequence annotation, and gene expression images.
DOE Contract No. DEAC0376SF00098.
DNA Sample Manipulation and Automation
Center for Genome Research; Whitehead Institute/Massachusetts Institute of Technology; Cambridge, MA 02139
617/252-1910, Fax: -1902, email@example.com
The objective of this project is to develop a high-throughput, fully automated robotic device for the complete automation of the sequencing process. We also aim to further develop DNA sequencing electrophoresis systems and to integrate these devices with our robotics.
We have built the Sequatron, an integrated, robotic device which automates the tasks of DNA purification and setup of thermal cycle sequencing reactions. The major component of our system is an articulated CRS 255A robotic arm which is track mounted. The deck of the robot contains several new or modified XYZ robotic workstations, a novel thermal cycler with automated headed lids, carousels, and custom built plate feeders.
Biochemically, we have employed our Solidphase reversible immobilization (SPRI) technique to isolate and manipulate the DNA throughout the process.
Specifically we have set up the Sequatron to isolate DNA from M13 phage or crude PCR products using the same protocol and procedures. From M13 phage we obtain approximately 1g of DNA per well, which is sufficient for multiple sequencing reactions.
The current throughput of the system is 80 microtiter plates of samples from M13 phage supernatants or crude PCR products to sequence ready samples every 24 hours. Recently, new enzymes, new energy transfer primers and higher density microtiter plates have opened up possible increases to in excess of 25,000 samples per 24 hour period.
DOE Grant No. DE-FG02-95ER62099.
Construction of a Genome-Wide Characterized Clone Resource for Genome Sequencing
Leroy Hood, MarkD. Adams,1 and Melvin Simon2
University of Washington; Seattle, WA 98195-7730
206/616-5014, Fax: /685-7301, firstname.lastname@example.org
1The Institute for Genomic Research; Rockville, MD 20850; email@example.com
2California Institute of Technology; Pasadena, CA 91125; firstname.lastname@example.org
Bacterial artificial chromosomes (BACs) represent the state of the art cloning system for human DNA because of their stability and ease of manipulation. Venter, Smith and Hood (Nature 381:364366, 1996) have proposed a strategy based on the use of sequences from the ends of all clones in a deep coverage BAC library to produce a sequenceready set of clones for the human genome. We propose to demonstrate the effectiveness of this strategy by performing a directed test, initially on chromosomes 16 and 22, and continuing on to chromosome 1. All available markers on chromosome 16 (including the large number of soon-to-be-available radiation hybrid markers) will be used to screen the existing 8x BAC library at CalTech. This will serve to evaluate the quality of the library in terms of representation of broad chromosomal regions. A similar procedure will be used for chromosome 22, except that the existing BAC map will be used to select more evenly spaced markers for screening, including use of endsequence markers from the current chromosome 22 BAC map constructed in the Simon lab. Each identified clone will be rearrayed from the library and end sequenced. This information will dovetail nicely with ongoing sequencing projects at TIGR and the Sanger Centre, which will in turn provide additional information on the average degree of BAC overlap detectable by this method, the degree of interference with genomewide repeats, and the appropriate use of fingerprinting as an early or late addition to the endsequencing information. In addition, we will develop and implement costeffective, highthroughput methods of preparing and endsequencing BAC DNA that are suitable for scaling to characterization of the full 400,000 clones necessary for characterization of a 15x human BAC library.
DOE Grant No. DE-FC03-96ER62299.
DNA Sequencing Using Capillary Electrophoresis
Barry L. Karger
Barnett Institute; Northeastern University; Boston, MA 02115
617/373-2867 or -2868, Fax: -2855
During the past year, we have made major progress in the design of a replaceable polymer matrix for DNA sequencing and the development of the first generation multiple capillary array of 12 capillaries. We also implemented ultrafast separation of dsDNA (e.g. 30 sec for complete resolution of the standard X174-HAE III restriction fragments).
In the separation of sequencing reaction products, we completed a study on the role of polymer molecular weight and concentration. Using linear polyacrylamide (LPA), the polymer with which we have had our most success, we have achieved 1000 base read lengths in 1 1/2 hrs. Optimization of column length, electric field and column temperature (50° C) was required. Using emulsion polymerization, we are now able to produce LPA powders with MW of ~104 k Da. The fully replaceable matrix is very powerful for rapid sequencing of long reads.
We have successfully implemented a 12-capillary array instrument and are using it to study issues of ruggedness in routine sequencing. As part of this, we have developed a sample clean-up procedure which reduces all reactions to a similar state in terms of sample solution prior to injection. The results of this work have led to the design of a 96-capillary array that we will implement over the next year.
We have also achieved very fast separations of ss- and dsDNA using short capillaries and very high yields. For example, sequencing 300 bases in 34 mins. has been shown, as well as very rapid mutational analysis. Implementation of such speeds on a capillary array will create an instrument for high throughput automated analysis.
DOE Grant No. DE-FG02-90ER60985.
Ultrasensitive Fluorescence Detection of DNA
Richard A. Mathies and Alexander N. Glazer
Departments of Chemistry and Molecular and Cell
Biology; University of California; Berkeley CA 94720
510/642-4192, Fax: -3599, email@example.com
The overall goal of this project is to develop new fluorescence labeling methods, separation methods and detection technologies for DNA sequencing and genomic analysis.
Highlights along with representative publications are given below.
Energy Transfer Primers. Families of sequencing and PCR primers have been developed that contain both fluorescence donor and acceptor chromophores.1 These labeled primers with optimized excitation and emission properties provide from 2 to 20fold enhanced signal intensities in automated DNA sequencing with slab gels and with capillary arrays.2 The reduced spectral cross talk of these ET primers also makes them valuable in PCR product and STR analyses.3
New Intercalation Dye Labels. A new family of heterodimeric bisintercalation dyes has been synthesized exploiting the concept of fluorescence energy transfer between two different cyanine intercalators.4 By tailoring the spectroscopic properties of the dyes, labels with intense emission above 650 nm following 488 nm excitation have been fabricated. By adjusting the spacing linker between the two dyes, the binding affinity has also been optimized. These molecules are useful for noncovalent multiplex labeling of dsDNA in a wide variety of multicolor analyses.5
Capillary Electrophoresis Chips. Capillary and capillary array electrophoresis systems have been photolithographically fabricated on 2x3' glass substrates.6 These devices provide high quality electrophoretic separations of dsDNA fragments and DNA sequencing reactions with a 10fold increase in speed.7 Arrays of up to 32 capillaries on a single chip have been fabricated.
Single DNA Molecule Fluorescence Burst Detection. A confocal fluorescence system has been used to demonstrate that single molecule fluorescence burst counting can be used to detect CE separations of dsDNA fragments. Fragments as small as 50 bp can be counted and mass sensitivities as low as 100 molecules per electrophoresis band are possible. This technology should be valuable in incipient cancer and trace pathogen detection.8
DOE Grant No. DEFG03-91ER61125.
2. Ju, J., Glazer, A. N. and Mathies, R. A. Energy Transfer Primers: A New Fluorescence Labeling Paradigm for DNA Sequencing and Analysis, Nature Medicine 2, 180-182 (1996).
3. Wang, Y., Ju, J., Carpenter, B., Atherton, J. M., Sensabaugh, G. F. and Mathies, R. A. HighSpeed, HighThroughput THO1 Allelic Sizing Using Energy Transfer Fluorescent Primers and Capillary Array Electrophoresis, Analytical Chemistry 67, 1197-1203 (1995).
4. Benson, S. C., Zeng, Z., and Glazer, A. N. Fluorescence Energy Transfer Cyanine Heterodimers with High Affinity for DoubleStranded DNA. I. Synthesis and Spectroscopic Properties, Anal. Biochem. 231, 247-255 (1995).
5. Zeng, Z., Benson, S. C., and Glazer, A. N. Fluorescence Energy Transfer Cyanine Heterodimers with High Affinity for DoubleStranded DNA. II. Applications to Multiplex Restriction Fragment Sizing, Anal. Biochem. 231, 256-260 (1995).
6. Woolley, A. T. and Mathies, R. A. UltraHighSpeed DNA Fragment Separations Using Microfabricated Capillary Array Electrophoresis Chips, Proc. Natl. Acad. Sci. U.S.A., 91, 11348-11352 (1994).
7. Woolley, A. T. and Mathies, R. A. UltraHighSpeed DNA Sequencing Using Capillary Array Electrophoresis Chips, Analytical Chemistry 67, 3676-3680 (1995).
8. Haab, B. B. and Mathies, R. A. Single Molecule Fluorescence Burst Detection of DNA Fragments Separated by Capillary Electrophoresis, Analytical Chemistry 67, 3253-3260 (1995).
Joint Human Genome Program Between Argonne National Laboratory and the Engelhardt Institute of Molecular Biology
Andrei Mirzabekov,1,2 G. Yershov,1,2 Y. Lysov,2 V. Barsky,2 V. Shick,2 and S. Bavikin1
1Argonne National Laboratory; Argonne, Il 60439
630/252-3161 or -3361, Fax: /252-3387
2Engelhardt Institute of Molecular Biology; 117984 Moscow, Russia
In 1996, more than thirty U.S. and Russian research workers participated in the joint Human Genome Program between Argonne National Laboratory and Engelhardt Institute of Molecular Biology on the development of sequencing by hybridization with oligonucleotide microchips (SHOM).
During this year, about twenty Russian scientists have been working from 3 months to 1 year in ANL. In this period, 3 papers have been published and 5 papers accepted for publication, 3 more papers are submitted for publication.
The main research efforts of the group have been concentrated in three directions:
I. Improvement of SHOM technology.
II. Development of SHOM for the needs of Human Genome Program.
III. Development of new approaches based on SHOM technology.
I. Improvement of SHOM technology
As a major result of the work in this direction, simple, reliable and effective methods of microchip manufacturing, sample preparations, and quantitative hybridization analysis by fluorescence microscopy have been developed or improved.
1. Photopolymerization technique for production of micromatrices of polyacrylamide gel pads on hydrophobicized glass surface was improved to become a simple, highly reproducible and inexpensive procedure (7).
2. New and cheaper chemistry of the oligonucleotide immobilization has been developed and introduced for production of more durable microchips. It is based on the use of aminooligonucleotides and aldehydegels instead of 3 methyluridineoligonucleotides and hydrazidegels (3).
3. Four-pin robot has been constructed with computer control of every microchip element production. High quality microchips with 4100 immobilized oligonucleotides have been manufactured and the complexity of the microchips can easily be scaled up to a few tens of thousand elements.
4. Two-color fluorescence microscope has been equipped for regular use with proper mechanics and software. It allows investigators to regularly use the automatic quantitative monitoring of the hybridization on the whole microchip and to measure the kinetics of hybridization as well as the melting curves of duplexes formed with all microchip oligonucleotides (1,2,8).
5. Four-color fluorescence microscope was manufactured and four proper fluorescence dyes are at present under selection.
6. Chemical methods of introduction of several fluorescence dyes into DNA and RNA with or without fragmentation have been developed and regularly used in SHOM experiments (4).
7. A theory describing the kinetics of hybridization with gelimmobilized oligonucleotides has been developed (5).
8. Simple and relatively inexpensive equipment (around $10,000 per set) has been produced for manual manufacturing of microchips and fluorescence measurement of hybridization, which will enable every laboratory to produce and practically use microchips containing up to 100 immobilized oligonucleotides or other compounds.
II. Application of SHOM
Although the main goal of our SHOM development is to produce a simple de novo sequencing procedure, a number of other SHOM applications have been tested as intermediate steps in the SHOM research.
1. Sequence analysis and sequencing
A number of technical problems should be solved for de novo sequencing although they are much less stringent for comparative sequence analysis than for de novo sequencing. Among these:
a) Reliable discrimination of perfect and mismatched duplexes. We have significantly improved the discrimination by decreasing the length of hybridized oligonucleotides to 6-and 8-mers (1, 7) and by using 5-mers in "contiguous stacking" hybridization (1,2). Essential improvement was also achieved by automatic measuring of the melting curves for duplexes formed in each microchip element and calculating their thermodynamic parameters, free energy, enthalpy and entrophy for different regions of the melting curves and by comparing them with these parameters for perfect duplexes. In addition, a highly reliable discrimination was achieved by using twocolor fluorescence microscopy and by quantitative comparison of the hybridization pattern of a known DNA or synthetic oligonucleotides and DNA under study labeled with different fluorophores (8).
b) Difference in hybridization efficiency depends on the GCcontent and the length of the duplex. We have equalized the efficiency by choosing proper concentration for the immobilized oligonucleotide (6,7) and also by increasing the effective length of immobilized oligonucleotides by adding at one or both their ends 5nitroindole as a universal base or a mixture of four bases (2).
c) Interference of hairpins and other structures in DNA with less stable duplexes formed upon the DNA hybridization with comparatively short immobilized oligonucleotides of the microchip. This interference was decreased by fragmentation of the analysed sample of DNA and RNA in the course of incorporation of a fluorescence label (4). We have also tested incorporation by a chemical bond of an intercalator into immobilized oligonucleotides that stabilized its base paring with DNA over hairpin formation (10).
d) Necessity to increase the microchip complexity for sequencing long DNA stretches. As an alternative, further development of socalled contiguous stacking hybridization was shown to improve the efficiency of 8-mer microchip up to that of 13-mer microchip so that DNA of several kilobases in length could be sequenced by SHOM (2).
e) 6-mer microchips for sequencing and sequence analysis. We have now come to the stage of manufacturing microchips containing 4,096 (i.e. all possible) 6mers. The control tests partly described above have shown that these microchips can be effectively used for sequence analysis, mutation diagnostics and detection of sequencing mistakes by conventional gelsequencing methods. We hope that after demonstrating the efficiency of 6mer microchips, we shall be able to get sufficient financial support for production of the microchip with all 65,536 8-mers.
2. Mutation diagnostics and gene polymorphism analysis
The improvements described above have been introduced for reliable ("Yes" or "No" mode) identification of singlebase changes in human genomic DNA. The efficiency of SHOM has been demonstrated for identification of a number of bthalassemia mutations (1,2,8) and HLA allele variations in the human genome.
3. Identification of microorganisms and gene expression monitoring
Bacterial microchips have been manufactured and tested. Their ability for reliable identification of a number of bacterial strains in the sample has been demonstrated (6). The chips containing oligonucleotides complementary to specific regions of 16S ribosomal RNA were hybridized with samples of rRNA, total RNA, DNA and RNA transcripts of PCRamplified genomic rDNA. Similar preliminary experiments demonstrated the efficiency of SHOM for monitoring the gene expression.
III. Development of new approaches based on the SHOM technology
1. Enzymatic modification of nucleic acids on selected elements of the oligonucleotide chip. The gel pads of the oligonucleotide chip are separated from each other by hydrophobic glass surface. It prevents the crosstalking of the chip elements when a drop of solution is applied on specified elements. At the same time, a high porosity of the gel allows diffusion of large proteins into the gel. We have demonstrated that immobilized oligonucleotides can be enzymatically phosphorylated and ligated with contiguously stacked 5-mer after hybridization with DNA. A walking sequencing procedure by stacked pentanucleotides was proposed that is based on enzymatic ligation and phosphorylation on oligonucleotides chips (9).
2. DNA fractionation on oligonucleotide chips. Due to the same properties, the oligonucleotide chips are used for fractionation of DNA after DNA hybridization with some complementary oligonucleotides of the chip. A new procedure for sequencing long DNA pieces was proposed that is based on fractionation of DNA on fractionating oligonucleotide chips followed by sequencing of the isolated DNA by SHOM on sequencing microchips. The procedure allows the investigator to skip cloning and mapping of long DNA pieces (9).
It appears that the major technical problems of SHOM have been in most part solved, and this technology can already be applied for sequence analysis and checking the accuracy of conventional sequencing methods. A number of other applications in the Human Genome Program are within the reach of SHOM, such as mutation screening, gene polymorphism studies, detection of microorganisms, gene expression studies, etc. Application of SHOM for de novo DNA sequencing requires manufacturing of more complicated microchips and improvement of some other, already available methods.
DOE Contract No. W-31-109-Eng-38.
2. Parinov S., Barsky V., Yershov G., Kirillov Eu., Timofeev E., Belgovskiy
A., Mirzabekov A. DNA sequencing by hybridization to microchip octaand
decanucleotides extended by stacked pentanucleotides. Nucl. Acids Res.
1996. Vol. 24. N 15. P. 2998-3004.
4. Prudnikov D., Mirzabekov A. Chemical methods of DNA and RNA fluorescent labelling. Nucl. Acids Res. 1996., in press.
5. Livshits M., Mirzabekov A. Theoretical analysis of the kinetics of DNA hybridization with gelimmobilized oligonucleotides. Biophys. J. 1996. Vol. 71, in print.
6. Guschin D., Mobarry B., Proudnikov D., Stahl D., Rittmann B., Mirzabekov A. Oligonucleotide microchips as genosensors for determinative and environmental studies in microbiology Applied and Environmental Microbiology, in print.
7. Guschin D., Yershov G., Zaslavsky A., Gemmell A., Shick V., Lysov Yu., Mirzabekov A. A simple method of oligonucleotide microchip manufacturing and properties of the microchips. submitted for publication.
8. Drobyshev A., Mologina N., Shik V., Pobedimskaya D., Yershov G., Mirzabekov A. Sequence analysis by hybridization with oligonucleotide microchip: identification of betathalassemia mutations. Gene (in print).
9. Dubiley S., Kirillov Eu., Lysov Yu., Mirzabekov A. DNA fractionation, sequence analysis and ligation of immobilized oligomers on oligonucleotide chips submitted for publication.
10. Timofeev E., Smirnov I.P., Haff L.A., Tishchenko E.I., Mirzabekov A.D., Florentiev V.L.. Methidium Intercalator Inserted into Synthetic Oligonucleotides Tetrahedron Letters 1996, v.37, N47, p.8467.
High-Throughput DNA Sequencing: SAmple SEquencing (SASE) Analysis as a Framework for Identifying Genes and Complete LargeScale Genomic Sequencing
RobertK. Moyzis and JeffreyK. Griffith1
The human chromosome 5 and 16 physical maps (Doggett et al., Nature 377:Suppl:335365, 1995; Grady et al., Genomics 32:9196, 1996) provide the ideal framework for initiating largescale DNA sequencing. These physical mapping studies have shown clearly that gene density in humans will vary greatly. For example, band 16q21, consisting of 8 Mb of DNA, has no genes or trapped exons assigned to it, as yet. In contrast, band 16p13.3 has an extremely high density of coding regions in the DNA examined to date (i.e., multiple genes/cosmid). Given this wide variation in gene density and current sequencing costs, we propose that newly targeted genomic regions should be analyzed first by a "Lewis and Clark" exploratory approach, before committing to full length DNA sequencing. We are using a SAmple SEquencing (SASE) approach to rapidly generate aligned sequences along the chromosome 5 and 16 physical maps. SASE analysis is a method for rapidly "scanning" large genomic regions with minimal cost, identifying, and localizing most genes. Briefly, individual cosmids are partially digested with Sau3A and 3 kb fragments are recloned into doublestrand sequencing vectors. By sequencing both ends of a 1X sampling of these recloned fragments along with end sequences of the cosmid, 70% sequence coverage is achieved with 98% clone coverage. The majority of this clone coverage is ordered by the relationship between the subclone end sequences. These ordered sequences are ideal substrates for directed sequencing strategies (for example, primer walking or transposon sequencing). SASE analysis has been initiated on the 40 Mb short arm of chromosome 16 and the 45 Mb short arm of chromosome 5. We propose to make SASE sequences, along with feature annotation, publicly available through GSDB. Such data are sufficient to allow PCR amplification of the sequenced region from GSDB submissions alone, eliminating the need for extensive clone archiving and distributing, will allow for the effective "democratization" of the genome, allowing numerous laboratories to share and contribute to the growing genome databases.
DOE Grant No. DE-FG03-96ER62298.
One-Step PCR Sequencing
Kenneth W. Porter, J. David Briley, and Barbara Ramsay Shaw
Department of Chemistry; Duke University; Durham, NC 27708
919/6601553, Fax: 1605, firstname.lastname@example.org
A method is described to simultaneously amplify and sequence DNA using a new class of nucleotides containing boron. During the polymerase chain reaction, boronmodified nucleotides, i.e. 2'deoxynucleoside 5'a[Pborano]triphosphates,1,2 are incorporated into the product DNA. The boranophosphate linkages are resistant to nucleases and thus the positions of the boranophosphates can be revealed by exonuclease digestion, thereby generating a set of fragments that defines the DNA sequence. The boranophosphate method offers an alternative to current PCR sequencing methods.
Singlesided primer extension with dideoxynucleotide chain terminators is avoided with the consequence that the sequencing fragments are derived directly from the original PCR products. Boranophosphate sequencing is demonstrated with the Pharmacia and the Applied Biosystems 373A automatic sequencers producing data that is comparable to cycle sequencing.
DOE Grant No. DEFG0297ER62376 and NIH Grant No. HG00782.
Automation of the Front End of DNA Sequencing
Lloyd M. Smith and Richard A. Guilfoyle
University of Wisconsin; Madison, WI 53706
Guilfoyle: 608/265-6138, Fax: 6780
The objective of this project is to continue developing more efficient tools and methods addressing the "frontend" processes of largescale DNA sequencing. Our specific aims are highthroughput purification and mapping of cosmid inserts, controlled fragmentation of random inserts, direct selection vectors for cloning and sequencing, highthroughput M13 clone isolations, and highthroughput template purifications.
An approach to multicosmid purifications was developed using a cellharvester and binding to GF/C glass fiber filterbottom microtiter plates. This method proved inadequate because the yields were low and the DNA was easily fragmented. In the last year we have started examining the use of triplexaffinity capture (TAC) for this purpose as applied to BACs, based on our previous success with TAC purification and restriction mapping of cosmids (1,2).
We initially proposed to control random fragmentation for shotgun cloning using CviJ1 and its methyltransferase. Instead, we are now exploring automating it by scaled-down nebulization and parallel processing.
We have made a vector, M13-102 (3,4, patented)), for facilitating construction and improving quality of M13 shotgun libraries. It allows direct selection of recombinants, dephosphorylation of inserts to reducing chimerics, contains universal primers for fluorescent sequencing, and a triplex sequence for easy TAC purification of linearized RF DNA. We also made a version of this vector, M13-100Z, which expressed the alphapeptide of Bgal. Its utility is in flow cytometry based clone isolation. We continue to develop these vectors for multiple cloning sites, and insert flipping using in closing steps of largescale sequencing projects.
We continue to develop highthroughput clone isolations by flow cytometric cell sorting. M13 or plasmid clones can theoretically be isolated at rates in microtiter wells at rates up to 2 per second using our present FacStarPlus cytometer and collection assembly. Theoretical rates are much higher. This bypasses plating onto solidmedia and any need for plaque/colony picking. We initially tried isolations after microencapsulation of cells in agarose gel microbeads, but with H/W and S/W improvements we can now distinguish positively selected transfected cells from background. Efficiency of sorting is very sensitive to detection efficiency. We continue to investigate different methods of florescence detection for various plasmid and M13 vector systems including fluorogenic substrates for B-gal, fluorescenttagged antibodies to M13 or cell surface proteins, and green fluorescent protein as a reporter.
We have been developing a solidphase filter plate method for M13 template purifications using carboxylated polystyrene beads (Bangs Labs, IN) for automating on the Hamilton 2200. It should process 96 samples in under 30 minutes and deliver 1-2 micrograms per sample for cycle-sequencing. This approach has proven superior to others we have tried with respect to amenability to automation (5,6).
Ancillary projects. We reported a method for direct fluorescence analysis of genetic polymorphisms using oligonucleotide arrays on glass supports (7), which spun off other projects including (a) enhanced discrimination by artificial mismatch hybridization (8), restriction hybridization ordering of shotgun clones, and restriction site indexing-PCR (RSIPCR) (9, patent applied for). RSIPCR is an alternative strategy to extralong PCR which has application in large gap filling (>45kb) differential gene expression analysis, RFLP and EST marker production, endsequencing and others.
Our most significant findings are the following:
1. Improved direct selection M13 cloning vector
2. Rapid restriction mapping of cosmids using triplehelix affinity capture
3. Highthroughput M13 template production using carboxylated beads
4. Sequencing of a cosmid encoding the Drosophila GABA receptor
5. Improved detection of sequencing clones by flowcytometry
6. RSIPCR, a strategy to obtain mapped and sequenceready DNA directly from up to 0.5 kb regions of a complex genome using palindromic class II restriction enzymes; bypasses conventional cloning methodology (see previous section for applications).
DOE Grant No. DE-FG02-91ER61122.
2. Ji, H., Francisco, T., Smith, L.M. and Guilfoyle, R.A. (1996) Genomics 31, 185-192.
3. Guilfoyle,R. and Smith, L.M. (1994) Nucleic Acids Res. 22, 100-107.
4. Chen, D., Johnson, A.F., Severin, J.M., Rank, D.R., Smith, L.M. and Guilfoyle, R.A. (1996) Gene 172, 53-57.
5. Kolner, D.E., Guilfoyle, R.A., and Smith, L. (1994) DNA Sequence 4, 253-257.
6. Johnson, A.F., Wang, R., Ji, H., Chen, D., Guilfoyle, R.A. and Smith, L.M. (1996) Anal Biochem 234, 83-95.
7. Guo, Z., Guilfoyle, R.A., Thiel, A.J., Wang, R. and Smith, L.M. (1994) Nucleic Acids Res, 22, 5456-5465.
8. Guo, Z., Liu, Q., and Smith, L.M. (submitted).
9. Guilfoyle, R.A., Guo, Z., Kroening, D., Leeck, C. and Smith, L.M.(submitted).
High-Speed DNA Sequence Analysis by Matrix-Assisted Laser Desorption Mass Spectrometry
Lloyd M. Smith and Brian Chait1
Department of Chemistry; University of Wisconsin;
Madison, WI 53706
608/263-2594, Fax: /265-6780, email@example.com
1Rockefeller University; New York, NY 10021
Our mass spec research has focused primarily on the possibility of utilizing Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry (MALDIMS) as an alternative method to conventional gel electrophoresis for DNA sequence analysis. In this approach, extension fragments generated by the Sanger sequencing reactions are separated by size and detected in the mass spectrometer in one step.
Our group has shown fragmentation to be a major factor limiting accessible
mass range, sensitivity, and mass resolution in the analysis of DNA by
MALDIMS. This DNA fragmentation was shown to be strongly dependent on both
the MALDI matrix and the nucleic acid sequence employed. Fragmentation
is proposed to follow a pathway in which nucleobase protonation leads to
cleavage of the Nglycosidic bond with base loss, followed by cleavage of
the phosphodiester backbone. Modifications of the deoxyribose sugar ring
by replacing the 2' hydrogen with more electronwithdrawing groups such
as the hydroxyl or fluoro group were shown to stabilize the Nglycosidic
bond, partially or completely blocking fragmentation at the modified nucleosides.
The stabilization provided by these chemical modifications was also shown
to expand the range of matrices useful for nucleic acid analysis, yielding
in some cases greatly improved performance.
Analysis of Oligonucleotide Mixtures by Electrospray IonizationMass Spectrometry
Richard D. Smith, David C. Muddiman, James E. Bruce, and Harold R. Udseth
Environmental Molecular Sciences Laboratory; Pacific Northwest National Laboratory; Richland, WA 99352
509/376-0723, Fax: -5824, firstname.lastname@example.org
This project aims to develop electrospray ionization mass spectrometry (ESIMS) methods for high speed DNA sequencing of oligonucleotide mixtures, that can be integrated into an effective overall sequencing strategy. A second goal is develop mass spectrometric methods that can be effective utilized in post genomic research in broad areas of DNA characterization, such as with polymerase chain reaction to rapidly and accurately identify single base polymorphisms. ESI produces intact molecular ions from DNA fragments of different size and sequence with high efficiency . Our aim is to determine ESI mass spectrometry conditions that are compatible with biological sample preparation to allow efficient ionization of DNA and allowing for the analysis of complex mixtures (e.g., Sanger sequencing ladder). We have developed a novel online microdialysis method at PNNL to remove salts, detergents, and buffers from such biological preparations as PCR and dideoxy sequencing mixtures. This has allowed for rapid and efficient desalting (e.g., of samples having 0.25 M NaCl) allowing ESI mass spectral analysis without the typically problematic Naadducts observed. Oligonucleotide ions are typically produced from ESI with a broad distribution of net charge states for each molecular species, and thus leading to difficulties in analysis of complex mixtures . To make identification of each component in a sequencing mixture possible, the charge states of molecular ions can be reduced using gasphase reactions. The chargestate reduction methods being examined include: (1) reactions with organic acids and bases (in the solution to be electrosprayed and the ESIMS interface or the gas phase); (2) the labeling of the oligonucleotides with a designed functional group for production of molecular ions of very low charge states; and (3) the shielding of potential charge sites on the oligonucleotide phosphate/phosphodiester groups with polyamines (and the subsequent gasphase removal of the neutral amines). In initial studies two methods for charge state reduction of gas phase oligonucleotide negative ions have been tested: (1) the addition of acids and bases to the oligonucleotide solution and (2) the formation of diamine adducts followed by dissociation in the interface region [2,3]. Several methods show promise for charge state reduction and results have been demonstrated for series of smaller oligonucleotides. We have recently demonstrated for the first time that PCR products can be rapidly detected using ESIMS with significant improvements projected [4,5]. Finally, new mass spectrometric methods have been developed to provide the dynamic range expansion necessary for addressing DNA sequencing mixtures . Our overall aim is to provide a foundation for the development of an overall approach to high speed sequencing (including the rapid and precise PCR product characterization) using cost effective highthroughput instrumentation.
DOE Contract No. DE-AC06-76RLO-1830.
 "Charge State Reduction of Oligonucleotide Negative Ions from Electrospray Ionization", X. Cheng, D. C. Gale, H. R. Udseth, and R. D. Smith, Anal. Chem., 67, 586-593 (1995).
 "ChargeState Reduction with Improved Signal Intensity of Oligonucleotides in Electrospray Ionization Mass Spectrometry" D.C. Muddiman, X.Cheng, H.R. Udseth and R.D. Smith J. Am. Soc. Mass Spectrom., 7 (8) 697-706 (1996).
 "Analysis of Doublestranded Polymerase Chain Reaction Products from the Bacillus cereus Group by Electrospray Ionization Fourier Transform Ion Cyclotron Resonance Mass Spectrometry" D.S. Wunschel, K.F. Fox, A. Fox, J.E. Bruce, D.C. Muddiman and R.D. Smith Rapid Commun. in Mass Spectrom., 10, 29-35 (1996).
 "Characterization of PCR Products From Bacilli Using Electrospray Ionization FTICR Mass Spectrometry", D. C. Muddiman, D. S. Wunschel, C. Liu, L. PasaTolic, K. F. Fox, A. Fox, G. A. Anderson, and R. D. Smith, Anal. Chem., 68, 3705-3712 (1996).
 "Colored Noise Waveforms and Quadrupole Excitation for the Dynamic Range Expansion in Fourier Transform Ion Cyclotron Resonance Mass Spectrometry", J. E. Bruce, G. A. Anderson and R. D. Smith, Anal. Chem., 68, 534-541 (1996).
High-Speed Sequencing of Single DNA Molecules in the Gas Phase by FTICRMS
Richard D. Smith, David C. Muddiman, S.A. Hofstadler, and J.E. Bruce
Environmental Molecular Sciences Laboratory; Pacific Northwest National Laboratory; Richland, WA 99352
509/376-0723, Fax: -5824, email@example.com
This project is aimed at the development of a totally new concept for high speed DNA sequencing based upon the analysis of single (i.e., individual) large DNA fragments using electrospray ionization (ESI) combined with Fourier transform ion cyclotron resonance (FTICR) mass spectrometry. In our approach, large single-stranded DNA segments extending to as much as 25 kilobases (and possibly much larger), are transferred to the gas phase using ESI. The multiply-charged molecular ions are trapped in the cell of an FTICR mass spectrometer, where one or more single ion(s) are then selected for analysis in which its masstocharge ratio (m/z) is measured both rapidly and nondestructively. Single ion detection is achievable due to the high charge state of the electrosprayed ions and the unique sensitivity of new FTICR detection methodologies.
Initial efforts under this project have demonstrated the capability for the formation, extended trapping, isolation, and monitoring of sequential reactions of highly charged DNA molecular ions with molecular weights well into the megadalton range . We have shown that large multiplycharged individual ions of both single and double-stranded DNA anions can also be efficiently trapped in an FTICR cell, and their mass-to-charge ratios measured with very high accuracy. Thus, it is feasible to quickly determine the mass of each lost unit as the DNA is subjected to rapid reactive degradation steps. One approach is to develop methods based upon the use of ionmolecule or photochemical processes that can promote a stepwise reactive degradation of gasphase DNA anions. Successful development of one of these approaches could greatly reduce the cost and enhance the speed of DNA sequencing, potentially allowing for sequencing DNA segments of more than 25 kilobase in length, on a time scale of minutes with negligible error rates with the added potential for conducting many such measurements in parallel. Instrumentation optimized for these purposes is currently being introduced and promises to greatly advance the methodology. The techniques being developed promise to lead to a host of new methods for DNA characterization, potentially extending to the size of much larger DNA restriction fragments (>500 kilobases).
DOE Contract No. DE-AC06-76RLO-1830.
 "Charge State Shifting of Individual MultiplyCharged Ions of Bovine Albumin Dimer and Molecular Weight Determination Using an IndividualIon Approach," X. Cheng, R. Bakhtiar, S. Van Orden, and R. D. Smith, Anal.Chem., 66, 2084-2087 (1994).
 "Trapping, Detection, and Mass Measurement of Individual Ions in a Fourier Transform Ion Cyclotron Resonance Mass Spectrometer,: J.E. Bruce,X. Cheng, R. Bakhtiar, Q. Wu, S.A. Hofstadler, G.A. Anderson, and R.D.Smith, J. Amer. Chem. Soc., 116, 7839-7847 (1994).
 "Direct Charge Number and Molecular Weight Determination of Large Individual Ions by Electrospray IonizationFourier Transform Ion Cyclotron Resonance Mass Spectrometry", R. Chen, Q. Wu, D.W. Mitchell, S.A. Hofstadler, A.L. Rockwood, and R. D. Smith, Anal. Chem., 66, 3964-3969 (1994).
 "Trapping, Detection and Mass Determination of Coliphage T4 (108 MDa) Ions by Electrospray Ionization Fourier Transform Ion Cyclotron Resonance Mass Spectrometry" R. Chen, X. Cheng, D.W. Mitchell, S.A. Hofstadler, A.L. Rockwood, Q. Wu, M.G. Sherman and R.D. Smith, Anal. Chem.,67, 1159-1163 (1995).
 "Accurate Molecular Weight Determination of Plasmid DNA Using Mass Spectrometry", X. Cheng, D. G. Camp II, Q. Wu, R. Bakhtiar, D. L. Springer, B.J. Morris, J. E. Bruce, G. A. Anderson, C. G. Edmonds and R. D. Smith, Nucleic Acid Res., 24, 2183-2189 (1996).
Characterization and Modification of DNA Polymerases for Use in DNA Sequencing
Harvard University; Boston, MA 02115-5730
617/432-3128, Fax: -3362, firstname.lastname@example.org
Our studies are directed towards improving the properties of DNA polymerases for use in DNA sequencing. The primary focus is understanding the mechanism by which DNA polymerases discriminate against nucleotide analogs, and the mechanism by which they incorporate nucleotides processively without dissociating from the DNA template.
We are comparing three DNA polymerases that have been used extensively for DNA sequencing; E. coli DNA polymerase I, T7 DNA polymerase, and Taq DNA polymerase. These are related to one another, and this homology has been exploited to construct active site hybrids that have been used to determine the structural basis for differences in their activities. Specifically, the hybrids have been used (1) to determine why E. coli DNA polymerase I and Taq DNA polymerase discriminate strongly against dideoxynucleotides, and (2) to understand how T7 DNA polymerase interacts with its processivity factor, thioredoxin, to confer high processivity.
Based on these studies, we have been able to modify Taq DNA polymerase and E. coli DNA polymerase I to make them incorporate dideoxynucleotides much more efficiently, and to have increased processivity in the presence of thioredoxin. The ability to incorporate dideoxynucleotides efficiently greatly improves the uniformity of band intensities on a DNA sequencing gel, thereby increasing the accuracy of the DNA sequence obtained. In addition, the efficient use of dideoxynucleotides reduces the amount of these analogs required for DNA sequencing, an important issue when using fluorescently modified dideoxy terminators. In an approach that complements these studies, we, in collaboration with Dr. Thomas Ellenberger (Harvard Medical School), are determining the crystal structure of T7 DNA polymerase in a complex with thioredoxin and a primertemplate. Knowledge of this structure will allow the rationale design of specific mutations that will enable DNA polymerases to incorporate other analogs useful for DNA sequencing more efficiently, such as those with fluorescent moieties on the bases.
DOE Grant No. DE-FG02-96ER62251.
Modular Primers for DNA Sequencing
Mugasimangalam Raja,1,2 Dina Sonkin,2 Lev Lvovsky,2 and Levy Ulanovsky1,2
1Center for Mechanistic Biology and Biotechnology; Argonne National Laboratory, Argonne, IL 604394833
Ulanovsky: 630/252-3940; Fax: -3387, email@example.com
2Dept. of Structural Biology; Weizmann Institute of Science; Rehovot 76100, Israel
We are developing molecular approaches to DNA sequencing enabling primer walking without the step of chemical synthesis of oligonucleotide primers between the walks. One such approach involves "modular primers" described earlier, consisting of 5-mers, 6-mers or 7-mers (selected from a presynthesized library), annealing to the template contiguously with each other. Another approach, that we have termed DENS (Differential Extension with Nucleotide Subsets), works by selectively extending a short primer, making it a long one at the intended site only. DENS starts with a limited initial extension of the primer (at 2030 C) in the presence of only 2 out of the 4 possible dNTPs. The primer is extended by 6-9 bases or longer at the intended priming site, which is deliberately selected, (as is the twodNTP set), to maximize the extension length. The subsequent sequencing/termination reaction at 60-65 C then accepts the extended primer at the intended site, but not at alternative sites, where the initial extension (if any) is generally much shorter. DENS allows the use of primers as long as 8mers (degenerate in 2 positions) which prime much more strongly than modular primers involving 5-7 mers and which (unlike the latter) can be used with thermostable polymerases, thus allowing cyclesequencing with dyeterminators for Taq, as well as making doublestranded DNA sequencing more robust.
These technologies are expected to speed up genome sequencing in more than one way:
a) Reduction in redundancy would result from more efficient and rapid closure of even long gaps which are currently avoided at the price of 7- to 9-fold redundancy in shotgun. Instantly available primers would also improve the quality of sequencing. Stretches of sequence that have too low confidence level (high suspected error rate) can be resequenced without synthesizing new oligos and without growing any new subclones.
b) Further down the road, the completion of the automation of the closed cycle of primer walking will be made possible via the elimination of the need to synthesize the walking primers. Combined with the capillary sequencers, the instant availability of the walking primers should reduce the time per walking cycle from 2-3 days now to about 1.5-2.0 hours, an improvement in speed by a factor of 20-50.
c) The closed-end automation would minimize both the labor cost and human errors. As primer walking has minimal, if any, front-end and back-end bottlenecks inherent to shotgun, the cost of sequencing would be essentially that of reagents, 5 cents/base or less.
DOE Grant No. DE-FG02-94ER61831.
Time-of-Flight Mass Spectroscopy of DNA for Rapid Sequence
Peter Williams, ChauWen Chou, David Dogruel, Jennifer Krone, Kathy Lewis, and Randall Nelson
Department of Chemistry and Biochemistry; Arizona State University; Tempe, AZ 85287
602/965-4107, Fax: -2747, firstname.lastname@example.org
There are three potential roles for mass spectrometry relevant to the Human Genome Project:
a) The most obvious role is that on which all groups have been focussing development of an alternative, faster sequence ladder readout method to speed up largescale sequencing. Progress here has been difficult and slow because the mass spectrometry requirements exceed the current capabilities of mass spectrometry even for proteins, and DNA presents significantly more difficulty than proteins. We have shown previously that pulsed laser ablation of DNA from frozen aqueous films has the potential to yield sequence-quality mass spectra, but that ionization in this approach is erratic and uncontrollable. We are focussing on developing ionization methods using ion (or electron) attachment to vaporphase DNA (ablated from ice films) in an electric field-free environment; results of this approach will be reported.
b) Mass spectrometry may not ultimately compete favorably in speed with largescale multiplexing of conventional or nearterm technologies such as capillary electrophoresis. However, as the Genome project nears completion there will be an increasing need for rapid smallscale DNA analysis, where the multiplex advantage will not be so great and mass spectrometry could play a more significant role there. With this in mind we are looking at ways to speed up the overall mass spectrometric analysis, e.g. simple rapid cleanup of sequence mixtures, and at generation of short sequence ladders by exopeptidase digestion.
c) Given the genome data base(s) at the completion of the project, with rapid search capability, a need will arise for comparably rapid generation of search input data to identify often very small quantities of proteins isolated from biochemical investigations. With this in mind we have developed extremely rapid enzyme digestion techniques optimized for mass spectrometric readout, using endopeptidases covalently coupled directly to the mass spectrometer probe tip. The elimination of autolysis and transfer losses allows rapid (few minute) endopeptidase digestion and mass analysis of as little as 1 picomole of protein, leading to an ambiguous database identification. An alternative search procedure uses partial aminoacid sequence information. With the added use of exopeptidases to generate a peptide ladder sequence in the mass spectrum of the endopeptidase digest, on the order of a dozen residues of internal sequence can be generated in a total analysis time of 20 minutes or less, again using only picomoles of sample.
DOE Grant No. DE-FG02-91ER61127.
Development of Instrumentation for DNA Sequencing at a Rate of 40 Million Bases Per Day
Edward S. Yeung, Huan Tsung Chang, Qingbo Li, Xiandan Lu, and Eliza Fung
Ames Laboratory and Department of Chemistry; Iowa State University; Ames, IA 50011
515/294-8062, Fax: -0266, email@example.com
We have developed novel separation, detection, and imaging techniques for realtime monitoring in capillary electrophoresis. These techniques will be used to substantially increase the speed, throughput, reliability, and sensitivity in DNA sequencing applications in highly multiplexed capillary arrays. We estimate that it should be possible to eventually achieve a raw sequencing rate of 40 million bases per day in one instrument based on the standard Sanger protocol. We have reached a stage where an actual sequencing instrument with 100 capillaries can be built to replace the Applied Biosystems 373 or 377 instruments, with a net gain in speed and throughput of 100-fold and 24-fold, respectively.
The substantial increase in sequencing rate is a result of several technical advances in our laboratory. (1) The use of commercial linear polymers for sieving allows replaceable yet reproducible matrices to be prepared that have lower viscosity (thus faster migration rates) compared to polyacrylamide. (2) The use of a charge-injection device camera allows random data acquisition to decrease data storage and data transfer time. (3) The use of distinct excitation wavelengths and cutoff emission filters allows maximum light throughput for efficient excitation and sensitive detection employing the standard 4-dye coding. (4) The use of indexmatching and 1:1 imaging reduces stray light without sacrificing the convenience of oncolumn detection.
Continuing efforts include further optimization of the separation matrix, development of new column conditioning protocols, refinement of the excitation/emission optics, design of a pressure injection system for 96well titer plates, validation of a new 2color basecalling scheme, simplification of software to allow essentially realtime data processing, implementation of voltage programming to shorten the total run times, and scale up of the technology to allow parallel sequencing in up to 1,000 capillaries.
X. Lu and E. S. Yeung, "Optimization of Excitation and Detection Geometry for Multiplexed Capillary Array Electrophoresis of DNA Fragments", Appl. Spectrosc. 49, 605-609 (1995).
Q. Li and E. S. Yeung, "Evaluation of the Potential of a Charge Injection Device for DNA Sequencing by Multiplexed Capillary Electrophoresis", Appl. Spectrosc. 49, 825-833 (1995).
E. N. Fung and E. S. Yeung, "High-Speed DNA Sequencing by Using Mixed Poly(ethyleneoxide) Solutions in Uncoated Capillary Columns," Anal. Chem. 67, 1913-1919 (1995).
Q. Li and E. S. Yeung, "Simple Two-Color Base-Calling Schemes for DNA
Sequencing Based on Standard 4Label Sanger Chemistry", Appl. Spectrosc.
49, 1528-1533 (1995).
Note: The proceedings of the 1997 DOE Human Genome Program Contractor-Grantee Workshop VI, which include updated research abstracts, can be found at: