Essential techniques of molecular genetics
Reminder: an excellent (though slightly out of date) web site to visit for relevant information is The US Department of Energy's Primer of Molecular Genetics. Another really well thought out and presented site is the MIT Biology Hypertextbook, and in particular, the molecular genetics chapter.
Electrophoresis
sieving properties of agarose and polyacrylamide Matrix optimum size range of the molecules to be separated
Agarose 0.5 - 20 kb
Polyacrylamide 10 - 1000 bp (DNA),
0 - 100 kd (proteins)
Electrophoresis is one of the most convenient methods to separate molecules which differ in any combination of size or charge. There are many forms of electrophoresis in use but all operate by putting a solution of the molecules to be separated into an electric field. Depending on their charge the molecules will be attracted towards one or other of the electrodes. In order to prevent convection effects from disrupting the separation the solution is supported by an immovable matrix. This may do little more than prevent convection currents (which is the case with starch gel electrophoresis) or it may, by presenting a barrier to the easy passage of the moving molecules, play an active role in sieving the molecules and thus contribute to their separation on the basis of size. In all electrophoresis experiments electrodes are positioned at opposite ends of the separation matrix which may be in the form of a tube or of a thick or thin slab. In recent years the most common forms of electrophoresis have used either agarose or polyacrylamide matrices. There are many different commercial and home made apparatuses in use. For unimportant technical reasons, acrylamide gels are usually held vertically and the samples are loaded in wells formed in the top edge whereas agarose gels are usually held horizontally and the samples are inserted into wells in their top face.
You will find many variations on this basic theme in the course of this lecture series.
Cloning
This is the general term given to the isolation of a specific fragment of target (e.g. human) DNA by propagating that fragment in a microorganism (usually a bacterium but sometimes a yeast). Using standard microbiological techniques, a single microorganism can be isolated and grown in culture carrying within its genome one small fragment of human DNA. The human DNA can readily be purified from the bacterium in gram quantities if necessary.
Almost all clones come from libraries. Essentially these may be only one or other of cDNA or genomic and it is vital to remember the distinction between them.
cDNA clones
Remember that cDNA means complementary DNA. It has been copied directly from mRNA. Because that RNA will have been processed in its journey from gene to cytoplasm (to test tube) it will not contain introns. Nor will it contain any sequences from upstream or downstream of the genes. Every cDNA library is made from a tissue source. It will only contain a representation of the sequences transcribed in that tissue. e.g. do not expect to isolate beta globin cDNA from a skin fibroblast cDNA library. Even though cDNA is made from mRNA it still contains repetitive sequences, 5%-10% of human transcripts contain an interspersed repetitive element such as an Alu repeat in the 3´UTR.
Most libraries contain cDNA clones in the same relative abundance as were their corresponding mRNAs in the tissue of origin. e.g. liver cDNA libraries are a rich source of serum albumin cDNA clones. Some libraries have been normalised, i.e. an attempt has been made to equalise the frequencies with which all types of cDNA are found within the library (if they were to be found at all in the mRNA source).
cDNA clones may be made for various purposes
Expression studies
Constructed in specially designed vectors containing promoters which may be regulatable.
Gene structure and function studies
Full length (or near full length) cDNAs which can be sequenced for comparison with the genomic sequence.
"tagging" genes
ESTs are short fragments of cDNA.
From the point of view of mapping and sequencing the genome, the latter two classes of clone are most relevant.
Genomic clones
Genomic clones are designed to include as much genomic DNA as possible in order to minimise the number of clones required to be isolated. Over the years vector systems have evolved. The first generation of genomic libraries were built in vectors based on lambda, later libraries used plasmid-phage hybrid vectors such as cosmids. Recently yeast artificial chromosomes (YACs) have been popular but are now gradually being replaced by bacterial systems based on either the phage P1 or the F element origins of replication (PACs and BACs).
Cloning Vectors Vector Maximum Insert size Approx. No. of clones required in library Advantages? Disadvantages?
lambda 20 kb 5 x 105 easy to construct libraries,
relatively stable inserts many clones required
hard to prepare DNA from clones
cosmid 45 kb 2 x 105 easy to construct libraries
easy to prepare DNA from clones not always stable
YAC 1 Mb 104 few clones required very prone to rearrangement,
difficult to construct
PAC ~120 kb 105 fewer clones required than for cosmids
stable single copy origin of replication therefore harder to prepare DNA
BAC > 500 kb 5 x 104 few clones required
very stable single copy origin of replication therefore harder to prepare DNA
Another innovation has been the use of gridded and chromosome specific libraries. In a gridded library every clone has its own unique address where it is to be found in a well in a microtitre tray. This has huge advantages over ungridded, amplified libraries for our ability to exchange information about clones. Chromosome specific libraries have been made by flow sorting individual metaphase chromosomes using a machine originally designed to sort different populations of cells.
Southern and Northern Blotting
Before PCR and cheap fast sequencing changed our view of the universe that is genetics, the Southern Blot was a universal workhorse. There was not an experiment in molecular genetics which did not at some stage employ a Southern Blot. It is still a useful tool and you need to know about it so that you can interpret historical data.
Sickled and normal erythrocytes
Named after its inventor, Prof. Ed. Southern, the blot is a fast way of analysing a small number of DNA fragments which may be present in a complex mixture. For instance, the sickle cell mutation is a point mutation in the beta globin gene which changes a single amino acid in the beta globin polypeptide. In homozygotes, in conditions of low oxygen concentration the mutant globin polymerises forcing the cell into a bizzarre shape.
Suppose that we wish to ask whether the sickle cell mutation is present in an individual and the only material which we have available is a DNA sample. We could employ a Southern Blot:
DNA is first digested with a restriction endonuclease and then separated according to the size of DNA fragments by electrophoresis through an agarose gel. Small fragments are able to migrate more quickly through the gel than are long fragments. Most restriction enzymes will digest human DNA into about a million fragments. We will want to compare only those which contain the human beta globin gene. To do this the DNA is first denatured, (made single stranded), by treatment with NaOH. It is then transfered to the surface of a nylon filter by blotting as in part a) of the figure.
A probe is made. Usually this means incorporating a label into a fragment of DNA from the target sequence (beta globin in this case). The label may be either a radioactive isotope (often 32P) or a chemical hapten such as biotin attached via a long side chain.
The single stranded probe is annealed (hybridised) to its target sequence which is then stringently washed to remove any probe which has bound other than by perfect fornation of a long run of complementary hydrogen bonds. See section b) of the figure.
Finally the position of the annealed probe is determined, (by autoradiography in the case of a radioactive probe), see c). In this example, sample 1 seems to be homozygous for the slower moving allele, sample 3 is homozygous for the faster allele and sample 2 is a heterozygote.
In the case of the sickle cell mutation, the single base change involved, as well as causing a missense mutation in the beta globin gene, also causes the disappearance of a restriction site for the enzyme MstII. As a consequence the size of the restriction fragment containing the 5´ end of the gene is altered from 1.15kb to 1.35kb. See the figure below where MstII sites are shown as arrows.
The Polymerase Chain Reaction
This invention has revolutionised molecular genetics by doing away with the need to clone DNA in many circumstances where it used to be necessary. It is so poweful that it has made it possible to produce microgram amounts (that's a lot!) of DNA starting from just a single molecule. This has applications in forensic science, in archeology (Neandertal mitochondrial DNA amplified from ancient bones by PCR was recently sequenced), and in medicine where, for example, it can enable antenatal DNA tests to be performed in just a few hours work and large populations can be screened for particular mutations very quickly and cheaply.
Normally about 30 cycles of amplification are carried out. If the efficiency of the reaction were 100% that would represent approximately a 109 fold amplification. In fact, although early rounds are highly efficient the later rounds of amplification are much less so because of depletion of nucleotide triphosphates and primers in the reaction and gradual destruction of the DNA polymerase.
DNA sequencing
At the heart of the so called "new genetics" is our ability to sequence DNA rapidly and cheaply. With knowledge of sequence comes knowledge of gene structure and very often a beginning of understanding of gene function. The simplest method of sequencing DNA was invented by Dr. Fred Sanger for which he was awarded his second Nobel Prize. In the UK our major national Human Genome Project DNA sequencing centre is named after him, the Sanger Centre in Cambridgeshire. You can visit its home page here if you wish.
The Sanger method is also known as the "dideoxy chain termination method".
The DNA to be sequenced must be obtained in pure form, either by cloning or by PCR.
The DNA strands must be separated (usually by cloning into a vector, M13 bacteriophage, which has a single stranded phase to its life cycle but sometimes by some other cunning technique or even just by heating).
A primer is allowed to anneal to known sequence at one end of the target sequence. The known sequence may be part of the bacteriophage DNA flanking the cloned DNA.
Deoxynucleotide triphosphates (dATP, dCTP, dGTP and dTTP) and DNA polymerase are added and the primer sequence is thus elongated along the target "template" DNA.
Also included in the mixture are small amounts of the four dideoxynucleotide triphosphates (ddNTPs) each tagged with a different coloured fluorescent dye. When a molecule of ddNTP is incorporated into an elongating DNA strand it prevents further chain elongation because it lacks the necessary terminal 3´ OH group.
The products of synthesis are then separated according to size by electrophoresis on a polyacrylamide gel. This figure shows an earlier method in which the four bases were read in separate reactions and a radiocative "label" was incorporated into the newly synthesised DNA. But the principle is the same.
As the products pass a point on the gel one by one, the coloured dye tags can be read by a laser scanner and computer.
STSs and ESTs
PCR and sequencing together have made possible the creation of useful landmarks in the genome. These are several thousand short fragments of known DNA sequence whose presence in any DNA sample can be tested by PCR. They are known as STSs (Sequence Tagged Sites). If an STS is part of a transcribed sequence it is known as an EST (Expressed Sequence Tag). Hundreds of thousands of ESTs have been created and can be accessed by computer. One major attempt to classify them all is known as Unigene
Recommended reading
The topics include:
gene cloning
Southern and Northern blotting
The Polymerase Chain Reaction
DNA sequencing
Reading:
Mange and Mange Chapters 6 and 8 (good chapters which cover the essentials and more)
Lewis is weak on methodology but strong on applications. Partial information is in Chapters 8 and 17 (pp316 - 322 (the rest of the chapter is interesting but not directly relevant)), you could also read chapter 21
Mueller and Young Chapter 4
Jorde et al. Chapter 3 (pp 42 - 56)
Thompson McInnes and Willard Chapter 5 (an excellent account)
Connor and Ferguson-Smith Chapter 3 (too little (and too advanced?) - but have a look if its all that's available)
--------------------------------------------------------------------------------
Self Assessment Questions
Dystrophin is the protein product of a gene called DMD on the human X chromosome. Mutations in this gene cause the muscle wasting disease Duchenne muscular dystrophy. If, as part of a study to compare human and kangaroo muscles, you wanted to compare the sequences of the human and kangaroo dystrophins, which of the following resources might provide useful material and how?
A cDNA library made from kangaroo brain
A cDNA library made from the muscle of a patient with Duchenne muscular dystrophy.
A cDNA library made from normal human muscle.
A human genomic library in a BAC vector made from leukocyte DNA
An arrayed genomic library in a YAC vector made from kangaroo liver DNA
RNA purified from kangaroo muscle.
Answers
--------------------------------------------------------------------------------
Back to the top
Back to the lecture list
Next lecture--------------------------------------------------------------------------------
Reminder: an excellent (though slightly out of date) web site to visit for relevant information is The US Department of Energy's Primer of Molecular Genetics. Another really well thought out and presented site is the MIT Biology Hypertextbook, and in particular, the molecular genetics chapter.
Electrophoresis
sieving properties of agarose and polyacrylamide Matrix optimum size range of the molecules to be separated
Agarose 0.5 - 20 kb
Polyacrylamide 10 - 1000 bp (DNA),
0 - 100 kd (proteins)
Electrophoresis is one of the most convenient methods to separate molecules which differ in any combination of size or charge. There are many forms of electrophoresis in use but all operate by putting a solution of the molecules to be separated into an electric field. Depending on their charge the molecules will be attracted towards one or other of the electrodes. In order to prevent convection effects from disrupting the separation the solution is supported by an immovable matrix. This may do little more than prevent convection currents (which is the case with starch gel electrophoresis) or it may, by presenting a barrier to the easy passage of the moving molecules, play an active role in sieving the molecules and thus contribute to their separation on the basis of size. In all electrophoresis experiments electrodes are positioned at opposite ends of the separation matrix which may be in the form of a tube or of a thick or thin slab. In recent years the most common forms of electrophoresis have used either agarose or polyacrylamide matrices. There are many different commercial and home made apparatuses in use. For unimportant technical reasons, acrylamide gels are usually held vertically and the samples are loaded in wells formed in the top edge whereas agarose gels are usually held horizontally and the samples are inserted into wells in their top face.
You will find many variations on this basic theme in the course of this lecture series.
Cloning
This is the general term given to the isolation of a specific fragment of target (e.g. human) DNA by propagating that fragment in a microorganism (usually a bacterium but sometimes a yeast). Using standard microbiological techniques, a single microorganism can be isolated and grown in culture carrying within its genome one small fragment of human DNA. The human DNA can readily be purified from the bacterium in gram quantities if necessary.
Almost all clones come from libraries. Essentially these may be only one or other of cDNA or genomic and it is vital to remember the distinction between them.
cDNA clones
Remember that cDNA means complementary DNA. It has been copied directly from mRNA. Because that RNA will have been processed in its journey from gene to cytoplasm (to test tube) it will not contain introns. Nor will it contain any sequences from upstream or downstream of the genes. Every cDNA library is made from a tissue source. It will only contain a representation of the sequences transcribed in that tissue. e.g. do not expect to isolate beta globin cDNA from a skin fibroblast cDNA library. Even though cDNA is made from mRNA it still contains repetitive sequences, 5%-10% of human transcripts contain an interspersed repetitive element such as an Alu repeat in the 3´UTR.
Most libraries contain cDNA clones in the same relative abundance as were their corresponding mRNAs in the tissue of origin. e.g. liver cDNA libraries are a rich source of serum albumin cDNA clones. Some libraries have been normalised, i.e. an attempt has been made to equalise the frequencies with which all types of cDNA are found within the library (if they were to be found at all in the mRNA source).
cDNA clones may be made for various purposes
Expression studies
Constructed in specially designed vectors containing promoters which may be regulatable.
Gene structure and function studies
Full length (or near full length) cDNAs which can be sequenced for comparison with the genomic sequence.
"tagging" genes
ESTs are short fragments of cDNA.
From the point of view of mapping and sequencing the genome, the latter two classes of clone are most relevant.
Genomic clones
Genomic clones are designed to include as much genomic DNA as possible in order to minimise the number of clones required to be isolated. Over the years vector systems have evolved. The first generation of genomic libraries were built in vectors based on lambda, later libraries used plasmid-phage hybrid vectors such as cosmids. Recently yeast artificial chromosomes (YACs) have been popular but are now gradually being replaced by bacterial systems based on either the phage P1 or the F element origins of replication (PACs and BACs).
Cloning Vectors Vector Maximum Insert size Approx. No. of clones required in library Advantages? Disadvantages?
lambda 20 kb 5 x 105 easy to construct libraries,
relatively stable inserts many clones required
hard to prepare DNA from clones
cosmid 45 kb 2 x 105 easy to construct libraries
easy to prepare DNA from clones not always stable
YAC 1 Mb 104 few clones required very prone to rearrangement,
difficult to construct
PAC ~120 kb 105 fewer clones required than for cosmids
stable single copy origin of replication therefore harder to prepare DNA
BAC > 500 kb 5 x 104 few clones required
very stable single copy origin of replication therefore harder to prepare DNA
Another innovation has been the use of gridded and chromosome specific libraries. In a gridded library every clone has its own unique address where it is to be found in a well in a microtitre tray. This has huge advantages over ungridded, amplified libraries for our ability to exchange information about clones. Chromosome specific libraries have been made by flow sorting individual metaphase chromosomes using a machine originally designed to sort different populations of cells.
Southern and Northern Blotting
Before PCR and cheap fast sequencing changed our view of the universe that is genetics, the Southern Blot was a universal workhorse. There was not an experiment in molecular genetics which did not at some stage employ a Southern Blot. It is still a useful tool and you need to know about it so that you can interpret historical data.
Sickled and normal erythrocytes
Named after its inventor, Prof. Ed. Southern, the blot is a fast way of analysing a small number of DNA fragments which may be present in a complex mixture. For instance, the sickle cell mutation is a point mutation in the beta globin gene which changes a single amino acid in the beta globin polypeptide. In homozygotes, in conditions of low oxygen concentration the mutant globin polymerises forcing the cell into a bizzarre shape.
Suppose that we wish to ask whether the sickle cell mutation is present in an individual and the only material which we have available is a DNA sample. We could employ a Southern Blot:
DNA is first digested with a restriction endonuclease and then separated according to the size of DNA fragments by electrophoresis through an agarose gel. Small fragments are able to migrate more quickly through the gel than are long fragments. Most restriction enzymes will digest human DNA into about a million fragments. We will want to compare only those which contain the human beta globin gene. To do this the DNA is first denatured, (made single stranded), by treatment with NaOH. It is then transfered to the surface of a nylon filter by blotting as in part a) of the figure.
A probe is made. Usually this means incorporating a label into a fragment of DNA from the target sequence (beta globin in this case). The label may be either a radioactive isotope (often 32P) or a chemical hapten such as biotin attached via a long side chain.
The single stranded probe is annealed (hybridised) to its target sequence which is then stringently washed to remove any probe which has bound other than by perfect fornation of a long run of complementary hydrogen bonds. See section b) of the figure.
Finally the position of the annealed probe is determined, (by autoradiography in the case of a radioactive probe), see c). In this example, sample 1 seems to be homozygous for the slower moving allele, sample 3 is homozygous for the faster allele and sample 2 is a heterozygote.
In the case of the sickle cell mutation, the single base change involved, as well as causing a missense mutation in the beta globin gene, also causes the disappearance of a restriction site for the enzyme MstII. As a consequence the size of the restriction fragment containing the 5´ end of the gene is altered from 1.15kb to 1.35kb. See the figure below where MstII sites are shown as arrows.
The Polymerase Chain Reaction
This invention has revolutionised molecular genetics by doing away with the need to clone DNA in many circumstances where it used to be necessary. It is so poweful that it has made it possible to produce microgram amounts (that's a lot!) of DNA starting from just a single molecule. This has applications in forensic science, in archeology (Neandertal mitochondrial DNA amplified from ancient bones by PCR was recently sequenced), and in medicine where, for example, it can enable antenatal DNA tests to be performed in just a few hours work and large populations can be screened for particular mutations very quickly and cheaply.
Normally about 30 cycles of amplification are carried out. If the efficiency of the reaction were 100% that would represent approximately a 109 fold amplification. In fact, although early rounds are highly efficient the later rounds of amplification are much less so because of depletion of nucleotide triphosphates and primers in the reaction and gradual destruction of the DNA polymerase.
DNA sequencing
At the heart of the so called "new genetics" is our ability to sequence DNA rapidly and cheaply. With knowledge of sequence comes knowledge of gene structure and very often a beginning of understanding of gene function. The simplest method of sequencing DNA was invented by Dr. Fred Sanger for which he was awarded his second Nobel Prize. In the UK our major national Human Genome Project DNA sequencing centre is named after him, the Sanger Centre in Cambridgeshire. You can visit its home page here if you wish.
The Sanger method is also known as the "dideoxy chain termination method".
The DNA to be sequenced must be obtained in pure form, either by cloning or by PCR.
The DNA strands must be separated (usually by cloning into a vector, M13 bacteriophage, which has a single stranded phase to its life cycle but sometimes by some other cunning technique or even just by heating).
A primer is allowed to anneal to known sequence at one end of the target sequence. The known sequence may be part of the bacteriophage DNA flanking the cloned DNA.
Deoxynucleotide triphosphates (dATP, dCTP, dGTP and dTTP) and DNA polymerase are added and the primer sequence is thus elongated along the target "template" DNA.
Also included in the mixture are small amounts of the four dideoxynucleotide triphosphates (ddNTPs) each tagged with a different coloured fluorescent dye. When a molecule of ddNTP is incorporated into an elongating DNA strand it prevents further chain elongation because it lacks the necessary terminal 3´ OH group.
The products of synthesis are then separated according to size by electrophoresis on a polyacrylamide gel. This figure shows an earlier method in which the four bases were read in separate reactions and a radiocative "label" was incorporated into the newly synthesised DNA. But the principle is the same.
As the products pass a point on the gel one by one, the coloured dye tags can be read by a laser scanner and computer.
STSs and ESTs
PCR and sequencing together have made possible the creation of useful landmarks in the genome. These are several thousand short fragments of known DNA sequence whose presence in any DNA sample can be tested by PCR. They are known as STSs (Sequence Tagged Sites). If an STS is part of a transcribed sequence it is known as an EST (Expressed Sequence Tag). Hundreds of thousands of ESTs have been created and can be accessed by computer. One major attempt to classify them all is known as Unigene
Recommended reading
The topics include:
gene cloning
Southern and Northern blotting
The Polymerase Chain Reaction
DNA sequencing
Reading:
Mange and Mange Chapters 6 and 8 (good chapters which cover the essentials and more)
Lewis is weak on methodology but strong on applications. Partial information is in Chapters 8 and 17 (pp316 - 322 (the rest of the chapter is interesting but not directly relevant)), you could also read chapter 21
Mueller and Young Chapter 4
Jorde et al. Chapter 3 (pp 42 - 56)
Thompson McInnes and Willard Chapter 5 (an excellent account)
Connor and Ferguson-Smith Chapter 3 (too little (and too advanced?) - but have a look if its all that's available)
--------------------------------------------------------------------------------
Self Assessment Questions
Dystrophin is the protein product of a gene called DMD on the human X chromosome. Mutations in this gene cause the muscle wasting disease Duchenne muscular dystrophy. If, as part of a study to compare human and kangaroo muscles, you wanted to compare the sequences of the human and kangaroo dystrophins, which of the following resources might provide useful material and how?
A cDNA library made from kangaroo brain
A cDNA library made from the muscle of a patient with Duchenne muscular dystrophy.
A cDNA library made from normal human muscle.
A human genomic library in a BAC vector made from leukocyte DNA
An arrayed genomic library in a YAC vector made from kangaroo liver DNA
RNA purified from kangaroo muscle.
Answers
--------------------------------------------------------------------------------
Back to the top
Back to the lecture list
Next lecture--------------------------------------------------------------------------------