Genetic polymorphism is the existence of variants with respect to a gene locus (alleles), a chromosome structure(e.g., sizeofcentromeric heterochromatin), a gene product (variants in enzymatic activity or binding affinity), or a phenotype. The term DNA polymorphism refers to a wide range of variations in nucleotide base composition, length of nucleotide repeats, or single nucleotide variants. DNA polymorphisms are important as genetic markers to identify and distinguish alleles at agene locus and to determine their parental origin.
A. Single nucleotide polymorphism (SNP)
These allelic variants differ in a single nucleotide at a specific position. At least one in a thousand DNA bases differs among individuals (1). The detection of SNPs does not require gel electrophoresis. This facilitates large-scale detection. A SNP can be visualized in a Southern blot as a restriction fragment length polymorphism (RFLP) if the difference in the two alleles corresponds to a difference in the recognition site of a restriction enzyme (see Southern blot, p. 62). B. Simple sequence length polymorphism (SSLP) These allelic variants differ in the number of tandemly repeated short nucleotide sequences in noncoding DNA. Short tandem repeats (STRs) consist of units of 1,2,3,or 4 base pairs repeated from 3 to about 10 times. Typical short tandem repeats are CA repeats in the 5′ to 3′ strand, i.e., alternating CG and AT base pairs in the double strand. Each allele is defined by the number of CA repeats, e.g., 3 and 5, as shown (1). These are also called microsatellites. The size differences due to the number of repeats are determined by PCR. Variable number of tandem repeats (VNTR), also called minisatellites, consist of repeat units of 20–200 base pairs (2). C. Detection of SNP by oligonucleotide hybridization analysis Oligonucleotides, short stretches of about 20 nucleotides with a complementary sequence to the single-stranded DNA to be examined, will hybridize completely only if perfectly matched. If there is a difference of even one base, such as due to an SNP, the resulting mismatch can be detected because the DNA hybrid is unstable and gives no signal.
D. Detection of STRs by PCR
Short tandem repeats (STRs) can be detected by the polymerase chain reaction (PCR). The allelic regions of a stretch of DNA are amplified; the resulting DNA fragments of different sizes are subjected to electrophoresis; and their sizes are determined.
E. CEPH families
An important step in gene identification is the analysis of large families by linkage analysis of polymorphic marker loci on a specific chromosomal region near a locus of interest. Large families are of particular value. DNA from such families has been collected by the Centre pour l’Étude du Polymorphisme Humain (CEPH) in Paris, now called the Centre Jean Dausset, after the founder. Immortalized cell lines are stored from each family. A CEPH family consists of four grandparents, the two parents, and eight children. If four alleles are present at a given locus they are designated A, B, C, and D. Starting with the grandparents, the inheritance of each allele through the parents to the grandchildren can be traced (shown here as a schematic pattern in a Southern blot). Of the four grandparents shown, three are heterozygous (AB, CD, BC) and one is homozygous (CC). Since the parents are heterozygous for different alleles (AD the father and BC the mother), all eight children are heterozygous (BD, AB, AC, or CD).