How to use chi-squared to test for Hardy-Weinberg equilibrium
Updated: Apr 28, 2021
This post demonstrates the use of chi-squared to test for Hardy-Weinberg equilibrium. There is a question on a recent (February 2020) AP Biology practice test that required this calculation. The question is a secure item, so the exact question will not be discussed here. There is a previous post on this blog explaining how to test for evolution using the null hypothesis and chi-squared.
For our examples, we'll use the fictional species featured in many of the evolution simulations. The population demonstrates incomplete dominance for color. There are two alleles; red and blue. Heterozygotes have a purple phenotype.
Chi-squared is a statistical test used to determine if observed data (o) is equivalent to expected data (e). A population is at Hardy-Weinberg equilibrium for a gene if five conditions are met; random mating, no mutation, no gene flow, no natural selection, and large population size. Under these circumstances, the allele frequencies for a population are expected to remain consistent (equilibrium) over time. The H-W equations are expected to estimate genotype and allele frequencies for a population that is at equilibrium. The equations may not accurately predict the frequencies if the population is not at equilibrium (for example, if selection is occurring). However, it is possible that, even with the presence of an evolutionary force, a population may still demonstrate the expected H-W data.
In the case of a trait showing incomplete dominance, the heterozygotes are distinct from the homozygous dominant individuals, which allows the genotype and allele frequencies to be calculated directly (without the H-W equations). This direct calculation can be compared to values based on H-W calculations to determine if the population is at H-W equilibrium.
For the first example, we'll use a simple data set (not generated by a simulation). In this case, there are 50 total individuals in the population; 10 are red, 10 are purple, and 30 are blue. These are the observed values for the chi-squared analysis.
First, we need to find the allele frequencies. The population has a total of 50 individuals, and each individual has two alleles, so there are 100 alleles in the population. Each red (RR) individual has two copies of the R allele and each purple individual (RB) has one copy, so there are 30 red alleles in the population. Based on this, the R allele frequency is 0.3 and the B allele frequency is 0.7 (work shown below).
Next, we have to find the expected frequencies for each genotype, based on H-W equations. The work is shown below.
Once we have the frequencies for each genotype, we can then find the expected numbers by multiplying the frequencies by the total number of individuals (50).
Now that we have both observed and expected values, we can plug them into the chi-squared equation.
The resulting chi-squared value is 13.71. For a p-value of .05 and 1 degree of freedom (df = 1 is generally used for HW even when there are three phenotypes because the expected can be calculated starting with one of the two alleles*), the critical value is 3.84. The chi-squared value for this sample (13.71) is greater than 3.84, so we reject the hypothesis that the observed and expected values are equivalent. This suggests that the population is not at H-W equilibrium. See this post for a more involved discussion of how to use chi-squared results.
For the next examples, we'll use data generated by the population genetics simulation. See this blog post for an explanation of the simulation. In this run, there is no selection against any of the phenotypes and there is no mutation chance. The population size is set to 500.
At the end of the simulation run, the red allele frequency is 0.541 and the blue allele frequency is 0.459. The frequencies for the phenotypes are 0.278 for red, 0.526 for purple, and 0.196 for blue.
We can use the phenotype frequencies and the total population number (500) to find the number of individuals for each phenotype. These numbers are the observed values for the chi-squared calculation.
The next step is to find the expected values. If the population is at H-W equilibrium, the phenotype values calculated from the allele frequencies will be close to the observed phenotype values. The expected frequency of the red individuals based on the H-W equation is the frequency of the red allele (0.541) squared. Then multiply by 500 to get the expected value. The work for the expected values is shown below.
At this point, the chi-squared value can be determined by plugging the observed and expected values into the chi-squared equation. In this case, the chi-squared value is 1.4 (work is shown below).
Our value of 1.4 is smaller than the critical value (3.84), so we cannot reject the hypothesis that the observed and expected values are equivalent. This means that the final population distribution is consistent with the H-W equations.
One more example, again using the population genetics simulation to generate data. This time it is set to test a heterozygote advantage situation. The survival chance for red individuals is 50%, purple (RB) is 100%, and blue (BB) is 0%. The other variables are the same as the previous example.
Again, we have to start by finding the observed numbers based on the phenotype frequencies, and the expected numbers based on the allele frequencies.
Here is the chi-squared calculation for this example:
The chi-squared value is 97.15. This is clearly more than the critical value, and so the hypothesis that the observed and expected are equivalent is rejected. This indicates that the population is not at H-W equilibrium.
*The degrees of freedom can get confusing here...keep in mind that it is unlikely that an AP Biology question will be dependent on differentiating between 1 and 2 degrees of freedom in this context (it is, after all, NOT AP Statistics). Note that the question that prompted this post asked for the chi-squared value only (not the identification of df or critical value).