How to use Chi-square in Biology classes

Jolene Pappas
Jul 21
9 min read

Updated: Jul 22

This post is written from an AP Biology viewpoint, but could apply to most introductory biology instruction, where students do not have prior knowledge of statistical tests.

Click here to skip to a summary of what exactly students need for AP Biology.

AP Biology navigates a delicate balance with statistical tests. The course is designed to help students understand the role of data analysis in science. However, it's not a statistics class, and there is no requirement for students to have taken a statistics class. AP Biology teachers and students can only go so in-depth with statistics and still be able to cover the wide range of biological concepts we are responsible for.

To simplify things a bit, only one statistical test, chi-square, is used in AP Biology. Chi-square is used to evaluate whether an observed set of data matches an expected set of data. The null hypothesis is that the observed and expected are not significantly different. There are variations in the approaches to chi-square, including the goodness-of-fit test and a test of independence (homogeneity and association are additional terms that come up in a more advanced/detailed statistical setting).

Chi-square tests

Goodness-of-fit is used when the expected proportions are determined separately from experimental data, based on a prediction. In a biology setting, this is frequently utilized in heredity problems. For example, if two pea plant traits result from two genes that are on separate chromosomes, we predict a 9:3:3:1 ratio of phenotypes in the F2 offspring.

Genetic diagram showing pea traits: yellow/round, green/wrinkled, and combinations. Includes P, F1, and F2 generation probabilities.

The test of independence is used to determine if groups or variables are related. In this type of test, the expected values are calculated from collected data, rather than based on predictions. The expected calculations are based on determining combined probabilities based on the recorded data.

In terms of the math involved, the practical difference between these approaches is how the expected values are determined. The same equation is used to calculate the chi-square value, and the same table is used for interpreting the results. How the data was collected and the purpose of the analysis both play a role in determining which type of test to use, but this decision is beyond the requirements for most introductory Biology courses. Selecting the correct statistical test for a situation is generally for people who have completed a statistics course.

Expected values

How we find the expected values differs depending on whether we are performing a goodness-of-fit test or a test of independence.

Goodness-of-fit

For a goodness-of-fit test, we start with a prediction. Let's say we predict that a sample with four categories is evenly divided. This means that we expect each category to be 25%. We multiply the total of the observed data by the expected frequency for each category, in this case 25%.

Table compares observed (8, 4, 6, 12) and expected values (7.5) for four categories. Equation: 30 x 25% = 7.5. — Sample observed and expected data for a goodness-of-fit test

Test of Independence

In a test of independence, the expected values are found using probability rules. Short version of the probability rules: OR means add, AND means multiply.

In this case, we start with the observed data.

Table with categories I and II, showing data in columns for Data 1, Data 2, and their totals. — Sample data for a test of independence

To find the expected value for Category I, Data 1, we do the following:

Find the frequency of individuals in Category I (either Data 1 OR Data 2): (8 + 4) / 30 = 0.4
Find the frequency of individuals in Data 1 (Category I OR Category II): (8 + 6) / 30 = 0.47
Find the frequency of Category I AND Data 1: 0.4 * 0.47 = 0.188
Use the frequency to calculate the expected number (multiply by the total): 0.188 * 30 = 5.6

As an alternative to understanding what's going on with frequencies and probability rules, use the following chart as a guide. In summary, each expected value is the row total multiplied by the column total and then divided by the overall total.

Grid with A, B, C, D cells. Below, four math fractions featuring these letters and equations. — Expected calculation guide

Here are each of the expected values from the sample.

Table with Category I and II data, showing observed and expected values. Calculations in handwriting are on the right. — Sample observed and expected values for a test of independence

Calculating Chi-Square

The chi-square equation is the same regardless of how the expected values were determined.

Chi-squared formula on black background: χ² = Σ((o-e)²/e). — chi-square equation

For the chi-square calculation, subtract the expected from the observed, square that, and then divide by the expected for each of the categories. Then add those values for the final chi-square value.

Table shows observed and expected frequencies for four categories. Chi-square formula and calculation equal 4.66. — Chi-square calculation example

Degrees of freedom

Degrees of freedom are the number of values in a data set that can change. This number is necessary to interpret the results of the chi-square test. The AP Biology Equations and Formulas sheet states, "Degrees of freedom are equal to the number of distinct possible outcomes minus one." For example, consider the following data set: 8, 4, 6, 12.

If the '8' is unknown (x, 4, 6, 12), but we know the average is 7.5, we can figure out what that unknown value is.

7.5 x 4 = 30

4 + 6 + 12 = 22

30 - 22 = 8

This shows that one of the values in this data set is not "free" to vary. But if more than one of the values was unknown, we could not identify them. For this data set, three of the values can vary. This is why the "possible outcomes minus one" definition found on the equation sheet works. The data set in the example has four values, so df = 4 - 1 = 3.

Table with categories I-IV and data A-D in orange. Handwritten text reads "One column of data, df = 4 - 1 = 3" on a black background. — Degrees of freedom for a simple data set

In AP Biology, this is the situation that students are most likely to encounter if they are required to find the degrees of freedom. In heredity problems, degrees of freedom are calculated by subtracting one from the number of different possible phenotypes. For example, a monohybrid cross that has two possible phenotypic outcomes has one degree of freedom. A dihybrid cross with four possible phenotypes has 3 degrees of freedom.

Two tables list data on round, wrinkled, yellow, and green categories. Handwritten notes: "df=4-1=3" and "df=2-1=1". — Degrees of freedom, heredity examples

Things get a little more complicated when there are multiple values per category. In this case, when the data is arranged in a chart, df = (rows - 1)(columns - 1). Degrees of freedom are based on the data listed across rows as well as down each column.

Two tables compare data: top has 2 rows, bottom has 5. Annotations show calculations for degrees of freedom, highlighting differences. — Degrees of freedom examples

In a 2x2 table, if we know the totals for each row and column, and if we have at least 1 value, we can find any other value, so there is only one degree of freedom.

Table with data for two categories. Handwritten equations and an arrow highlight a degree of freedom. — Degrees of freedom demonstration

In these situations, it is easy for students to misinterpret the degrees of freedom, often calculating too many. In Part 2 of Investigation 7 from AP Biology Investigative Labs, there appear to be 4 categories: Mitosis and Interphase for the control and Mitosis and Interphase for the treated. Without additional instruction, this could be interpreted as 3 degrees of freedom.

However, the lab states that "The degrees of freedom (df) equals the number of treatment groups minus one multiplied by the number of phase groups minus one (T133, S90)." The lab does not explain to students why this is different than what we see on the formula sheet, but it does provide the correct outcome of 1 degree of freedom (df = (2 − 1) (2 − 1) = 1).

A table shows cell phases: Interphase and Mitosis with control and experimental groups. Notes: "2 rows," "2 columns," and "df=(2-1)(2-1)=1." — Mitosis lab df calculation

Hardy-Weinberg

Chi-square can be used to test if a population is at Hardy-Weinberg equilibrium (assume one gene, two alleles). This is done by using the frequency of an allele to calculate the expected frequencies, and from those, the expected values. See this post for a full breakdown of this process.

Three tables show genotype data: "Observed," "Count," and "Expected" with values for alleles AA, Aa, and aa, connected by arrows. Demonstrates an overview of finding the expected values to use for a chi-square testing for Hardy-Weinberg Equilibrium. — Testing for Hardy-Weinberg equilibrium

In these situations, there may be three phenotypes (if the heterozygous phenotype is different than both homozygous phenotypes). However, because we can do all the expected calculations based only on one of the two alleles, there is only 1 degree of freedom.

Table shows alleles A (125) and a (75) counts. Handwritten notes: "2 categories," "df = 2-1 = 1." — degrees of freedom for a H-W equilibrium test

Interpreting Results

The critical value table helps us to interpret the chi-squared results. To find the critical value, we need to know the degrees of freedom and the p-value. The p-value is the probability that the results could be due to chance. Introductory biology situations typically use a p-value of 0.05, and in most situations the p-value will be specified by the lab/question.

Chi-square table with degrees of freedom and p-values. The value 3.84 is circled in red under degree 1 at p-value 0.05. — Chi-Square Table

To interpret the results, compare the calculated chi-square value to the critical value. If the chi-square value is greater than the critical value, the hypothesis is rejected. This means that the observed values do not match the expected. If the chi-square value is not greater than the critical value, then we fail to reject the hypothesis. Failing to reject the hypothesis means that there probably is not a difference between the observed and expected.

Examples

Heredity Example

If seed color (yellow is dominant to green) and seed shape (round is dominant to wrinkled) are unlinked and we cross two plants heterozygous for both traits, we expect:

9/16 of the offspring to be round and yellow,
3/16 to be wrinkled and yellow,
3/16 to be round and green, and
1/16 to be wrinkled and green.

The data below is the observed data.

Chart with orange header reads "Data." Rows list attributes: "Round/Yellow," "Wrinkled/Yellow," "Round/Green," "Wrinkled/Green" with values. — observed data

Because we have predicted ratios, we will use the goodness-of-fit approach here. The total number of plants produced in this example is 113. To find the expected values, we multiply the total by the expected proportions.

Round/Yellow: 113 * 9/16 = 63.56
Wrinkled/Yellow: 113 * 3/16 = 21.19
Round/Green: 113 * 3/16 = 21.19
Wrinkled/Green: 113 * 1/16 = 7.06

For the chi-square calculation, subtract the expected from the observed, square that, and then divide by the expected for each of the categories. For this example, the chi-square value is 1.65.

Chi-square calculation on black background showing formula and results: 0.199 + 0.067 + 0.155 + 1.224 = 1.65.

The degrees of freedom are 3, so at p = 0.05, the critical value is 7.82. 1.65 is less than 7.82, so we can't reject the hypothesis that the expected values are the same as the observed values. Conclusion: The two traits are not linked.

Mitosis Example

In this example, we are examining the effect of a treatment on cell division by counting the number of dividing and non-dividing cells in a control and treated sample. Instead of testing a prediction, we are examining whether the treatment affects the chances that a cell is dividing.

The table below shows the observed data for both the control and experimental (treated) groups:

Table comparing cell cycle phases: Control vs. Experimental. Interphase shows 86 (Control) and 96 (Experimental); Mitosis shows 14 and 4. — Observed Data, 200 total cells

The following chart has both the observed and expected values for this example. We find the expected values using the method described here.

Table comparing observed and expected values for control and experimental interphase and mitosis — Observed and expected data

For this example, the chi-square value is 6.11.

Chi-squared equation on a black background. Calculation shows steps and results, with a final sum of 6.11 in white text.

For this example, there is 1 degree of freedom, so at p = .05, the critical value is 3.84. 6.11 is greater than 3.84, so we reject the hypothesis that the expected values are the same as the observed values. Conclusion: The experimental treatment has an impact on the frequency of cell division.

Ecology Example

Let's say a student is studying whether the size of a pill bug affects what choices it makes in a choice chamber. The student has set up the choice chamber with a dark side and a light side. The table below shows the observed data.

Table with length categories (5-9, 10-14, 15-19mm) showing values for Light Side (2, 5, 1) and Dark Side (10, 15, 7) in blue. — Observed data

The null hypothesis here is that length does not impact whether a bug chooses the light or dark side. This is a test for independence, so each expected value is (row total * column total) / total. For example, the 6-9 mm light expected = ((2+10) * (2+5+2)) / 40 = 2.4

Table shows categories with observed and expected numbers. Categories include variations of light and dark with length measurements. — Observed and expected data

For this example, the chi-square value is 0.678.

Mathematical equation for chi-squared test calculation, resulting in 0.678.

There are 2 degrees of freedom, so at p = .05, the critical value is 5.99. 0.678 is less than 5.99, so we cannot reject the hypothesis that the expected values are the same as the observed values. Conclusion: Pill bug length does not impact light choice.

What do AP Biology students need to know related to chi-square?

According to the AP Biology Course and Exam Description (2025), students should be able to

Write/identify a null hypothesis.
Calculate a chi-square value.
- The chi-square equation and table are on the Equations and Formulas sheet (no need to memorize).
Interpret a chi-square value (reject/fail to reject null hypothesis).
Use chi-square for genetics problems (including finding the expected values and degrees of freedom).
Understand that chi-square is used in situations beyond genetics.
Understand that there are more statistical tests than just chi-square, even though this is the one used in this course.

Both tests for independence and goodness-of-fit can be used in AP biology questions; however, students do not need to differentiate between them. Based on the chi-square questions available in the AP Classroom Question Bank (as of July 2025), if students are asked to perform a calculation for a test of independence, they are provided the observed and expected values. Nearly all available questions where students are expected to calculate expected values are genetics questions (with one exception: a Hardy-Weinberg question from a practice exam).

Types of chi-square questions in the Question Bank on AP Classroom

Given a chi-square value, degrees of freedom, and a p-value, interpret the results (reject/fail to reject null hypothesis).
Given a heredity experiment description and a chi-square value, identify the critical value and interpret the results.
Given observed and expected values, a chi-square value, and a p-value, interpret the results.
Given observed and expected values, identify the critical value, calculate the chi-square value, and/or interpret results.
Given observed data for a heredity experiment, calculate the chi-square value.
Given observed data for a trait with two alleles producing three phenotypes, identify expected values for H-W equilibrium, calculate chi-square to test H-W equilibrium, and/or identify the critical value.

Biology Simulations

Click here to sign up for email updates!