Allele Frequency Calculator
Allele frequency measures how common a genetic variant is in a population. Enter genotype counts (AA, Aa, aa), phenotype counts, or raw allele counts, and the calculator computes p and q, predicts Hardy-Weinberg expected genotype frequencies (p², 2pq, q²), highlights the carrier frequency (2pq), and runs a chi-square test to check whether the population is in genetic equilibrium.
Genotype Counts
Enter the number of individuals with each genotype in your sample.
Hardy-Weinberg expected frequencies assume random mating, no selection, no mutation, no migration, and a large population. Chi-square test uses df=1 and a significance threshold of p=0.05 (critical value 3.841).
Quick Answer
Allele frequency (p for dominant, q for recessive) is the proportion of a specific allele among all alleles at a given locus in a population. Calculate it from genotype counts using: p = (2 x AA + Aa) / (2 x N) and q = 1 - p, where N is the total number of individuals. Under Hardy-Weinberg equilibrium, expected genotype frequencies are: AA = p², Aa = 2pq, aa = q². A chi-square test compares observed versus expected counts to test whether the population is in equilibrium.
What Is Allele Frequency?
Allele frequency is the proportion of a specific allele variant among all copies of a gene in a population. For a gene with two alleles, A (dominant) and a (recessive), the frequencies are conventionally labelled p and q respectively. Because every individual in a diploid population carries exactly two alleles for each gene locus, the frequencies of all alleles at a locus must sum to 1: p + q = 1. This simple constraint means that once you know p, you automatically know q = 1 - p.
Allele frequencies are the currency of population genetics. For a quick visual of how allele combinations produce offspring, try our Punnett square calculator or dihybrid cross calculator. They quantify how common or rare any genetic variant is in a group of individuals and form the foundation for tracking evolutionary change, predicting disease risk, and interpreting genomic data. A population in which allele frequencies are shifting over time is, by definition, evolving. A population in which they remain stable generation after generation is said to be at genetic equilibrium.
The concept was formalised in 1908 independently by British mathematician G.H. Hardy and German physician Wilhelm Weinberg. Their foundational insight, now known as the Hardy-Weinberg principle, showed that in a large population with random mating, no selection, no mutation, and no migration, allele frequencies remain constant indefinitely and genotype frequencies settle into a predictable mathematical relationship determined entirely by p and q.
In practice, few real populations meet all these idealised conditions. Natural selection favours some alleles over others. Populations are finite, making them susceptible to random drift. Individuals sometimes mate non-randomly. Migration introduces new alleles. Mutation creates new variants. Detecting deviations from Hardy-Weinberg equilibrium using a chi-square test is therefore a powerful diagnostic: it can reveal hidden population structure, genotyping errors, selection in action, or non-random mating patterns.
How to Use the Allele Frequency Calculator
- Select your input mode. Choose from three options based on what data you have. Genotype Counts is the most precise: you know exactly how many AA, Aa, and aa individuals are in your sample. Phenotype Counts is used when you can only observe the phenotype (dominant vs recessive) but not the specific genotype: the calculator estimates p and q by assuming HWE and taking q = square root of (recessive count / total). Allele Counts is used when you have direct allele counts from sequencing or genotyping array data.
- Enter your counts. For genotype mode, enter the number of individuals in each genotype class (AA, Aa, aa). All values must be whole numbers greater than or equal to zero and the total must be at least 1. For phenotype mode, enter how many individuals show the dominant phenotype and how many show the recessive phenotype. For allele count mode, enter the total count of A alleles and a alleles observed across all individuals.
- Read p and q. The calculator displays p (dominant allele frequency) and q (recessive allele frequency) to four decimal places, alongside their percentages. Verify that p + q = 1.0000 as a sanity check.
- Review Hardy-Weinberg expected frequencies. The bar chart shows the expected proportions of AA (p²), Aa (2pq), and aa (q²) genotypes if the population were in perfect HWE. Compare these to your observed genotype frequencies to spot deviations.
- Check the HWE chi-square test (genotype mode). The calculator compares observed genotype counts to Hardy-Weinberg expected counts using a chi-square test with one degree of freedom. A chi-square value below 3.841 indicates the population is consistent with HWE at the 5% significance level. A value above 3.841 indicates a statistically significant deviation.
- Note the carrier frequency. The result panel highlights 2pq, the expected proportion of heterozygous carriers. This is particularly important in medical genetics: carrier frequency tells you how many people in the population carry one copy of a recessive disease allele without being affected.
Allele Frequency Formulas
There are three standard approaches to calculating allele frequency depending on the data available. Each gives a precise result under slightly different assumptions.
- From genotype counts (most accurate): p = (2 x AA + Aa) / (2 x N) and q = (2 x aa + Aa) / (2 x N), where N = AA + Aa + aa. This approach counts alleles directly: each AA individual contributes 2 A alleles, each Aa individual contributes 1, and each aa contributes 0. This method does not assume HWE and gives the true allele frequency in your sample regardless of whether the population is in equilibrium.
- From phenotype counts (requires HWE assumption): If you only know the dominant and recessive phenotype counts, you must assume HWE to estimate q. Because aa individuals are the only ones with the recessive phenotype: q² = aa count / N, therefore q = square root (aa/N), and p = 1 - q. This estimate is only valid if the population is genuinely in HWE.
- From allele counts (genomics/sequencing): p = count of A alleles / total allele count, q = count of a alleles / total allele count. Modern sequencing platforms report allele counts directly, making this the standard approach in genomics databases such as gnomAD and dbSNP.
- Hardy-Weinberg expected genotype frequencies: Once p and q are known, expected frequencies under HWE are AA = p², Aa = 2pq, aa = q². These sum to 1: p² + 2pq + q² = (p + q)² = 1.
- Chi-square HWE test: chi² = sum of (observed - expected)² / expected across all three genotype classes. Degrees of freedom = 1 (three genotype classes minus two allele frequencies estimated from the data minus one). Critical value at p=0.05, df=1 is 3.841. A chi-square above this threshold indicates significant departure from HWE.
Real-World Applications
In medical genetics, allele frequency is the starting point for estimating carrier frequency of autosomal recessive diseases. For cystic fibrosis in Northern European populations, the disease allele frequency (q) is approximately 0.02, meaning q² (affected individuals) = 0.0004 or about 1 in 2,500 newborns, and 2pq (carrier frequency) = approximately 0.04, or 1 in 25 people. Genetic counsellors use these calculations to explain population-level risk to families considering carrier testing.
In conservation biology, allele frequency monitoring is used to assess the genetic health of endangered species. When a population crashes to a small size, random genetic drift can rapidly alter allele frequencies, eliminating rare alleles entirely through chance rather than selection. This reduces genetic diversity and can make the population less resilient to future disease or environmental changes. Biologists track allele frequencies over time to detect these warning signs early in recovery programmes.
In forensic genetics, allele frequencies from population databases are used to calculate the probability that two DNA profiles match by chance, the match probability. This is the statistical foundation of DNA evidence presented in court. According to the National Institute of Justice, forensic labs use reference population databases containing allele frequencies for each STR locus to estimate the rarity of a particular genetic profile.
In plant and animal breeding, tracking allele frequencies across generations reveals whether artificial selection is working. If breeders are successfully selecting for a favourable trait, the frequency of the allele conferring that trait should increase in each generation. Monitoring allele frequencies allows breeders to quantify selection response and adjust breeding programmes accordingly.
Allele Frequency and Evolutionary Forces
Hardy-Weinberg equilibrium describes a population at evolutionary rest. Real populations deviate from equilibrium under the influence of five evolutionary forces, each of which shifts allele frequencies in a different way.
Natural selection increases the frequency of alleles that improve survival or reproductive success and decreases the frequency of harmful alleles. The rate of change depends on the selection coefficient (how much the allele affects fitness) and the current allele frequency. Selection acts most powerfully when a beneficial allele is rare, because then it is most often found in heterozygotes where it is immediately exposed to selection rather than hidden in homozygotes.
Genetic drift is random change in allele frequencies due to chance events in small populations. In very small populations (effective population size below a few hundred individuals), drift can fix or eliminate alleles entirely regardless of their fitness effects, overriding selection. The founder effect and bottleneck effect are extreme cases where a small group of individuals establishes a new population or survives a population crash, carrying only a subset of the original allele diversity.
Gene flow, the movement of alleles between populations through migration, homogenises allele frequencies across connected populations and can introduce new variants not present in the original population. Mutation creates new alleles at a low but constant rate, providing the raw material for selection to act upon. Non-random mating, such as inbreeding or assortative mating, alters genotype frequencies without changing allele frequencies, which is why HWE must be tested separately from whether p and q are stable.
Common Mistakes
Confusing allele frequency with genotype frequency. Allele frequency (p, q) is the proportion of a specific allele among all alleles at a locus. Genotype frequency (p², 2pq, q²) is the proportion of individuals with a specific genotype. These are related by the Hardy-Weinberg equations but are not the same thing. A common error is assuming that if an allele has frequency 0.3, then 30% of individuals are homozygous for it, when in fact the homozygous frequency is q² = 0.09 or just 9%.
Using phenotype mode without checking HWE first. Estimating allele frequencies from phenotype counts by taking q = square root (recessive proportion) assumes the population is in Hardy-Weinberg equilibrium. If the population has non-random mating or recent selection, this estimate can be substantially wrong. Whenever possible, genotype all individuals and use genotype count mode, which requires no HWE assumption.
Ignoring sample size. Allele frequency estimates from small samples are unreliable. A sample of 20 individuals gives you 40 allele observations, which can produce a frequency estimate of 0.5 with a confidence interval spanning from 0.34 to 0.66. Population genetics studies typically require at least 100 to 200 individuals for stable allele frequency estimates, and large-scale genomics studies use thousands.
Misinterpreting the chi-square result. A non-significant chi-square (below 3.841) does not prove that the population is in HWE. It means the data are consistent with HWE, but the test lacks statistical power to detect small deviations in small samples. Conversely, a significant chi-square in a very large sample may reflect a statistically detectable but biologically trivial deviation. Always consider sample size when interpreting chi-square results.
Forgetting that p + q must equal exactly 1. After calculating p from genotype counts, q should be calculated as 1 - p, not independently from the aa count using a separate formula. Using both formulas independently and not constraining p + q = 1 can introduce rounding errors that make the results inconsistent, especially when checking whether p² + 2pq + q² = 1.
Frequently Asked Questions
What is allele frequency?
Allele frequency is the proportion of a specific allele among all copies of a gene at a particular locus in a population. It is expressed as a decimal between 0 and 1, or as a percentage. For a two-allele system, the frequencies of the two alleles (p and q) must sum to 1.
What is the difference between allele frequency and genotype frequency?
Allele frequency refers to how common a specific allele is among all alleles in the population (p or q). Genotype frequency refers to how common a specific genotype is among all individuals (p², 2pq, or q²). Under Hardy-Weinberg equilibrium, genotype frequencies can be predicted from allele frequencies, but they are different measurements.
What does p + q = 1 mean?
For a gene locus with two alleles, p is the frequency of the dominant allele and q is the frequency of the recessive allele. Because every allele at that locus must be either A or a, the two frequencies must account for all alleles in the population, so they must sum to 1. This is a mathematical constraint, not an assumption.
What is Hardy-Weinberg equilibrium?
Hardy-Weinberg equilibrium (HWE) is a state in which allele and genotype frequencies in a population remain constant from generation to generation. It occurs when the population is large, mating is random, and there is no selection, mutation, or migration. Under HWE, genotype frequencies are: AA = p², Aa = 2pq, aa = q². Real populations deviate from HWE under evolutionary pressure.
How do I calculate allele frequency from genotype counts?
Count all alleles in the sample: each AA individual contributes 2 dominant alleles, each Aa individual contributes 1, and each aa contributes 0. Then p = (2 x AA + Aa) / (2 x N) where N is the total number of individuals. q = 1 - p. This method does not require the population to be in HWE.
What is carrier frequency and why does it matter?
Carrier frequency is the proportion of heterozygous individuals (Aa) in a population, equal to 2pq under HWE. Carriers carry one recessive disease allele but do not express the condition. Carrier frequency is important in medical genetics because it predicts how often two carriers will meet and have a risk of producing an affected child. For cystic fibrosis in Northern European populations, 2pq is approximately 1 in 25.
What causes deviation from Hardy-Weinberg equilibrium?
Five forces can cause deviation from HWE: natural selection (which favours some alleles over others), genetic drift (random frequency changes in small populations), gene flow (migration of alleles between populations), mutation (creation of new alleles), and non-random mating (such as inbreeding or assortative mating). A significant chi-square test result in a population genetics study indicates at least one of these forces is acting.
What does a chi-square test for HWE tell me?
The chi-square test compares observed genotype counts to the counts expected under HWE. A chi-square value above 3.841 (the critical value for one degree of freedom at p=0.05) indicates the observed genotype frequencies differ significantly from HWE expectations. This can signal selection, non-random mating, population structure, or genotyping error. It does not tell you which force is responsible.
Can I calculate allele frequency from phenotype data alone?
Yes, but only if you assume HWE. If you know the proportion of individuals showing the recessive phenotype (q²), you can estimate q = square root of q², then p = 1 - q. This method introduces error if the population is not in HWE, because the recessive phenotype frequency will not equal q² exactly. For precision, genotyping all individuals and using genotype counts is always preferable.
What is the difference between allele frequency and minor allele frequency (MAF)?
Allele frequency is the proportion of a specific allele in the population. Minor allele frequency (MAF) is specifically the frequency of the less common allele at a locus, by convention always the smaller of p and q. In genomics databases such as gnomAD and dbSNP, variants are typically reported by their MAF rather than by which allele is dominant or recessive, because dominance depends on the phenotype being studied while MAF is a purely descriptive measure of rarity.
Frequently Asked Questions
What is allele frequency?
What is the difference between allele frequency and genotype frequency?
What does p + q = 1 mean?
What is Hardy-Weinberg equilibrium?
How do I calculate allele frequency from genotype counts?
What is carrier frequency and why does it matter?
What causes deviation from Hardy-Weinberg equilibrium?
What does a chi-square test for HWE tell me?
Can I calculate allele frequency from phenotype data alone?
What is the difference between allele frequency and minor allele frequency (MAF)?
Rate This Tool
Was this tool helpful?
Be the first to rate this tool
About the Author
S. Siddiqui is the founder and editor-in-chief of YourToolsBase, overseeing all content, tool accuracy, and editorial standards.
View full profileRelated Tools
Authoritative Sources
Formulas and data in this tool are based on guidelines from the above sources.