What Causes Linkage Disequilibrium? (6 Forces)

Linkage disequilibrium is created by six main forces: physical linkage, genetic drift, natural selection, population admixture, bottlenecks and founder effects, and new mutation. Recombination constantly works to erase LD, so for LD to exist, one of these forces has to generate it faster than recombination can break it down. Several of them produce LD even between loci on different chromosomes, which is the clearest sign that LD is about more than physical proximity.
This guide walks through each cause and the signature it leaves. Understanding what generates LD is what lets researchers read it backward, using LD patterns to detect selection, reconstruct population history, and map disease genes. For the concept itself, our explainer on what linkage disequilibrium is sets it up, and you can experiment with how allele combinations produce LD by entering haplotype frequencies into an LD calculator.
Why LD Needs a Cause
LD is not the default state. Recombination pushes every pair of loci toward linkage equilibrium each generation, so without something actively generating it, LD decays away. This is the key framing: LD always reflects a force strong enough to outpace recombination.
That force can be physical, keeping alleles together because recombination rarely separates them, or it can be population-level, correlating alleles through drift, selection, mixing, or history. The physical causes produce LD only between nearby loci. The population-level causes can produce LD between any loci, near or far, linked or unlinked. Telling these apart is how LD becomes informative: LD between unlinked loci cannot come from linkage, so it points to one of the other forces.
Physical Linkage
The most familiar cause is physical linkage: loci sitting close together on the same chromosome. This is where the name comes from, even though it is only one source among several.
When two loci are close, recombination between them is rare, so the alleles they carry stay associated for many generations. The closer the loci, the stronger and more persistent the LD. This is the LD that disease mapping exploits, because a marker in strong LD with a nearby variant is physically close to it. Physical linkage produces the orderly, distance-dependent LD that fades as loci move apart, the pattern behind haplotype blocks. It is the only one of the six causes that requires the loci to be on the same chromosome.
Genetic Drift
Genetic drift generates LD by chance, and it is the dominant source in small populations. This connects LD directly to population size.
In a finite population, the haplotypes that get passed to the next generation are a random sample of those in the current one. By chance, some allele combinations are over-represented and others lost, which creates associations between loci, even unlinked ones. This sampling effect, described by William Hill and Alan Robertson in a 1968 paper on linkage disequilibrium in finite populations, produces LD at a rate inversely proportional to the effective population size: the smaller the population, the more drift-generated LD. In large populations, drift is weak and recombination keeps LD low; in small ones, drift constantly regenerates LD faster than recombination can clear it.
This is why small populations show LD between distant and even unlinked loci, and why the amount of LD in a population is used to estimate its effective size. The drift-LD connection runs both ways: drift creates LD, and measured LD reveals the population size that drift implies. This bridge to population size is developed in our guide on effective population size, and the broader behavior of drift is covered in our explainer on what genetic drift is.

Natural Selection
Selection generates LD when it favors or disfavors particular combinations of alleles, and it can do so even between unlinked loci. Two mechanisms matter.
The first is genetic hitchhiking. When a beneficial mutation rises in frequency under positive selection, the alleles physically near it on the same chromosome rise with it, because they are dragged along before recombination can separate them. This sweep creates a long stretch of strong LD around the selected site, sometimes called genetic draft. Scanning for these unusually long, high-LD haplotypes is a standard way to detect recent selection in a genome.
The second is epistatic selection, where particular combinations of alleles at different loci work better together than apart. As Michael Bulmer and others showed, this kind of selection can maintain LD indefinitely, even between loci on different chromosomes, because selection keeps rebuilding the favored combinations each generation as recombination breaks them up. Directional and stabilizing selection tend to generate negative LD between loci, while disruptive and epistatic selection generate positive LD. Selection is therefore one of the few forces that can sustain LD permanently rather than just transiently.
A third, subtler selective process is Hill-Robertson interference, which arises from the interaction of selection and drift in finite populations. When two beneficial mutations occur on different chromosomes, neither can fix without the other being lost unless recombination brings them together, so selection at one locus interferes with selection at the other. This builds negative LD among selected loci and slows the overall response to selection. The effect is strongest where recombination is low, which is part of why regions of suppressed recombination, like the interiors of inversions, accumulate distinctive LD. It also creates an evolutionary pressure favoring recombination itself, since recombination relieves the interference, one of the leading explanations for why sex and recombination are so widespread.
Population Admixture
When two genetically different populations mix, the result is strong LD, even between unlinked loci. Admixture is one of the most powerful LD-generating forces.
The reason is straightforward. If two source populations have different allele frequencies at two loci, then in the generation right after mixing, the alleles that were common together in one source still tend to appear together, producing an association. A locus where one source population was mostly A and the other mostly a, combined with a second locus showing the same split, yields strong LD between the two loci purely from the mixing, regardless of whether they are linked. This admixture LD was characterized by Jonathan Pritchard and Noah Rosenberg in a 1999 paper on detecting population stratification, who showed it could be used to detect hidden population structure.
Admixture LD has a distinctive property: it appears across the whole genome, not just between nearby loci, because mixing correlates allele frequencies everywhere at once. It then decays over generations as recombination breaks down the cross-population associations, fastest for unlinked loci and slowest for tightly linked ones. This decay is exploited in admixture mapping, which uses the LD created by recent mixing to locate disease genes that differ in frequency between the source populations.
The same allele-frequency differences that create admixture LD are what population-differentiation measures quantify, so admixture LD and between-population differentiation are two views of the same underlying divergence, a link our guide on genetic differentiation and gene flow develops. Population structure, the milder cousin of admixture, generates LD the same way: a sample that unknowingly pools individuals from differentiated subpopulations shows LD reflecting the mixture, not any real association within either group. This is the notorious source of false positives in association studies, where a marker appears linked to a trait only because both track ancestry, which is why modern studies correct for structure before trusting any signal.

Bottlenecks and Founder Effects
A sharp reduction in population size, a bottleneck, or the founding of a population by a few individuals, a founder effect, generates substantial LD. These are really intense episodes of drift.
When a population crashes or is founded by a handful of individuals, only a small sample of haplotypes survives, so the haplotype frequencies in the survivors are far from the original population's. This creates strong, genome-wide LD that persists for many generations as the population recovers. Recurrent bottlenecks are especially effective at building LD. The classic human example is the out-of-Africa bottleneck, which left non-African populations with more extensive LD and larger haplotype blocks than African populations. Domestication shows the same pattern: domesticated crops and livestock, founded from small wild samples and bred intensively, carry far more LD than their wild ancestors, as seen when domesticated soybean is compared to its wild relative.
This makes bottleneck-driven LD a fingerprint of demographic history. The amount and extent of LD a population carries records the size reductions in its past, which is why LD is a standard tool for reconstructing bottlenecks, founder events, and expansions. The connection between dramatic size changes and their genetic aftermath is explored in our guide on population bottlenecks.
Mutation
Every new mutation begins life in complete linkage disequilibrium with the alleles around it. Mutation is the ultimate origin of all LD, even though its effect on any one variant fades over time.
When a mutation arises, it occurs on one specific chromosome carrying one specific combination of alleles at neighboring loci. At that instant, the new allele is found only with those particular neighbors, so it is in perfect LD with all of them. Over generations, recombination gradually shuffles the new allele onto other backgrounds, eroding this initial association, fastest with distant loci and slowest with close ones. The LD around a young mutation is therefore strong and extends far; the LD around an old mutation is weak and confined to its immediate neighbors.
This time dependence is useful. The extent of LD around an allele indicates its age: a common allele still sitting in a long block of strong LD is likely young and possibly recently selected, because there has not been time for recombination to break up its haplotype. This logic underlies several tests for recent positive selection, which look for young, high-frequency alleles still embedded in long, undisrupted haplotypes.
Mutation interacts with the other measures too. Because a brand-new variant sits on exactly one haplotype background, it starts in complete LD by D prime but, being rare, shows low r squared with the common variants around it, the rare-allele divergence between the two measures in its purest form. Recurrent mutation, where the same change happens independently on different backgrounds, slightly erodes D prime below one even without recombination, which is one reason D prime of exactly one is rarer for highly mutable sites. These wrinkles matter mainly at the finest scale, but they explain why mutation is both the source of all LD and a quiet ongoing influence on its measured value.
The Six Causes at a Glance
The forces differ in whether they create LD between unlinked loci and how long the LD lasts. The table summarizes them.
| Cause | LD between unlinked loci? | Persistence |
|---|---|---|
| Physical linkage | No, nearby loci only | Long for close loci |
| Genetic drift | Yes | Ongoing in small populations |
| Selection (hitchhiking) | Mainly nearby | Transient, until recombination clears it |
| Selection (epistatic) | Yes | Can be indefinite |
| Admixture | Yes, genome-wide | Decays over generations after mixing |
| Bottleneck / founder effect | Yes, genome-wide | Many generations |
| Mutation | Nearby loci | Decays as the allele ages |
The pattern worth remembering is the unlinked-loci column. Physical linkage and hitchhiking act locally, on nearby loci. Drift, epistatic selection, admixture, and bottlenecks all generate LD genome-wide, between loci anywhere. So LD between loci on different chromosomes is a signal of one of those population-level forces, never of linkage, which is precisely why it is so useful for detecting structure and history.
Frequently Asked Questions
Can linkage disequilibrium occur between unlinked loci?
Yes. While physical linkage produces LD only between nearby loci, several other forces, genetic drift, epistatic selection, population admixture, and bottlenecks, generate LD between loci anywhere in the genome, including on different chromosomes. LD between unlinked loci is in fact a key signal of population structure, admixture, or recent demographic events, because it cannot come from linkage.
What is the main cause of linkage disequilibrium in small populations?
Genetic drift. In a small population, the haplotypes passed to the next generation are a random sample of the current ones, and chance over-representation of some combinations creates LD, even between unlinked loci. This drift-generated LD increases as effective population size decreases, which is why small populations show extensive LD and why LD is used to estimate population size.
Reading LD Backward
Linkage disequilibrium exists because some force generates it faster than recombination erases it, and there are six main culprits: physical linkage, genetic drift, selection, admixture, bottlenecks, and mutation. Physical linkage and hitchhiking act locally on nearby loci, while drift, epistatic selection, admixture, and bottlenecks generate LD across the whole genome, even between loci on different chromosomes.
That distinction is what makes LD a powerful tool rather than a curiosity. Because each force leaves a characteristic signature, researchers read LD patterns backward to detect natural selection, reveal hidden population structure, reconstruct bottlenecks and migrations, and map disease genes. The same associations that physical linkage creates locally, the population-level forces create globally, and telling which force is responsible turns a pattern of correlations into a story about a population's genetics and past. To see how these patterns get put to work in human genetics, our guide on linkage disequilibrium in GWAS covers the applications.