Glossary

Index of coincidence

ICcoincidence index

The Index of Coincidence measures the probability that two letters selected from a text are identical.

Definition and formula

The Index of Coincidence (IC) is the probability that two letters drawn without replacement from a text are the same. For counts nᵢ and total letter count N, IC = Σ nᵢ(nᵢ−1) / N(N−1). Spaces and punctuation are normally excluded consistently.

How to interpret it

Natural languages have uneven letter frequencies, so their IC is generally higher than that of a uniform random alphabet. Monoalphabetic substitution preserves the frequency counts and therefore the IC. Polyalphabetic encryption mixes several distributions and often lowers the overall value.

Estimating a key period

To test a candidate Vigenère period, divide the ciphertext into columns at that interval and average their IC values. Columns that each resemble one shifted language distribution support the candidate. Expected values depend on language, alphabet, normalization, and text length, so IC is evidence rather than proof.

Frequently asked questions

It indicates an uneven distribution with repeated letters, often compatible with natural language or monoalphabetic substitution, but it does not identify the cipher by itself.

It suggests likely periods. Multiples, divisors, and statistical noise can produce competing peaks, so other tests should confirm the result.

It becomes noisy on short samples because a few letter counts dominate the estimate. Longer ciphertext provides more stable evidence.

See also