Glossary

Frequency Analysis

letter frequency attack

A cryptanalytic method that compares symbol and letter-group frequencies with language patterns to infer plaintext in classical ciphers.

Definition

Frequency analysis examines how often letters, symbols, words, or groups such as bigrams and trigrams occur in a text. Natural languages have recurring statistical patterns: in English, for example, E, T, and A are usually among the most common letters.

Many classical ciphers change the symbols but preserve some of those patterns. A cryptanalyst compares the observed ciphertext distribution with a reference profile to propose likely plaintext letters and combinations. The result is evidence for candidate mappings, not an automatic decryption.

How it works

First normalize the text and count individual symbols. Then test several mappings between frequent ciphertext symbols and common letters in the suspected language. Bigrams, trigrams, repeated words, and word shapes help confirm or reject each hypothesis.

A Caesar cipher shifts the entire frequency profile by the same amount, while a monoalphabetic substitution permutes it. With a repeating-key polyalphabetic cipher such as Vigenère, analysts first estimate the key period—often with the Index of Coincidence or Kasiski examination—and then analyze the corresponding ciphertext columns separately.

Practical example

Suppose one symbol dominates a sufficiently long monoalphabetic ciphertext. It may represent E, but it could also stand for T, A, a space, or a letter favored by the text’s topic. The analyst tests several candidates and looks for supporting patterns: common pairs such as TH and HE, plausible repeated words, and readable fragments. A mapping becomes convincing only when it explains many observations at once.

Limitations

Frequency analysis is unreliable on very short texts and can be distorted by names, specialized vocabulary, spelling conventions, or an unusual genre. Homophonic substitution, polyalphabetic systems, compression, and deliberate padding can weaken the visible language profile.

Modern encryption is designed to hide such regularities. A correctly used one-time pad goes further: its independent random key makes the ciphertext statistically independent of the plaintext. A particular ciphertext need not look perfectly flat; the crucial point is that its frequencies reveal no information about the original message.

Frequently asked questions

It is most useful against Caesar, affine, and other monoalphabetic substitution ciphers. It can also assist attacks on repeating-key Vigenère when combined with key-period detection and enough ciphertext.

There is no universal minimum. Longer samples produce more stable letter and n-gram distributions, while short messages may support several equally plausible mappings. Text type and cipher design matter as much as raw length.

When the key is truly random, as long as the message, used only once, and kept secret, every possible plaintext is compatible with the ciphertext. Its statistics therefore reveal nothing about which plaintext was sent.

Not when modern encryption such as AES is implemented and used correctly. These algorithms diffuse plaintext patterns throughout the ciphertext; attacks instead target keys, protocols, implementations, or misuse.

See also