Frequency Analysis

Analyze letter, character, word, bigram, and trigram frequencies in any text. Compare distributions against language profiles and use frequency analysis for classical cipher cryptanalysis.

Input
0 chars · 0 bytes
Try:
Result
✓ Client-side analysis ✓ Unicode and multilingual text support ✓ Real-time analysis as you type ✓ Client-side processing only
Examples
English text analysis
Input The quick brown fox jumps over the lazy dog

This pangram contains every letter of the English alphabet at least once.

Caesar ciphertext
Input KHOOR ZRUOG

HELLO WORLD encoded with Caesar cipher (shift 3). K, H, U, O, Z, G dominate — shifted from H, E, L, W, O, D.

Hamlet quote
Input To be or not to be that is the question

A famous English sentence for testing natural language letter distribution.

Repeated letter pattern
Input ATTACK AT DAWN ATTACK AT DUSK

A short phrase with repeated words and letter patterns. Useful for testing word frequency, bigrams, trigrams, and repeated-symbol analysis.

How frequency analysis works

Frequency analysis measures how often letters, symbols, words, or character groups appear in a text. Natural languages follow predictable statistical patterns, which means some letters occur far more frequently than others. In English, for example, E, T, A, O, I, N, S, H, and R are among the most common letters.

This tool calculates frequencies for letters, words, bigrams, and trigrams, allowing you to compare an unknown text against expected language distributions. Large deviations from normal language patterns often reveal encryption, encoding, or unusual text structures.

The results can be sorted and compared against language profiles to help identify the probable language of a text and detect statistical anomalies.

Using frequency analysis to break ciphers

Frequency analysis is one of the oldest techniques in cryptanalysis. Simple substitution ciphers preserve the statistical structure of a language, meaning the most common ciphertext symbols usually correspond to the most common plaintext letters.

To analyze a ciphertext, compare the observed frequencies with the expected frequencies of the suspected language. High-frequency symbols, common bigrams, and common trigrams can provide valuable clues when reconstructing the original message.

For Caesar cipher, frequency peaks often reveal the shift directly. For Vigenère and other polyalphabetic ciphers, frequency analysis is commonly combined with the Index of Coincidence and the Kasiski examination to estimate key length before attempting decryption.

Understanding language frequency profiles

Every language has a unique statistical fingerprint. In English, just six letters account for nearly half of all written text — and this distribution stays remarkably stable across topics, authors, and time periods.

E
12.7%
T
9.1%
A
8.2%
O
7.5%
I
7.0%
N
6.7%

Cryptanalysts compare observed ciphertext frequencies against profiles like these to identify the probable language and map high-frequency symbols to likely plaintext letters. The classic mnemonic ETAOIN SHRDLU captures the twelve most common English letters in order — a shorthand every classical cryptanalyst knows by heart.

FAQ

Frequency analysis is the study of how often letters, symbols, words, or character groups appear in a text. It is a fundamental cryptanalytic technique used to identify language patterns and attack many classical ciphers.

Frequency analysis works best against monoalphabetic substitution systems such as Caesar cipher, affine cipher, and simple substitution ciphers. It can also assist with attacks on Vigenère cipher when combined with other techniques. Modern encryption algorithms such as AES are not vulnerable to frequency analysis.

The most common English letters are approximately E, T, A, O, I, N, S, H, R, D, L, and C. The exact distribution varies by text type, but E typically represents around 12–13% of all letters in natural English writing.

The Index of Coincidence (IC) measures how likely it is that two randomly selected letters from a text are identical. Natural language texts usually have a higher IC than random text. Cryptanalysts use IC to distinguish between plaintext, monoalphabetic substitutions, and some polyalphabetic ciphers.

Bigrams are two-character sequences such as TH, HE, or ER. Trigrams are three-character sequences such as THE, ING, or AND. They provide additional statistical information that can significantly improve classical cipher analysis.

Yes. By comparing observed frequencies against known language profiles, frequency analysis can often estimate the most likely language of a text, especially when sufficient text is available.