Part 1: Foundations of Cryptography
Part 2: Classical Ciphers
Part 3: Mechanized Cryptography
Part 4: Theoretical Breakthroughs
Part 5: Modern Symmetric Cryptography
Part 6: Principles of Good Cryptosystems
Part 7: Introduction to Cryptanalysis

Introduction

The mechanized era showed that engineering sophistication could create much stronger ciphers than classical methods. Yet even Enigma, with its astronomical key space, was defeated. The lesson thatcomplexity and size alone do not guarantee security. Cryptography needed a mathematical foundation — a way to measure security, understand what made ciphers strong or weak, and design systems with provable properties.

In 1949, Claude Shannon, who created the field of information theory, published Communication Theory of Secrecy Systems. Written during World War II at Bell Labs, it gave cryptography its first rigorous framework: how to measure uncertainty, how to reason about secrecy, and how to define when a cipher is secure.

Entropy: Measuring Information Content

Shannon's breakthrough was to treat information as something that could be measured mathematically. His concept of entropy quantifies the uncertainty or randomness in a message.

The formula for Shannon entropy is $H(X) = -\sum_{i=1}^{n} p(x_i) \log_{2} p(x_i)$, where:

Random variable $X$: something that can take on different outcomes. For example, $X$ could represent a coin flip that could be "heads" or "tails."
Outcomes $x_i$: one of the. possible values of $X$. For example, for a coin flip, $x_1$ = heads, $x_2$ = tails.
Probability $p(x_i)$: the probability that the outcome $x_i$ occurs. Probabilities always at up to 1. For example, for a fair coin, $p(heads) = 0.5. - **The base-2 logarithm $\log_{2} p(x_i)$**: the log base 2 of the probability. Log base 2 measures information in bits. Intuitively, rare events (those with small probabilities) are more surprising, and hence give more information, when they happen, so they contribute more to entropy.
The product $p(x_i) \log_{2} p(x_i)$: a weighted measure of probability times information. Common events (high $p(x_i)$) are less surprising and rare events (low $p(x_i)$) are more surprising. The weighting balances them.
The minus sign: since $\log_{2} p(x_i)$ is negative because probabilities are between 0 and 1, the minus sign makes entropy positive.
The summation $\sum_{i=1}^{n}$: adds up contributions from all possible outcomes of $X$. The result is the average amount of information you get when observing $X$.

Entropy $H(X)$ gives us the expected (average) information per outcome. High entropy represents more unpredictability (like a fair coin). Low entropy represents less unpredictability (like a weighted coin that almost always lands heads).

It measures the average number of bits needed to determine the outcome. Here are some examples:

Fair coin: H = 1 bit (maximum for 2 outcomes)
Biased coin (90% heads): H ≈ 0.47 bits
Fair 6-sided die: H = log₂6 ≈ 2.58 bits
Uniform n-bit string: H = n bits

English Text Has Low Entropy

This mathematical framework revealed why classical ciphers failed. Text has much less entropy than its symbol space suggests:

Maximum possible: If all 27 symbols (A-Z plus space) appeared equally, entropy would be log₂(27) ≈ 4.75 bits per character.

Actual entropy: Shannon's experiments with human subjects revealed that English text carries only about 1-1.5 bits per character when context is considered (that is, in real text).

This means most of each letter is redundancy: predictable information that doesn't need to be transmitted. Redundancy is what lets us guess letters in crossword puzzles, allows us to understand speech with missing words, and enables spell-checkers to guess what we meant.

But redundancy is also what made frequency analysis work. A cipher that preserves this redundancy (like simple substitution) leaves the statistical fingerprints that cryptanalysts can exploit.

The Cryptographic Implication

Strong encryption must eliminate redundancy. The ciphertext should have high entropy: it should look statistically random, with no predictable patterns. If identical plaintext blocks produce identical ciphertext blocks, or if letter frequencies show through, the redundancy is preserved and the cipher is vulnerable.

Perfect Secrecy: The Theoretical Ideal

Shannon formalized the intuitive notion of "perfect security" with a precise definition: perfect secrecy means the ciphertext reveals absolutely nothing about the plaintext.

Formally, for every plaintext $p$ and ciphertext $c$, $\Pr[P=p \mid C=c] = \Pr[P=p]$.

In other words, seeing the ciphertext doesn't change your knowledge about what the plaintext might be. Before seeing the ciphertext, certain plaintexts were more likely than others based on context. After seeing the ciphertext, those probabilities are exactly the same.

The One-Time Pad: Achieving Perfect Secrecy

Shannon proved that perfect secrecy is achievable, and he identified the conditions required. The one-time pad meets these conditions:

Algorithm

Plaintext: P (n bits)
Key: K (n bits, truly random)
Encryption: C = P ⊕ K (bitwise XOR)
Decryption: P = C ⊕ K

Example

Plaintext:  10110010
Key:        01101100  (random)
Ciphertext: 11011110

To decrypt: 11011110 ⊕ 01101100 = 10110010

Historical Context: The Vernam Cipher

The one-time pad has its roots in the Vernam cipher, invented by Gilbert Vernam at AT&T in 1917. Vernam was working on secure telegraph communications and developed a system that combined plaintext with a key tape using the XOR operation (though he described it in terms of the Baudot code used in teleprinters).

Vernam's original system used a key tape that could be reused, which made it vulnerable to attack. The crucial insight, that the key must be used only once, came later through the work of Army cryptographer Joseph Mauborgne. In 1919, Mauborgne proved that if the key tape was truly random, never reused, and as long as the message, the system would be unbreakable.

This combination of Vernam's mechanical implementation and Mauborgne's theoretical insight created what we now call the one-time pad. Shannon's later work provided the mathematical framework to prove why these conditions were necessary and sufficient for perfect secrecy.

Why the One-Time Pad Is Perfect

The proof is elegant. For any plaintext P and any ciphertext C, there exists exactly one key K such that P ⊕ K = C. Since the key is chosen uniformly at random, every possible plaintext is equally likely to have produced the observed ciphertext.

An adversary who intercepts the ciphertext 11011110 gains no information about whether the plaintext was 10110010, 00101010, 11111111, or any other 8-bit string. Each is equally probable given a random key.

The Price of Perfect Secrecy

Shannon also proved the conditions under which perfect secrecy is possible:

Keys must be truly random: Any bias or predictability breaks the proof
Keys must be at least as long as the message: You need as much random key material as data to protect
Keys must never be reused: Using the same key twice catastrophically breaks security

Why key reuse is fatal: If two messages P₁ and P₂ use the same key K:

C₁ = P₁ ⊕ K
C₂ = P₂ ⊕ K
Therefore: C₁ ⊕ C₂ = P₁ ⊕ P₂

The key cancels out, revealing the relationship between the two plaintexts. With language redundancy, this often reveals both messages.

Practical problems with the one-time pad

The requirements for the one-time pad are impractical for most applications. The one-time pad requires the key to be as long as the data and never reused. This causes several practical problems:

Key distribution: Securely sharing as much key material as the data to be protected
Key storage: Safely storing the key
Key synchronization: Ensuring sender and receiver use the same key bits in the same order
Key generation: Producing truly random keys without bias or predictability

In addition to the storage needs, because the key cannot be reused, the one-time pad replaces the problem of sharing a message securely with that of securely sharing a key that is as long as the message.

Historical Use

Despite these limitations, one-time pads have been used when perfect security justified the cost:

Soviet diplomatic communications during the Cold War
The Washington-Moscow hotline during tense periods
High-level military communications for critical operations
Emergency backup systems when other methods failed

These systems required elaborate key distribution networks, diplomatic pouches, and careful operational procedures, demonstrating both the possibility and the cost of perfect secrecy.

Computational Security: The Practical Alternative

Since perfect secrecy is usually impractical, real-world cryptography aims for computational security: making attacks infeasible with available resources rather than impossible in principle.

The Modern Goal

Instead of perfect secrecy, we target computational indistinguishability: to any adversary with realistic computational resources, the ciphertext should be indistinguishable from random data.

Practical interpretation:

No statistical biases in ciphertext
No visible patterns or repetitions
No compression possible (randomness doesn't compress)
No feasible way to recover plaintext without the key

Attack Models

To make this concrete, cryptographers define specific attack models that represent what adversaries can do and test ciphers against these models:

Ciphertext-only attack (COA): Adversary sees only ciphertext
Known-plaintext attack (KPA): Adversary has some plaintext-ciphertext pairs
Chosen-plaintext attack (CPA): Adversary can request encryption of chosen plaintexts
Chosen-ciphertext attack (CCA): Adversary can request decryption of chosen ciphertexts

Modern ciphers are designed to remain secure even when adversaries have significant capabilities. However, these models are useful for testing new ciphers.

Confusion and Diffusion: Design Principles

Shannon identified two essential properties that secure ciphers must exhibit: confusion and diffusion.

Confusion: Hides the relationship between the key and the ciphertext. Each output bit should depend on the key in a complex, nonlinear way that resists analysis.; Small changes to the key or input should scramble many output bits in ways that are hard to predict. Modern ciphers use substitution boxes (S-boxes) — small lookup tables that implement nonlinear transformations. An 8-bit S-box takes an 8-bit input and produces an 8-bit output, but the mapping is carefully designed so that simple relationships (like XOR) don't hold.
Diffusion: Spreads the influence of each input bit across many output bits. A change in any single plaintext or key bit should affect many ciphertext bits in unpredictable ways.; Modern ciphers use permutation operations that rearrange and mix bits or bytes. Linear transformations like matrix multiplication can provide diffusion while remaining invertible for decryption.; The avalanche effect: Proper diffusion creates an "avalanche effect." This term refers to the property where changing one input bit should change about half the output bits. This ensures that small differences in input produce large, unpredictable differences in output.

Combining Confusion and Diffusion

Neither property alone is sufficient:

Confusion without diffusion: Local changes remain local, allowing divide-and-conquer attacks
Diffusion without confusion: Relationships between input and output remain linear and solvable

Most ciphers apply both properties in multiple rounds. Each round does:

Substitute (confusion): Apply nonlinear S-boxes
Permute (diffusion): Mix and rearrange the result
Mix key material: Combine with a round key derived from the main key

After enough rounds, small changes in input or key affect the entire output in highly nonlinear ways.

Randomness

Both perfect secrecy and computational security depend critically on randomness. One-time pads need truly random keys; modern ciphers need random keys and often random initialization values.

Random vs. Pseudorandom

In practice, we obtain sequences of random or pseudorandom bits:

Random bits come from physical processes we model as unpredictable, such as radioactive decay, thermal noise, and quantum measurements.
Pseudorandom bits come from deterministic algorithms (pseudorandom number generators, or PRNGs) that produce sequences that "look random" but are actually computed from a short seed.
Cryptographically secure PRNGs (CSPRNGs) are pseudorandom generators whose outputs are computationally indistinguishable from true randomness to anyone who doesn't know the seed. If someone examines a stream of data from a CSPRNG, they should be unable to make any predictions about what data will follow. Operating systems collect a small amount of physical randomness to seed a CSPRNG, then stretch that seed into as many bits as applications require.

How operating systems provide randomness

Modern operating systems provide cryptographically secure randomness:

Linux. The kernel maintains an entropy pool sourced from device behavior and timing jitter. Programs call getrandom() or read from /dev/urandom. Before the pool is initialized, getrandom() blocks so you do not receive low-entropy output. Once it is seeded, both interfaces draw from the same CSPRNG and are suitable for cryptography.
Windows. Windows exposes a similar service through the Cryptography Next Generation API; BCryptGenRandom returns output from a seeded CSPRNG backed by kernel sources. In both cases, the right approach is to call the system API rather than invent a generator in application code.
macOS: Similar to Linux with additional hardware sources of randomness.

These systems:

Collect entropy from hardware events (keyboard timing, disk delays, network interrupts) or the CPUs that support random number generation
Hash and mix entropy sources to eliminate bias
Seed a CSPRNG that can produce unlimited output
Provide applications with cryptographically secure random numbers

Why getting randomness right is hard

Freshly booted machines may not have gathered enough environmental noise yet. Virtual machines and embedded devices often have fewer and less diverse hardware events. When developers bypass the OS interface, seed a general-purpose PRNG, or accidentally reuse per-operation randomness, the results are often predictable.

There have been several high-profile examples where bad random values broke security. Some of these are:

Netscape SSL (1995): Seeded PRNG with predictable values (time and process ID); session keys were easily guessed.
Debian OpenSSL bug (2008): A code change accidentally eliminated most entropy sources; keys generated on affected systems were easily broken.
PlayStation 3 ECDSA (2010): Sony reused the same random value for different signatures, exposing their signing key and allowing homebrew software.
Android Bitcoin wallets (2013). Weak randomness led to ECDSA nonce reuse; attackers recovered private keys and stole funds.
Keyfactor study (2019). Scanning 75 million Internet-facing RSA certificates found widespread shared-factor keys, largely from IoT devices with poor entropy. [See the Keyfactor paper or the ITPro article.]
form-data multipart boundaries (2025). Using Math.random() let attackers predict HTTP multipart boundaries since Math.random() is not a cryptographically secure random number generator (CVE-2025-7783). The report states that an "attacker who can observe a few sequential values can determine the state of the PRNG and predict future values."

Quantum randomness (current research)

Researchers continue to search for high quality randomness that could be obtained at high rates. Quantum physics offers entropy sources that are unpredictable in principle. A simple example is sending single photons into a 50–50 beam splitter and recording which detector clicks.

More ambitious work uses Bell-test techniques that certify randomness under clear physical assumptions, even if the device internals are untrusted.

In June 2025, NIST and partners launched CURBy (the Colorado University Randomness Beacon), a free public service that publishes traceable, verifiable quantum-generated random numbers using an entanglement-based Bell test and a provenance protocol (“Twine”). Today, such sources are mostly used to seed conventional generators or in specialized links; for general software, the guidance remains the same: use the operating system’s CSPRNG.

Shannon's Fundamental Insights

Shannon's theoretical framework transformed cryptography from art to science. His contributions include:

Information can be measured mathematically using entropy
Perfect secrecy is possible but expensive (one-time pad)
Practical security can target computational bounds instead of information-theoretic ones
Confusion and diffusion are essential design principles for strong ciphers
Randomness is crucial and must be protected throughout the system

Modern Applications

Shannon's principles directly influenced the design of modern ciphers:

AES (Advanced Encryption Standard): Uses S-boxes for confusion and matrix operations for diffusion, applied over multiple rounds
Block cipher modes: Designed to ensure that identical plaintext blocks don't produce identical ciphertext
Stream ciphers: Generate pseudorandom keystreams to approximate one-time pad behavior
Security proofs: Modern cryptanalysis uses Shannon's framework to prove security bounds

The Path Forward

Shannon's work established the theoretical foundation that modern cryptography stands on. His framework allows us to:

Analyze existing systems to understand their strengths and weaknesses
Design new systems with provable security properties
Evaluate proposals using mathematical criteria rather than intuition
Understand the relationship between theoretical ideals and practical constraints

This mathematical foundation made possible the next phase of cryptographic development: engineered systems that deliberately implement confusion and diffusion to create computationally secure ciphers suitable for widespread use.

Next: Part 5: Modern Symmetric Cryptography

Theoretical Foundations