pk.org: Computer Security/Lecture Notes

Hash Pointers, Blockchains, Merkle Trees, and Bitcoin

Study Guide

Paul Krzyzanowski – October 9, 2025

Cryptographic Foundations

Bitcoin's trust model depends on cryptographic hash functions and authenticated data structures.

A hash function such as SHA-256 converts any input into a fixed-length digest that changes unpredictably when even one bit of input changes.
Hashing enables Bitcoin to verify data integrity, detect tampering, and provide compact digital fingerprints.

A hash pointer is a pointer that also stores a hash of the referenced data.
If the data changes, the hash no longer matches, revealing tampering.
Hash pointers are used in systems such as Git, where each commit points to the hash of its parent commit.
A change in any file causes the commit hash to change, and this change propagates through history.

The blockchain uses the same idea. Each block includes the hash of the previous block's header.
If an attacker modifies one block, every later block becomes invalid because the hashes no longer align.

A Merkle tree organizes data into a binary tree of hashes.
Each internal node stores the hash of its two children, and the root hash commits to all the data below it.
Merkle trees make it possible to verify that a transaction or file is included in a dataset without retrieving everything.
They are used in many systems:

The Double-Spending Problem

With physical cash, you cannot spend the same bill twice—once you hand it over, you no longer have it.
Digital files, however, can be copied infinitely.
Without a trusted authority to verify transactions, how do we prevent someone from sending the same bitcoin to two different people?

Traditional systems solve this with a central authority: a bank verifies that you have sufficient funds before approving a transaction.
Bitcoin solves it through its distributed ledger and consensus mechanism: all nodes maintain a complete transaction history and agree on which transactions are valid.

The Distributed Ledger

Bitcoin's ledger is not stored in one place. Instead, it is a distributed ledger: tens of thousands of nodes around the world each maintain a complete, independent copy of the entire transaction history.

This is not a fragmented database where different nodes hold different pieces.
Every participating node stores the full ledger, from the very first transaction in 2009 to the most recent block.
When a new block is added, it propagates across the network, and each node independently verifies and appends it to their local copy.

Nodes can verify they have the correct ledger by comparing block hashes with other nodes.
Because the blockchain is tamper-evident, any discrepancy in block hashes immediately reveals that someone has a corrupted or fraudulent copy.

This redundancy is central to Bitcoin's resilience.
There is no single point of failure, no server to shut down, and no organization that controls the data.
As long as even a handful of nodes remain online, the ledger persists.

The Ledger: Transactions vs. Accounts

Banking systems maintain account balances that change as money moves between accounts.
Bitcoin takes a different approach. It does not track balances but records every transaction ever made.
The current state of the system is the set of unspent transaction outputs (UTXOs).

Each Bitcoin transaction consumes prior outputs as inputs and creates new outputs that represent new ownership records.

Example:

This model ensures that:

Keys and Addresses

Ownership and authorization in Bitcoin rely on public-key cryptography.
Each user generates a key pair:

There are no usernames or real-world identities on the blockchain.
A user proves ownership simply by producing a valid digital signature with the correct private key.

Bitcoin uses addresses as compact, safer representations of public keys.
Addresses are derived from public keys by hashing them, adding a checksum, and encoding the result in a readable format.
They are used in transaction outputs to specify who can spend a given output.

When a recipient later spends funds, they reveal their public key and signature, allowing others to verify that it matches the address in the earlier transaction.
This system keeps participants pseudonymous while ensuring that only authorized users can spend funds.

Transactions

A Bitcoin transaction contains inputs and outputs. Inputs identify where the bitcoin comes from, and outputs identify to whom it is being transferred.
Each input references an earlier transaction output and provides a digital signature and public key as proof of ownership.
Outputs specify the recipient's address and amount.

Every input must be completely spent, so transactions often include a change output that returns excess funds to the sender.
The small difference between total inputs and outputs becomes the transaction fee, which goes to the miner who includes the transaction in a block.

When a transaction is created, it is broadcast to nearby Bitcoin nodes and propagated across the network within seconds.
Nodes independently verify each transaction by checking signatures, ensuring that referenced outputs exist and have not been spent, and validating the total value.
Once validated, transactions wait in a pool until included in a block.

Blocks and Linking

Transactions are grouped into blocks to simplify verification and synchronization.
A block bundles many transactions and links to the previous block, forming a continuous chain.

Each block has two main parts:

Changing any transaction alters its hash, which changes the Merkle root, the block hash, and every later block's reference.
Because each block depends on the one before it, the blockchain acts as an append-only, tamper-evident ledger.

What is Mining?

Mining is the process by which new blocks are added to the Bitcoin blockchain.
Miners are specialized nodes that collect valid transactions from the network, bundle them into a candidate block, and compete to publish that block by solving a computational puzzle.

The miner who successfully solves the puzzle first gets to add their block to the chain and receives a reward consisting of:

Mining serves two critical purposes:

  1. It creates new bitcoins in a controlled, predictable way

  2. It secures the network by making it computationally expensive to alter transaction history

Proof of Work and the Mining Puzzle

Bitcoin uses Proof of Work to determine which miner can publish the next block.
The mining puzzle requires finding a nonce (a number in the block header) such that the SHA-256 hash of the entire block header is less than a specific threshold called the target hash.

Formally: H(block header) < target hash, where H represents the SHA-256 hash function.

The target hash is a 256-bit number that determines how difficult it is to mine a new block.
The lower the target hash, the harder it is to find a valid solution.
Because hash outputs are unpredictable, miners must try billions or trillions of different nonce values until they find one that produces a hash below the target.

This process is computationally expensive but easy to verify.
Once a miner finds a valid nonce, any node can instantly verify the solution by computing a single hash.

The Difficulty Adjustment Algorithm

To keep the average time between blocks near 10 minutes, Bitcoin automatically adjusts the mining difficulty every 2016 blocks (roughly every two weeks) using the Difficulty Adjustment Algorithm.

If miners collectively produce blocks too quickly, the algorithm decreases the target hash, making the puzzle harder.
If blocks are mined too slowly, it increases the target hash, making it easier.

This self-regulating mechanism ensures that Bitcoin's block production remains stable regardless of how much mining power joins or leaves the network.
Even as miners deploy more powerful hardware, the difficulty adjusts to maintain the 10-minute average.

Mining Hardware Evolution

Bitcoin mining has evolved through several generations of hardware:

Because finding a valid block hash is probabilistic (like winning a lottery), individual miners often join mining pools to share both the computational work and rewards.
Each miner's chance of success is proportional to their share of the total network computing power.

Consensus and Chain Selection

Nodes always follow the longest valid chain, meaning the chain with the greatest cumulative proof of work (not necessarily the most blocks).

Bitcoin doesn't have a single central authority to decide which chain is correct.
Instead, the network uses consensus mechanisms to ensure all nodes agree on which block represents the head of the chain.

Competing Chains and Forks

When two valid blocks are found nearly simultaneously, the blockchain temporarily splits into competing chains, a situation called a fork.

Because miners are distributed globally and it takes time for blocks to propagate across the network, it's possible for a miner in Asia and a miner in Europe to both find valid blocks at nearly the same moment.
Each broadcasts their block to nearby nodes, and for a short time, different parts of the network may be working on different versions of the chain.

Most miners simply work on whichever valid block they received first.
Over the next few minutes, one branch will likely grow longer as more blocks are added to it.
Once one chain becomes longer, all honest nodes switch to that chain, and the shorter branch is abandoned.
Transactions in the abandoned blocks return to the memory pool and typically get included in future blocks on the winning chain.

This is why Bitcoin transactions are not considered truly final until several blocks have been added after them—a practice called waiting for confirmations.

While accidental forks resolve naturally within minutes, an attacker could attempt to create a competing chain deliberately to reverse a transaction.
However, the computational cost of sustaining a competing chain long enough to overtake the honest chain makes such attacks impractical.

Security and the 51% Attack

For an attacker to modify an earlier transaction, they would need to redo all proof of work from that block onward and surpass the rest of the network.
With thousands of miners contributing massive computational power, catching up is practically impossible.

An attacker who controlled more than half of the total computational power of the network could, in theory, execute a 51% attack—rewriting recent history or excluding specific transactions.
However, the cost of acquiring and operating enough hardware to do this across the global Bitcoin network is so high that such an attack is effectively infeasible in practice.

Even if an attacker succeeded, the attack would likely destroy confidence in Bitcoin, making their stolen coins worthless—a strong economic disincentive.

Mining Rewards and Economics

Each newly mined block includes one special transaction, the coinbase transaction, that creates new bitcoins from nothing.
This is how new coins enter circulation.

The initial reward in 2009 was 50 BTC per block.
Every 210,000 blocks (roughly every four years), it halves:

Over time, as the block reward continues to halve, transaction fees are expected to become the main incentive for mining.
After 32 halvings, the reward will reach zero and there will be a maximum of around 21 million bitcoins in circulation.

Miners act honestly because their revenue depends on following the rules.
Any attempt to cheat or fork the chain would destroy their own reward.
This self-interest forms the backbone of Bitcoin's decentralized stability.

System Overview

Bitcoin's architecture combines four reinforcing layers:

Layer Purpose
Cryptography Provides data integrity and authorization using hashes and signatures.
Data structures Blockchain and Merkle trees maintain authenticated, tamper-evident storage.
Consensus Proof of Work coordinates the network without central authority.
Economics Block rewards and transaction fees motivate miners to act honestly.

Key concepts in Bitcoin's design:

Together, these layers allow strangers to agree on a single version of history without a trusted intermediary.
Bitcoin's design shows how cryptography, distributed computing, and incentives can replace institutional trust with mathematical verification.

Next: Terms you should know