pk.org: CS 417/Lecture Notes

Distributed Systems Foundations and Communication - Keywords

Paul Krzyzanowski – 2026-01-05

Core distributed systems concepts

Distributed system
A collection of independent computers connected by a network that coordinate to accomplish a common goal.
Autonomous computer
A computer that has its own processor, memory, operating system, and clock, and operates independently of others.
Message passing
Explicit communication between processes using network messages rather than shared memory.
Partial failure
A failure mode in which some components fail while others continue operating.
All-or-nothing failure
A failure mode in which the entire system stops functioning when a failure occurs.
Single point of failure
A component whose failure causes the entire system to fail.
Horizontal scaling
Increasing system capacity by adding more machines and distributing work across them.
Vertical scaling
Increasing system capacity by adding resources to a single machine.

Laws and principles

Moore’s Law
The historical observation that transistor counts, and thus computing capacity, roughly doubled every 18 to 24 months.
Amdahl’s Law
A principle stating that the speedup from parallelism is limited by the portion of a task that must remain sequential.
Metcalfe’s Law
The idea that the value of a network grows roughly with the square of the number of its participants.
End-to-end principle
A network design principle that places functionality such as reliability and security at the communicating endpoints rather than in the network.
Fate sharing
The principle that communication state should reside at the endpoints so failures affect only the components already involved.
Best-effort delivery
A network service model in which packets are attempted but not guaranteed to be delivered, ordered, or delivered within a fixed time.

Failure models

Fail-stop failure
A failure in which a component halts execution and produces no further output, and the failure can be detected.
Fail-silent failure
A failure in which a component produces no output, but other components cannot reliably distinguish failure from delay.
Crash-restart failure
A failure in which a component crashes and later restarts, possibly with lost or stale state.
Network partition
A failure that divides a system into disconnected groups that cannot communicate.
Byzantine failure
A failure in which a component continues running but does not follow the system specification, producing incorrect or inconsistent behavior.

Fault tolerance and availability

Fault tolerance
The ability of a system to continue operating correctly despite component failures.
Redundancy
The use of multiple components to tolerate failures and improve availability.
Availability
The fraction of time a system is usable from a client’s perspective.
Reliability
A measure of correctness and time-to-failure of a system or component.
Series system
A system structure in which failure of any component causes system failure.
Parallel system
A system structure in which the system continues operating as long as some components remain functional.

Networking fundamentals

Packet switching
A networking approach in which data is divided into packets that are routed independently through the network.
Layered architecture
A design approach that separates networking functionality into layers with well-defined responsibilities.
OSI model
A conceptual seven-layer model used to describe and reason about network protocol design.
Data link layer
The layer responsible for communication on a single physical network.
Network layer
The layer responsible for routing packets between machines across networks.
Transport layer
The layer responsible for process-to-process communication.

Internet and IP networking

Internet Protocol (IP)
A network-layer protocol that provides connectionless, best-effort delivery of packets between machines.
Datagram
An independent packet of data sent over a network without guarantees of delivery or ordering.
Port
A transport-layer identifier used to deliver data to the correct process on a machine.

Transport protocols and sockets

Transmission Control Protocol (TCP)
A transport protocol that provides reliable, ordered, congestion-controlled byte-stream communication.
User Datagram Protocol (UDP)
A transport protocol that provides connectionless, best-effort datagram delivery with minimal overhead.
Head-of-line blocking
A delay that occurs when later data must wait for earlier data to be delivered in order.
Socket
An operating system abstraction that provides an interface for network communication.
Connection-oriented communication
Communication that involves explicit connection setup and teardown.
Connectionless communication
Communication in which messages are sent independently without establishing a connection.
QUIC
A transport protocol built on UDP that provides reliable, multiplexed communication in user space.

Data placement

Replication
The creation of multiple authoritative copies of data to improve availability and fault tolerance.
Caching
The storage of temporary copies of data to reduce latency and load, potentially serving stale data.