Distributed Systems Foundations and Communication

Core distributed systems concepts

Distributed system: A collection of independent computers connected by a network that coordinate to accomplish a common goal.
Autonomous computer: A computer that has its own processor, memory, operating system, and clock, and operates independently of others.
Message passing: Explicit communication between processes using network messages rather than shared memory.
Partial failure: A failure mode in which some components fail while others continue operating.
All-or-nothing failure: A failure mode in which the entire system stops functioning when a failure occurs.
Single point of failure: A component whose failure causes the entire system to fail.
Horizontal scaling: Increasing system capacity by adding more machines and distributing work across them.
Vertical scaling: Increasing system capacity by adding resources to a single machine.

Moore’s Law: The historical observation that transistor counts, and thus computing capacity, roughly doubled every 18 to 24 months.
Amdahl’s Law: A principle stating that the speedup from parallelism is limited by the portion of a task that must remain sequential.
Metcalfe’s Law: The idea that the value of a network grows roughly with the square of the number of its participants.
End-to-end principle: A network design principle that places functionality such as reliability and security at the communicating endpoints rather than in the network.
Fate sharing: The principle that communication state should reside at the endpoints so failures affect only the components already involved.
Best-effort delivery: A network service model in which packets are attempted but not guaranteed to be delivered, ordered, or delivered within a fixed time.

Fail-stop failure: A failure in which a component halts execution and produces no further output, and the failure can be detected.
Fail-silent failure: A failure in which a component produces no output, but other components cannot reliably distinguish failure from delay.
Crash-restart failure: A failure in which a component crashes and later restarts, possibly with lost or stale state.
Network partition: A failure that divides a system into disconnected groups that cannot communicate.
Byzantine failure: A failure in which a component continues running but does not follow the system specification, producing incorrect or inconsistent behavior.

Fault tolerance: The ability of a system to continue operating correctly despite component failures.
Redundancy: The use of multiple components to tolerate failures and improve availability.
Availability: The fraction of time a system is usable from a client’s perspective.
Reliability: A measure of correctness and time-to-failure of a system or component.
Series system: A system structure in which failure of any component causes system failure.
Parallel system: A system structure in which the system continues operating as long as some components remain functional.

Packet switching: A networking approach in which data is divided into packets that are routed independently through the network.
Layered architecture: A design approach that separates networking functionality into layers with well-defined responsibilities.
OSI model: A conceptual seven-layer model used to describe and reason about network protocol design.
Data link layer: The layer responsible for communication on a single physical network.
Network layer: The layer responsible for routing packets between machines across networks.
Transport layer: The layer responsible for process-to-process communication.

Internet Protocol (IP): A network-layer protocol that provides connectionless, best-effort delivery of packets between machines.
Datagram: An independent packet of data sent over a network without guarantees of delivery or ordering.
Port: A transport-layer identifier used to deliver data to the correct process on a machine.

Transmission Control Protocol (TCP): A transport protocol that provides reliable, ordered, congestion-controlled byte-stream communication.
User Datagram Protocol (UDP): A transport protocol that provides connectionless, best-effort datagram delivery with minimal overhead.
Head-of-line blocking: A delay that occurs when later data must wait for earlier data to be delivered in order.
Socket: An operating system abstraction that provides an interface for network communication.
Connection-oriented communication: Communication that involves explicit connection setup and teardown.
Connectionless communication: Communication in which messages are sent independently without establishing a connection.
QUIC: A transport protocol built on UDP that provides reliable, multiplexed communication in user space.

Replication: The creation of multiple authoritative copies of data to improve availability and fault tolerance.
Caching: The storage of temporary copies of data to reduce latency and load, potentially serving stale data.