CAP theorem explained

The CAP theorem is one of the most cited and most misunderstood concepts in distributed systems. Most explanations generate confusion rather than clarity. This guide explains what it actually means, why partition tolerance is not optional, and — most usefully — how to apply CAP reasoning concretely in a system design interview.

What the CAP theorem actually states

Eric Brewer proposed the CAP conjecture in 2000 and it was formally proved by Gilbert and Lynch in 2002. The theorem states: a distributed data system cannot simultaneously guarantee all three of the following properties.

Consistency (C). Every read returns either the most recent write or an error. There is no scenario in which a client reads stale data from a node that has not yet received a recent write. This is linearisability — the system behaves as if all operations are executed on a single, serialised log, regardless of how many nodes actually participate.

Availability (A). Every request to a non-failing node receives a response — not an error, not a timeout. The system remains operational and responsive. Availability in the CAP sense is a strong property: it says every functioning node must respond to every request, not just "most requests most of the time."

Partition Tolerance (P). The system continues to operate even when some messages between nodes are dropped or delayed due to a network partition. A partition is a communication failure between two subsets of nodes: Node A cannot reach Node B, even though both are functioning correctly individually.

The theorem says you can have at most two of these three at the same time. This is often illustrated as a triangle where you pick two vertices — CA, CP, or AP — and sacrifice the third.

The CA option is misleading in practice. You cannot give up partition tolerance in a distributed system, because network partitions happen regardless of whether you designed for them. A system with no partition tolerance is a single-node system. Any multi-node system will eventually experience a partition. The real choice is between C and A — but only during a partition event. When the network is healthy, you can have both C and A simultaneously. CAP only constrains your behaviour during the failure scenario.

Why partition tolerance is not a choice

This is the most important clarification in understanding CAP, and it is the point most explanations get wrong. The framing "pick two of three" implies that partition tolerance is optional. It is not.

Consider a system with two database nodes, Node A and Node B, replicating data between them. A network partition occurs: Node A and Node B can no longer communicate. A write comes in to Node A. Node A has two options. Option 1: accept the write, update its local state, return success to the client. The system is available. But if Node B later accepts a read for the same key, it returns the old value — inconsistency. Option 2: refuse the write, return an error to the client. The system is consistent (no stale reads possible) but unavailable.

There is no Option 3 that avoids the choice. You cannot update both nodes if they cannot communicate. You must choose: accept a write with risk of inconsistency (AP), or refuse the write to preserve consistency (CP). The partition forces the choice. You cannot opt out of partition tolerance; you can only decide in advance what you will sacrifice when one inevitably occurs.

The practical implication: when choosing a database, you are not choosing between three theoretical properties. You are choosing how your system behaves during the network failures that will definitely happen. Frame the conversation that way in interviews.

CP systems: consistency during partition

A CP system chooses to reject requests rather than serve potentially inconsistent data during a partition. It sacrifices availability to guarantee that every successful response reflects the current state of the system.

Zookeeper. Used for distributed coordination, leader election, and configuration management. When a network partition creates a minority partition (nodes that cannot reach a quorum), those nodes stop serving reads and writes entirely. They return errors until connectivity is restored and they can rejoin the quorum. Applications that rely on Zookeeper for lock management or leader election will stall during the partition — by design. The correctness guarantee (only one leader, consistent config) is more valuable than availability for these use cases.

etcd. The configuration store for Kubernetes. Same quorum-based approach as Zookeeper — minority partitions become unavailable. This is the correct choice for a system that controls cluster state: a split-brain in the Kubernetes control plane would be worse than a brief control plane unavailability.

HBase. Built on top of HDFS with a single master architecture. Writes go through the master; if the master is partitioned from region servers, writes are rejected. Consistent, but not available during the partition. The right choice for big-data workloads where correctness of large sequential scans matters more than write availability during failures.

When to choose CP. When the cost of a stale read is high — financial transactions, distributed locks, configuration management, leader election, inventory reservation. Any use case where two clients simultaneously reading "the answer" and getting different values causes a correctness problem downstream.

AP systems: availability during partition

An AP system chooses to keep serving requests during a partition, accepting that different nodes may temporarily return different values for the same key. It sacrifices consistency to guarantee that clients always get a response.

Cassandra. Uses a quorum model with tunable consistency. By default, writes are acknowledged when W nodes confirm (W can be 1, quorum, or all). Reads are served when R nodes respond (R can be 1, quorum, or all). When W + R ≤ replication factor, you can have a scenario where a write is confirmed by fewer nodes than a read queries — creating a window of inconsistency. Cassandra's eventual consistency means all nodes will eventually converge to the same value, but reads during the convergence window may return stale data. AP systems like Cassandra are used when the cost of a stale read is low: a social media post appearing with a slightly wrong like count, a product recommendation being 2 seconds stale, a user profile taking a moment to update across regions.

DynamoDB (default configuration). Eventually consistent reads by default. You can opt into strongly consistent reads (paying higher latency) for operations where staleness is unacceptable. DynamoDB gives you the AP default with a per-request CP escape hatch — an elegant design that aligns with the PACELC model.

DNS. Globally distributed, AP by definition. DNS propagation takes minutes to hours. During that window, different clients resolve the same domain to different IP addresses. This is acceptable because the cost of a stale DNS resolution is a misdirected request, not data corruption.

When to choose AP. When the cost of a stale read is low — social feeds, product catalogues, user preferences, recommendation systems, analytics dashboards. Any use case where brief inconsistency is a minor UX issue, not a correctness problem.

Run it in the simulator

SysSimulator's chaos engineering panel lets you inject network partitions directly into your running architecture and observe CP vs AP behaviour in real time.

Load any blueprint with a replicated database (the Distributed Cache or E-Commerce blueprints work well). Inject a network partition — splitting the database cluster into two groups that cannot communicate. Watch how the system responds: if configured with strong consistency (CP-mode), write requests to the minority partition return errors and read requests stall. If configured with eventual consistency (AP-mode), writes continue to be accepted on both sides of the partition, and reads from each side return different values for the same key.

The metrics dashboard shows you the concrete cost of each choice: CP mode shows error rate spiking during the partition with zero stale reads; AP mode shows zero errors but read inconsistency count climbing as the two sides diverge. The partition heals and you watch eventual consistency resolve — the two sides reconcile and inconsistency count returns to zero.

This is the difference between understanding CAP theoretically and having seen its consequences in a running system. The simulation numbers are what you reference in the interview.

Inject a partition in SysSimulator →

Applying CAP in an interview

The most common CAP-related interview mistake: naming a database without justifying the choice through the product semantics. "I'll use Cassandra because it's AP" is not a sufficient answer. "I'll use Cassandra because the primary read pattern is a social feed where a 1–2 second staleness window is acceptable, and I need the write availability during network partitions to ensure posts are never lost — the cost of a stale read is a slightly out-of-date feed, which users tolerate" is a strong answer.

For every data store in your design, ask and answer: what is the cost of a stale read from this store? If the cost is high (money, correctness, security), choose CP. If the cost is low (UX inconvenience, approximate analytics), choose AP. State the tradeoff explicitly. The interviewer is not checking that you know what CP and AP stand for — they are checking that you can map abstract consistency properties to concrete product consequences.

The PACELC model is worth mentioning in senior interviews: even without a partition, there is a latency-consistency tradeoff. Synchronous replication before acknowledging writes is consistent but slower. Asynchronous replication is faster but introduces a window of inconsistency on normal operation. This is the everyday tradeoff, not just the partition scenario.

The question behind the question

"Can you give up partition tolerance?" No. Partition tolerance is not a design choice in a distributed system — it is an operational reality. The actual choice is what the system does when a partition occurs. Demonstrate that you understand this distinction.

"What's the difference between consistency in CAP and consistency in ACID?" They use the same word to mean different things. ACID consistency means the database is in a valid state before and after a transaction (constraints are enforced). CAP consistency means linearisability — every read reflects the most recent write. These are entirely different properties. ACID consistency is about data validity; CAP consistency is about distributed visibility timing.

"Is Cassandra always AP?" No. Cassandra's consistency is tunable per operation. With CONSISTENCY ALL, Cassandra requires all replicas to respond — effectively CP behaviour at the cost of high latency. With CONSISTENCY ONE, it is strongly AP. The default is eventual consistency (AP). The right answer is: Cassandra provides tunable consistency, defaulting to AP, with per-operation consistency levels allowing CP semantics when needed.

"What is the difference between CAP and PACELC?" CAP describes the partition scenario only. PACELC adds that even without a partition, there is a latency-consistency tradeoff. A system that acknowledges writes synchronously across all replicas is more consistent but slower than one that acknowledges locally and replicates asynchronously. PACELC gives a more complete picture of real-world database behaviour.

Frequently asked questions

What does the CAP theorem actually mean?
During a network partition, a distributed system must choose between consistency (refusing requests that might return stale data) and availability (serving requests even if data may be stale). Partition tolerance is not optional — partitions happen. The theorem constrains your behaviour during that partition event.

Is partition tolerance optional?
No. Any multi-node distributed system will eventually experience a network partition. Designing without partition tolerance means designing a single-node system. The CAP choice is between C and A during partitions, not between P and something else.

What are examples of CP and AP systems?
CP: Zookeeper, etcd, HBase, Consul — stop accepting requests on minority partitions to preserve consistency. AP: Cassandra (default), DynamoDB (default), CouchDB, Riak — keep serving requests during partitions, accepting temporary inconsistency.

How do you choose between CP and AP in an interview?
Map to product semantics: what is the cost of a stale read from this store? High cost (financial, locks, config) → CP. Low cost (feeds, recommendations, analytics) → AP. State the tradeoff explicitly in terms of the product consequence, not just the database name.

What is the PACELC theorem?
An extension of CAP. When there is no partition (the normal case), there is still a latency vs consistency tradeoff: synchronous replication is consistent but slower; async replication is faster but stale. PACELC is a more complete model for everyday database selection, not just failure scenarios.

Inject a partition in SysSimulator →   Chaos engineering scenarios

Next in the series: Consistent hashing explained →