Caching is the most leveraged performance technique in distributed systems — and the most misapplied. The wrong pattern for the wrong use case creates subtle consistency bugs, cache stampede failures, and hot node problems that are hard to diagnose under production load. This guide covers every major caching pattern, when to use each, and the failure modes that interviewers probe.
In almost every system design interview question, caching appears at some point. The interviewer expects you to make a caching decision and justify it. Candidates who say "add a Redis cache" and move on are giving the minimum viable answer. Candidates who specify the cache pattern (cache-aside? write-through?), the eviction policy (LRU? LFU?), the TTL reasoning, the stampede prevention strategy, and the hot key handling are demonstrating the depth that earns senior and staff-level signals.
Caching decisions are also a proxy for understanding the read/write ratio of your system, the acceptable staleness window for your data, the consistency requirements of your use case, and the operational failure modes your team must be prepared to handle. Interviewers who ask "how do you cache this?" are really asking all of those questions simultaneously.
The patterns in this guide appear across every major system design topic: Twitter's timeline (cache-aside, fan-out pre-population), payments (explicitly no caching for financial data), rate limiters (Redis counter with TTL), YouTube (CDN as distributed cache for video segments), URL shorteners (cache-aside for redirect path). Knowing these patterns by name and tradeoff lets you apply them precisely in any context.
Cache-aside is the default pattern for most read-heavy systems. The application is responsible for cache population: reads check the cache first, miss on a cold key, read from the database, write to the cache, and return the result. Subsequent reads hit the cache until the key expires or is evicted.
Read path: GET from cache → if hit, return. If miss: GET from database → SET in cache with TTL → return result.
Write path: Write to database → delete (invalidate) the cache key. On the next read, the key is fetched fresh from the database and re-cached. Do not update the cache on write — updating creates a race condition between the write and any concurrent read that might populate the cache with stale data simultaneously.
When to use: Read-heavy workloads where the data changes infrequently relative to how often it is read. User profiles, product catalogue pages, article content, configuration data. Accepts a staleness window equal to the key TTL.
Failure mode: Cold start — when the cache is empty (after a restart or full eviction), every request is a cache miss until the cache warms up. This can cause a thundering herd on the database during recovery. Mitigation: cache warming (pre-populating hot keys before cutting over), or gradual traffic ramp after cache restart.
Numbers to commit: Redis GET takes ~0.2ms. A database read takes 5–15ms. At 10,000 RPS with 95% hit rate: 9,500 Redis reads + 500 database reads per second. Without cache: 10,000 database reads per second — 20× the database load for the same traffic. This is the leverage that makes caching worth its operational complexity.
Write-through keeps the cache and database synchronised by writing to both on every update. When a write comes in: write to cache AND write to database in the same operation (or in sequence), returning success only when both writes complete.
Write path: SET in cache → INSERT/UPDATE in database → return success. (Or database first, then cache — order matters for consistency semantics.)
Read path: GET from cache → always a hit (the cache contains all data that has ever been written). If the key is not in cache, it was never written through — a genuine miss, not a cold miss.
When to use: Write patterns where reads must see recent writes immediately. Session data (users should see their own writes instantly), shopping cart contents (the cart must be consistent between devices), user preference updates. Also useful when cache miss cost is very high — write-through ensures the cache is always warm for data that has been written.
Downside: Write latency is doubled — every write pays two storage latencies. For systems where write latency matters (high-frequency writes), this is prohibitive. Write-through is most appropriate when write volume is low relative to read volume, which is the common case for user data.
Important distinction from cache-aside writes: Cache-aside invalidates (deletes) the cache key on write, forcing a re-fetch on next read. Write-through updates the cache key on write. Write-through ensures the next read is always a cache hit; cache-aside with invalidation ensures the next read is always fresh. Both prevent stale reads but via different mechanisms.
Write-behind decouples the write acknowledgement from the database write. The application writes to the cache, receives an immediate success response, and a background process asynchronously flushes dirty cache entries to the database.
Write path: SET in cache (mark as dirty) → return success immediately. Background worker: read dirty keys → write to database → mark as clean.
Read path: GET from cache → always a hit for recently written data (since writes land in cache first). May miss for data that was never written through or has been evicted before being read.
When to use: Write-intensive workloads where database write throughput is the bottleneck. Counters (view counts, like counts, real-time metrics), session state with frequent small updates, gaming leaderboards. The aggregate of many small writes is flushed as fewer larger database writes, reducing write amplification.
Critical failure mode: If the cache node crashes before dirty entries are flushed, those writes are lost. Write-behind is inappropriate for any data where loss is unacceptable: payment records, user account data, order history. It is appropriate for data where approximate or slightly stale values are acceptable: view counts, analytics counters, temporary state. Always ask: "Can we lose the last N seconds of writes to this data?" If no, do not use write-behind.
Read-through is similar to cache-aside but the cache itself (rather than the application) is responsible for loading missing data from the database. On a cache miss, the cache calls the database, populates itself, and returns the result. The application never speaks to the database directly.
When to use: When the database access logic is complex and you want it centralised in the cache layer rather than duplicated across application servers. Some managed cache systems (DynamoDB Accelerator, Toku) implement read-through natively.
In practice: Most applications use cache-aside rather than read-through because it gives the application more control over what gets cached and how. Read-through is more common in cache frameworks than in hand-rolled Redis usage.
When a cache reaches its memory limit, it must evict keys to make room. The eviction policy determines which keys are removed. Redis supports multiple eviction policies; the two most important for interviews are LRU and LFU.
LRU (Least Recently Used). Evicts the key that was accessed least recently. Assumes temporal locality: if a key hasn't been accessed in a while, it's unlikely to be accessed soon. Works well for workloads where recently accessed data is likely to be accessed again — most web application read patterns. LRU is Redis's default eviction policy (allkeys-lru).
LFU (Least Frequently Used). Evicts the key that has been accessed fewest times. Assumes frequency locality: a key accessed 10,000 times in the past is likely to be accessed again, even if the last access was an hour ago. Works better for workloads with stable popular items that may not be accessed recently but have high lifetime access frequency. Redis added LFU support in version 4.0 (allkeys-lfu). Use LFU when you have long-lived popular items (product pages for best-selling products, static configuration loaded infrequently but read constantly).
The practical difference. LRU can evict a popular product page that hasn't been accessed in 2 hours because it was idle during off-peak hours — even though it will be accessed 10,000 times tomorrow morning. LFU would retain it because its frequency score is high. For most e-commerce caches, LFU produces better hit rates for popular items. For caches that mirror recent activity (news feeds, trending topics), LRU is more appropriate.
A cache stampede is one of the most common production failure modes and one of the most important patterns to name explicitly in interviews. It occurs when a popular cache key expires and many concurrent requests simultaneously experience a cache miss, all rushing to query the database and repopulate the cache.
The problem: with 10,000 RPS and a 95% cache hit rate, the database handles 500 RPS of cache misses. If the popular key expires, all 10,000 RPS suddenly miss the cache and hit the database simultaneously — a 20× database load spike in the space of one request duration.
Prevention strategy 1: Probabilistic early expiration (PER). Re-cache a key slightly before its TTL expires, with probability increasing as the TTL approaches. The formula: a background process checks keys near expiry and re-fetches them with probability proportional to (remaining_TTL / total_TTL)^-1. Early rehydration is distributed randomly across the TTL window, preventing all keys from expiring simultaneously. This is the most elegant solution and the one to describe in interviews.
Prevention strategy 2: Mutex lock. When a cache miss occurs, the first request acquires a distributed lock (Redis SETNX), queries the database, and populates the cache. Other requests that miss the cache concurrently wait for the lock to release, then read the freshly populated cache value. Simple to implement; creates a brief head-of-line blocking delay for the lock waiters. Acceptable for single-key stampedes; problematic for stampedes across thousands of keys simultaneously.
Prevention strategy 3: Jittered TTLs. Instead of setting all keys with a fixed TTL (e.g., 3600 seconds), add random jitter: TTL = 3600 + random(0, 600). Keys set at the same time will expire across a 10-minute window rather than simultaneously. Simple to implement; significantly reduces stampede risk for keys populated in bulk (e.g., at cache warm-up or after a full invalidation).
Consistent hashing distributes keys uniformly across cache nodes — but some keys are accessed far more frequently than others. A product page for a viral item might receive 100,000 requests per second while the average key receives 10. That single key, and therefore that single cache node, becomes the bottleneck regardless of how many total nodes you have.
Detection. Redis's redis-cli --hotkeys command identifies hot keys using LFU frequency sampling. In production, monitoring per-key access rate via a sampling layer or your cache client library reveals hot keys before they become bottlenecks.
Local in-process cache. The most effective solution for the hottest keys: store them in application server memory (a small bounded LRU map, e.g., 1,000 entries). Zero network hop — the lookup is a hash map access in the same process. The trade-off: each application server has its own copy, so writes must invalidate across all servers (broadcast invalidation via pub/sub or short TTLs). For read-dominated hot keys that change rarely, this is the right answer. Example: a configuration value read on every request but updated once a day.
Key replication. Store the hot key on multiple cache nodes with a numeric suffix: hot_product:1, hot_product:2, ..., hot_product:N. Reads are distributed round-robin or randomly across the replicas. Writes update all N replicas. For a key receiving 100,000 RPS distributed across 10 replicas, each replica sees 10,000 RPS — back in the normal range. N should be proportional to the hot key's request rate divided by the per-node capacity.
Load the Distributed Cache blueprint in SysSimulator to observe all of these patterns under real simulated load.
Start with cache-aside at 20,000 RPS. Open the Metrics view and note: cache hit rate, database read RPS, p99 latency for cache hits vs misses. This is your baseline. Then change the eviction policy from LRU to LFU and observe whether hit rate improves for the configured access pattern.
Inject a cache stampede via the Chaos panel. Watch: hit rate drops to zero, database load spikes 20×, connection pool exhaustion, p99 latency collapse. Record the recovery time with and without probabilistic early expiration enabled — the difference is the value of the mitigation in concrete seconds.
Then configure a hot key scenario and inject a hot node failure — the specific node holding the hot key goes down. With consistent hashing and no hot key replication, all hot key traffic falls through to the database. With local in-process cache enabled on app servers, the hot key continues to be served from memory despite the cache node failure. This demonstrates the value of the local cache layer for truly hot data.
Open Distributed Cache blueprint →
Use this framework in interviews when you reach the caching decision point:
Step 1: What is the read/write ratio? Mostly reads → caching adds high value. Mostly writes → caching adds less value and write-through or write-behind overhead may dominate.
Step 2: What is the acceptable staleness window? No staleness (financial data, sessions) → write-through or no cache. Seconds of staleness acceptable → cache-aside with short TTL. Minutes acceptable → cache-aside with longer TTL. Near real-time writes, eventual reads → write-behind.
Step 3: What is the cost of a cache miss? High cost (slow origin, overloaded database) → invest in cache warm-up, probabilistic early expiration, mutex locking. Low cost (fast database, low traffic) → simple cache-aside with TTL expiry.
Step 4: Are there hot keys? Yes, identified or expected → local in-process cache for the hottest keys + key replication for the next tier. No → consistent hashing with virtual nodes is sufficient.
What is the cache-aside pattern?
The application checks the cache on reads; on a miss, fetches from the database and populates the cache. On writes, invalidates the cache key. The cache only holds data that has been requested — no unused data is cached. The most common caching pattern for read-heavy workloads.
What is the difference between write-through and write-behind?
Write-through writes to cache and database synchronously — consistent, but doubles write latency. Write-behind writes to cache first and flushes to database asynchronously — faster writes, but risks data loss if the cache crashes before flushing. Write-through for correctness-critical data; write-behind for high-frequency counters and temporary state.
What is a cache stampede and how do you prevent it?
Multiple concurrent cache misses on a popular key all hitting the database simultaneously. Prevention: probabilistic early expiration (re-cache before TTL expires), mutex lock (only one request queries the database on a miss), jittered TTLs (spread expiry across a time window).
What is LRU vs LFU eviction?
LRU evicts the least recently accessed key — good for temporal locality workloads. LFU evicts the least frequently accessed key — better for long-lived popular items that may be idle but have high lifetime access rates. Redis supports both; LRU is the default.
How do you handle hot keys in a distributed cache?
Local in-process cache in application servers (zero network hop, best for top-10 hot keys), key replication across multiple cache nodes (distribute read load), or read-through replicas. Detect hot keys with Redis --hotkeys flag or client-side access frequency monitoring.
Run this in SysSimulator → Browse all blueprints
Start from the top: Design Twitter's system architecture →