The most common senior system design interview question — and the one that exposes the widest range of preparation gaps. This guide covers the exact estimation, tradeoff reasoning, and failure narration that earns a hire signal.
When an interviewer says "design Twitter," they are not asking whether you know what Twitter is. Everyone in the room knows what Twitter is. They are evaluating four specific skills in roughly 45 minutes.
Load reasoning. Can you derive numbers rather than memorise them? The interviewer will not accept "it scales." They want: "At 100M DAU with 1% peak concurrency, I have roughly 1 million concurrent users. With an average of two read requests per session, that's approximately 100,000 read RPS at sustained load, peaking higher during trending events." Numbers derived from first principles, not recited from a prep blog.
Failure narration. "What if the cache goes down?" is not a trivia question. The expected answer has a structure: what breaks first, what cascades in what order, the blast radius, how the system degrades gracefully rather than collapsing entirely, and the mitigation. Each step should have a number attached. Skipping any of these steps signals a gap even if the overall answer sounds correct.
Cost awareness. "Why not just add ten more Redis nodes?" is a trap. The right answer has a dollar number. Engineers who reason about cost at the design stage get hired at staff level. Engineers who handwave it do not.
Tradeoff articulation. Every architecture decision should be followed by the word "because" and a number. Not "I chose Redis for the timeline cache" — "I chose Redis because timeline reads are the dominant workload at 100K RPS, and Redis delivers sub-millisecond latency at that scale. The alternative — reading from Cassandra directly — would add 5–15ms per request and require 3× the database replicas to handle the read volume." The "because" is the signal. The number makes it credible.
This guide delivers each of those four skills applied to Twitter specifically. Work through it in order, run the simulation, and practise the narration script out loud before your interview.
Do not skip this section. Most system design guides open with architecture diagrams and treat estimation as a footnote. Interviewers at top companies open with "walk me through your assumptions" — and if you can't derive the numbers, the rest of the interview suffers. This is also the section that differentiates senior from mid-level: seniors estimate, juniors describe.
Daily active users and concurrency. Twitter has roughly 250M monetisable daily active users. For an interview, 100M DAU is a clean number that produces round estimates. At any given moment, approximately 1% of DAU are concurrently active — this is a real-world rule of thumb that holds across most consumer social platforms. That gives us 1 million concurrent users at peak.
Read volume. A typical session involves viewing a timeline (one read) and scrolling through it (one to two more reads). Call it 2 read requests per user per session. Distributed across 86,400 seconds in a day: 100M × 2 / 86,400 ≈ 2,300 RPS sustained. But sustained average is misleading — peak is what your system must handle. Peak is typically 5–10× sustained for social platforms. Set your target at 23,000 RPS sustained, with headroom designed for 100,000 RPS at peak (trending events, breaking news).
Write volume. Twitter's historical write-to-read ratio is approximately 1:100. At 23,000 read RPS, that's roughly 230 write RPS (new tweets) sustained. This seems low — it is. Writes are cheap at Twitter scale. The expensive part is not writing the tweet; it's the fan-out that follows it.
Fan-out amplification. Every tweet triggers timeline updates for each follower. The average Twitter user has 200 followers. At 230 write RPS, fan-out generates: 230 × 200 = 46,000 timeline update operations per second sustained. At peak write volume (say 2,300 write RPS), that's 460,000 timeline updates per second. This is the number that drives your entire write architecture — not the raw write RPS, but the fan-out amplification.
Storage. Each tweet is roughly 280 characters (560 bytes UTF-8) plus metadata. At 230 writes/sec: 230 × 560 bytes × 86,400 seconds ≈ 11 GB/day raw tweet storage. For media (images, video), multiply by 10–50× depending on attachment rate. Ten years of tweets: roughly 40 TB for text, measured in petabytes with media.
Write these numbers down at the start of the interview. Interviewers take note when a candidate derives estimates from first principles rather than reciting them. The derivation signals quantitative reasoning; the number enables everything that follows.
Walk through design decisions in order of load impact, not alphabetical component order. The 100K read RPS dominates — design the read path first.
Read path: Redis timeline cache. At 100K read RPS, a cold database query per request is a non-starter. MySQL or Cassandra at 100K RPS would require dozens of replicas and still deliver 5–15ms latency. Redis delivers sub-millisecond reads at this volume from a modest cluster. The approach: pre-compute each user's timeline as a sorted list in Redis. On read, the app server calls Redis — one network hop, no join, no aggregation. Cache hit rate target: 95%+ for logged-in users.
The core Twitter tradeoff: fan-out on write vs fan-out on read. This is the question interviewers ask most often, and the question behind every follow-up for the next 20 minutes. You must know both sides precisely.
Fan-out on write: when a user tweets, immediately write that tweet into every follower's Redis timeline cache. A user with 200 followers triggers 200 Redis writes per tweet. Fast reads (the cache is already assembled), expensive writes (amplification proportional to follower count). Works well for 99% of users. Catastrophically fails for celebrities: Lady Gaga with 30 million followers generates 30 million Redis writes per tweet — at several tweets per day, this is 300M+ cache writes/day just from one account, saturating your write tier.
Fan-out on read: assemble the timeline at read time by fetching the latest tweets from everyone the user follows. No write amplification. Slow reads — a user following 500 accounts requires 500 DB lookups or a complex join per timeline load, adding 200–500ms latency. Acceptable for users who follow few accounts; unusable at scale for users following hundreds.
Twitter's hybrid solution: fan-out on write for users with fewer than ~10,000 followers. Fan-out on read for celebrities. At read time, the app server merges the pre-computed timeline from Redis with a small set of celebrity tweets fetched on demand. The merge is cheap because the celebrity set is small and known in advance (maintained separately). This is the specific answer interviewers are looking for — state it explicitly, name the threshold, and explain why the threshold exists.
Why Kafka. At 230 write RPS with 200× fan-out amplification, the tweet writer cannot synchronously perform 46,000 Redis writes before returning a response to the user. A synchronous call chain would block for 1–5 seconds. Kafka decouples the tweet write from the fan-out: the tweet is written to Cassandra and a Kafka topic in milliseconds, then returned to the user as "posted." Kafka consumers (fan-out workers) read the topic and perform the Redis writes asynchronously. Fan-out latency becomes invisible to the writer. Fan-out worker failure becomes recoverable — the Kafka offset holds the position, no tweets are lost.
CDN for media. At 100K RPS, serving images and video from your own servers is not cost-effective. CDNs cache media at edge PoPs globally, reducing origin requests to a fraction of total requests. The important design question is cache invalidation: for profile pictures and static assets, long TTLs (days) are fine. For ephemeral media or time-sensitive content, shorter TTLs or cache-bust via content-addressed URLs.
Database layer. User profiles and the social graph (follower/following relationships) fit well in MySQL with appropriate indexing. Tweet content goes to Cassandra — write-optimised, wide-column, append-only access pattern aligns with tweet storage requirements. The social graph needs low-latency neighbour lookups ("who follows user X?") which MySQL handles well with indexed foreign keys at Twitter's follower count distribution.
This is the section no static guide can replicate. Load the Social Feed blueprint in SysSimulator and you will see the numbers from the estimation section materialise in real time. More importantly, you will see what breaks — and that's what you narrate in the interview.
Open the simulator and load the Social Feed blueprint. Set traffic to 23,000 RPS (your sustained estimate). Watch the metrics bar: cache hit rate should sit above 95%, p99 latency should be under 50ms, error rate should be near zero. The system is healthy at this load.
Now open the Metrics page (toolbar → Metrics) and note the exact numbers: your p99 latency at 23K RPS, the database connection pool depth, the cache utilisation. Write these down — they are your interview numbers.
Push traffic to 100,000 RPS (peak). Watch what changes: cache hit rate may hold, but p99 latency rises and the database connection pool starts filling. Note when error rate appears — that's your bottleneck. The Metrics page will identify the specific component: this is the component you "add replicas to" in the interview narration.
Then inject a cache stampede via the Chaos panel. Watch the cascade: cache hit rate drops to near zero, every request hits the database, connection pool exhausts within seconds, error rate spikes, p99 goes from 48ms to 2,000ms+. Record the exact numbers. This cascade — with your specific numbers — is what you narrate when the interviewer asks "what happens if the cache fails?"
Practise saying this out loud before your interview. The structure is: what breaks first, what cascades, blast radius, mitigation, recovery. The numbers come from your simulator session — substitute your actual recorded values.
"I'm running the Social Feed architecture at 23,000 RPS — p99 is 48ms and cache hit rate is 97%. I'm going to inject a cache stampede to show what happens when the timeline cache fails during peak traffic."
"[inject] You can see the cache hit rate drops from 97% to near zero immediately. Every timeline read is now a cache miss. The requests that were being served from Redis in under a millisecond are now hitting Cassandra directly. Watch the database — it was handling roughly 700 QPS of cache-miss traffic. It's now absorbing the full 23,000 RPS. Connection pool depth is climbing. Within 800 milliseconds, database connections are exhausted. P99 latency spikes from 48ms to over 2,000ms. Error rate jumps to 31%."
"The blast radius is the entire read path — every user who loads their timeline gets either a timeout or a degraded response. Write operations are unaffected; tweets are still being accepted and fan-out is still queued in Kafka."
"My mitigation: a circuit breaker on the cache client monitors miss rate. When it exceeds a threshold — say 20% miss rate sustained for 500ms — it trips and starts shedding traffic to the database. Instead of 23,000 RPS hitting the DB, we cap it at 2,000 RPS and serve stale timeline data or a 'timeline temporarily unavailable' response to the remaining traffic. This protects the database from exhaustion while the cache repopulates."
"Recovery time depends on cache warm-up speed. With a pre-warming strategy — writing popular timelines back to Redis before the circuit breaker releases — recovery is under 90 seconds. Without pre-warming, a cold cache stampede can recur on restart if you release all traffic at once. Probabilistic early expiration on cache entries prevents the thundering herd on restart."
That answer — specific numbers, named mitigation, recovery timeline, and a production nuance — earns a strong hire signal. The difference between this answer and a vague "the database would get overloaded" is the exact structure and the numbers from the simulation.
Every interviewer follow-up on the Twitter design probes a specific decision. Here are the five most common and what they're actually evaluating.
"Why not fan-out on read for everyone?" Cost: at 100K RPS, a user following 200 accounts requires 200 DB lookups per read. That's 100,000 × 200 = 20 million database reads per second. Even at 1ms per read, you need 20,000 database cores to handle it. The interviewer is checking that you understand why fan-out on write exists — it trades write amplification for dramatically cheaper reads at scale.
"How do you handle celebrities specifically?" The hybrid model: celebrity tweets are excluded from fan-out on write. Celebrities are identified by follower count (threshold typically 10K–100K followers). Their tweets are stored in a separate "celebrity tweet" data structure and injected into the user's timeline at read time. The merge is a small set (users follow few celebrities) and adds negligible latency. The interviewer is checking that you've thought about the edge case, not just the happy path.
"What happens during a thundering herd when the cache restarts?" Every cache key expires simultaneously after a cold restart. All requests hit the database at once, creating a new stampede worse than the original failure. The mitigation is probabilistic early expiration: cache entries are given a random jitter on their TTL (±10%), so expirations are spread over a window rather than concentrated at a single moment. The interviewer is checking production operational awareness.
"How does the timeline stay consistent?" It doesn't need to be strongly consistent — and that's the right answer. A tweet appearing in your timeline 1–2 seconds after posting is acceptable for a social feed. This is eventual consistency by design. The interviewer is checking that you can map consistency requirements to the actual product semantics rather than defaulting to "strong consistency is safer."
"How do you scale the fan-out workers?" Horizontally — each Kafka consumer group can have many consumer instances, each processing a partition. The number of partitions caps the number of parallel consumers. For Twitter's fan-out volume, the bottleneck is Redis write throughput, not Kafka consumer capacity. The interviewer is checking that you understand horizontal scaling in a queue-based system.
What is the hardest part of designing Twitter's system?
The timeline fan-out problem. Pre-computing every follower's timeline update for a celebrity with 50 million followers would require 50 million Redis writes per tweet. Twitter's hybrid model (fan-out on write for regular users, fan-out on read for celebrities) is the core architectural decision that makes the system viable.
Should Twitter use SQL or NoSQL?
Both. User profiles and the social graph fit in MySQL with indexed foreign keys for follower lookups. Tweet storage goes to Cassandra — wide-column, write-optimised, append-only. Redis for pre-computed timelines. Use the access pattern to drive the database choice, not a general preference.
How does Twitter handle 100 million daily active users?
Read path: Redis serves pre-computed timelines at sub-millisecond latency, handling 100K+ RPS from a modest cluster. Write path: Kafka decouples tweet writes from fan-out, making fan-out asynchronous and resilient to consumer failures. CDN serves media at the edge. Every layer is horizontally scaled.
What database does a Twitter-like system use?
MySQL for user and graph data, Cassandra for tweets, Redis for timelines, and a blob store with CDN for media. Real Twitter used MySQL with aggressive caching and a custom in-memory timeline store (Manhattan) before migrating to a hybrid architecture.
How does Twitter's timeline work at scale?
Fan-out on write: when a non-celebrity tweets, a Kafka consumer writes that tweet into each follower's timeline cache in Redis. When the user opens Twitter, the timeline is already assembled — no join, no aggregation, one Redis read. Celebrity tweets are injected at read time to avoid 50-million-write amplification per tweet.
Why does Twitter use Kafka?
Fan-out from a single tweet can generate 200+ Redis writes. Doing this synchronously before returning to the tweet author would add 1–5 seconds of latency. Kafka makes the fan-out asynchronous: the tweet is published instantly, Kafka queues the fan-out work, and consumers process it in the background. Failure is recoverable from the Kafka offset.
Run this in SysSimulator → Browse all blueprints
Next in the series: Design WhatsApp's messaging architecture →