Design a notification system

Notification systems are underestimated in interviews. They appear simple — just send a push notification — but the real challenges are fan-out at scale, multi-channel delivery, priority queuing, rate limiting, and handling delivery failures across third-party providers. This guide covers the full architecture and the failure narration that distinguishes candidates who have operated these systems from those who have only used them.

What the interview is really asking

Fan-out throughput. Sending a single notification to 100 million users is a fan-out problem. The interviewer is checking whether you understand that this requires a queue-based architecture with high parallelism, not a synchronous loop that calls APNs/FCM one user at a time. The follow-up "how do you send 10 million notifications in 30 seconds?" is the specific probe.

Multi-channel architecture. Production notification systems support push (iOS via APNs, Android via FCM), email (via SMTP or transactional email APIs), SMS (via Twilio/Vonage), and in-app notifications (real-time via WebSocket). Each channel has different latency characteristics, cost structures, and failure modes. The interviewer is checking that you design a unified notification pipeline with channel-specific adapters, not a separate system per channel.

Priority and reliability tiering. A payment failure alert and a "You have a new follower" notification are not equally urgent. The first must be delivered immediately with retry-until-success semantics. The second can be delayed or dropped if delivery is difficult. The interviewer is checking that you distinguish transactional vs marketing notifications and design priority queues accordingly.

Rate limiting and user preferences. An unthrottled notification system can become a spam engine. The interviewer is checking that you build rate limiting per user and per channel, and that you respect user opt-out preferences — both as a UX concern and as a legal compliance requirement (GDPR, CAN-SPAM).

Back-of-envelope estimation

Notification volume. A mid-to-large platform: 100 million MAU, each receiving an average of 10 notifications/day across all channels. 100M × 10 / 86,400 = ~11,600 notifications/second at average. At peak (marketing campaign launch, major event): 50× burst = ~580,000 notifications/second. This is the throughput that drives the fan-out architecture design.

Channel breakdown (typical). ~60% push (mobile), ~30% email, ~8% in-app, ~2% SMS. Push notifications dominate by volume. SMS is the most expensive (typically $0.005–$0.05 per message) and most sparingly used. Email has the lowest cost and highest tolerance for latency.

APNs/FCM throughput limits. APNs allows approximately 10,000 push notifications per second per HTTP/2 connection. FCM supports similar throughput. With 100 worker processes each maintaining a connection pool: 100 × 10,000 = 1,000,000 push notifications/second maximum dispatch rate. This covers even peak campaign loads.

Storage for notification history. Notification record: ~500 bytes (recipient, content, channel, timestamps, delivery status). At 11,600 notifications/sec: 11,600 × 500 bytes = 5.8 MB/sec = ~500 GB/day. Keep 90 days of history: ~45 TB. Partition by creation date in Cassandra (write-heavy, time-series access pattern).

Architecture decisions and why

Three-layer pipeline. Every notification flows through three stages. (1) Notification API: receives notification requests from product services (e.g., "send payment confirmation to user 12345"), validates, enriches with user contact details, publishes to a priority queue. (2) Fan-out service: for bulk notifications (marketing campaigns), reads the recipient list and publishes one job per recipient to the channel-specific dispatch queue. For single-user notifications (transactional), publishes directly to the dispatch queue. (3) Channel workers: pool of workers consuming from dispatch queues, calling channel-specific APIs (APNs, FCM, SES, Twilio) and recording delivery status.

Priority queues. Two separate Kafka topics per channel: HIGH_PRIORITY (transactional: OTPs, payment alerts, security notices) and LOW_PRIORITY (marketing: promotions, newsletters, recommendations). Channel workers consume from the high-priority topic first. When it is empty, they consume from low-priority. This ensures transactional notifications are never delayed by a large marketing campaign sending simultaneously. The priority inversion is explicit and measurable.

Device token registry. A user's push notification tokens are stored in a device token registry (PostgreSQL or Cassandra): user_id → list of (device_token, platform, created_at, last_seen). A user may have multiple devices (phone + tablet). The fan-out service fetches all active device tokens for a user before dispatching. Invalid tokens (returned by APNs/FCM as "invalid token" errors) are immediately removed from the registry. This prevents wasted API calls on uninstalled devices.

Idempotency for at-least-once delivery. Workers retry failed notifications (transient errors: network timeout, rate limit) with exponential backoff. This means a notification may be dispatched more than once. Deduplication: each notification has a unique ID. APNs supports an apns-collapse-id header that merges duplicate notifications on the device side. FCM has a collapse_key for the same purpose. For email and SMS, the notification ID is stored per recipient — before dispatch, check whether this notification ID has already been successfully delivered.

User preference check. Before dispatching to any channel, the worker checks the user's notification preferences (opt-out state per channel per category). Preferences are cached in Redis with a 5-minute TTL. For GDPR compliance, checking preferences is not optional — it must happen for every notification before dispatch. Preference violations are logged for audit.

Delivery receipts and analytics. APNs and FCM provide delivery receipts asynchronously (delivered, failed, device_offline). An analytics service consumes these receipts from a Kafka topic, updates the notification status in the history store, and emits metrics: delivery rate per channel, latency from publish to delivery, failure rate by error code. Delivery analytics feed back into the product — low delivery rates for a notification type signal that users are opting out or uninstalling the app.

Run it in the simulator

Load the Notification Service blueprint in SysSimulator. The blueprint models the API layer, priority queues (high/low), fan-out workers, channel adapters (push, email), and a delivery receipt processor.

Set traffic to 10,000 notifications/second (a mid-scale platform). Set 80% to low-priority and 20% to high-priority. Observe queue depth for both priorities — at steady state, the high-priority queue should be near-empty (processed immediately), the low-priority queue should have moderate depth but be draining steadily.

Inject a marketing campaign spike — 10× increase in low-priority notifications. Watch: low-priority queue depth climbs, but high-priority notifications continue to be processed without delay. This is the priority queue design working correctly: campaigns don't starve transactional notifications.

Then inject an APNs gateway timeout — the push channel becomes unavailable. Watch: push workers back off and retry; queue depth climbs for push notifications while email continues to drain normally. The multi-channel isolation means one channel failure doesn't affect others.

Open Notification Service blueprint →

Failure narration — word for word

"I'll inject an APNs gateway failure — Apple's push notification service returns timeouts for all push requests. This affects iOS push delivery only."

"[inject] Push workers start getting timeouts from APNs. They back off with exponential backoff — first retry at 1s, then 2s, 4s, 8s. The push dispatch queue depth climbs because workers are blocked waiting for retries. High-priority push notifications (OTPs, payment alerts) are in the high-priority queue — they're also queuing, but they jump ahead when workers become available."

"The blast radius: iOS users don't receive push notifications during the outage window. Android (FCM) and email channels are completely unaffected — they use separate worker pools and separate queues. In-app notifications (WebSocket) also unaffected — they're a separate path entirely."

"Recovery: when APNs comes back, workers resume consuming the queue. High-priority messages are processed first. Total time to clear the backlog depends on queue depth — at 10,000 push/sec capacity and 30 seconds of backlog at 8,000 push/sec net accumulation rate, the backlog is ~240,000 notifications, cleared in ~30 seconds at full capacity. Users receive delayed notifications in order."

The question behind the question

"How do you handle users who have notifications turned off?" Preference check happens before dispatch — if a user has opted out of a notification category, the notification is dropped at the fan-out stage and logged as "suppressed." For legally mandated notifications (account security alerts, data breach notifications), opt-out is not honoured — these are dispatched regardless of preference settings. The distinction must be documented in the notification type definition.

"How do you handle a notification that needs to be cancelled after it's been queued?" Cancellation requires marking the notification as cancelled in a status store before workers process it. Workers check the status before dispatching. For notifications already dispatched, cancellation is not possible on the delivery infrastructure side — APNs and FCM don't support cancellation of already-dispatched notifications. The only option is sending a follow-up notification that corrects or retracts the original.

"How do you aggregate multiple notifications into a digest?" A digest worker groups notifications by user within a time window (e.g., "You have 5 new followers" instead of 5 separate notifications). The worker buffers individual notification events per user in Redis, runs on a schedule (every 5 minutes), aggregates buffered events into a single digest notification, and dispatches the digest. The aggregation window is configurable per notification type.

Frequently asked questions

How do you send 10 million push notifications in 30 seconds?
Fan-out via Kafka, parallel dispatch workers, HTTP/2 multiplexed connections to APNs/FCM. Each worker maintains a persistent connection pool and processes thousands of notifications/sec. 100 workers × 10,000 push/sec = 1M push/sec capacity. 10 million delivered in ~10 seconds.

What is the difference between transactional and marketing notifications?
Transactional: triggered by user action, required immediately, never batched (OTP, payment, security). Marketing: bulk, scheduled, tolerate delay (promotions, newsletters). Separate priority queues ensure campaigns never delay transactional notifications.

How do you handle notification delivery failures?
Invalid device token: remove from registry. Device offline: APNs/FCM queue server-side, deliver on reconnect. Transient error: exponential backoff retry. Persistent failure: log, alert on-call. Track per-device delivery rates and suppress consistently failing devices.

How do you prevent notification spam?
Redis rate limiters per user/channel/category. User preference opt-outs checked before every dispatch. Transactional notifications exempt from marketing opt-outs. GDPR compliance: preference violations logged for audit. Digest aggregation for high-frequency notification types.

Run this in SysSimulator →   Browse all blueprints

Next: Design a web crawler →