How do you handle hash collisions in a URL shortener?

With hash-based short codes, two different long URLs may produce the same 6-character prefix. The standard mitigation: when inserting a new URL, check if the short code already exists. If it does, append an incrementing suffix or salt the original URL with a random string and re-hash. Repeat until a unique code is found. For high-volume systems, collision probability with 6-character base62 is low (1 in 56 billion per new URL) but nonzero. ID-based generation avoids this problem entirely.

How do you design the analytics pipeline for a URL shortener?

Don't write analytics synchronously in the redirect path — it adds latency to every click. Instead: write a click event to a Kafka topic asynchronously (fire-and-forget, adds <1ms), return the 302 redirect immediately. A stream processor (Kafka consumer or Flink job) reads click events and aggregates counts by short code, time window, referrer, and geography. Aggregated results are written to an analytics database (ClickHouse or BigQuery for high-cardinality queries). This keeps the hot redirect path at <10ms while supporting rich analytics queries.

Design a URL shortener

Q: How does a URL shortener generate short codes?

Two main approaches: (1) Hash-based — hash the long URL (MD5 or SHA-256), take the first 6–8 characters of the base62-encoded hash. Simple but requires collision detection and resolution. (2) ID-based — use an auto-incrementing database ID, encode it in base62 (digits 0-9 plus uppercase and lowercase letters). A 6-character base62 code can represent 62^6 = 56 billion unique URLs. ID-based generation is collision-free by definition and is the approach used by most production URL shorteners.

Q: Why is a URL shortener read-heavy and how do you design for it?

URLs are created once but may be clicked millions of times. The create-to-read ratio is typically 1:100 to 1:10,000 for popular links. The redirect path (short code → long URL lookup → HTTP 301/302) must be as fast as possible. Design: cache the short code → long URL mapping in Redis (sub-millisecond lookups), use HTTP 301 (permanent redirect, browser caches it) or 302 (temporary, server sees every click for analytics). A cache hit rate of 95%+ means the database handles only 5% of redirect traffic.

Q: What is the difference between HTTP 301 and 302 for URL shorteners?

HTTP 301 is a permanent redirect: the browser caches the destination URL and goes directly to it on subsequent clicks, bypassing the shortener server entirely. This reduces server load dramatically but means you lose analytics for repeat visits from the same browser. HTTP 302 is a temporary redirect: every click goes through the shortener server, enabling accurate click tracking and analytics. Most URL shorteners use 302 if click analytics are a product feature, accepting the extra server hop.

The URL shortener is the classic entry-level system design question — deceptively simple to describe, revealing in its edge cases. Interviewers use it early in a loop because it touches every fundamental: hashing, caching, database choice, read/write ratio reasoning, and analytics pipeline design. Getting it right demonstrates systems thinking clarity.

What the interview is really asking

The URL shortener question appears simple because the product is simple. A user pastes a long URL, gets a short one back, and anyone who clicks the short URL gets redirected to the long one. Interviewers use precisely this simplicity to test whether you can identify and reason about the non-obvious challenges.

Read/write ratio reasoning. A URL is created once. It may be clicked thousands of times. The system is massively read-heavy — the redirect path must be optimised for the 99% case (read), not the 1% case (write). Candidates who design a system that queries the database on every redirect are failing the fundamental load analysis.

Hash function selection and collision handling. Generating a short code sounds trivial but has real tradeoffs: hash-based (simple but requires collision detection), ID-based (collision-free but reveals database ID), random (collision-free and opaque but requires uniqueness checks). The interviewer is checking whether you think through edge cases, not just the happy path.

Redirect semantics. HTTP 301 vs 302 is a product design question that has infrastructure consequences. The answer depends on whether click analytics are a product requirement. Knowing this distinction — and explaining it in terms of the product tradeoff — signals API and web infrastructure literacy.

Analytics at scale without slowing down the redirect. If click tracking is a feature, writing to a database on every redirect will become the bottleneck. The interviewer is checking whether you know to decouple the analytics write from the redirect path using an async queue.

Back-of-envelope estimation

Create rate. Assume a Twitter-scale URL shortener: 100 million new short URLs created per day. 100M / 86,400 seconds ≈ 1,160 writes per second. This is modest — a single database writer can handle this volume without sharding.

Read rate (redirect). Popular short URLs are clicked far more than they are created. Assume a 100:1 read-to-write ratio. That gives 100 million redirects per day / 86,400 = ~1,160 RPS sustained. At peak (a viral tweet with a short URL): 10× sustained = ~11,600 RPS from a single popular link. The read path must handle these spikes without touching the database on every request.

Storage. Each URL record: short code (7 bytes), long URL (average ~200 bytes), user ID (8 bytes), created timestamp (8 bytes), expiry (8 bytes), click count (8 bytes) = ~240 bytes per record. At 1,160 writes/sec × 86,400 seconds/day: ~100 million new records/day × 240 bytes = ~24 GB/day. Over 10 years: ~87 TB. This fits comfortably on a single MySQL or PostgreSQL instance with partitioning by creation date. No need for NoSQL or sharding for the URL metadata table.

Short code space. A 7-character base62 code gives 62⁷ = ~3.5 trillion unique codes. At 100 million URLs/day, that's ~35,000 years before exhausting the code space. 6 characters (56 billion) would last ~1,500 years. Use 6 characters for aesthetic cleanness; 7 if you expect truly massive scale.

Cache sizing. The top 20% of short URLs generate 80% of clicks (Pareto distribution). 20% of 100 million URLs = 20 million records × 240 bytes ≈ 4.8 GB. A single Redis instance with 8 GB of memory can cache the entire hot URL set with room to spare. Cache hit rate for the redirect path: 95%+ is achievable, meaning the database sees only 5% of redirect traffic.

Architecture decisions and why

Short code generation: ID-based with base62 encoding. The cleanest approach for production: use a database auto-increment ID (or a distributed ID generator like Snowflake if you need multiple writers). Encode the integer ID in base62 (characters 0-9, a-z, A-Z). ID 1 → "1", ID 62 → "10", ID 3,521,614,606,208 → "ZZZZZZ". This gives you: guaranteed uniqueness (IDs never collide), compact codes (6 characters handles 56 billion URLs), and no collision detection overhead. The only downside: sequential IDs are guessable — users can enumerate all short URLs by incrementing the decoded ID. If that's a concern, XOR the ID with a fixed salt before encoding.

Database: single MySQL with read replicas. The URL mapping is relational and fits naturally in a single table. Write volume (1,160 writes/sec) is well within a single MySQL primary's capacity. Read volume (11,600 RPS) is almost entirely served by Redis cache — the database handles only cache misses. One primary + two read replicas provides write availability and read scalability beyond what the cache doesn't cover. This is a case where the correct answer is "no NoSQL needed" — the scale doesn't justify the complexity.

Redis for the redirect cache. The redirect path is: receive short code → Redis GET → if hit, return 302 to long URL. Cache miss path: Redis miss → MySQL read → Redis SET → return 302. The Redis lookup adds ~1ms to every redirect. Without Redis, the MySQL read adds 5–15ms. At 11,600 RPS, a 95% cache hit rate means only 580 RPS hit MySQL — well within a single instance's capacity. TTL on cache entries: 24 hours for active links, with background refresh for links still receiving traffic.

Redirect: HTTP 302 with async click tracking. Use 302 (temporary redirect) if click analytics are a feature — this ensures every click passes through the server. Return the redirect immediately. After returning the response, write a click event to Kafka asynchronously (in a background goroutine/thread, fire-and-forget). The click event contains: short code, timestamp, user agent, referrer, IP (for geo lookup). A stream processor aggregates these into per-URL click counts and time-series data written to an analytics store. The redirect latency is unaffected by analytics write complexity.

URL expiry. URLs can have an optional expiry date set at creation. A background worker runs every hour and marks expired URLs as inactive in the database and deletes them from the Redis cache. Redirect requests for expired URLs return 410 Gone. This prevents the database from growing indefinitely with abandoned short URLs.

Run it in the simulator

Load the URL Shortener blueprint in SysSimulator. The blueprint models the create path (API → DB writer), the redirect path (load balancer → redirect service → Redis → MySQL), and an async analytics pipeline via a message queue.

Set redirect traffic to 10,000 RPS and observe: Redis hit rate (should be 95%+), database read RPS (should be ~500, the 5% cache misses), p99 redirect latency (should be under 10ms for cache hits).

Inject a Redis failure. Watch: all 10,000 RPS fall through to the database. MySQL goes from 500 RPS to 10,000 RPS instantly. Connection pool exhausts. P99 latency spikes from 8ms to 800ms+. Error rate climbs. Record these numbers — this is your interview answer for "what happens if the cache goes down?"

Open URL Shortener blueprint →

Failure narration — word for word

"The redirect path at 10,000 RPS — p99 is 8ms, cache hit rate is 96%, the database is handling 400 RPS of cache misses. I'll inject a Redis failure to show what happens when the cache layer goes down."

"[inject] The hit rate drops to zero. All 10,000 RPS immediately fall through to MySQL. Database connections climb from 25 to 400 within 3 seconds — that's our connection pool limit. New redirect requests start timing out waiting for a connection. P99 goes from 8ms to 1,200ms. Error rate is now 28%."

"The blast radius: all redirect traffic is affected — this is a full redirect service degradation. Create operations are unaffected because they use the write path, which doesn't depend on Redis."

"My mitigation: a circuit breaker on the Redis client. When Redis becomes unavailable, the circuit breaker trips and the redirect service switches to a 'degraded mode' that queries MySQL directly but rate-limits itself to 2,000 RPS to avoid overwhelming the database. The remaining 8,000 RPS get a 503 with a Retry-After header. It's a partial outage rather than a full one — 20% of requests work, 80% get a clean error they can retry. As Redis recovers, the circuit breaker half-opens, tests connectivity, and gradually re-enables caching. Full recovery in under 2 minutes."

The question behind the question

"What if two users shorten the same long URL?" A product decision: return the same short code (deduplication — saves storage, but original creator loses exclusivity over their link analytics) or generate a new code (every shortening is independent — simpler, correct analytics isolation). Most URL shorteners generate new codes per user per shortening. The interviewer is checking that you identify this as a product decision, not a technical one.

"How do you prevent short URLs from being enumerated?" Sequential base62 IDs are predictable. Mitigations: XOR the ID with a fixed secret before encoding (reversible, but unpredictable without the secret); use a random 6-character code with uniqueness check against the database (simple, collision-unlikely at 6 characters); hash the ID with HMAC. The XOR approach is the cleanest for an interview — simple to explain, secure enough for most use cases.

"How do you scale the write path globally?" Single auto-increment IDs don't work across multiple database writers — concurrent increments can collide. Solutions: (1) pre-allocate ID ranges to each writer (Writer A owns 1–1M, Writer B owns 1M–2M); (2) use a distributed ID generator (Snowflake IDs combine timestamp + machine ID + sequence, globally unique without coordination); (3) use UUIDs (16 bytes, globally unique, but long — encode in base62 gives 22 characters, too long for a short URL).

"What do you do about malicious URLs?" Check submitted URLs against a blocklist (Google Safe Browsing API) at creation time. Block known phishing and malware domains. Run asynchronous re-checks on existing URLs periodically. Return a warning interstitial page instead of a direct redirect for flagged URLs. This is a product safety question, and mentioning it proactively signals that you think about abuse surface area.

Frequently asked questions

How does a URL shortener generate short codes?
ID-based with base62 encoding is the cleanest production approach. Auto-increment a database ID, encode in base62 (0-9, a-z, A-Z). 6 characters covers 56 billion URLs. No collisions possible. Hash-based approaches require collision detection overhead.

Why is a URL shortener read-heavy and how do you design for it?
URLs are created once, clicked many times. At 100:1 read-to-write ratio, the redirect path dominates. Cache the short code → long URL mapping in Redis (sub-millisecond). A 95% hit rate means the database handles only 5% of redirect traffic.

What is the difference between HTTP 301 and 302 for URL shorteners?
301 (permanent): browser caches the destination, future clicks bypass the server — reduces load but loses analytics. 302 (temporary): every click goes through the server — enables analytics at the cost of server load. Choose based on whether click tracking is a product requirement.

How do you handle hash collisions?
With ID-based generation, there are no collisions — IDs are unique by definition. With hash-based generation: check for existing code before inserting; if collision, re-hash with a salt. ID-based avoids the problem entirely.

How do you design the analytics pipeline?
Fire-and-forget: write click events to Kafka asynchronously after returning the redirect. Don't block the redirect on analytics writes. A stream processor aggregates events and writes to an analytics database. Redirect latency stays under 10ms regardless of analytics complexity.

Run this in SysSimulator → Browse all blueprints

Next in the series: Chaos engineering 101 →