Design a proximity service

Proximity service questions — "design Yelp's nearby search," "design Google Maps nearby places," "find the nearest K restaurants" — are fundamentally about one problem: how do you efficiently query spatial data at scale? The interviewer is not checking whether you know about geospatial databases. They're checking whether you understand geohashing, the boundary problem, expanding ring search, and how to serve high read QPS on location data that's mostly static. This guide covers the full architecture from coordinates to results.

What the interview is really asking

Spatial indexing fundamentals. A naive approach — "store lat/lon in a database, query WHERE distance(lat, lon, user_lat, user_lon) < radius" — does a full table scan. The interviewer is checking whether you know that spatial queries need spatial indexing: geohashing, quadtrees, or R-trees. The core insight is that distance queries need to be converted into prefix or range queries that an index can serve efficiently.

The geohash boundary problem. A geohash cell has 8 neighboring cells. Two points that are close to each other in real space might have different geohash prefixes if they're near a cell boundary. The interviewer is checking whether you know to search the target cell plus all 8 neighbors, not just the target cell — and that this is always 9 cells, regardless of precision level.

Static vs. dynamic location data. Business locations (restaurants, stores) change rarely — a write-through cache is appropriate. Driver or user locations (ride-sharing, live tracking) update every few seconds — a different architecture (Redis GEORADIUS with frequent writes) is needed. The interviewer is checking that you distinguish these cases and don't design a static-data system for a moving-target problem.

Scale: read-heavy, geographically distributed. A Yelp-scale proximity service handles hundreds of thousands of search queries per second globally. The interviewer is checking that you design for read scalability (caches, read replicas, CDN for static business data) and geographic distribution (serve queries from the nearest data center).

Back-of-envelope estimation

Scale target. Yelp-scale: 500 million registered users, 50 million DAU. Each user performs ~2 proximity searches/day. Peak QPS: 50M × 2 / 86,400 × 3 (peak multiplier) = ~3,500 search QPS. Round to 5,000 QPS for headroom.

Business data size. 500 million registered businesses globally (Yelp has ~5M listings in the US; global scale implies ~50M). Business record: ~1 KB (name, address, category, lat/lon, rating, hours, metadata). 50M × 1 KB = 50 GB. Fits in memory: a Redis cluster with 200 GB RAM can hold the entire business index with room for geospatial indexes. PostgreSQL with PostGIS for the source of truth, Redis for serving hot reads.

Geolocation index size. 50 million business locations in a Redis sorted set (as geohashes). Each entry: ~70 bytes (geohash score + name). 50M × 70 bytes = 3.5 GB. Trivially fits in Redis memory.

Cache hit rate. Popular cities (NYC, London, Tokyo) will have most searches concentrated in dense geohash cells. A geohash cell at precision 6 covers a ~1.2km × 0.6km area. A results cache keyed by (geohash_6, category, radius) has high hit rate in dense urban areas. Estimated cache hit rate for top 100 cities: ~85%. Effective QPS hitting database: 5,000 × 0.15 = 750 QPS — easily handled by PostgreSQL.

Architecture decisions and why

Geohashing for the spatial index. Encode each business's (lat, lon) into a geohash string. Store in a database table with an index on the geohash column (standard B-tree index). To search: compute the geohash of the user's location at the appropriate precision for the search radius (radius 500m → precision 6; radius 5km → precision 5; radius 50km → precision 4). Query: SELECT * FROM businesses WHERE geohash LIKE 'prefix%'. Execute this query for the target cell and 8 neighboring cells. The 9-cell search ensures no boundary misses. Filter the results to the exact distance using the Haversine formula (much smaller result set to filter than a full table scan).

Redis GEORADIUS for hot path. For real-time search serving (read path), load all business geohashes into a Redis sorted set using GEOADD. Serve search requests with GEOSEARCH (formerly GEORADIUS): GEOSEARCH key FROMLONLAT lon lat BYRADIUS 1 km ASC COUNT 20. Redis executes this in-memory, returning sorted results in microseconds. Redis handles 50,000+ GEOSEARCH operations per second per node. A small Redis cluster (3 nodes) handles the full 5,000 QPS with substantial headroom. PostgreSQL is the write master; Redis is the read-optimized replica (synced via the business update pipeline).

Business data separation from spatial index. The spatial index stores only (business_id, lat, lon). Business metadata (name, address, rating, hours, photos) is stored separately in PostgreSQL and cached in a key-value store (Redis Hash or a dedicated CDN-backed service) keyed by business_id. The search flow: (1) GEOSEARCH returns list of business_ids sorted by distance; (2) batch fetch metadata for the top K results from cache/database. This separation means spatial index updates (business moves) don't require rewriting large metadata blobs, and metadata updates (new review, updated hours) don't require touching the spatial index.

Expanding ring search. If GEOSEARCH within the requested radius returns fewer than minimum_results (e.g., 20), expand the search: re-query at 2× the radius, then 4×, capping at a maximum of 50km. Implement this client-side in the proximity API server to avoid multiple round trips to Redis. Since each GEOSEARCH is a single Redis command, three expanding searches still complete in <5ms total. The expansion logic is: if results < 20 AND radius < max_radius, double radius and retry. Inform the client of the actual search radius used so the UI can display "Showing results within 10km" accurately.

Read replica scaling. The Redis geospatial cluster is read-heavy (queries) and write-light (business location updates). Use Redis replication: one primary for writes (business adds/updates), N read replicas for GEOSEARCH queries. Route all search traffic to read replicas via a Redis cluster proxy (or application-level read/write split). Adding replicas scales read capacity linearly. Each replica holds the complete geospatial dataset (3.5 GB is tiny by Redis standards).

Run it in the simulator

Model the proximity service in SysSimulator: a load balancer, proximity API servers, a Redis geospatial cluster (read replicas), and a PostgreSQL primary for business metadata.

Set traffic to 5,000 search QPS. Observe that nearly all load is absorbed by the Redis layer (cache hit rate ~90%), with minimal PostgreSQL traffic for cache misses. Inject a Redis replica failure — one of three replicas goes down. Watch: traffic redistributes to remaining two replicas; QPS per replica increases but stays within capacity. No user-visible impact.

Then inject a hot geohash event — a viral event causes 10× normal traffic for a single geohash cell (e.g., a city block). Observe the Redis primary's CPU for that key. Mitigate by adding a local in-process cache at the API server layer for hot geohash results (10-second TTL).

Open SysSimulator and model this →

Failure narration — word for word

"I'll inject a Redis primary failure — the node holding the write master for the geospatial sorted set goes down. This affects new business writes but not reads."

"[inject] The Redis cluster sentinel detects the primary is down — heartbeat timeout after 5 seconds. Sentinel promotes the replica with the lowest replication lag to primary. Promotion completes in ~10 seconds. During this window: reads continue serving from the remaining replicas (unaffected). Writes — new business adds and location updates — fail with a connection error."

"Impact assessment: business location updates are infrequent (restaurants don't move). A 10-second window of failed writes means a small number of new business registrations or location corrections are buffered in the API server's write queue. After the new primary is elected, the queue drains within seconds. No searches are affected."

"Recovery path: the API server's Redis client auto-reconnects to the new primary (sentinel notifies clients of the new primary address). The old primary, when it recovers, rejoins as a replica and resyncs from the new primary. Full cluster health is restored within ~2 minutes."

The question behind the question

"How would you design this for real-time moving objects (e.g., Uber drivers)?" Static business data (our primary design) updates rarely — write-through cache is fine. Moving objects update every 3–5 seconds. At Uber scale: 1M active drivers × 1 update/4 seconds = 250,000 location writes/second. Redis GEORADIUS handles this: driver locations stored in Redis, updated via GEOADD on each GPS update. Nearby driver search uses GEOSEARCH with a small radius (1–2km). Key differences from the business design: no separate metadata store (driver data is simple), no expanding ring search (if no drivers within 2km, show "no cars available"), and a much higher write-to-read ratio.

"How would you shard the Redis geospatial dataset?" For static business data at 50M locations, a single Redis node handles it (3.5 GB). For larger datasets: shard by geohash prefix — geohash cell 'u09t' maps to shard based on the first character or first two characters. All businesses with matching prefix land on the same shard. GEOSEARCH queries hit 1 shard for a single cell search and at most 9 shards for the 9-cell neighborhood search. The API server fans out the 9-cell queries in parallel across shards and merges results.

"What database would you use instead of Redis for geospatial queries?" PostgreSQL + PostGIS: full SQL, geospatial operators (ST_DWithin, ST_Distance), spatial indexes (GiST), handles complex queries (nearby + filter by category + open now). Slower than Redis (milliseconds vs. microseconds) but far more expressive. Use PostGIS as the source of truth with a Redis geospatial cache for the hot read path. For extreme scale, specialized systems: Elasticsearch with geo_point fields and geo_distance queries handles billions of locations with full-text and geo combined.

Frequently asked questions

What is geohashing and how does it work?
Encodes lat/lon to an alphanumeric string by recursively halving the space. Nearby locations share common prefixes. Search by querying target cell + 8 neighbors (always 9 cells) to handle boundary effects. 6-char geohash = ~1.2km × 0.6km; 5-char = ~5km × 5km.

What is the difference between geohashing and a quadtree?
Geohashing: fixed grid, uniform cell size, simple to index. Quadtree: adaptive subdivision, small cells in dense areas, larger in sparse. Geohashing is simpler and cache-friendly; quadtrees give more uniform result counts. Use geohashing for mostly-static business data; quadtrees for real-time moving objects with variable density.

How does Redis GEORADIUS work?
Internally stores coordinates as geohash scores in a sorted set. GEOSEARCH queries by bounding-box cell lookup then exact distance filter. O(N+logM) per query. Handles hundreds of thousands of queries/second per node. GEOSEARCH returns results sorted by distance.

How do you handle users searching in areas with very few nearby businesses?
Expanding ring search: start at requested radius, double until minimum results (e.g., 20) found or maximum radius (50km) reached. Each level adds neighboring geohash cells. Report actual search radius used to the client so the UI can display it accurately.

Run this in SysSimulator →   Browse all blueprints

Next: SQL vs NoSQL: how to choose →