SQL vs NoSQL: how to choose

Q: When should you use a relational database (SQL)?

Use a relational database when: (1) you need ACID transactions across multiple entities (payment records, order and inventory, bank transfers); (2) your data has a well-defined, stable schema with relationships between entities; (3) you need complex queries with joins, aggregations, and filters — SQL is vastly more expressive than any NoSQL query language; (4) data integrity constraints (foreign keys, unique constraints, not-null) are important for correctness. PostgreSQL handles hundreds of thousands of queries per second with proper indexing and read replicas. The performance ceiling of a relational database is higher than most engineers expect — exhaust optimization (indexes, query tuning, read replicas, connection pooling) before migrating to NoSQL.

Q: When should you use a document database like MongoDB?

Use a document database when: (1) your data is naturally document-shaped — a product catalog where each product has different attributes, a CMS where each article has different metadata, a user profile with a variable set of preferences; (2) you frequently read or write an entire document as a unit and rarely need cross-document joins; (3) your schema evolves rapidly (new product attributes added weekly) and you can't afford migration downtime. Avoid document databases when: you need multi-document ACID transactions, your access patterns require joins across many document types, or your data has a stable relational structure. MongoDB 4.0+ supports multi-document transactions, but this negates most of the performance benefit over PostgreSQL.

Q: What is a column-family database and when do you use it?

Column-family databases (Cassandra, HBase, Google Bigtable) store data as rows but group columns into 'column families' — sparse columns that are stored together on disk. The primary access pattern is a single-key lookup or range scan on a compound primary key. Cassandra is optimized for: high write throughput (hundreds of thousands of writes/second), time-series data (messages ordered by timestamp, event logs, IoT sensor readings), and data that's naturally organized by a partition key (user_id, device_id). The trade-off: Cassandra has no joins, limited secondary indexes, no aggregations. You design your tables around your queries — the same data is often duplicated in multiple tables to support different query patterns. Use Cassandra for chat message history, IoT time series, activity feeds.

Q: Can you use both SQL and NoSQL in the same system?

Yes — polyglot persistence is common in production systems. Different data has different requirements. A typical architecture: PostgreSQL for financial records and user accounts (ACID, complex queries), Redis for session data and caches (low latency key-value), Cassandra for activity feeds and message history (high write throughput, time-series), Elasticsearch for full-text search (inverted index, relevance ranking). The trade-off: operational complexity increases with each additional database system. In an interview, default to PostgreSQL for most use cases and explain specific NoSQL choices with concrete justification — don't add NoSQL for its own sake.

Database choice is one of the first decisions you'll justify in every system design interview. The interviewer isn't looking for "NoSQL scales better" — that's a misconception that signals surface-level preparation. They're looking for a structured reasoning process: what are the access patterns, what are the consistency requirements, what's the data model, and what's the operational cost of each option? This guide gives you that framework.

The most important question: what are your access patterns?

Database choice follows access patterns, not the other way around. Before deciding SQL vs NoSQL, answer these questions:

Do you need joins? If yes, relational. Joins are the fundamental advantage of SQL — they let you store data once and query it from multiple angles. If your queries always look up one entity by ID and return everything you need in a single response, joins aren't required and a document or key-value store might be simpler.

Do you need multi-entity ACID transactions? If yes, relational. The canonical example: debit one account, credit another — both must succeed or both must fail. PostgreSQL handles this with row-level locking and WAL-based durability. Most NoSQL systems offer single-document atomicity but not cross-document transactions.

What is the read/write ratio and throughput? Relational databases handle very high throughput with proper indexing and connection pooling (hundreds of thousands of reads/second per replica, tens of thousands of writes/second). Cassandra and DynamoDB are optimized for extreme write throughput at the cost of query flexibility. If your write throughput is >100K/second with simple access patterns, consider a column-family store. Otherwise, PostgreSQL with read replicas is likely sufficient.

Is the schema stable or variable? Stable schema with relationships → relational. Variable schema where each record has a different structure (product catalog, CMS content, user-defined fields) → document database. The ability to add fields without schema migrations is a real operational advantage for rapidly evolving products.

The four NoSQL categories

Key-value stores (Redis, DynamoDB, Memcached). The simplest model: get(key), put(key, value). No query language beyond key lookup. Use cases: session storage, caching, feature flags, rate limiting counters, leaderboards (Redis sorted sets). Redis adds data structures (lists, sets, sorted sets, hashes) that enable more sophisticated use cases. Key-value stores are the fastest option for single-key operations — Redis latency is sub-millisecond. Never use a key-value store when you need to query by anything other than the primary key.

Document stores (MongoDB, CouchDB, Firestore). Store semi-structured JSON/BSON documents. Query on any field (with indexes). Natural fit for object-shaped data. Strength: flexible schema, rich query language within a single collection. Weakness: joins across collections are expensive (application-side or $lookup). Use for: product catalogs, content management, user profiles, configuration data. MongoDB 4.0+ supports multi-document transactions but with significant performance overhead.

Column-family stores (Cassandra, HBase, Bigtable). Rows with a compound primary key (partition key + clustering key). Data is stored sorted by the clustering key within a partition. The data model is designed around your queries — you define tables (or "materialized views") per access pattern. Cassandra's strength is linear write scalability: adding nodes adds write throughput proportionally. Use for: time-series (messages ordered by timestamp, IoT data, activity logs), write-heavy workloads where the access pattern is known upfront. Weakness: no joins, limited secondary indexes, write-time schema design required.

Graph databases (Neo4j, Amazon Neptune). Nodes and edges with properties. Traversal queries (BFS/DFS) across relationships. Use when relationships between entities are the primary query concern: social network "friends of friends," fraud detection (shared accounts, device fingerprints), knowledge graphs, recommendation engines based on graph traversal. Avoid for data that is primarily relational but has some graph-like properties — a relational database with recursive CTEs handles most "graph" queries adequately.

SQL decision rules for interviews

Default to PostgreSQL. Start with PostgreSQL for any storage layer in your design. It handles most use cases up to large scale. PostgreSQL supports JSON columns (for semi-structured data), full-text search (for basic search), geospatial queries via PostGIS, and time-series data adequately at moderate volume. The operational complexity of adding a second database system is a real cost that must be justified.

Add Cassandra when write throughput exceeds ~100K/sec with time-series patterns. Chat messages, activity feeds, IoT sensor data — these are classic Cassandra use cases. The write path is optimized: writes go to a commit log and memtable (in-memory), flushed to SSTables (immutable files on disk). No locks, no B-tree rebalancing — just sequential writes.

Add Redis for caching, rate limiting, and session management. Not for primary storage. Redis data lives in memory — durability is limited (AOF/RDB persistence is not a substitute for a relational database). Redis is a first-class data store for ephemeral data: rate limit counters, session tokens, feature flag states, pub/sub channels.

Add Elasticsearch for search. PostgreSQL full-text search handles simple cases. When you need relevance ranking, typo tolerance, faceted search (filter by category + price + rating simultaneously), or multi-field boosting, Elasticsearch is the right tool. It maintains an inverted index and BM25 relevance scoring that PostgreSQL cannot match at scale.

Common interview mistake: "NoSQL scales better"

This statement is misleading. PostgreSQL with read replicas scales to millions of reads/second. Amazon Aurora Postgres scales horizontally with up to 15 read replicas and 100 TB storage. The write bottleneck of a single-primary relational database is typically >100K writes/second with modern hardware — sufficient for most systems. NoSQL does not automatically scale "better" — it scales differently, often trading query expressiveness and consistency for write throughput and horizontal scalability.

The correct framing in an interview: "I'll use PostgreSQL for [this data] because [access patterns, ACID requirements, schema stability]. I'll use [Cassandra/Redis/Elasticsearch] for [this data] because [specific throughput/access pattern/query requirement that relational doesn't handle well]." Always justify with specifics, never with vague scalability claims.

How to apply this in simulator design sessions

When you add a data store component in SysSimulator, the component type communicates your design choices to the interviewer. Use a Database component for PostgreSQL/MySQL (relational), a Cache component for Redis (key-value, in-memory), and a NoSQL Database component for Cassandra/DynamoDB (column-family, document). Label each with the specific database system and data it holds. The choice should match the access pattern you've described.

Practice database component placement →

Frequently asked questions

When should you use a relational database (SQL)?
ACID transactions, complex joins, stable schema, relational data. PostgreSQL handles hundreds of thousands of QPS — exhaust optimization before migrating to NoSQL.

When should you use a document database like MongoDB?
Document-shaped data with variable structure, read/write entire document as unit, rapidly evolving schema. Avoid when you need multi-document transactions or frequent cross-collection joins.

What is a column-family database and when do you use it?
Cassandra/HBase: optimized for high write throughput and time-series access patterns. Design tables around queries. Use for chat history, activity feeds, IoT data. No joins, limited secondary indexes.

Can you use both SQL and NoSQL in the same system?
Yes — polyglot persistence. PostgreSQL for financial/relational data, Redis for cache/sessions, Cassandra for time-series writes, Elasticsearch for search. Justify each with specific access pattern requirements, not "NoSQL scales better."

Practice in SysSimulator → Browse all blueprints

Next: Database indexing explained →