Database indexing explained

Indexes are the single most important performance lever in relational databases — and one of the most commonly misused. In system design interviews, the question isn't "should you have indexes?" (always yes on primary keys) — it's "which indexes, on which columns, in which order, and at what trade-off to write throughput?" This guide covers the mechanics and the decision framework.

How B-tree indexes work

The default index type in PostgreSQL, MySQL, and most SQL databases is the B-tree (balanced tree). A B-tree index stores key-value pairs in a tree structure where all leaf nodes are at the same depth (balanced). Internal nodes hold key range boundaries for navigation; leaf nodes hold the actual indexed key values and pointers to the corresponding table rows (or, in a clustered index, the rows themselves).

A lookup for WHERE email = 'alice@example.com' traverses the tree from root to the leaf containing that key in O(log N) steps — typically 3–5 steps even for billion-row tables (each node holds hundreds of keys). The leaf node contains a pointer to the heap (the actual table), and the engine fetches the row. This is called a "heap fetch."

B-trees support: equality queries (WHERE col = ?), range queries (WHERE col BETWEEN ? AND ?, WHERE col > ?), prefix LIKE queries (WHERE col LIKE 'prefix%'), and sort elimination (ORDER BY col on an indexed column is free — the B-tree is already sorted). They do not support: suffix LIKE queries (WHERE col LIKE '%suffix'), or queries that apply a function to the indexed column (WHERE LOWER(email) = ? won't use an index on email — create a functional index instead).

Hash indexes

Hash indexes compute a hash of the key and use it to locate the bucket containing the matching entries. Hash indexes are O(1) for equality lookups — faster than B-tree's O(log N) for a single equality query. However: hash indexes do not support range queries or ordering. They are also not write-ahead logged in older PostgreSQL versions (durability concern in crashes). In practice, B-tree indexes are preferred for most use cases because the O(log N) vs O(1) difference is negligible (the bottleneck is disk I/O, not CPU computation), and B-tree handles range queries that hash cannot.

Redis uses hash tables for its Hash data structure. Cassandra uses a hash-partitioned primary key to determine which node holds each row. These are partitioning hashes, not query-serving indexes — a different concept.

Composite indexes: column order matters

A composite index on (col_a, col_b, col_c) is sorted lexicographically: first by col_a, then by col_b within each col_a group, then by col_c. This structure makes the index usable for queries filtering on leading prefix columns:

Usable: WHERE col_a = ?, WHERE col_a = ? AND col_b = ?, WHERE col_a = ? AND col_b = ? AND col_c = ?. Not usable (without the leading prefix): WHERE col_b = ?, WHERE col_c = ?.

Column ordering rule. Put the column with the highest cardinality (most distinct values, most selective filter) first. Put equality filter columns before range filter columns — a range condition on a column stops the index from being used for further columns to its right. Example: for queries WHERE user_id = ? AND created_at > ?, use index (user_id, created_at). For queries WHERE status = 'active' AND created_at > ? where status has low cardinality (only a few values), the index won't be selective on status alone — but combining it with created_at as (status, created_at) still enables the query to filter by status first and scan a range of created_at values.

Covering indexes: eliminate the heap fetch

A "heap fetch" is the extra step where the database follows an index pointer to the actual table row to retrieve columns not in the index. This extra I/O is expensive when rows are not in the buffer cache. A covering index includes all columns the query needs — the database returns results entirely from the index, never touching the table.

Example: SELECT user_id, email, name FROM users WHERE account_status = 'active' AND created_at > '2024-01-01'. Index (account_status, created_at) filters the rows — but the engine still needs to heap-fetch each row to get email and name. A covering index (account_status, created_at) INCLUDE (email, name) (PostgreSQL syntax) stores email and name in the index leaf node. The query is served entirely from the index — no heap fetch.

Covering indexes are especially valuable for high-frequency read paths (login lookups, dashboard queries) where performance matters most. The trade-off: the index is larger on disk, and writes update more index data per row.

Partial indexes: index only what you query

A partial index indexes only rows matching a condition. Example: CREATE INDEX ON orders(created_at) WHERE status = 'pending'. This index is much smaller than a full index on (status, created_at) across all rows — it only indexes the small fraction of orders that are pending. Queries that filter WHERE status = 'pending' AND created_at > ? use this partial index efficiently, and the index is cheap to maintain (only updated when pending orders are inserted or resolved).

Partial indexes are a powerful tool for tables with high-cardinality status values where most queries target a small active subset (unread notifications, pending jobs, incomplete transactions).

When indexes hurt: the write overhead

Every write (INSERT, UPDATE, DELETE) must update all indexes on the affected table. A table with 10 indexes has approximately 10× the write I/O overhead of the same table with 0 indexes. For write-heavy systems (Cassandra-level write rates), this overhead is why Cassandra uses a completely different storage model (append-only LSM tree, no B-tree indexes on non-primary-key columns by default).

In a SQL database: audit indexes regularly. Remove indexes not being used by any active query (PostgreSQL's pg_stat_user_indexes shows index scan counts). For bulk data loads, consider dropping non-essential indexes, performing the load, then rebuilding — this is often 10× faster than updating indexes row-by-row during insertion.

Index selectivity: why low-cardinality indexes are often useless

Index selectivity is the ratio of distinct values to total rows. High selectivity = index is useful (unique index: 1.0 selectivity). Low selectivity = planner may ignore the index. Example: index on a boolean column is_deleted with 95% false, 5% true. A query WHERE is_deleted = false returns 95% of rows — a sequential scan is cheaper than an index scan that fetches 95% of rows via random I/O. The query planner knows this and will skip the index. Rule: only create indexes on columns with selectivity >5% for typical queries (unless combined with a high-selectivity leading column in a composite index).

Frequently asked questions

What is a B-tree index and how does it work?
Sorted tree with O(log N) lookup. Leaf nodes hold keys and row pointers. Supports equality, range, and prefix queries. 3–5 levels even for billion-row tables. Default index type in PostgreSQL and MySQL.

What is a covering index?
An index that includes all columns needed by a query — eliminates the heap fetch step. Created with INCLUDE clause in PostgreSQL. Best for high-frequency read paths. Trade-off: larger index, more write overhead.

How do composite indexes work and what order should columns be in?
Sorted by leading columns first. Usable only when query filters on a leading prefix. Put highest-cardinality columns first; put equality conditions before range conditions.

When do indexes hurt performance?
Every index adds write overhead per row change. Audit and remove unused indexes. Low-selectivity indexes are often ignored by the query planner. Bulk loads: drop non-essential indexes before loading, rebuild after.

Practice in SysSimulator →   SQL vs NoSQL guide

Next: Database sharding strategies →