Database indexing is one of the highest-leverage optimizations available to developers, yet it remains poorly understood by many. A well-placed index can reduce query execution time by orders of magnitude. A poorly chosen index wastes storage, slows writes, and gives a false sense of optimization. Understanding how indexes work and when to apply different strategies separates performant applications from sluggish ones.
How Indexes Work Under the Hood
At the most basic level, a database index is a separate data structure that maintains a sorted or hashed reference to rows in a table. Without an index, the database must perform a full table scan, reading every row to find matches. With an index, the database can jump directly to relevant rows, much like using a book's index to find a topic instead of reading every page.
The most common index type is the B-tree (balanced tree). B-tree indexes store keys in a sorted, hierarchical structure that allows the database to find any value in logarithmic time. For a table with one million rows, a B-tree index can locate a specific row in roughly 20 comparisons rather than one million.
When you create a primary key in PostgreSQL, MySQL, or most relational databases, a B-tree index is automatically created. But primary key indexes alone rarely satisfy the query patterns of a real application.
Single-Column Indexes
The simplest indexing strategy targets individual columns that appear frequently in WHERE clauses, JOIN conditions, or ORDER BY expressions. If your application frequently queries users by email address, an index on the email column eliminates full table scans for those queries.
Creating single-column indexes is straightforward, but choosing which columns to index requires analysis. Start by examining your application's slow query log. Most databases provide tools for this: PostgreSQL has pg_stat_statements, MySQL has the slow query log, and cloud databases typically surface this through dashboards.
Not every column benefits from indexing. Columns with low cardinality, meaning they contain few distinct values relative to the total row count, produce indexes that the query planner may ignore entirely. A boolean is_active column on a million-row table has only two distinct values. The database may decide that scanning half the table through an index is no faster than a full table scan.
Composite Indexes
Real-world queries rarely filter on a single column. A query searching for orders placed by a specific customer within a date range touches two columns. A composite index (also called a multi-column index) covers both columns in a single index structure.
Column order in composite indexes matters critically. The index is sorted by the first column, then by the second column within each group of the first, and so on. This means a composite index on (customer_id, created_at) efficiently supports queries filtering by customer_id alone or by both customer_id and created_at. However, it does not efficiently support queries filtering only by created_at.
This principle is called the leftmost prefix rule. A composite index on columns (A, B, C) supports queries on (A), (A, B), and (A, B, C), but not on (B), (C), or (B, C) alone. Understanding this rule prevents creating redundant indexes and helps you design indexes that serve multiple query patterns.
Designing Effective Composite Indexes
- Place equality conditions before range conditions. An index on (status, created_at) is more effective than (created_at, status) when status uses equality and created_at uses a range.
- Consider the most common query patterns together. One well-designed composite index often replaces two or three single-column indexes.
- Use EXPLAIN to verify the database actually uses the index you created.
Covering Indexes
A covering index includes all columns that a query needs, eliminating the need to read the actual table rows. When the database can satisfy a query entirely from the index, it performs an index-only scan, which is significantly faster because it avoids random I/O to the main table.
In PostgreSQL, you can create covering indexes using the INCLUDE keyword. In MySQL, simply adding columns to the end of a composite index achieves the same effect. For queries that run thousands of times per second, covering indexes can cut response times dramatically.
The trade-off is index size. Every column added to an index increases storage requirements and slows write operations. Covering indexes work best for read-heavy queries on relatively stable data.
Partial Indexes
Not every row needs to be indexed. A partial index (called a filtered index in SQL Server) indexes only rows that match a specified condition. If you frequently query active users but rarely query deactivated ones, a partial index on email WHERE is_active = true is smaller, faster to maintain, and faster to query than a full index.
Partial indexes are particularly valuable for tables with skewed data distributions. An orders table where 95% of orders are completed and only 5% are pending benefits enormously from a partial index on pending orders. The index is twenty times smaller and only needs updating when orders enter or leave the pending state.
Hash Indexes
B-tree indexes support equality checks, range queries, and sorting. If you only need equality lookups, hash indexes can be more efficient. Hash indexes use a hash function to map values directly to row locations, providing O(1) lookups for exact matches.
In PostgreSQL, hash indexes have been fully WAL-logged and crash-safe since version 10, making them production-ready. MySQL's MEMORY engine supports hash indexes natively. However, hash indexes cannot support range queries, ordering, or partial matches, limiting their applicability.
Full-Text and Specialized Indexes
Standard B-tree indexes cannot efficiently search text content. Full-text indexes (GIN indexes in PostgreSQL, FULLTEXT indexes in MySQL) tokenize text content and build inverted indexes that support natural language search. For applications with search functionality, full-text indexes eliminate the need for external search engines in many cases.
Spatial data requires GiST indexes. JSON data benefits from GIN indexes in PostgreSQL. Each specialized index type is optimized for specific data access patterns that B-trees handle poorly.
Index Maintenance and Monitoring
Indexes are not set-and-forget. They require ongoing attention to remain effective.
Detecting Unused Indexes
Every index consumes storage and slows write operations. Indexes that no queries use are pure overhead. PostgreSQL's pg_stat_user_indexes view shows how many times each index has been scanned. Indexes with zero or near-zero scans over a meaningful period are candidates for removal.
Monitoring Index Bloat
B-tree indexes in PostgreSQL can become bloated over time as rows are updated and deleted. The dead tuples remain in the index until it is rebuilt. Monitoring index size relative to table size helps identify bloated indexes. The pgstattuple extension provides detailed bloat statistics.
Rebuilding Indexes
When an index becomes significantly bloated, rebuilding it reclaims space and restores performance. PostgreSQL's REINDEX CONCURRENTLY rebuilds indexes without locking the table, making it safe to run in production.
Common Indexing Mistakes
- Over-indexing: Creating indexes on every column slows writes and wastes storage. Index only columns that queries actually filter, sort, or join on.
- Ignoring the query planner: Always verify with EXPLAIN that the database uses your index. The planner may choose a different strategy based on data distribution.
- Forgetting about writes: Every index adds overhead to INSERT, UPDATE, and DELETE operations. High-write tables need fewer, more targeted indexes.
- Indexing low-cardinality columns: Boolean columns and status fields with few distinct values rarely benefit from standalone indexes.
Database indexing is both a science and a craft. Start with your slowest queries, understand the access patterns, create targeted indexes, and measure the results. The payoff in application performance is almost always worth the investment in understanding how your database uses indexes.