General

Database Indexing and Query Optimization: The Complete Guide

Learn how database indexes turn slow queries into fast ones, common index types, query optimization techniques, anti-patterns to avoid, and a practical workflow to diagnose and fix performance bottlenecks.

June 2026 · 9 min read · 1 views · 0 hearts

Try in editor Tutorial catalog

The Complete Guide to Database Indexing and Query Optimization

If you've ever waited more than a few seconds for a database query to return, you already know the pain. Raw speed isn't just nice to have — it's the difference between a snappy app and one users abandon. The good news? Understanding indexing and query optimization can turn a 10-second query into a 10-millisecond one.

Let's cut through the theory and get into what actually matters.

What Exactly Is a Database Index?

Think of a database index like the index at the back of a textbook. Without it, you'd have to flip through every single page to find "Python" — that's a full table scan. With an index, you jump straight to the relevant pages.

In database terms, an index is a separate data structure (usually a B-tree or hash table) that stores a sorted copy of selected columns along with pointers to the actual rows. The database can search this structure in logarithmic time instead of linear.

Why indexes are deceptively powerful: They turn O(n) operations into O(log n) or even O(1). For a table with a million rows, that's the difference between a million comparisons and maybe 20.

The Most Common Index Types and When to Use Them

B-Tree Indexes

This is the default in virtually every relational database (PostgreSQL, MySQL, SQLite). Works for: - Equality searches (WHERE user_id = 42) - Range queries (WHERE created_at BETWEEN '2024-01-01' AND '2024-06-01') - Sorting (ORDER BY created_at) - Prefix matching on strings (WHERE name LIKE 'Joh%')

Avoid for: Full-text searches or wildcard patterns starting with %.

Hash Indexes

Ultra-fast for exact matches only
No ordering, no range support
Great for primary key lookups in in-memory databases (Redis, some PostgreSQL use)

GIN and GiST Indexes (PostgreSQL)

GIN for array columns, JSONB, full-text search
GiST for geometric data, range types, and similarity searches
Overkill for simple columns but essential for specialized data

Partial Indexes

This is a cheat code many developers overlook. A partial index only indexes a subset of rows:

CREATE INDEX idx_active_users ON users (email) WHERE is_active = true;

If 90% of your searches are for active users, this index is tiny and blazing fast compared to indexing the whole table.

Covering Indexes

An index that contains ALL columns your query needs. The database never touches the actual table — it reads everything from the index alone.

CREATE INDEX idx_order_lookup ON orders (customer_id, order_date, total_amount);

If your query only selects customer_id, order_date, and total_amount, this index covers it completely. No table access needed.

How Query Optimization Actually Works

Indexes are half the battle. The other half is writing queries the database can use effectively.

The Magic of EXPLAIN

Before optimizing anything, always run EXPLAIN (or EXPLAIN ANALYZE). Every serious database has it.

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'someone@example.com';

The output will show you: - Whether it's doing a sequential scan (bad) or index scan (good) - Estimated row counts vs actual — big discrepancies signal outdated statistics - Where the query is spending time

Pro tip: Run ANALYZE regularly on production tables so the query planner has accurate statistics. PostgreSQL even has autovacuum for this.

Common Query Anti-Patterns That Kill Performance

1. SELECT * on large tables You're dragging every column into memory. Only select what you need.

2. Functions on indexed columns

-- This ignores the index on created_at
SELECT * FROM orders WHERE DATE(created_at) = '2024-01-01';

-- This uses the index
SELECT * FROM orders WHERE created_at >= '2024-01-01' AND created_at < '2024-01-02';

Functions wrap columns, forcing full scans.

3. Non-sargable conditions Anything that makes the database unable to use an index: WHERE LOWER(email) = 'x', patterns with leading wildcards like %search%, or WHERE column IS NULL (often, but depends on database).

4. N+1 queries in application code

# Bad: 1 query to get users, then 100 queries to get orders
users = User.all()
for user in users:
    orders = Order.filter(user_id=user.id)

Use JOINs or batch loading (like Django's select_related and prefetch_related).

Real-World Performance Breakdown

Here's a concrete example. A table with 5 million order rows:

Query	Without Index	With Composite Index
`WHERE customer_id = 123 AND created_at > '2024-01-01'`	3.2 seconds	4 ms
`WHERE status = 'pending' ORDER BY created_at DESC LIMIT 10`	2.8 seconds	6 ms
Full-table aggregation (`COUNT(*)`)	1.1 seconds	1.1 seconds (no help)

The third row is important: indexes help lookups, not always aggregations. Know when they matter.

When Indexes Hurt

Indexes aren't free. Every index: - Slows down writes (INSERT, UPDATE, DELETE) because the database must update the index too - Consumes disk space — a compound index on several wide columns can be larger than the table itself - Can confuse the query planner if over-engineered

Rule of thumb: Create indexes based on actual query patterns, not theoretical ones. Monitor slow query logs and optimize the real bottlenecks.

A Practical Optimization Workflow

Enable slow query logging — your database logs queries slower than 100ms.
Identify the worst offenders — highest frequency or highest total time.
Run EXPLAIN ANALYZE — understand what's happening.
Add one index — don't throw ten at once.
Test under load — query latency drops? Write throughput okay?
Verify index usage — use pg_stat_user_indexes or MySQL's sys.schema_unused_indexes to remove unused ones.

The One Optimization That Beats All Indexes

Sometimes the best "index" isn't an index at all — it's changing the data model. Denormalization, materialized views, or precomputed summaries can reduce the need for complex joins.

For example, if you frequently count active users per day, store that count in a separate table updated by a cron job or trigger. No amount of indexing on a raw events table will beat reading a single row.

Final advice: Production data is messy. Query plans depend on data distribution, not just schema. Trust EXPLAIN ANALYZE over intuition. And never add an index to "see what happens" — that's how you end up with 40 unused indexes and write performance in the gutter.

Comments

Questions, corrections, and tips stay visible for everyone reading this page.

0 in thread

Join the discussion

No comments yet

Be the first to leave a note — it helps the next reader.