Indexing and Performance

75 min

Proper indexing is fundamental to PostgreSQL performance, enabling the database to quickly locate rows without scanning entire tables. Indexes are data structures that provide fast access paths to data, similar to an index in a book. Without indexes, queries require sequential table scans, which become prohibitively slow as tables grow. Understanding index types and when to use them is essential for building performant database applications.

B-tree indexes are PostgreSQL's default index type and are suitable for most use cases. B-trees support equality and range queries efficiently, making them ideal for columns used in WHERE clauses, JOIN conditions, and ORDER BY clauses. B-tree indexes work well with most data types and support multi-column indexes. They're automatically used for PRIMARY KEY and UNIQUE constraints. B-tree indexes are the workhorse of PostgreSQL indexing.

Hash indexes are faster than B-trees for equality comparisons but don't support range queries or sorting. Hash indexes are smaller than B-trees but have limitations: they're not WAL-logged (can't be used in replication), don't support unique constraints, and are only useful for exact matches. Hash indexes are best for columns with high cardinality used only in equality comparisons.

GIN (Generalized Inverted Index) indexes are designed for composite values like arrays, full-text search vectors, and JSONB. GIN indexes enable fast searches for values contained within composite types. For example, finding all rows where an array contains a specific value or where JSONB contains a specific key-value pair. GIN indexes are larger than B-trees but enable queries that would be impossible or very slow without them.

GiST (Generalized Search Tree) indexes support custom data types and operators, making them essential for geometric data, full-text search, and other specialized types. GiST indexes enable efficient searches for spatial data (finding points within regions), text search, and range types. PostGIS uses GiST indexes extensively for spatial queries. Understanding when to use GiST vs GIN depends on the data type and query patterns.

The EXPLAIN command shows how PostgreSQL plans to execute a query, revealing which indexes will be used, join strategies, and estimated costs. EXPLAIN ANALYZE actually executes the query and shows actual execution times and row counts. Understanding EXPLAIN output helps identify missing indexes, inefficient query plans, and optimization opportunities. Regular query analysis is essential for maintaining database performance as data grows.

Key Concepts

Indexes provide fast access paths to data, avoiding full table scans.
B-tree indexes are default and support most query types.
GIN indexes are essential for arrays, JSONB, and full-text search.
GiST indexes support custom types like geometric and spatial data.
EXPLAIN shows query execution plans and helps identify optimization opportunities.

Learning Objectives

Master

Creating appropriate indexes for different data types and query patterns
Understanding when to use B-tree, GIN, and GiST indexes
Using EXPLAIN to analyze query performance
Identifying and fixing performance bottlenecks

Develop

Understanding database performance optimization
Designing efficient database schemas
Analyzing and improving query performance

Tips

Index columns used in WHERE, JOIN, and ORDER BY clauses.
Use EXPLAIN ANALYZE to see actual execution times, not just estimates.
Create composite indexes for multi-column queries: CREATE INDEX ON table (col1, col2);
Monitor index usage: SELECT * FROM pg_stat_user_indexes; to find unused indexes.

Common Pitfalls

Creating too many indexes, slowing down INSERT/UPDATE operations.
Not indexing frequently queried columns, causing slow queries.
Using wrong index type, missing performance benefits.
Not analyzing queries, missing optimization opportunities.

Summary

Proper indexing is crucial for database performance.
Different index types suit different data types and query patterns.
EXPLAIN helps identify performance bottlenecks and optimization opportunities.
Balancing index benefits with update performance is important.

Exercise

Create different types of indexes and analyze query performance.

-- Create a large table for testing
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER,
    order_date DATE,
    total_amount DECIMAL(10,2),
    status VARCHAR(20),
    product_categories TEXT[]
);

-- Insert sample data (simplified)
INSERT INTO orders (customer_id, order_date, total_amount, status, product_categories)
SELECT 
    (random() * 1000)::int,
    CURRENT_DATE - (random() * 365)::int,
    (random() * 1000)::decimal(10,2),
    CASE (random() * 3)::int 
        WHEN 0 THEN 'pending'
        WHEN 1 THEN 'shipped'
        ELSE 'delivered'
    END,
    ARRAY['electronics', 'books', 'clothing']
FROM generate_series(1, 10000);

-- Create different types of indexes
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_amount ON orders(total_amount);

-- Analyze query performance
EXPLAIN (ANALYZE, BUFFERS) 
SELECT customer_id, COUNT(*), SUM(total_amount)
FROM orders 
WHERE order_date >= '2024-01-01' 
  AND status = 'delivered'
GROUP BY customer_id
HAVING SUM(total_amount) > 500
ORDER BY SUM(total_amount) DESC;

Exercise Tips

Compare index types: CREATE INDEX USING btree, hash, gin, gist.
Use partial indexes: CREATE INDEX ON table (col) WHERE condition; for filtered indexes.
Analyze table after creating indexes: ANALYZE table; to update statistics.
Check index usage: EXPLAIN shows which indexes are used in query plans.

Indexing and Performance