Back to Curriculum

Partitioning

📚 Lesson 9 of 15 ⏱️ 70 min

Partitioning

70 min

Table partitioning divides large tables into smaller, more manageable pieces called partitions. Each partition is stored separately but can be queried as if it were a single table. Partitioning improves query performance by enabling partition pruning (skipping irrelevant partitions), makes maintenance operations faster, and helps manage very large tables. PostgreSQL supports declarative partitioning (native partitioning) introduced in version 10, making partitioning easier to use.

Range partitioning divides data based on ranges of values, commonly used for time-series data partitioned by date ranges. For example, you might partition a sales table by month, with each partition containing one month's data. Range partitioning is ideal for data with natural ordering and when queries typically filter by the partition key. PostgreSQL automatically prunes partitions that don't contain relevant data, significantly improving query performance.

List partitioning divides data based on specific values, useful when you have discrete categories. For example, you might partition a customer table by region, with partitions for 'North', 'South', 'East', and 'West'. List partitioning is ideal when data naturally falls into distinct categories and queries filter by these categories. Each partition contains rows matching specific values in the partition key.

Hash partitioning distributes data across partitions using a hash function on the partition key. Hash partitioning ensures relatively even distribution of data across partitions, making it useful when you don't have natural ranges or categories. Hash partitioning is less common than range or list partitioning but is useful for distributing load evenly across partitions. The number of partitions is specified when creating the partitioned table.

Partition pruning is a key performance benefit of partitioning. When you query a partitioned table with a WHERE clause on the partition key, PostgreSQL automatically excludes partitions that can't contain matching rows. This partition elimination happens at query planning time, so queries only scan relevant partitions. For large tables with good partition key usage in queries, this can provide dramatic performance improvements.

Partitioning considerations include choosing appropriate partition keys (columns used in WHERE clauses), determining partition size (too many small partitions or too few large partitions both have drawbacks), and managing partition maintenance (adding new partitions, removing old ones). Partitioning adds complexity to schema design and maintenance, so it should be used when tables are large enough to benefit. Understanding when and how to partition enables you to manage very large datasets effectively.

Key Concepts

  • Partitioning divides large tables into smaller, manageable partitions.
  • Range partitioning divides data by value ranges (commonly dates).
  • List partitioning divides data by specific discrete values.
  • Hash partitioning distributes data evenly using hash functions.
  • Partition pruning automatically skips irrelevant partitions in queries.

Learning Objectives

Master

  • Creating range, list, and hash partitioned tables
  • Understanding partition pruning and performance benefits
  • Managing partitions (adding, removing, maintaining)
  • Choosing appropriate partition keys and strategies

Develop

  • Understanding large-scale database management
  • Designing partitioned schemas for performance
  • Optimizing queries on partitioned tables

Tips

  • Partition by columns frequently used in WHERE clauses for partition pruning.
  • Use range partitioning for time-series data: PARTITION BY RANGE (date_column).
  • Create partitions in advance for time-series data to avoid errors.
  • Use ALTER TABLE ... ATTACH PARTITION to add existing tables as partitions.

Common Pitfalls

  • Creating too many partitions, causing planning overhead.
  • Not partitioning by query filter columns, missing partition pruning benefits.
  • Forgetting to create partitions, causing INSERT failures.
  • Not maintaining partitions, accumulating old data unnecessarily.

Summary

  • Partitioning improves performance and manageability of large tables.
  • Range, list, and hash partitioning suit different use cases.
  • Partition pruning automatically optimizes queries on partitioned tables.
  • Proper partition management is essential for partitioned table success.

Exercise

Create a partitioned table for time-series data.

-- Create a partitioned table for sales data
CREATE TABLE sales_partitioned (
    id SERIAL,
    sale_date DATE NOT NULL,
    product_name VARCHAR(100),
    amount DECIMAL(10,2),
    region VARCHAR(50)
) PARTITION BY RANGE (sale_date);

-- Create partitions for different months
CREATE TABLE sales_2024_01 PARTITION OF sales_partitioned
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE sales_2024_02 PARTITION OF sales_partitioned
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

CREATE TABLE sales_2024_03 PARTITION OF sales_partitioned
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');

-- Insert data into partitions
INSERT INTO sales_partitioned (sale_date, product_name, amount, region) VALUES
    ('2024-01-15', 'Laptop', 1200.00, 'North'),
    ('2024-02-20', 'Mouse', 25.00, 'South'),
    ('2024-03-10', 'Keyboard', 75.00, 'East');

-- Query partitioned table
SELECT 
    DATE_TRUNC('month', sale_date) as month,
    COUNT(*) as total_sales,
    SUM(amount) as total_amount
FROM sales_partitioned
WHERE sale_date >= '2024-01-01' AND sale_date < '2024-04-01'
GROUP BY DATE_TRUNC('month', sale_date)
ORDER BY month;

Exercise Tips

  • Use DEFAULT partition for values that don't match any partition.
  • Monitor partition sizes: SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(...)).
  • Use pg_partitions view to see partition structure.
  • Consider sub-partitioning for very large datasets (partition of partitions).

Code Editor

Output