Back to Curriculum

Database Design and Normalization

📚 Lesson 11 of 15 ⏱️ 55 min

Database Design and Normalization

55 min

Database normalization is the process of organizing data to reduce redundancy and improve data integrity. Normalization eliminates data duplication, reduces storage requirements, and prevents update anomalies (inconsistencies). Understanding normalization is essential for designing efficient, maintainable databases. Normalization follows a series of normal forms (1NF, 2NF, 3NF, BCNF, etc.), each addressing specific types of redundancy. While normalization is important, over-normalization can hurt performance—balance is key.

First Normal Form (1NF) ensures atomic values and no repeating groups. Each column must contain only atomic (indivisible) values, and each row must be unique. Repeating groups (multiple values in a single column or multiple columns for the same attribute) violate 1NF. Understanding 1NF is the foundation of normalization. 1NF eliminates repeating groups by creating separate rows or tables.

Second Normal Form (2NF) ensures no partial dependencies on composite keys. A table is in 2NF if it's in 1NF and all non-key attributes are fully dependent on the entire primary key (not just part of it). Partial dependencies occur when a non-key attribute depends on only part of a composite key. Understanding 2NF eliminates partial dependencies. 2NF applies only to tables with composite primary keys.

Third Normal Form (3NF) ensures no transitive dependencies. A table is in 3NF if it's in 2NF and no non-key attribute depends on another non-key attribute (transitive dependency). Transitive dependencies create redundancy and update anomalies. Understanding 3NF eliminates transitive dependencies. 3NF is often sufficient for most database designs.

Boyce-Codd Normal Form (BCNF) is a stricter version of 3NF that handles cases where 3NF doesn't eliminate all redundancy. BCNF requires that every determinant (attribute that determines another) must be a candidate key. Understanding BCNF enables you to design highly normalized databases. BCNF is important for complex database designs.

Denormalization is sometimes used to improve query performance at the cost of data redundancy. Denormalization intentionally introduces redundancy to reduce JOINs and improve read performance. It's a trade-off: faster queries vs. more storage and update complexity. Understanding denormalization helps you make informed design decisions. Denormalization should be used judiciously, typically in read-heavy systems or data warehouses. Best practices include normalizing to 3NF for transactional systems, denormalizing selectively for performance, understanding the trade-offs, and documenting design decisions.

Key Concepts

  • Normalization reduces redundancy and improves data integrity.
  • 1NF ensures atomic values and no repeating groups.
  • 2NF eliminates partial dependencies on composite keys.
  • 3NF eliminates transitive dependencies.
  • Denormalization trades redundancy for performance.

Learning Objectives

Master

  • Understanding normalization principles and normal forms
  • Applying 1NF, 2NF, and 3NF to database design
  • Identifying and eliminating data redundancy
  • Understanding when to denormalize for performance

Develop

  • Database design thinking
  • Understanding data integrity and redundancy trade-offs
  • Designing efficient, maintainable database schemas

Tips

  • Normalize to 3NF for transactional systems to ensure data integrity.
  • Consider denormalization for read-heavy systems or data warehouses.
  • Understand the trade-offs between normalization and performance.
  • Document design decisions, especially denormalization choices.

Common Pitfalls

  • Over-normalizing, causing excessive JOINs and poor performance.
  • Not normalizing enough, causing data redundancy and update anomalies.
  • Denormalizing without understanding the trade-offs.
  • Not documenting design decisions, making maintenance difficult.

Summary

  • Normalization reduces redundancy and improves data integrity.
  • 1NF, 2NF, and 3NF address different types of redundancy.
  • Denormalization trades redundancy for query performance.
  • Understanding normalization enables effective database design.
  • Balance normalization with performance requirements.

Exercise

Design a normalized database schema for an e-commerce system with customers, products, and orders.

-- Customers table
CREATE TABLE Customers (
    CustomerID INT PRIMARY KEY AUTO_INCREMENT,
    CustomerName VARCHAR(100) NOT NULL,
    Email VARCHAR(100) UNIQUE NOT NULL,
    Phone VARCHAR(20),
    Address TEXT,
    City VARCHAR(50),
    Country VARCHAR(50)
);

-- Products table
CREATE TABLE Products (
    ProductID INT PRIMARY KEY AUTO_INCREMENT,
    ProductName VARCHAR(200) NOT NULL,
    Description TEXT,
    Price DECIMAL(10,2) NOT NULL,
    CategoryID INT,
    StockQuantity INT DEFAULT 0,
    FOREIGN KEY (CategoryID) REFERENCES Categories(CategoryID)
);

-- Orders table
CREATE TABLE Orders (
    OrderID INT PRIMARY KEY AUTO_INCREMENT,
    CustomerID INT NOT NULL,
    OrderDate DATETIME DEFAULT CURRENT_TIMESTAMP,
    TotalAmount DECIMAL(10,2) NOT NULL,
    Status ENUM('pending', 'processing', 'shipped', 'delivered') DEFAULT 'pending',
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)
);

-- OrderItems table (junction table for many-to-many relationship)
CREATE TABLE OrderItems (
    OrderID INT,
    ProductID INT,
    Quantity INT NOT NULL,
    UnitPrice DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (OrderID, ProductID),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
    FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);

Code Editor

Output