MongoDB Aggregation Pipeline

50 min

MongoDB's aggregation pipeline is a powerful framework for data processing and analysis. The pipeline processes documents through a series of stages, where each stage transforms the documents and passes results to the next stage. Aggregation enables complex data transformations, grouping, calculations, and analysis that would be difficult or impossible with simple queries. Understanding aggregation is essential for analytics, reporting, and data processing in MongoDB.

Common aggregation stages include $match (filter documents, like WHERE in SQL), $group (group documents and calculate aggregates), $sort (sort documents), $project (reshape documents, like SELECT in SQL), $limit (limit number of documents), $skip (skip documents), $unwind (deconstruct arrays), and $lookup (join collections). Each stage performs specific transformations, and stages can be combined in any order to achieve desired results.

$match filters documents early in the pipeline, reducing the number of documents processed by subsequent stages. This improves performance significantly. $match uses the same query syntax as find(), enabling you to filter by any field or condition. Placing $match early in pipelines is a best practice for performance. Understanding $match enables efficient data filtering in aggregation pipelines.

$group groups documents by specified fields and calculates aggregate values. $group uses accumulator operators like $sum, $avg, $min, $max, $first, $last, and $push to calculate values for each group. $group is essential for analytics, enabling you to calculate totals, averages, counts, and other aggregates by category. Understanding $group enables powerful data analysis capabilities.

$project reshapes documents by including, excluding, or adding computed fields. $project can rename fields, add calculated fields using expressions, and reshape document structure. This is useful for creating reports, transforming data for APIs, and preparing data for further processing. Understanding $project enables flexible data transformation.

Advanced aggregation features include $lookup for joining collections (similar to SQL JOINs), $unwind for working with arrays, $facet for running multiple pipelines in parallel, and expression operators for calculations. Aggregation expressions support arithmetic, string manipulation, date operations, and conditional logic. Understanding these advanced features enables sophisticated data processing and analysis directly in MongoDB.

Key Concepts

Aggregation pipeline processes documents through multiple stages.
$match filters documents early for performance.
$group groups documents and calculates aggregates.
$project reshapes documents by selecting and computing fields.
Stages can be combined in any order to achieve desired results.

Learning Objectives

Master

Building aggregation pipelines with multiple stages
Using $match, $group, and $project stages
Calculating aggregates with accumulator operators
Transforming data with $project and $unwind

Develop

Understanding data processing patterns
Designing efficient aggregation pipelines
Performing complex data analysis in MongoDB

Tips

Use $match early in pipelines to filter documents and improve performance.
Use $group with accumulators: { $group: { _id: '$field', total: { $sum: '$amount' } } }.
Use $project to reshape: { $project: { newField: '$oldField', computed: { $add: ['$field1', '$field2'] } } }.
Use $unwind to work with arrays: { $unwind: '$arrayField' }.

Common Pitfalls

Not using $match early, processing unnecessary documents.
Forgetting _id in $group, causing grouping errors.
Not understanding accumulator operators, using wrong operators.
Creating overly complex pipelines, reducing readability and performance.

Summary

Aggregation pipeline enables powerful data processing and analysis.
Common stages include $match, $group, $project, and $sort.
Stages can be combined to achieve complex transformations.
Understanding aggregation enables sophisticated analytics in MongoDB.

Exercise

Create aggregation pipelines for data analysis.

// Sample data for aggregation
db.sales.insertMany([
  { product: "Laptop", amount: 999, region: "North", date: new Date("2024-01-15") },
  { product: "Phone", amount: 699, region: "South", date: new Date("2024-01-16") },
  { product: "Laptop", amount: 999, region: "North", date: new Date("2024-01-17") },
  { product: "Tablet", amount: 399, region: "East", date: new Date("2024-01-18") },
  { product: "Phone", amount: 699, region: "West", date: new Date("2024-01-19") },
  { product: "Laptop", amount: 999, region: "South", date: new Date("2024-01-20") }
])

// Basic aggregation - total sales by product
db.sales.aggregate([
  { $group: { 
    _id: "$product", 
    totalSales: { $sum: "$amount" },
    count: { $sum: 1 }
  }},
  { $sort: { totalSales: -1 }}
])

// Sales by region with filtering
db.sales.aggregate([
  { $match: { amount: { $gte: 500 } }},
  { $group: { 
    _id: "$region", 
    totalSales: { $sum: "$amount" },
    avgAmount: { $avg: "$amount" }
  }},
  { $sort: { totalSales: -1 }}
])

// Complex aggregation with multiple stages
db.sales.aggregate([
  // Stage 1: Filter by date range
  { $match: { 
    date: { 
      $gte: new Date("2024-01-15"), 
      $lte: new Date("2024-01-20") 
    }
  }},
  
  // Stage 2: Group by product and region
  { $group: { 
    _id: { 
      product: "$product", 
      region: "$region" 
    },
    totalSales: { $sum: "$amount" },
    count: { $sum: 1 }
  }},
  
  // Stage 3: Project to reshape output
  { $project: {
    _id: 0,
    product: "$_id.product",
    region: "$_id.region",
    totalSales: 1,
    count: 1,
    avgSale: { $divide: ["$totalSales", "$count"] }
  }},
  
  // Stage 4: Sort by total sales
  { $sort: { totalSales: -1 }},
  
  // Stage 5: Limit to top 5
  { $limit: 5 }
])

// Lookup example (if we have another collection)
db.products.insertMany([
  { _id: "Laptop", category: "Electronics", price: 999 },
  { _id: "Phone", category: "Electronics", price: 699 },
  { _id: "Tablet", category: "Electronics", price: 399 }
])

// Join sales with products
db.sales.aggregate([
  { $lookup: {
    from: "products",
    localField: "product",
    foreignField: "_id",
    as: "productInfo"
  }},
  { $unwind: "$productInfo" },
  { $project: {
    product: 1,
    amount: 1,
    region: 1,
    category: "$productInfo.category",
    price: "$productInfo.price"
  }}
])

Exercise Tips

Use $lookup for joins: { $lookup: { from: 'collection', localField: 'field', foreignField: 'field', as: 'alias' } }.
Use $unwind after $lookup: { $unwind: '$arrayField' } to flatten arrays.
Use $facet for multiple pipelines: { $facet: { pipeline1: [...], pipeline2: [...] } }.
Use expression operators: { $add: ['$field1', '$field2'] } for calculations.

MongoDB Aggregation Pipeline