Back to Curriculum

Data Analysis and Reporting Queries

📚 Lesson 15 of 15 ⏱️ 55 min

Data Analysis and Reporting Queries

55 min

SQL is powerful for data analysis and generating business reports, enabling you to extract insights from data directly in the database. SQL's aggregation, grouping, window functions, and analytical capabilities make it ideal for reporting and analysis. Understanding SQL for analysis enables you to answer business questions efficiently. SQL can handle complex analytical queries that would be difficult in application code. Modern SQL databases provide extensive analytical functions.

Pivot queries transform rows into columns for cross-tabulation reports, enabling you to create summary tables with categories as columns. PIVOT operations (available in SQL Server, Oracle) rotate data from rows to columns. In databases without PIVOT, you can simulate it with CASE statements and aggregation. Understanding pivot queries enables you to create matrix-style reports. Pivot queries are useful for comparing categories side-by-side.

Time-series analysis tracks trends over time using date functions, enabling you to analyze data by time periods (daily, weekly, monthly, yearly). Date functions (DATE_TRUNC, DATE_PART, EXTRACT, etc.) extract time components. GROUP BY with date functions creates time-based aggregations. Window functions enable running totals, moving averages, and period-over-period comparisons. Understanding time-series analysis enables you to identify trends and patterns.

Cohort analysis groups users or entities by a common characteristic (e.g., signup date) and tracks their behavior over time. Cohort analysis enables you to understand retention, lifetime value, and behavior patterns. SQL window functions and date functions are essential for cohort analysis. Understanding cohort analysis enables you to answer questions like 'How do users who signed up in January behave compared to February users?'

Ranking and top-N queries identify top performers, outliers, and distributions. Window functions (RANK, DENSE_RANK, ROW_NUMBER) enable ranking. TOP N queries (LIMIT, TOP, ROWNUM) retrieve top results. Percentile calculations identify distributions. Understanding ranking queries enables you to identify best/worst performers and outliers.

Best practices include using window functions for analytical queries, using date functions for time-series analysis, optimizing analytical queries with proper indexes, understanding that analytical queries can be resource-intensive, and documenting complex analytical queries. Understanding data analysis and reporting queries enables you to extract valuable insights from data.

Key Concepts

  • SQL is powerful for data analysis and business reporting.
  • Pivot queries transform rows into columns for cross-tabulation.
  • Time-series analysis tracks trends over time using date functions.
  • Cohort analysis groups entities and tracks behavior over time.
  • Ranking queries identify top performers and distributions.

Learning Objectives

Master

  • Creating pivot queries for cross-tabulation reports
  • Performing time-series analysis with date functions
  • Conducting cohort analysis
  • Writing ranking and top-N queries

Develop

  • Data analysis thinking
  • Understanding business intelligence concepts
  • Designing effective reporting queries

Tips

  • Use window functions for analytical queries (ranking, running totals).
  • Use date functions for time-series analysis (GROUP BY date components).
  • Optimize analytical queries with proper indexes.
  • Understand that analytical queries can be resource-intensive.

Common Pitfalls

  • Writing inefficient analytical queries, causing performance issues.
  • Not using window functions when they would simplify queries.
  • Not understanding date functions, making time-series analysis difficult.
  • Not optimizing analytical queries, causing slow reports.

Summary

  • SQL is powerful for data analysis and business reporting.
  • Pivot queries enable cross-tabulation reports.
  • Time-series analysis tracks trends over time.
  • Understanding analytical queries enables valuable insights.
  • Analytical queries require careful optimization.

Exercise

Create a comprehensive sales report with monthly trends and customer analysis.

-- Monthly sales report with year-over-year comparison
WITH MonthlySales AS (
    SELECT 
        YEAR(OrderDate) as Year,
        MONTH(OrderDate) as Month,
        SUM(TotalAmount) as MonthlyTotal,
        COUNT(DISTINCT CustomerID) as UniqueCustomers
    FROM Orders
    GROUP BY YEAR(OrderDate), MONTH(OrderDate)
),
YearlyTotals AS (
    SELECT 
        Year,
        SUM(MonthlyTotal) as YearlyTotal,
        AVG(MonthlyTotal) as AvgMonthlyTotal
    FROM MonthlySales
    GROUP BY Year
)
SELECT 
    ms.Year,
    ms.Month,
    ms.MonthlyTotal,
    ms.UniqueCustomers,
    yt.YearlyTotal,
    yt.AvgMonthlyTotal,
    ROUND((ms.MonthlyTotal / yt.AvgMonthlyTotal - 1) * 100, 2) as MonthlyVariance
FROM MonthlySales ms
JOIN YearlyTotals yt ON ms.Year = yt.Year
ORDER BY ms.Year DESC, ms.Month DESC;

-- Customer cohort analysis (retention by signup month)
WITH CustomerCohorts AS (
    SELECT 
        CustomerID,
        DATE_FORMAT(MIN(OrderDate), '%Y-%m-01') as CohortMonth,
        OrderDate,
        TIMESTAMPDIFF(MONTH, MIN(OrderDate), OrderDate) as MonthsSinceFirstOrder
    FROM Orders
    GROUP BY CustomerID, OrderDate
)
SELECT 
    CohortMonth,
    COUNT(DISTINCT CASE WHEN MonthsSinceFirstOrder = 0 THEN CustomerID END) as Month0,
    COUNT(DISTINCT CASE WHEN MonthsSinceFirstOrder = 1 THEN CustomerID END) as Month1,
    COUNT(DISTINCT CASE WHEN MonthsSinceFirstOrder = 2 THEN CustomerID END) as Month2,
    COUNT(DISTINCT CASE WHEN MonthsSinceFirstOrder = 3 THEN CustomerID END) as Month3
FROM CustomerCohorts
GROUP BY CohortMonth
ORDER BY CohortMonth DESC;

Code Editor

Output