Data Analysis and Reporting Queries
55 minSQL is powerful for data analysis and generating business reports, enabling you to extract insights from data directly in the database. SQL's aggregation, grouping, window functions, and analytical capabilities make it ideal for reporting and analysis. Understanding SQL for analysis enables you to answer business questions efficiently. SQL can handle complex analytical queries that would be difficult in application code. Modern SQL databases provide extensive analytical functions.
Pivot queries transform rows into columns for cross-tabulation reports, enabling you to create summary tables with categories as columns. PIVOT operations (available in SQL Server, Oracle) rotate data from rows to columns. In databases without PIVOT, you can simulate it with CASE statements and aggregation. Understanding pivot queries enables you to create matrix-style reports. Pivot queries are useful for comparing categories side-by-side.
Time-series analysis tracks trends over time using date functions, enabling you to analyze data by time periods (daily, weekly, monthly, yearly). Date functions (DATE_TRUNC, DATE_PART, EXTRACT, etc.) extract time components. GROUP BY with date functions creates time-based aggregations. Window functions enable running totals, moving averages, and period-over-period comparisons. Understanding time-series analysis enables you to identify trends and patterns.
Cohort analysis groups users or entities by a common characteristic (e.g., signup date) and tracks their behavior over time. Cohort analysis enables you to understand retention, lifetime value, and behavior patterns. SQL window functions and date functions are essential for cohort analysis. Understanding cohort analysis enables you to answer questions like 'How do users who signed up in January behave compared to February users?'
Ranking and top-N queries identify top performers, outliers, and distributions. Window functions (RANK, DENSE_RANK, ROW_NUMBER) enable ranking. TOP N queries (LIMIT, TOP, ROWNUM) retrieve top results. Percentile calculations identify distributions. Understanding ranking queries enables you to identify best/worst performers and outliers.
Best practices include using window functions for analytical queries, using date functions for time-series analysis, optimizing analytical queries with proper indexes, understanding that analytical queries can be resource-intensive, and documenting complex analytical queries. Understanding data analysis and reporting queries enables you to extract valuable insights from data.
Key Concepts
- SQL is powerful for data analysis and business reporting.
- Pivot queries transform rows into columns for cross-tabulation.
- Time-series analysis tracks trends over time using date functions.
- Cohort analysis groups entities and tracks behavior over time.
- Ranking queries identify top performers and distributions.
Learning Objectives
Master
- Creating pivot queries for cross-tabulation reports
- Performing time-series analysis with date functions
- Conducting cohort analysis
- Writing ranking and top-N queries
Develop
- Data analysis thinking
- Understanding business intelligence concepts
- Designing effective reporting queries
Tips
- Use window functions for analytical queries (ranking, running totals).
- Use date functions for time-series analysis (GROUP BY date components).
- Optimize analytical queries with proper indexes.
- Understand that analytical queries can be resource-intensive.
Common Pitfalls
- Writing inefficient analytical queries, causing performance issues.
- Not using window functions when they would simplify queries.
- Not understanding date functions, making time-series analysis difficult.
- Not optimizing analytical queries, causing slow reports.
Summary
- SQL is powerful for data analysis and business reporting.
- Pivot queries enable cross-tabulation reports.
- Time-series analysis tracks trends over time.
- Understanding analytical queries enables valuable insights.
- Analytical queries require careful optimization.
Exercise
Create a comprehensive sales report with monthly trends and customer analysis.
-- Monthly sales report with year-over-year comparison
WITH MonthlySales AS (
SELECT
YEAR(OrderDate) as Year,
MONTH(OrderDate) as Month,
SUM(TotalAmount) as MonthlyTotal,
COUNT(DISTINCT CustomerID) as UniqueCustomers
FROM Orders
GROUP BY YEAR(OrderDate), MONTH(OrderDate)
),
YearlyTotals AS (
SELECT
Year,
SUM(MonthlyTotal) as YearlyTotal,
AVG(MonthlyTotal) as AvgMonthlyTotal
FROM MonthlySales
GROUP BY Year
)
SELECT
ms.Year,
ms.Month,
ms.MonthlyTotal,
ms.UniqueCustomers,
yt.YearlyTotal,
yt.AvgMonthlyTotal,
ROUND((ms.MonthlyTotal / yt.AvgMonthlyTotal - 1) * 100, 2) as MonthlyVariance
FROM MonthlySales ms
JOIN YearlyTotals yt ON ms.Year = yt.Year
ORDER BY ms.Year DESC, ms.Month DESC;
-- Customer cohort analysis (retention by signup month)
WITH CustomerCohorts AS (
SELECT
CustomerID,
DATE_FORMAT(MIN(OrderDate), '%Y-%m-01') as CohortMonth,
OrderDate,
TIMESTAMPDIFF(MONTH, MIN(OrderDate), OrderDate) as MonthsSinceFirstOrder
FROM Orders
GROUP BY CustomerID, OrderDate
)
SELECT
CohortMonth,
COUNT(DISTINCT CASE WHEN MonthsSinceFirstOrder = 0 THEN CustomerID END) as Month0,
COUNT(DISTINCT CASE WHEN MonthsSinceFirstOrder = 1 THEN CustomerID END) as Month1,
COUNT(DISTINCT CASE WHEN MonthsSinceFirstOrder = 2 THEN CustomerID END) as Month2,
COUNT(DISTINCT CASE WHEN MonthsSinceFirstOrder = 3 THEN CustomerID END) as Month3
FROM CustomerCohorts
GROUP BY CohortMonth
ORDER BY CohortMonth DESC;