Back to Curriculum

Advanced Data Visualization

📚 Lesson 9 of 10 ⏱️ 70 min

Advanced Data Visualization

70 min

Pandas integrates seamlessly with matplotlib and seaborn for data visualization, enabling you to create publication-quality charts and graphs. Pandas plotting methods wrap Matplotlib functionality, while Seaborn provides statistical visualizations. This integration enables comprehensive data visualization workflows. Understanding visualization integration enables effective data communication. Visualization is essential for data science.

Advanced plotting techniques help communicate complex data insights effectively by using appropriate chart types, combining multiple visualizations, and highlighting key patterns. Advanced techniques include subplots, overlays, annotations, and multi-panel layouts. Understanding advanced techniques enables sophisticated visualizations. Advanced techniques are essential for complex data.

Custom styling and interactive plots enhance data presentation by making visualizations more engaging and informative. Custom styling includes colors, fonts, themes, and layouts. Interactive plots (using Plotly or Bokeh) enable exploration. Understanding styling enables professional visualizations. Styling improves communication effectiveness.

Understanding visualization best practices improves data storytelling by ensuring visualizations are clear, accurate, and compelling. Best practices include choosing appropriate chart types, using clear labels, avoiding clutter, and highlighting key insights. Understanding best practices enables effective communication. Best practices are essential for quality visualizations.

Time series visualization uses line charts, area charts, and heatmaps to show temporal patterns. Time series visualizations enable identifying trends, seasonality, and anomalies. Understanding time series visualization enables temporal analysis. Time series visualization is essential for time-based data.

Statistical visualizations (distributions, correlations, regressions) use histograms, box plots, scatter plots, and heatmaps to reveal data relationships. Statistical visualizations enable understanding data distributions and relationships. Understanding statistical visualization enables data exploration. Statistical visualization is essential for analysis.

Key Concepts

  • Pandas integrates seamlessly with matplotlib and seaborn for visualization.
  • Advanced plotting techniques communicate complex insights effectively.
  • Custom styling and interactive plots enhance data presentation.
  • Understanding visualization best practices improves data storytelling.
  • Time series and statistical visualizations reveal data patterns.

Learning Objectives

Master

  • Creating advanced visualizations with Pandas, Matplotlib, and Seaborn
  • Applying custom styling and themes
  • Creating time series and statistical visualizations
  • Following visualization best practices

Develop

  • Understanding data visualization principles
  • Designing effective visualizations
  • Appreciating visualization's role in data storytelling

Tips

  • Use Seaborn for statistical visualizations—it's built on Matplotlib.
  • Use subplots for comparing multiple visualizations.
  • Customize colors, fonts, and styles for professional appearance.
  • Use annotations to highlight key insights in plots.

Common Pitfalls

  • Using wrong chart types, confusing viewers.
  • Creating cluttered visualizations, obscuring insights.
  • Not labeling axes or adding titles, making plots unclear.
  • Not following best practices, creating misleading visualizations.

Summary

  • Pandas integrates seamlessly with matplotlib and seaborn.
  • Advanced plotting techniques communicate complex insights.
  • Custom styling enhances data presentation.
  • Understanding visualization best practices improves storytelling.
  • Visualization is essential for effective data communication.

Exercise

Create comprehensive data visualizations using pandas with matplotlib and seaborn.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from datetime import datetime, timedelta

# Set style for better-looking plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Create sample sales data
np.random.seed(42)
dates = pd.date_range('2024-01-01', periods=365, freq='D')
stores = ['Store_A', 'Store_B', 'Store_C']
products = ['Laptop', 'Phone', 'Tablet', 'Monitor']

data = []
for date in dates:
    for store in stores:
        for product in products:
            sales = np.random.poisson(50)
            revenue = sales * np.random.uniform(800, 1200)
            customers = int(sales * np.random.uniform(0.8, 1.2)
            data.append({
                'Date': date,
                'Store': store,
                'Product': product,
                'Sales': sales,
                'Revenue': revenue,
                'Customers': customers
            })

df = pd.DataFrame(data)

# 1. Time series visualization
print("=== Time Series Visualization ===")
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Daily sales trend
daily_sales = df.groupby('Date')['Sales'].sum()
axes[0, 0].plot(daily_sales.index, daily_sales.values, linewidth=2, color='steelblue')
axes[0, 0].set_title('Daily Sales Trend', fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel('Date')
axes[0, 0].set_ylabel('Total Sales')
axes[0, 0].tick_params(axis='x', rotation=45)

# Monthly revenue by store
monthly_revenue = df.groupby([df['Date'].dt.to_period('M'), 'Store'])['Revenue'].sum().unstack()
monthly_revenue.plot(kind='bar', ax=axes[0, 1], width=0.8)
axes[0, 1].set_title('Monthly Revenue by Store', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel('Month')
axes[0, 1].set_ylabel('Revenue ($)')
axes[0, 1].tick_params(axis='x', rotation=45)
axes[0, 1].legend(title='Store')

# Product performance heatmap
product_performance = df.groupby(['Store', 'Product'])['Sales'].sum().unstack()
sns.heatmap(product_performance, annot=True, fmt='d', cmap='YlOrRd', ax=axes[1, 0])
axes[1, 0].set_title('Product Sales by Store', fontsize=14, fontweight='bold')

# Sales distribution by product
product_sales = df.groupby('Product')['Sales'].sum()
colors = plt.cm.Set3(np.linspace(0, 1, len(product_sales)))
axes[1, 1].pie(product_sales.values, labels=product_sales.index, autopct='%1.1f%%', 
                colors=colors, startangle=90)
axes[1, 1].set_title('Sales Distribution by Product', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

# 2. Advanced statistical plots
print("\n=== Advanced Statistical Plots ===")
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Box plot with custom styling
sns.boxplot(data=df, x='Store', y='Revenue', hue='Product', ax=axes[0, 0])
axes[0, 0].set_title('Revenue Distribution by Store and Product', fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel('Store')
axes[0, 0].set_ylabel('Revenue ($)')

# Violin plot for sales distribution
sns.violinplot(data=df, x='Product', y='Sales', ax=axes[0, 1])
axes[0, 1].set_title('Sales Distribution by Product', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel('Product')
axes[0, 1].set_ylabel('Sales')

# Scatter plot with regression line
sns.regplot(data=df, x='Customers', y='Revenue', ax=axes[1, 0], 
            scatter_kws={'alpha': 0.6}, line_kws={'color': 'red'})
axes[1, 0].set_title('Revenue vs Customers with Regression Line', fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel('Number of Customers')
axes[1, 0].set_ylabel('Revenue ($)')

# Correlation heatmap
correlation_matrix = df[['Sales', 'Revenue', 'Customers']].corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, 
            square=True, ax=axes[1, 1])
axes[1, 1].set_title('Correlation Matrix', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

# 3. Interactive and animated plots
print("\n=== Interactive and Animated Plots ===")

# Create animated time series
from matplotlib.animation import FuncAnimation

fig, ax = plt.subplots(figsize=(12, 6))
ax.set_xlim(df['Date'].min(), df['Date'].max())
ax.set_ylim(0, df.groupby('Date')['Sales'].sum().max() * 1.1)
ax.set_xlabel('Date')
ax.set_ylabel('Total Sales')
ax.set_title('Animated Daily Sales Trend', fontsize=14, fontweight='bold')

line, = ax.plot([], [], lw=2, color='steelblue')
ax.grid(True, alpha=0.3)

def animate(frame):
    # Show data up to current frame
    current_date = df['Date'].min() + timedelta(days=frame)
    data_to_show = df[df['Date'] <= current_date]
    daily_sales = data_to_show.groupby('Date')['Sales'].sum()
    
    line.set_data(daily_sales.index, daily_sales.values)
    return line,

# Create animation (uncomment to run)
# anim = FuncAnimation(fig, animate, frames=365, interval=100, blit=True)
# plt.show()

# 4. Custom styling and themes
print("\n=== Custom Styling and Themes ===")

# Create a custom color palette
custom_colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4', '#FFEAA7']

# Create a styled plot
fig, ax = plt.subplots(figsize=(12, 8))

# Prepare data for stacked area chart
monthly_data = df.groupby([df['Date'].dt.to_period('M'), 'Product'])['Sales'].sum().unstack()
monthly_data.index = monthly_data.index.astype(str)

# Create stacked area chart
monthly_data.plot(kind='area', stacked=True, ax=ax, color=custom_colors, alpha=0.8)

ax.set_title('Monthly Sales by Product (Stacked Area Chart)', 
             fontsize=16, fontweight='bold', pad=20)
ax.set_xlabel('Month', fontsize=12, fontweight='bold')
ax.set_ylabel('Sales', fontsize=12, fontweight='bold')
ax.legend(title='Product', title_fontsize=12, fontsize=10)
ax.grid(True, alpha=0.3)

# Customize ticks and labels
ax.tick_params(axis='x', rotation=45, labelsize=10)
ax.tick_params(axis='y', labelsize=10)

# Add value labels on the plot
for i, month in enumerate(monthly_data.index):
    total_sales = monthly_data.loc[month].sum()
    ax.text(i, total_sales + 50, f'{{total_sales:.0f}}', 
            ha='center', va='bottom', fontweight='bold', fontsize=10)

plt.tight_layout()
plt.show()

# 5. Subplot grid with different chart types
print("\n=== Subplot Grid with Different Chart Types ===")

fig = plt.figure(figsize=(20, 12))
gs = fig.add_gridspec(3, 4, hspace=0.3, wspace=0.3)

# 1. Line chart - Daily trends
ax1 = fig.add_subplot(gs[0, :2])
daily_trends = df.groupby('Date')['Sales'].sum()
ax1.plot(daily_trends.index, daily_trends.values, color='#FF6B6B', linewidth=2)
ax1.set_title('Daily Sales Trend', fontsize=14, fontweight='bold')
ax1.set_xlabel('Date')
ax1.set_ylabel('Sales')

# 2. Bar chart - Store comparison
ax2 = fig.add_subplot(gs[0, 2:])
store_totals = df.groupby('Store')['Revenue'].sum()
bars = ax2.bar(store_totals.index, store_totals.values, color=custom_colors[:3])
ax2.set_title('Total Revenue by Store', fontsize=14, fontweight='bold')
ax2.set_ylabel('Revenue ($)')

# Add value labels on bars
for bar in bars:
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height + 1000,
             f'${{height:.0f}}', ha='center', va='bottom', fontweight='bold')

# 3. Pie chart - Product distribution
ax3 = fig.add_subplot(gs[1, :2])
product_dist = df.groupby('Product')['Sales'].sum()
wedges, texts, autotexts = ax3.pie(product_dist.values, labels=product_dist.index, 
                                   autopct='%1.1f%%', colors=custom_colors)
ax3.set_title('Sales Distribution by Product', fontsize=14, fontweight='bold')

# 4. Scatter plot - Sales vs Revenue
ax4 = fig.add_subplot(gs[1, 2:])
scatter = ax4.scatter(df['Sales'], df['Revenue'], c=df['Customers'], 
                      cmap='viridis', alpha=0.6, s=50)
ax4.set_xlabel('Sales')
ax4.set_ylabel('Revenue ($)')
ax4.set_title('Sales vs Revenue (colored by Customers)', fontsize=14, fontweight='bold')
plt.colorbar(scatter, ax=ax4, label='Customers')

# 5. Heatmap - Monthly performance
ax5 = fig.add_subplot(gs[2, :])
monthly_performance = df.groupby([df['Date'].dt.to_period('M'), 'Store'])['Sales'].sum().unstack()
sns.heatmap(monthly_performance, annot=True, fmt='.0f', cmap='RdYlBu_r', ax=ax5)
ax5.set_title('Monthly Sales Performance by Store', fontsize=14, fontweight='bold')
ax5.set_xlabel('Store')
ax5.set_ylabel('Month')

plt.show()

print("\nVisualization completed! These charts demonstrate:")
print("1. Time series analysis with multiple chart types")
print("2. Statistical plots for data distribution")
print("3. Interactive and animated visualizations")
print("4. Custom styling and color schemes")
print("5. Complex subplot layouts for comprehensive analysis")

Code Editor

Output