AI for Time Series Analysis

📚 Lesson 14 of 15 ⏱️ 90 min

AI for Time Series Analysis

90 min

Time series analysis involves data points collected over time intervals, enabling prediction, pattern recognition, and anomaly detection. Time series data has temporal structure—values depend on previous values, trends, and seasonal patterns. Understanding time series enables you to forecast future values, detect anomalies, and identify patterns. Time series analysis is essential for finance (stock prices), weather (temperature), sales (demand forecasting), and many other domains. AI techniques excel at capturing complex temporal patterns.

AI techniques for time series include recurrent neural networks (RNNs), LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit), and transformer models. RNNs process sequences by maintaining hidden states that capture temporal information. LSTMs use gates to control information flow, enabling learning of long-term dependencies. GRUs are simpler than LSTMs but often perform similarly. Transformers use attention mechanisms to capture temporal relationships. Understanding these techniques enables you to choose appropriate models. Each technique has strengths for different time series characteristics.

Applications include forecasting (predicting future values), anomaly detection (identifying unusual patterns), pattern recognition (identifying recurring patterns), and classification (categorizing time series). Forecasting predicts future values based on historical data. Anomaly detection identifies outliers or unusual patterns. Pattern recognition finds recurring patterns (seasonality, trends). Classification categorizes time series into classes. Understanding applications enables you to identify opportunities. Time series analysis is widely used across industries.

Time series data requires special handling for temporal dependencies, including proper train/test splits (temporal, not random), feature engineering (lag features, rolling statistics), and handling non-stationarity (trends, seasonality). Temporal dependencies mean future values depend on past values, requiring sequential processing. Train/test splits must respect temporal order—don't use future data to predict past. Feature engineering creates lag features and rolling statistics. Non-stationarity requires differencing or detrending. Understanding temporal handling enables effective time series analysis.

Challenges in time series include non-stationarity (changing statistical properties over time), seasonality (recurring patterns), missing data, and multiple scales (short-term and long-term patterns). Non-stationarity requires preprocessing (differencing, detrending). Seasonality requires seasonal decomposition or seasonal models. Missing data requires imputation strategies. Multiple scales require models that capture both short and long-term patterns. Understanding challenges enables you to address them effectively. Time series can be complex, requiring careful preprocessing and modeling.

Best practices include using appropriate train/test splits (temporal), engineering temporal features (lags, rolling statistics), handling non-stationarity, choosing appropriate models for your data characteristics, and validating on out-of-sample data. Understanding time series analysis enables you to build effective forecasting and analysis systems. Time series analysis requires understanding both the data and appropriate techniques. Proper preprocessing and model selection are crucial for success.

Key Concepts

Time series analysis involves data points collected over time.
AI techniques include RNNs, LSTM, GRU, and transformers.
Applications include forecasting, anomaly detection, and pattern recognition.
Time series requires special handling for temporal dependencies.
Time series faces challenges in non-stationarity and seasonality.

Learning Objectives

Master

Understanding time series concepts and characteristics
Using RNNs and LSTMs for time series forecasting
Handling temporal dependencies and non-stationarity
Implementing time series forecasting and anomaly detection

Develop

Time series thinking
Understanding temporal patterns and dependencies
Designing effective time series analysis systems

Tips

Use temporal train/test splits—don't randomly split time series data.
Engineer temporal features (lags, rolling statistics) for better performance.
Handle non-stationarity through differencing or detrending.
Choose models appropriate for your data characteristics.

Common Pitfalls

Randomly splitting time series data, causing data leakage.
Not handling non-stationarity, causing poor model performance.
Ignoring seasonality, missing important patterns.
Not validating on out-of-sample data, overestimating performance.

Summary

Time series analysis involves data collected over time intervals.
AI techniques (RNNs, LSTM, transformers) capture temporal patterns.
Applications include forecasting, anomaly detection, and pattern recognition.
Time series requires special handling for temporal dependencies.
Understanding time series enables effective forecasting and analysis.

Exercise

Implement time series forecasting using LSTM networks.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.optimizers import Adam

# Generate synthetic time series data
np.random.seed(42)
n_points = 1000
t = np.linspace(0, 20, n_points)

# Create complex time series with trend, seasonality, and noise
trend = 0.1 * t
seasonality = 2 * np.sin(2 * np.pi * t / 4) + 1.5 * np.sin(2 * np.pi * t / 2)
noise = np.random.normal(0, 0.3, n_points)
data = trend + seasonality + noise

# Create DataFrame
df = pd.DataFrame({
    'timestamp': pd.date_range('2020-01-01', periods=n_points, freq='D'),
    'value': data
})

print("Time Series Data Overview:")
print(f"Data points: {len(df)}")
print(f"Date range: {df['timestamp'].min()} to {df['timestamp'].max()}")
print(f"Mean value: {df['value'].mean():.2f}")
print(f"Standard deviation: {df['value'].std():.2f}")

# Plot original data
plt.figure(figsize=(12, 6))
plt.plot(df['timestamp'], df['value'])
plt.title('Original Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3)
plt.show()

# Prepare data for LSTM
def create_sequences(data, seq_length):
    """Create sequences for LSTM input"""
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:(i + seq_length)])
        y.append(data[i + seq_length])
    return np.array(X), np.array(y)

# Normalize data
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df[['value']])

# Create sequences
sequence_length = 20
X, y = create_sequences(scaled_data, sequence_length)

# Split data
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

print(f"Training sequences: {X_train.shape}")
print(f"Testing sequences: {X_test.shape}")

# Build LSTM model
model = Sequential([
    LSTM(50, return_sequences=True, input_shape=(sequence_length, 1)),
    Dropout(0.2),
    LSTM(50, return_sequences=False),
    Dropout(0.2),
    Dense(1)
])

model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
print("Model Summary:")
model.summary()

# Train model
history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Training History')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'][-20:], label='Training Loss (Last 20 epochs)')
plt.plot(history.history['val_loss'][-20:], label='Validation Loss (Last 20 epochs)')
plt.title('Training History (Last 20 Epochs)')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

# Make predictions
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)

# Inverse transform predictions
train_predict = scaler.inverse_transform(train_predict)
y_train_inv = scaler.inverse_transform(y_train)
test_predict = scaler.inverse_transform(test_predict)
y_test_inv = scaler.inverse_transform(y_test)

# Calculate metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error

train_rmse = np.sqrt(mean_squared_error(y_train_inv, train_predict))
test_rmse = np.sqrt(mean_squared_error(y_test_inv, test_predict))
train_mae = mean_absolute_error(y_train_inv, train_predict)
test_mae = mean_absolute_error(y_test_inv, test_predict)

print("\nModel Performance:")
print(f"Training RMSE: {train_rmse:.4f}")
print(f"Testing RMSE: {test_rmse:.4f}")
print(f"Training MAE: {train_mae:.4f}")
print(f"Testing MAE: {test_mae:.4f}")

# Plot predictions
plt.figure(figsize=(15, 8))

# Training predictions
train_indices = range(sequence_length, len(df) - len(X_test))
plt.subplot(2, 1, 1)
plt.plot(df['timestamp'], df['value'], label='Actual', alpha=0.7)
plt.plot(df['timestamp'].iloc[train_indices], train_predict.flatten(), 
         label='Training Predictions', alpha=0.8)
plt.title('Training Data Predictions')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)

# Testing predictions
test_indices = range(len(df) - len(X_test), len(df))
plt.subplot(2, 1, 2)
plt.plot(df['timestamp'], df['value'], label='Actual', alpha=0.7)
plt.plot(df['timestamp'].iloc[test_indices], test_predict.flatten(), 
         label='Testing Predictions', alpha=0.8)
plt.title('Testing Data Predictions')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Future forecasting
def forecast_future(model, last_sequence, steps, scaler):
    """Forecast future values"""
    future_predictions = []
    current_sequence = last_sequence.copy()
    
    for _ in range(steps):
        # Predict next value
        next_pred = model.predict(current_sequence.reshape(1, sequence_length, 1))
        future_predictions.append(next_pred[0, 0])
        
        # Update sequence
        current_sequence = np.roll(current_sequence, -1)
        current_sequence[-1] = next_pred[0, 0]
    
    return np.array(future_predictions)

# Get last sequence for forecasting
last_sequence = scaled_data[-sequence_length:]
future_steps = 30

# Make future predictions
future_scaled = forecast_future(model, last_sequence, future_steps, scaler)
future_predictions = scaler.inverse_transform(future_scaled.reshape(-1, 1))

# Create future dates
last_date = df['timestamp'].iloc[-1]
future_dates = pd.date_range(last_date + pd.Timedelta(days=1), 
                           periods=future_steps, freq='D')

# Plot forecast
plt.figure(figsize=(12, 6))
plt.plot(df['timestamp'], df['value'], label='Historical Data', alpha=0.7)
plt.plot(future_dates, future_predictions, label='Forecast', 
         color='red', linestyle='--', linewidth=2)
plt.title('Time Series Forecast')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.show()

print("\nTime Series Analysis Considerations:")
print("1. Seasonality: Identify and model periodic patterns")
print("2. Trend: Capture long-term changes in data")
print("3. Stationarity: Ensure data properties are consistent over time")
print("4. Feature Engineering: Create time-based features")
print("5. Validation: Use time-aware cross-validation techniques")

Code Editor

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.optimizers import Adam

# Generate synthetic time series data
np.random.seed(42)
n_points = 1000
t = np.linspace(0, 20, n_points)

# Create complex time series with trend, seasonality, and noise
trend = 0.1 * t
seasonality = 2 * np.sin(2 * np.pi * t / 4) + 1.5 * np.sin(2 * np.pi * t / 2)
noise = np.random.normal(0, 0.3, n_points)
data = trend + seasonality + noise

# Create DataFrame
df = pd.DataFrame({
    'timestamp': pd.date_range('2020-01-01', periods=n_points, freq='D'),
    'value': data
})

print("Time Series Data Overview:")
print(f"Data points: {len(df)}")
print(f"Date range: {df['timestamp'].min()} to {df['timestamp'].max()}")
print(f"Mean value: {df['value'].mean():.2f}")
print(f"Standard deviation: {df['value'].std():.2f}")

# Plot original data
plt.figure(figsize=(12, 6))
plt.plot(df['timestamp'], df['value'])
plt.title('Original Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3)
plt.show()

# Prepare data for LSTM
def create_sequences(data, seq_length):
    """Create sequences for LSTM input"""
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:(i + seq_length)])
        y.append(data[i + seq_length])
    return np.array(X), np.array(y)

# Normalize data
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df[['value']])

# Create sequences
sequence_length = 20
X, y = create_sequences(scaled_data, sequence_length)

# Split data
train_size = int(len(X) * 0.8)
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

print(f"Training sequences: {X_train.shape}")
print(f"Testing sequences: {X_test.shape}")

# Build LSTM model
model = Sequential([
    LSTM(50, return_sequences=True, input_shape=(sequence_length, 1)),
    Dropout(0.2),
    LSTM(50, return_sequences=False),
    Dropout(0.2),
    Dense(1)
])

model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
print("Model Summary:")
model.summary()

# Train model
history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

# Plot training history
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Training History')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'][-20:], label='Training Loss (Last 20 epochs)')
plt.plot(history.history['val_loss'][-20:], label='Validation Loss (Last 20 epochs)')
plt.title('Training History (Last 20 Epochs)')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

# Make predictions
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)

# Inverse transform predictions
train_predict = scaler.inverse_transform(train_predict)
y_train_inv = scaler.inverse_transform(y_train)
test_predict = scaler.inverse_transform(test_predict)
y_test_inv = scaler.inverse_transform(y_test)

# Calculate metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error

train_rmse = np.sqrt(mean_squared_error(y_train_inv, train_predict))
test_rmse = np.sqrt(mean_squared_error(y_test_inv, test_predict))
train_mae = mean_absolute_error(y_train_inv, train_predict)
test_mae = mean_absolute_error(y_test_inv, test_predict)

print("\nModel Performance:")
print(f"Training RMSE: {train_rmse:.4f}")
print(f"Testing RMSE: {test_rmse:.4f}")
print(f"Training MAE: {train_mae:.4f}")
print(f"Testing MAE: {test_mae:.4f}")

# Plot predictions
plt.figure(figsize=(15, 8))

# Training predictions
train_indices = range(sequence_length, len(df) - len(X_test))
plt.subplot(2, 1, 1)
plt.plot(df['timestamp'], df['value'], label='Actual', alpha=0.7)
plt.plot(df['timestamp'].iloc[train_indices], train_predict.flatten(), 
         label='Training Predictions', alpha=0.8)
plt.title('Training Data Predictions')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)

# Testing predictions
test_indices = range(len(df) - len(X_test), len(df))
plt.subplot(2, 1, 2)
plt.plot(df['timestamp'], df['value'], label='Actual', alpha=0.7)
plt.plot(df['timestamp'].iloc[test_indices], test_predict.flatten(), 
         label='Testing Predictions', alpha=0.8)
plt.title('Testing Data Predictions')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Future forecasting
def forecast_future(model, last_sequence, steps, scaler):
    """Forecast future values"""
    future_predictions = []
    current_sequence = last_sequence.copy()
    
    for _ in range(steps):
        # Predict next value
        next_pred = model.predict(current_sequence.reshape(1, sequence_length, 1))
        future_predictions.append(next_pred[0, 0])
        
        # Update sequence
        current_sequence = np.roll(current_sequence, -1)
        current_sequence[-1] = next_pred[0, 0]
    
    return np.array(future_predictions)

# Get last sequence for forecasting
last_sequence = scaled_data[-sequence_length:]
future_steps = 30

# Make future predictions
future_scaled = forecast_future(model, last_sequence, future_steps, scaler)
future_predictions = scaler.inverse_transform(future_scaled.reshape(-1, 1))

# Create future dates
last_date = df['timestamp'].iloc[-1]
future_dates = pd.date_range(last_date + pd.Timedelta(days=1), 
                           periods=future_steps, freq='D')

# Plot forecast
plt.figure(figsize=(12, 6))
plt.plot(df['timestamp'], df['value'], label='Historical Data', alpha=0.7)
plt.plot(future_dates, future_predictions, label='Forecast', 
         color='red', linestyle='--', linewidth=2)
plt.title('Time Series Forecast')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.show()

print("\nTime Series Analysis Considerations:")
print("1. Seasonality: Identify and model periodic patterns")
print("2. Trend: Capture long-term changes in data")
print("3. Stationarity: Ensure data properties are consistent over time")
print("4. Feature Engineering: Create time-based features")
print("5. Validation: Use time-aware cross-validation techniques")

AI for Time Series Analysis

Key Concepts

Learning Objectives

Master

Develop

Tips

Common Pitfalls

Summary

Exercise

Code Editor

Output