← Back to Curriculum

Generative AI and GANs

📚 Lesson 9 of 15 ⏱ 110 min

Generative AI and GANs

110 min

Generative AI creates new content like images, text, music, and video that resembles training data. Unlike discriminative models that classify or predict, generative models learn the distribution of data and can sample new examples. Generative AI has revolutionized creative applications, enabling AI-generated art, music, writing, and design. Understanding generative AI enables you to build systems that create new content. Modern generative models (GANs, VAEs, diffusion models, transformers) achieve remarkable quality.

Generative Adversarial Networks (GANs) use two competing networks—a generator and a discriminator—trained in an adversarial process. The generator creates fake data, trying to fool the discriminator. The discriminator tries to distinguish real from fake data. This competition drives both networks to improve: the generator learns to create realistic data, and the discriminator learns to detect fakes. Understanding GANs enables you to build generative models. GANs can be challenging to train due to instability, but they produce high-quality results.

Variational Autoencoders (VAEs) learn latent representations for generation by encoding data into a latent space and decoding back to data space. VAEs learn a probabilistic latent space, enabling smooth interpolation and generation. VAEs are more stable than GANs but may produce blurrier results. Understanding VAEs provides an alternative approach to generation. VAEs combine autoencoders with variational inference, learning meaningful latent representations.

Modern generative models include diffusion models (DALL-E, Stable Diffusion), which generate images by reversing a noise process, and autoregressive models (GPT for text), which generate sequences token by token. Diffusion models have achieved state-of-the-art image generation. Autoregressive models excel at text and sequential data. Understanding different generative approaches enables you to choose appropriate methods. Each approach has strengths for different types of data.

Generative models have applications in art, design, content creation, data augmentation, and synthetic data generation. AI-generated art, music, and writing are becoming increasingly common. Generative models can augment training data, creating synthetic examples to improve model performance. Understanding generative applications enables you to identify opportunities. Generative AI raises important questions about creativity, authorship, and authenticity.

Training generative models requires careful design, proper loss functions, and often significant computational resources. GANs require balancing generator and discriminator training. VAEs require proper regularization of the latent space. Diffusion models require many denoising steps. Understanding training challenges enables you to build effective generative models. Best practices include using established architectures, proper initialization, careful hyperparameter tuning, and monitoring training stability. Understanding generative AI enables you to build creative AI systems.

Key Concepts

  • Generative AI creates new content resembling training data.
  • GANs use competing generator and discriminator networks.
  • VAEs learn latent representations for generation.
  • Diffusion models and autoregressive models are modern approaches.
  • Generative models have applications in art, design, and content creation.

Learning Objectives

Master

  • Understanding generative AI concepts and applications
  • Implementing GANs for content generation
  • Understanding VAEs and other generative approaches
  • Applying generative models to creative tasks

Develop

  • Generative AI thinking
  • Understanding creative AI applications
  • Designing effective generative models

Tips

  • Start with simple GANs before tackling complex architectures.
  • Balance generator and discriminator training carefully.
  • Use established architectures (DCGAN, StyleGAN) as starting points.
  • Monitor training stability—GANs can be unstable.

Common Pitfalls

  • Not balancing generator and discriminator, causing training instability.
  • Using GANs when simpler generative models would suffice.
  • Not understanding that generative models require significant resources.
  • Ignoring ethical considerations of AI-generated content.

Summary

  • Generative AI creates new content like images, text, and music.
  • GANs use competing networks for adversarial training.
  • VAEs and diffusion models provide alternative approaches.
  • Understanding generative AI enables creative AI applications.
  • Generative models require careful training and significant resources.

Exercise

Implement a simple GAN for generating synthetic data.

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt

# Generator Network
class Generator(nn.Module):
    def __init__(self, latent_dim=100, hidden_dim=128):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim),
            nn.ReLU(True),
            nn.Linear(hidden_dim, hidden_dim * 2),
            nn.ReLU(True),
            nn.Linear(hidden_dim * 2, hidden_dim * 4),
            nn.ReLU(True),
            nn.Linear(hidden_dim * 4, 784),  # 28x28 = 784
            nn.Tanh()
        )
    
    def forward(self, x):
        return self.main(x)

# Discriminator Network
class Discriminator(nn.Module):
    def __init__(self, hidden_dim=128):
        super(Discriminator, self).__init__()
        self.main = nn.Sequential(
            nn.Linear(784, hidden_dim * 4),
            nn.LeakyReLU(0.2, True),
            nn.Linear(hidden_dim * 4, hidden_dim * 2),
            nn.LeakyReLU(0.2, True),
            nn.Linear(hidden_dim * 2, hidden_dim),
            nn.LeakyReLU(0.2, True),
            nn.Linear(hidden_dim, 1),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        return self.main(x)

# Training setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
latent_dim = 100

generator = Generator(latent_dim).to(device)
discriminator = Discriminator().to(device)

criterion = nn.BCELoss()
g_optimizer = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
d_optimizer = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

# Training loop
num_epochs = 100
batch_size = 64
d_losses = []
g_losses = []

for epoch in range(num_epochs):
    for i in range(100):  # 100 batches per epoch
        # Train Discriminator
        d_optimizer.zero_grad()
        
        # Real data (synthetic for demonstration)
        real_data = torch.randn(batch_size, 784).to(device)
        real_labels = torch.ones(batch_size, 1).to(device)
        
        # Fake data
        noise = torch.randn(batch_size, latent_dim).to(device)
        fake_data = generator(noise)
        fake_labels = torch.zeros(batch_size, 1).to(device)
        
        # Combine real and fake data
        all_data = torch.cat([real_data, fake_data])
        all_labels = torch.cat([real_labels, fake_labels])
        
        # Train discriminator
        d_output = discriminator(all_data)
        d_loss = criterion(d_output, all_labels)
        d_loss.backward()
        d_optimizer.step()
        
        # Train Generator
        g_optimizer.zero_grad()
        
        # Generate new fake data
        noise = torch.randn(batch_size, latent_dim).to(device)
        fake_data = generator(noise)
        fake_labels = torch.ones(batch_size, 1).to(device)  # Trick discriminator
        
        # Train generator
        g_output = discriminator(fake_data)
        g_loss = criterion(g_output, fake_labels)
        g_loss.backward()
        g_optimizer.step()
        
        if i == 0:  # Record first batch of each epoch
            d_losses.append(d_loss.item())
            g_losses.append(g_loss.item())
    
    if epoch % 20 == 0:
        print(f'Epoch {epoch}, D Loss: {d_losses[-1]:.4f}, G Loss: {g_losses[-1]:.4f}')

# Generate and display samples
generator.eval()
with torch.no_grad():
    noise = torch.randn(16, latent_dim).to(device)
    fake_images = generator(noise).cpu()
    
    # Reshape to 28x28
    fake_images = fake_images.view(16, 28, 28)
    
    # Display samples
    plt.figure(figsize=(8, 8))
    for i in range(16):
        plt.subplot(4, 4, i + 1)
        plt.imshow(fake_images[i], cmap='gray')
        plt.axis('off')
    plt.suptitle('Generated Images')
    plt.show()

# Plot training progress
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(d_losses, label='Discriminator Loss')
plt.title('Discriminator Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(g_losses, label='Generator Loss')
plt.title('Generator Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()

print("GAN training completed!")

Code Editor

Output