AI Diffusion Models Explained Mathematically in Simple Terms

What Are Diffusion Models?

Imagine you have a clear photo of a cat. Now imagine gradually adding static noise to it, like on an old TV, until it becomes pure random noise. Diffusion models learn to reverse this process - they start with random noise and gradually remove it to create a clear image.

The Core Math (Made Simple)

Forward Process: Adding Noise

The forward process adds noise to data in small steps. Mathematically:

x(t) = √(1-β) × x(t-1) + √β × noise

Where:

x(t) = image at time step t
β = small number (like 0.0001) controlling how much noise to add
noise = random static

Think of it like adding a tiny bit of fog to a window each second until you can't see through it.

Reverse Process: Removing Noise

The magic happens when we reverse this process. The model learns:

x(t-1) = 1/√(1-β) × [x(t) - √β × predicted_noise]

The AI learns to predict what noise was added, then subtracts it!

Why This Works: The Brilliant Insight

The key insight is that removing a little bit of noise is much easier than generating an entire image from scratch. It's like:

Hard: "Draw a cat"
Easy: "This fuzzy image has some static on it, clean it up a bit"

By breaking generation into hundreds of tiny "clean up a bit" steps, we make an impossible problem possible.

The Training Process

Take a real image
Add noise for T steps (typically 1000)
Train the AI to predict the noise at each step
Loss function: How wrong was the noise prediction?

Loss = ||actual_noise - predicted_noise||²

Generating New Images

Once trained, to generate a new image:

Start with pure random noise
For 1000 steps, ask the model: "What noise should I remove?"
Remove that predicted noise
Repeat until you have a clear image

The Score Function (Advanced but Simple)

The model actually learns the "score function" - the gradient of the data distribution:

score = -∇log p(x)

In simple terms: "Which direction should I move to make this look more like a real image?"

Why Diffusion Models Are Special

Compared to GANs:

More stable training
Better mode coverage (more variety)
Easier to control

Compared to VAEs:

Higher quality images
More flexible

Key Mathematical Concepts

1. Markov Chain

Each step only depends on the previous step, not the entire history:

x(999) → x(998) → x(997) → ... → x(0)

2. Gaussian Distribution

The noise added is Gaussian (bell curve) shaped:

Most values near zero
Few extreme values
Natural and mathematically convenient

3. Variance Schedule

We control how much noise to add at each step:

β₁ = 0.0001 (tiny noise at first)
β₁₀₀₀ = 0.02 (more noise later)

Simple Code Example (Conceptual)

# Training
for image in dataset:
    t = random_timestep()
    noise = generate_random_noise()
    noisy_image = add_noise(image, t, noise)
    predicted_noise = model(noisy_image, t)
    loss = mean_squared_error(noise, predicted_noise)
    
# Generation
x = random_noise()
for t in reverse(1000):
    predicted_noise = model(x, t)
    x = remove_noise(x, predicted_noise, t)
return x  # Your generated image!

Real-World Applications

Text-to-Image (DALL-E 2, Stable Diffusion)
- Condition the denoising on text descriptions
Image Editing
- Start denoising from a partially noised image
Video Generation
- Apply diffusion in time dimension too
3D Generation
- Diffusion on 3D voxels or point clouds

The Beautiful Mathematics

The diffusion equation comes from physics (heat diffusion):

∂p/∂t = ∇²p

This describes how heat (or in our case, noise) spreads over time. We're essentially reversing heat flow!

Key Takeaways

Diffusion = Gradual Noising + Denoising
Small steps make hard problems easy
The model learns to predict noise, not images
Physics-inspired math makes it work

Why Should You Care?

Diffusion models are behind:

AI art generators
Photo editing tools
Video generation
3D model creation
Scientific simulations

They're not just another AI technique - they're a fundamental breakthrough in how we think about generation problems.

Understanding Diffusion Models

Understanding Diffusion Models Mathematically

Introduction

Diffusion models are generative models that progressively transform random noise into meaningful data. They work by first adding noise to the data in a controlled manner and then learning to reverse this process to generate realistic samples.

Gaussian Distribution

A Gaussian (normal) distribution is defined as:

\[ \mathcal{N}(x; \mu, \Sigma) = \frac{1}{\sqrt{(2\pi)^d |\Sigma|}} \exp\left(-\frac{1}{2} (x - \mu)^T \Sigma^{-1} (x - \mu)\right) \]

where:

\( x \) is the random variable (vector in high-dimensional space).
\( \mu \) is the mean vector, representing the expected value of \( x \).
\( \Sigma \) is the covariance matrix, which determines the spread and correlation of \( x \) across dimensions.
\( d \) is the number of dimensions.

The covariance matrix \( \Sigma \) determines the shape of the Gaussian distribution. If \( \Sigma = I \) (identity matrix), the distribution is isotropic (same spread in all directions).

Forward Diffusion Process

In the forward process, we add small amounts of Gaussian noise to an image \( x_0 \) over \( T \) time steps until it becomes pure noise \( x_T \). The transition from step \( t-1 \) to \( t \) follows a Gaussian distribution:

\[ q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1 - \beta_t} x_{t-1}, \beta_t I) \]

where:

\( x_t \) is the noisy image at step \( t \).
\( \beta_t \) is a small variance term controlling the amount of noise added.
\( I \) is the identity matrix, meaning noise is added independently to each pixel.

The cumulative effect of noise at any time \( t \) can be written as:

\[ q(x_t | x_0) = \mathcal{N}(x_t; \sqrt{\bar{\alpha}_t} x_0, (1 - \bar{\alpha}_t) I) \]

where:

\( \bar{\alpha}_t = \prod_{s=1}^{t} (1 - \beta_s) \) represents the accumulated noise schedule.

Reverse Process (Denoising)

To generate new images, we reverse the diffusion process by predicting and removing noise. The reverse step follows another Gaussian distribution:

\[ p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \sigma_t^2 I) \]

where:

\( \mu_\theta(x_t, t) \) is the predicted mean of the original image at step \( t \), estimated by a neural network.
\( \sigma_t^2 \) is the variance of the reverse process, typically learned or predefined.

The neural network predicts the noise \( \epsilon_\theta(x_t, t) \), and the denoised mean is computed as:

\[ \mu_\theta(x_t, t) = \frac{1}{\sqrt{1 - \beta_t}} \left( x_t - \beta_t \frac{\epsilon_\theta(x_t, t)}{\sqrt{1 - \bar{\alpha}_t}} \right) \]

Training Objective

The model is trained to predict noise by minimizing the mean squared error (MSE) loss:

\[ L(\theta) = \mathbb{E}_{x_0, \epsilon, t} \left[ ||\epsilon - \epsilon_\theta(x_t, t)||^2 \right] \]

where:

\( \epsilon \) is the actual Gaussian noise added to the data.
\( \epsilon_\theta(x_t, t) \) is the noise predicted by the model.

This loss function ensures that the model learns to correctly remove noise at each step.

Sampling (Generating Images)

To generate an image, we start with pure Gaussian noise \( x_T \) and iteratively apply the denoising process:

\[ x_{t-1} = \mu_\theta(x_t, t) + \sigma_t z, \quad \text{where } z \sim \mathcal{N}(0, I) \]

This gradually transforms random noise into a structured image.

Summary

Forward Process: Adds noise step by step using \( x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon \).
Reverse Process: Uses a neural network to predict noise and gradually remove it.
Loss Function: Trains the model to predict noise by minimizing \( L(\theta) = \mathbb{E}[||\epsilon - \epsilon_\theta(x_t, t)||^2] \).
Sampling: Starts with noise and applies learned denoising steps to generate images.

Simple Linear Regression - and Related Regression Loss Functions

Today's Topics: a. Regression Algorithms b. Outliers - Explained in Simple Terms c. Common Regression Metrics Explained d. Overfitting and Underfitting e. How are Linear and Non Linear Regression Algorithms used in Neural Networks [Future study topics] Regression Algorithms Regression algorithms are a category of machine learning methods used to predict a continuous numerical value. Linear regression is a simple, powerful, and interpretable algorithm for this type of problem. Quick Example: These are the scores of students vs. the hours they spent studying. Looking at this dataset of student scores and their corresponding study hours, can we determine what score someone might achieve after studying for a random number of hours? Example: From the graph, we can estimate that 4 hours of daily study would result in a score near 80. It is a simple example, but for more complex tasks the underlying concept will be similar. If you understand this graph, you will understand this blog. Sim...

Artificial Intelligence Theory and Application

Search This Blog

AI Diffusion Models Explained Mathematically in Simple Terms

AI Diffusion Models Explained Mathematically in Simple Terms

What Are Diffusion Models?

The Core Math (Made Simple)

Forward Process: Adding Noise

Reverse Process: Removing Noise

Why This Works: The Brilliant Insight

The Training Process

Generating New Images

The Score Function (Advanced but Simple)

Why Diffusion Models Are Special

Key Mathematical Concepts

1. Markov Chain

2. Gaussian Distribution

3. Variance Schedule

Simple Code Example (Conceptual)

Real-World Applications

The Beautiful Mathematics

Key Takeaways

Why Should You Care?

Understanding Diffusion Models Mathematically

Introduction

Gaussian Distribution

Forward Diffusion Process

Reverse Process (Denoising)

Training Objective

Sampling (Generating Images)

Summary

Comments

Post a Comment

Popular posts from this blog

Simple Linear Regression - and Related Regression Loss Functions

What problems can AI Neural Networks solve

Activation Functions in Neural Networks