Project 5: Fun With Diffusion Models

Part 5A

Part 0: Setup

'a man wearing a hat’, num_inference_steps=20

'a man wearing a hat’, num_inference_steps=50

'an oil painting of a snowy mountain village’, num_inference_steps=20

'an oil painting of a snowy mountain village’, num_inference_steps=50

"a rocket ship”, num_inference_steps=20

"a rocket ship”, num_inference_steps=50

It seems that a higher num_inference_steps results in a higher quality image.

I’m using 180 as my random seed.

Part 1: Sampling Loops

1.1 Implementing the Forward Process

1.2 Classical Denoising

1.3 One-Step Denoising

original campanile image

1.4 Iterative Denoising

1.5 Diffusion Model Sampling

1.6 Classifier-Free Guidance (CFG)

1.7 Image-to-image Translation

me

sister and me

campanile image to image translation

me image to image translation

sister and me image to image translation

1.7.1 Editing Hand-Drawn and Web Images

luffy

bear

pig

luffy image to image translation

bear image to image translation

pig image to image translation

1.7.2 Inpainting

grass

ocean

campanile mask

campanile inpainted

grass mask

grass inpainted

ocean mask

ocean inpainted

1.7.3 Text-Conditional Image-to-image Translation

zoro

sanji

rocket to campanile

rocket to zoro

rocket to sanji

1.8 Visual Anagrams

'an oil painting of an old man’ and 'an oil painting of people around a campfire’

'a photo of a hipster barista’ and 'a photo of a dog’

'a lithograph of waterfalls’ and 'a lithograph of a skull’

1.9 Hybrid Images

'a lithograph of waterfalls’ + 'a lithograph of a skull’

‘a rocket ship’ + 'an oil painting of a snowy mountain village’

'an oil painting of people around a campfire’ + 'an oil painting of an old man’

Part 2: Bells & Whistles

Course Logo

created using gemini

Part 5B

In this sub-project, I create a diffusion model to generate images from the MNIST dataset.

Part 1: Training a Single-Step Denoising UNet

1.1 Implementing the UNet

1.2 Using the UNet to Train a Denoiser

1.2.1 Training

results after epoch 1

results after epoch 5

1.2.2 Out-of-Distribution Testing

Part 2: Training a Diffusion Model

From the previous part, we saw that the current UNet implementation is not sufficient enough to successfully denoise images that have significant amounts of noise. We need to create a proper diffusion model! This involves sampling a purely noisy image and generating a realistic image from it. We can do this by iteratively denoising an image.

2.1 Adding Time Conditioning to UNet

We can add a fully conditioned block to our UNet to inject the conditioning signal.

2.2 Training the UNet

2.3 Sampling from the UNet

2.4 Adding Class-Conditioning to UNet

2.5 Sampling from the Class-Conditioned UNet

epoch 5 samples

epoch 20 samples