Project 5: Fun With Diffusion Models

Part 5A

Part 0: Setup

'a man wearing a hat’, num_inference_steps=20
'a man wearing a hat’, num_inference_steps=50

'an oil painting of a snowy mountain village’, num_inference_steps=20
'an oil painting of a snowy mountain village’, num_inference_steps=50
"a rocket ship”, num_inference_steps=20
"a rocket ship”, num_inference_steps=50

It seems that a higher num_inference_steps results in a higher quality image.

I’m using 180 as my random seed.

Part 1: Sampling Loops

1.1 Implementing the Forward Process

1.2 Classical Denoising

1.3 One-Step Denoising

original campanile image

1.4 Iterative Denoising

1.5 Diffusion Model Sampling

1.6 Classifier-Free Guidance (CFG)

1.7 Image-to-image Translation

me
sister and me

campanile image to image translation
me image to image translation
sister and me image to image translation

1.7.1 Editing Hand-Drawn and Web Images

luffy
bear
pig
luffy image to image translation
bear image to image translation
pig image to image translation

1.7.2 Inpainting

grass
ocean
campanile mask
campanile inpainted
grass mask
grass inpainted
ocean mask
ocean inpainted

1.7.3 Text-Conditional Image-to-image Translation

zoro
sanji
rocket to campanile
rocket to zoro
rocket to sanji

1.8 Visual Anagrams

'an oil painting of an old man’ and 'an oil painting of people around a campfire’

'a photo of a hipster barista’ and 'a photo of a dog’

'a lithograph of waterfalls’ and 'a lithograph of a skull’

1.9 Hybrid Images

'a lithograph of waterfalls’ + 'a lithograph of a skull’
‘a rocket ship’ + 'an oil painting of a snowy mountain village’
'an oil painting of people around a campfire’ + 'an oil painting of an old man’

Part 2: Bells & Whistles

Course Logo

created using gemini

Part 5B

In this sub-project, I create a diffusion model to generate images from the MNIST dataset.

Part 1: Training a Single-Step Denoising UNet

1.1 Implementing the UNet

1.2 Using the UNet to Train a Denoiser

1.2.1 Training

results after epoch 1
results after epoch 5

1.2.2 Out-of-Distribution Testing

Part 2: Training a Diffusion Model

From the previous part, we saw that the current UNet implementation is not sufficient enough to successfully denoise images that have significant amounts of noise. We need to create a proper diffusion model! This involves sampling a purely noisy image and generating a realistic image from it. We can do this by iteratively denoising an image.

2.1 Adding Time Conditioning to UNet

We can add a fully conditioned block to our UNet to inject the conditioning signal.

2.2 Training the UNet

2.3 Sampling from the UNet

2.4 Adding Class-Conditioning to UNet

2.5 Sampling from the Class-Conditioned UNet

epoch 5 samples

epoch 20 samples