Final Project: Neural Radiance Fields (NeRFs)!

Part 1: Fit a Neural Field to a 2D Image

I first familiarized myself with representing a 2D image using a neural field.

Model Architecture

I created a Multi-layer Perceptron Layer (MLP) Network with Sinusoidal Positional Encoding that takes in 2-dimensional pixel coordinates and outputs the 3-dimensional pixel colors.

This formula represents Sinusoidal Positional Encoding, which helps the network learn high-frequency details better by mapping input coordinates to a higher-dimensional space.

PE(x) = \{x, sin(2^0\pi x), cos(2^0\pi x), ..., sin(2^{L-1}\pi x), cos(2^{L-1}\pi x)\}

For my first set of hyperparameters, I used:

number of layers: 4

channel size: 256

max frequency (L): 10

learning rate: 1e-2

epochs=1000

I used MSE as my loss and Adam as my optimizer.

To visualize my model progress, I used the Peak Signal-to-Noise Ratio (PSNR), a metric that computes the quality of an image versus another version of it.

PSNR = 10 \cdot log_{10}\left(\frac{1}{MSE}\right)

After implementing the architecture above, I used the following 2 images to train my model.

Results

My model was ultimately able to re-create the original images!

Hyperparameter Tuning

Although my results were already good, I was curious about what would happen if I changed a few hyperparameters.

For my second set of hyperparameters, I used:

number of layers: 4

channel size: 512

max frequency (L): 10

learning rate: 1e-3

epochs=1000

The new model achieves a similar quality output image at the final epoch, but intermediary epochs appear to have worse results since the learning rate is lower.

Part 2: Fit a Neural Radiance Field from Multi-view Images

Part 2.1: Create Rays from Cameras

I implemented transform , pixel_to_camera, and pixel_to_ray . transform converts the camera coordinates to world coordinates by a matrix multiplication on the camera coordinates using the c2w matrix. pixel_to_camera uses a K matrix (created from the focal length) on the [u,v] coordinates multiplied by s (depth of the point along the optical axis). pixel_to_ray gets the R and t from the c2w matrix, gets camera coordinates, creates ray origin from matrix multiplying the inverse of R on t & negating it, and ray direction from subtracting the world coordinates with the prev ray origin and dividing by that norm.

Part 2.2: Sampling

To sample, I created sample_rays and sample_points_from_rays. I get rays_o and rays_d from pixel_to_ray . I get the pixel colors from indexing into the image with the [u,v] values. sample_points_from_rays samples points from each ray by sampling at specified intervals between near and far planes in the image.

Part 2.3: Putting the Dataloading All Together

I created NewRaysData as the dataset. During training, there are 100 images and I want to do 10000 rays per epoch, so I set num_rays_per_image to be 10000/100.

These outputs are from the viser GUI.

100 rays from a random selection of cameras

Part 2.4: Neural Radiance Field

I implemented the above architecture for the NeRF Model.

learning rate: 5e-4

epochs=1500

optimizer: Adam

My PSNR curve got above 23!

Part 2.5: Volume Rendering

This part was pretty complicated :’) I created a volrend function that properly visualizes the NeRF model outputs. With the sigmas and rgb values, we follow the above equation.

I also had to write methods to plot the volrend function. sample_full_image_rays and some other code for validation allows the volrend output to also be properly visualized.

The results are not perfect but still capture the overall structure and color of the Lego toy. I think with more time I could have debugged further and/or trained longer to achieve a more accurate output. Overall, I still learned a lot about NeRFs and enjoyed challenging myself with the project.

Bells & Whistles

I added some logic in the volrend function to allow a custom background color for the rendered image.

light blue background for rendered image