A Review of Neural Radiance Fields (NeRF)

NeRF introduces a novel way to represent 3D scenes using neural networks, enabling highly realistic image rendering from novel viewpoints. It outperforms traditional methods in rendering quality but requires significant computational resources.

Introduction

Neural Radiance Fields (NeRF) represent a breakthrough in 3D scene reconstruction and novel view synthesis. By leveraging neural networks, NeRF can generate high-fidelity images from sparse input views, a significant advancement over traditional techniques.

Key Idea

The core idea of NeRF is to represent a scene as a continuous volumetric function, which maps 3D coordinates and viewing directions to RGB color and volume density (opaqueness). This allows for the synthesis of novel views by integrating this function along camera rays.

Methodology

Overview

The network takes as input a 5D coordinate (3D spatial location + 2D viewing direction) and outputs an RGB color and volume density.

NeRF does not consider lighting or reflecting. It thinks that each position(space) radiates the RGB light. Fully connected neural network is used to learn the mapping between the position and its corresponding radiance(light).
Nearby light contributes more for the pixel value and occluded light contributes less or not. This intuition is reflected in the volume rendering formula.
Volume density represents opaqueness or occupancy of the space. Therefore, it only depends on the position $\mathbf{x}$ . Density is in charge of occlusion.
Although NeRF does not cosider lighting effect explicitly, it considers the color change from view direction change. This implicitly considers all the complex light interactions for the realistic rendering. Thus, the RGB color depends on both position $\mathbf{x}$ and view direction $\mathbf{d}$ .

Mathematical Formulation

The scene function $F_\theta: (\mathbf{x}, \mathbf{d}) \to (\mathbf{c}, \sigma)$ :
This function will be decomposed into two separate MLP $f, g$ . $\begin{aligned} hidden, \sigma &= f(\mathbf{x}) \\ \mathbf{c} &= g(hidden, \mathbf{d}) \end{aligned}$
The rendering equation: $C(\mathbf{r}) = \int_{t_n}^{t_f} T(t) \sigma\Big(\mathbf{r}(t)\Big) \mathbf{c}\Big(\mathbf{r}(t), \mathbf{d}\Big) dt \quad where \quad T(t) = \exp \left( - \int_{t_n}^{t} \sigma(\mathbf{r}(s)) ds \right)$ This continuous version cannot be applied for the numerical computation. Therefore, we adopt a discrete version. $\delta_i = t_{i+1} - t_i$ means distance between two consecutive sampled points. $\hat{C}(\mathbf{r}) = \sum^N_{i=1} T_i \Big(1-\exp(-\sigma_i \delta_i)\Big) \mathbf{c}_i \quad where \quad T_i = \exp \left( - \sum^{i-1}_{j=1} \sigma_j \delta_j \right)$

Additional Details

Considering low-frequency bias of the neural network, NeRF uses sinusoidal positional encoding.
To enhance the sampling efficiency, NeRF adopted hierarchical sampling with coarse-to-fine structure.

Analysis

Strengths

High-quality rendering of complex scenes
Novel approach combining volumetric scene representation with neural networks
Significant improvements over previous methods in visual quality

Drawbacks

High computational cost and long training times
Requires a large number of input views for best performance
Limited to static scenes without dynamic elements

Future Work

Considering NeRF's drawbacks, there would be three possible extensions.

reducing computational requirements
extending NeRF to dynamic scenes
improving performance with fewer input views
enhancing rendering quality considering sophisticated light models

References

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., & Ng, R. (2020). NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.

Introduction​

Key Idea​

Methodology​

Overview​

Mathematical Formulation​

Additional Details​

Analysis​

Future Work​

References​