Skip to main content

A Review of Neural Radiance Fields (NeRF)

· 3 min read

NeRF introduces a novel way to represent 3D scenes using neural networks, enabling highly realistic image rendering from novel viewpoints. It outperforms traditional methods in rendering quality but requires significant computational resources.

Introduction

Neural Radiance Fields (NeRF) represent a breakthrough in 3D scene reconstruction and novel view synthesis. By leveraging neural networks, NeRF can generate high-fidelity images from sparse input views, a significant advancement over traditional techniques.

Key Idea

The core idea of NeRF is to represent a scene as a continuous volumetric function, which maps 3D coordinates and viewing directions to RGB color and volume density (opaqueness). This allows for the synthesis of novel views by integrating this function along camera rays.

Methodology

Overview

The network takes as input a 5D coordinate (3D spatial location + 2D viewing direction) and outputs an RGB color and volume density.

  • NeRF does not consider lighting or reflecting. It thinks that each position(space) radiates the RGB light. Fully connected neural network is used to learn the mapping between the position and its corresponding radiance(light).
  • Nearby light contributes more for the pixel value and occluded light contributes less or not. This intuition is reflected in the volume rendering formula.
  • Volume density represents opaqueness or occupancy of the space. Therefore, it only depends on the position x\mathbf{x}. Density is in charge of occlusion.
  • Although NeRF does not cosider lighting effect explicitly, it considers the color change from view direction change. This implicitly considers all the complex light interactions for the realistic rendering. Thus, the RGB color depends on both position x\mathbf{x} and view direction d\mathbf{d}.

Mathematical Formulation

  • The scene function Fθ:(x,d)(c,σ)F_\theta: (\mathbf{x}, \mathbf{d}) \to (\mathbf{c}, \sigma):
    This function will be decomposed into two separate MLP f,gf, g.
    hidden,σ=f(x)c=g(hidden,d)\begin{aligned} hidden, \sigma &= f(\mathbf{x}) \\ \mathbf{c} &= g(hidden, \mathbf{d}) \end{aligned}
  • The rendering equation:
    C(r)=tntfT(t)σ(r(t))c(r(t),d)dtwhereT(t)=exp(tntσ(r(s))ds)C(\mathbf{r}) = \int_{t_n}^{t_f} T(t) \sigma\Big(\mathbf{r}(t)\Big) \mathbf{c}\Big(\mathbf{r}(t), \mathbf{d}\Big) dt \quad where \quad T(t) = \exp \left( - \int_{t_n}^{t} \sigma(\mathbf{r}(s)) ds \right)
    This continuous version cannot be applied for the numerical computation. Therefore, we adopt a discrete version. δi=ti+1ti\delta_i = t_{i+1} - t_i means distance between two consecutive sampled points.
    C^(r)=i=1NTi(1exp(σiδi))ciwhereTi=exp(j=1i1σjδj)\hat{C}(\mathbf{r}) = \sum^N_{i=1} T_i \Big(1-\exp(-\sigma_i \delta_i)\Big) \mathbf{c}_i \quad where \quad T_i = \exp \left( - \sum^{i-1}_{j=1} \sigma_j \delta_j \right)

Additional Details

  1. Considering low-frequency bias of the neural network, NeRF uses sinusoidal positional encoding.
  2. To enhance the sampling efficiency, NeRF adopted hierarchical sampling with coarse-to-fine structure.

Analysis

Strengths

  • High-quality rendering of complex scenes
  • Novel approach combining volumetric scene representation with neural networks
  • Significant improvements over previous methods in visual quality

Drawbacks

  • High computational cost and long training times
  • Requires a large number of input views for best performance
  • Limited to static scenes without dynamic elements

Future Work

Considering NeRF's drawbacks, there would be three possible extensions.

  • reducing computational requirements
  • extending NeRF to dynamic scenes
  • improving performance with fewer input views
  • enhancing rendering quality considering sophisticated light models

References

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., & Ng, R. (2020). NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.