Abstract
Neural radiance fields (NeRF) is a promising approach for generating
photorealistic images and representing complex scenes. However, when processing
data sequentially, it can suffer from catastrophic forgetting, where previous
data is easily forgotten after training with new data. Existing incremental
learning methods using knowledge distillation assume that continuous data
chunks contain both 2D images and corresponding camera pose parameters,
pre-estimated from the complete dataset. This poses a paradox as the necessary
camera pose must be estimated from the entire dataset, even though the data
arrives sequentially and future chunks are inaccessible. In contrast, we focus
on a practical scenario where camera poses are unknown. We propose IL-NeRF, a
novel framework for incremental NeRF training, to address this challenge.
IL-NeRF's key idea lies in selecting a set of past camera poses as references
to initialize and align the camera poses of incoming image data. This is
followed by a joint optimization of camera poses and replay-based NeRF
distillation. Our experiments on real-world indoor and outdoor scenes show that
IL-NeRF handles incremental NeRF training and outperforms the baselines by up
to $54.04\%$ in rendering quality.