Joint-TensoRF:
Improving Robustness for Joint Optimization of Camera Poses
and Decomposed Low-Rank Tensorial Radiance Fields
AAAI 2024

National Yang Ming Chiao Tung University

Abstract

In this paper, we propose an algorithm that allows joint refinement of camera pose and scene geometry represented by decomposed low-rank tensor, using only 2D images as supervision. First, we conduct a pilot study based on a 1D signal and relate our findings to 3D scenarios, where the naive joint pose optimization on voxel-based NeRFs can easily lead to sub-optimal solutions. Moreover, based on the analysis of the frequency spectrum, we propose to apply convolutional Gaussian filters on 2D and 3D radiance fields for a coarse-to-fine training schedule that enables joint camera pose optimization. Leveraging the decomposition property in decomposed low-rank tensor, our method achieves an equivalent effect to brute-force 3D convolution with only incurring little computational overhead. To further improve the robustness and stability of joint optimization, we also propose techniques of smoothed 2D supervision, randomly scaled kernel parameters, and edge-guided loss mask. Extensive quantitative and qualitative evaluations demonstrate that our proposed framework achieves superior performance in novel view synthesis as well as rapid convergence for optimization.


Robust joint pose refinement on decomposed tensor

Robust joint pose refinement on decomposed tensor. Our method enables joint optimization of camera poses and decomposed voxel representation by applying efficient separable component-wise convolution of Gaussian filters on 3D tensor volume and 2D supervision images.



Comparison of naive joint pose optimization and our proposed method on voxel-based NeRFs

(a) Naively applying joint optimization on voxel-based NeRFs leads to dramatic failure as premature high-frequency signals in the voxel volume would curse the camera poses to stuck in local minima. (b) We propose a computationally effective manner to directly control the spectrum of the radiance field by performing separable component-wise convolution of Gaussian filters on the decomposed tensor. The proposed training scheme allows the joint optimization to converge successfully to a better solution.



Qualitative comparisons of the 2D image patch alignment

2D TensoRF + 2D Gaussian successfully registers accurate warping parameters, verifying the analysis of Gaussian filtering on joint optimization.




Visual comparisons of novel view synthesis



Ours (left) vs BARF (right). Try selecting different scenes!

Blender/Ficus Blender/Lego LLFF/fern LLFF/horns


PSNR and training iterations comparison



Citation

Acknowledgements

This work is supported by National Science and Technology Council (NSTC) 111-2628-E-A49-018-MY4, 112-2221-E-A49-087-MY3, 112-2222-E-A49-004-MY2, and Higher Education Sprout Project of the National Yang Ming Chiao Tung University, as well as the Ministry of Education (MoE), Taiwan. In particular, Yu-Lun Liu acknowledges the Yushan Young Fellow Program by the MoE in Taiwan.

The website template was borrowed from Michaël Gharbi and Ref-NeRF.