TurboSL: Dense, Accurate and Fast 3D by Neural Inverse Structured Light

Abstract

We show how to turn a noisy and fragile active triangulation technique—three-pattern structured light with a grayscale camera—into a fast and powerful tool for 3D capture: able to output sub-pixel accurate disparities at megapixel resolution, along with reflectance, normals, and a no-reference estimate of its own pixelwise 3D error. To achieve this, we formulate structured-light decoding as a neural inverse rendering problem. We show that despite having just three or four input images—all from the same viewpoint—this problem can be tractably solved by TurboSL, an algorithm that combines (1) a precise image formation model, (2) a signed distance field scene representation, and (3) projection pattern sequences optimized for accuracy instead of precision. We use TurboSL to reconstruct a variety of complex scenes from images captured at up to 60 fps with a camera and a common projector. Our experiments highlight TurboSL’s potential for dense and highly-accurate 3D acquisition from data captured in fractions of a second.

Video

TurboSL Decoder

Our decoder uses an efficient signed distance field representation with hashing based feature grids to model the geometry, and a precise image formation model to render structured light images. This model takes into account the cosine factor, the projector imperfect optics, the surface reflectance and residual contributions from ambient and indirect light. By minimizing the error between the rendered and the captured images, we optimize the unknown scene parameters. We also show that a reverse rendering pipeline can be used to predict the patterns. Therefore, we can further constrain the optimization, by minimizing the error between the predicted and the actual projection patterns.

Our decoder outperforms the state-of-the-art pixelwise decoding method for a variety of structured light patterns.

Dynamic Scene Reconstruction

Fast acquisition of TurboSL allows for dynamic scene reconstructions. We use a sliding window to reconstruct surfaces from triplets of structured light images captured at 30 frames per second.

BibTeX

@InProceedings{Mirdehghan_2024_CVPR,
    author    = {Mirdehghan, Parsa and Wu, Maxx and Chen, Wenzheng and Lindell, David B. and Kutulakos, Kiriakos N.},
    title     = {TurboSL: Dense Accurate and Fast 3D by Neural Inverse Structured Light},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {25067-25076}
}