Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images Using a View-Based Representation

2020·
Sai Rajeswar
,
Fahim Mannan
,
Florian Golemo
,
Jérôme Parent-Lévesque
,
David Vázquez
,
Derek Nowrouzezahrai
,
Aaron Courville
PDF
Abstract
The work presents an unsupervised method for inferring 3D scene information from single images. The approach comprises four components: an encoder that extracts latent 3D representations, a decoder generating 2.5D surfel-based reconstructions, a differentiable renderer synthesizing 2D images, and a critic network discriminating between generated and real images. Unlike voxel or mesh-based methods, this view-dependent approach scales with on-screen resolution. The authors demonstrate consistent learned representations enabling novel viewpoint synthesis and evaluate performance on ShapeNet and a custom 3D-IQTT benchmark for spatial reasoning tasks.
Type
Publication
International Journal of Computer Vision (IJCV)