Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images Using a View-Based Representation

2020·

Sai Rajeswar

Fahim Mannan

Florian Golemo

Jérôme Parent-Lévesque

David Vázquez

Derek Nowrouzezahrai

Aaron Courville

PDF

Abstract

The work presents an unsupervised method for inferring 3D scene information from single images. The approach comprises four components: an encoder that extracts latent 3D representations, a decoder generating 2.5D surfel-based reconstructions, a differentiable renderer synthesizing 2D images, and a critic network discriminating between generated and real images. Unlike voxel or mesh-based methods, this view-dependent approach scales with on-screen resolution. The authors demonstrate consistent learned representations enabling novel viewpoint synthesis and evaluate performance on ShapeNet and a custom 3D-IQTT benchmark for spatial reasoning tasks.

Type

Journal article

Publication

International Journal of Computer Vision (IJCV)

Last updated on Apr 15, 2026

← Online Fast Adaptation and Knowledge Accumulation: A New Approach to Continual Learning Jan 1, 2020

Proposal-Based Instance Segmentation with Point Supervision Jan 1, 2020 →

No results found

Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images Using a View-Based Representation