SceneConductor

3D Scene Generation from a Single Image with Multi-Agent Orchestration

1Nanyang Technological University  ·  2University of Oxford  ·  3Meshy AI
Scroll

TL;DR Our framework performs single-image 3D scene generation into three structured stages — scene initialization, environment construction, and multi-agent refinement — and introduce a geometry-aware layout predictor trained with sparse point-map priors, achieving consistent gains in geometric accuracy and perceptual realism.

Pipeline Overview

SceneConductor pipeline overview
(a) Scene Initialization
Per-object segmentation masks are lifted to object-level 3D representations, and a geometry-aware layout predictor places them into a coarse initial scene — no scene-level layout annotations required.
(b) Environment Construction
Point-map geometry is fused with the initialization to build the environmental scaffold: floors, walls, materials, and illumination are reconstructed and grounded around the placed objects.
(c) Multi-Agent Refinement
A planner agent detects structural and visual inconsistencies, applies simple corrections itself, and dispatches specialist agents for complex localized edits that are reintegrated into the global scene.

Results

Pick a scene from the gallery below, then toggle through Stage 1 → 2 → 3 to see how SceneConductor reconstructs a complete 3D environment from a single image. All three stages are interactive

Input Image
Selected input image
Current View · Stage 1
Current stage preview
Loading Stage 1...
Loading Stage 2...
Loading Stage 3...
  Example scenes

Geometry-Aware Layout Predictor

Supervised by sparse geometric priors derived from point maps, enabling training from segmentation-level data without scene-level layout annotations.

Geometry-aware layout predictor

Qualitative Comparison

Same input photo reconstructed by prior methods vs. our Geometry-aware Layout Predictor. Ours is interactive

  Example scenes

BibTeX

@article{kim2026sceneconductor,
  title   = {SceneConductor: 3D Scene Generation from a Single Image with Multi-Agent Orchestration},
  author  = {Kim, Jeonghwan and Lan, Yushi and Chen, Yongwei and Nguyen, Hieu Trung and Pan, Chuanyu and Pan, Xingang},
  journal = {arXiv preprint arXiv:2606.08402},
  year    = {2026}
}