Skip to main navigation Skip to search Skip to main content

Direct and Explicit 3D Generation from a Single Image

  • Haoyu Wu
  • , Meher Gitika Karumuri
  • , Chuhang Zou
  • , Seungbae Bang
  • , Yuelong Li
  • , Dimitris Samaras
  • , Sunil Hadap
  • Stony Brook University
  • Amazon.com, Inc.

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Current image-to-3D approaches suffer from high computational costs and lack scalability for high-resolution outputs. In contrast, we introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images along with 3D Gaussian features using a repurposed Stable Diffusion model. We introduce a depth branch into U-Net for efficient and high quality multi-view, cross-domain generation and incorporate epipolar attention into the latent-to-pixel decoder for pixel-level multi-view consistency. By back-projecting the generated depth pixels into 3D space, we create a structured 3D representation that can be either rendered via Gaussian splatting or extracted to high-quality meshes, thereby leveraging additional novel view synthesis loss to further improve our performance. Extensive experiments demonstrate that our method surpasses existing baselines in geometry and texture quality while achieving significantly faster generation time.

Original languageEnglish
Title of host publicationProceedings - 2025 International Conference on 3D Vision, 3DV 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages490-501
Number of pages12
ISBN (Electronic)9798331538514
DOIs
StatePublished - 2025
Event12th International Conference on 3D Vision, 3DV 2025 - Singapore, Singapore
Duration: Mar 25 2025Mar 28 2025

Publication series

NameProceedings - 2025 International Conference on 3D Vision, 3DV 2025

Conference

Conference12th International Conference on 3D Vision, 3DV 2025
Country/TerritorySingapore
CitySingapore
Period03/25/2503/28/25

Keywords

  • diffusion models
  • explicit 3d representation
  • image-to-3d

Fingerprint

Dive into the research topics of 'Direct and Explicit 3D Generation from a Single Image'. Together they form a unique fingerprint.

Cite this