TY - GEN
T1 - Direct and Explicit 3D Generation from a Single Image
AU - Wu, Haoyu
AU - Karumuri, Meher Gitika
AU - Zou, Chuhang
AU - Bang, Seungbae
AU - Li, Yuelong
AU - Samaras, Dimitris
AU - Hadap, Sunil
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Current image-to-3D approaches suffer from high computational costs and lack scalability for high-resolution outputs. In contrast, we introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images along with 3D Gaussian features using a repurposed Stable Diffusion model. We introduce a depth branch into U-Net for efficient and high quality multi-view, cross-domain generation and incorporate epipolar attention into the latent-to-pixel decoder for pixel-level multi-view consistency. By back-projecting the generated depth pixels into 3D space, we create a structured 3D representation that can be either rendered via Gaussian splatting or extracted to high-quality meshes, thereby leveraging additional novel view synthesis loss to further improve our performance. Extensive experiments demonstrate that our method surpasses existing baselines in geometry and texture quality while achieving significantly faster generation time.
AB - Current image-to-3D approaches suffer from high computational costs and lack scalability for high-resolution outputs. In contrast, we introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images along with 3D Gaussian features using a repurposed Stable Diffusion model. We introduce a depth branch into U-Net for efficient and high quality multi-view, cross-domain generation and incorporate epipolar attention into the latent-to-pixel decoder for pixel-level multi-view consistency. By back-projecting the generated depth pixels into 3D space, we create a structured 3D representation that can be either rendered via Gaussian splatting or extracted to high-quality meshes, thereby leveraging additional novel view synthesis loss to further improve our performance. Extensive experiments demonstrate that our method surpasses existing baselines in geometry and texture quality while achieving significantly faster generation time.
KW - diffusion models
KW - explicit 3d representation
KW - image-to-3d
UR - https://www.scopus.com/pages/publications/105016266900
U2 - 10.1109/3DV66043.2025.00050
DO - 10.1109/3DV66043.2025.00050
M3 - Conference contribution
AN - SCOPUS:105016266900
T3 - Proceedings - 2025 International Conference on 3D Vision, 3DV 2025
SP - 490
EP - 501
BT - Proceedings - 2025 International Conference on 3D Vision, 3DV 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th International Conference on 3D Vision, 3DV 2025
Y2 - 25 March 2025 through 28 March 2025
ER -