https://arxiv.org/api/pQUWhkJr1vOEEW6ZHIjEXKxSgIM 2026-06-25T11:07:11Z 9383 1215 15 http://arxiv.org/abs/2510.03163v3 ROGR: Relightable 3D Objects using Generative Relighting 2025-12-03T23:29:50Z

We introduce ROGR, a novel approach that reconstructs a relightable 3D model of an object captured from multiple views, driven by a generative relighting model that simulates the effects of placing the object under novel environment illuminations. Our method samples the appearance of the object under multiple lighting environments, creating a dataset that is used to train a lighting-conditioned Neural Radiance Field (NeRF) that outputs the object's appearance under any input environmental lighting. The lighting-conditioned NeRF uses a novel dual-branch architecture to encode the general lighting effects and specularities separately. The optimized lighting-conditioned NeRF enables efficient feed-forward relighting under arbitrary environment maps without requiring per-illumination optimization or light transport simulation. We evaluate our approach on the established TensoIR and Stanford-ORB datasets, where it improves upon the state-of-the-art on most metrics, and showcase our approach on real-world object captures.

2025-10-03T16:35:22Z NeurIPS 2025 Spotlight. Project page: https://tangjiapeng.github.io/ROGR Jiapeng Tang Matthew Levine Dor Verbin Stephan J. Garbin Matthias Nießner Ricardo Martin Brualla Pratul P. Srinivasan Philipp Henzler http://arxiv.org/abs/2512.04076v1 Radiance Meshes for Volumetric Reconstruction 2025-12-03T18:57:03Z

We introduce radiance meshes, a technique for representing radiance fields with constant density tetrahedral cells produced with a Delaunay tetrahedralization. Unlike a Voronoi diagram, a Delaunay tetrahedralization yields simple triangles that are natively supported by existing hardware. As such, our model is able to perform exact and fast volume rendering using both rasterization and ray-tracing. We introduce a new rasterization method that achieves faster rendering speeds than all prior radiance field representations (assuming an equivalent number of primitives and resolution) across a variety of platforms. Optimizing the positions of Delaunay vertices introduces topological discontinuities (edge flips). To solve this, we use a Zip-NeRF-style backbone which allows us to express a smoothly varying field even when the topology changes. Our rendering method exactly evaluates the volume rendering equation and enables high quality, real-time view synthesis on standard consumer hardware. Our tetrahedral meshes also lend themselves to a variety of exciting applications including fisheye lens distortion, physics-based simulation, editing, and mesh extraction.

2025-12-03T18:57:03Z Website: half-potato.gitlab.io/rm Alexander Mai Trevor Hedstrom George Kopanas Janne Kontkanen Falko Kuester Jonathan T. Barron http://arxiv.org/abs/2512.03237v1 LLM-Guided Material Inference for 3D Point Clouds 2025-12-02T21:14:04Z

Most existing 3D shape datasets and models focus solely on geometry, overlooking the material properties that determine how objects appear. We introduce a two-stage large language model (LLM) based method for inferring material composition directly from 3D point clouds with coarse segmentations. Our key insight is to decouple reasoning about what an object is from what it is made of. In the first stage, an LLM predicts the object's semantic; in the second stage, it assigns plausible materials to each geometric segment, conditioned on the inferred semantics. Both stages operate in a zero-shot manner, without task-specific training. Because existing datasets lack reliable material annotations, we evaluate our method using an LLM-as-a-Judge implemented in DeepEval. Across 1,000 shapes from Fusion/ABS and ShapeNet, our method achieves high semantic and material plausibility. These results demonstrate that language models can serve as general-purpose priors for bridging geometric reasoning and material understanding in 3D data.

2025-12-02T21:14:04Z Nafiseh Izadyar Teseo Schneider http://arxiv.org/abs/2512.03013v1 In-Context Sync-LoRA for Portrait Video Editing 2025-12-02T18:40:35Z

Editing portrait videos is a challenging task that requires flexible yet precise control over a wide range of modifications, such as appearance changes, expression edits, or the addition of objects. The key difficulty lies in preserving the subject's original temporal behavior, demanding that every edited frame remains precisely synchronized with the corresponding source frame. We present Sync-LoRA, a method for editing portrait videos that achieves high-quality visual modifications while maintaining frame-accurate synchronization and identity consistency. Our approach uses an image-to-video diffusion model, where the edit is defined by modifying the first frame and then propagated to the entire sequence. To enable accurate synchronization, we train an in-context LoRA using paired videos that depict identical motion trajectories but differ in appearance. These pairs are automatically generated and curated through a synchronization-based filtering process that selects only the most temporally aligned examples for training. This training setup teaches the model to combine motion cues from the source video with the visual changes introduced in the edited first frame. Trained on a compact, highly curated set of synchronized human portraits, Sync-LoRA generalizes to unseen identities and diverse edits (e.g., modifying appearance, adding objects, or changing backgrounds), robustly handling variations in pose and expression. Our results demonstrate high visual fidelity and strong temporal coherence, achieving a robust balance between edit fidelity and precise motion preservation.

2025-12-02T18:40:35Z Project page: https://sagipolaczek.github.io/Sync-LoRA/ Sagi Polaczek Or Patashnik Ali Mahdavi-Amiri Daniel Cohen-Or http://arxiv.org/abs/2512.15719v1 A Fast Volumetric Capture and Reconstruction Pipeline for Dynamic Point Clouds and Gaussian Splats 2025-12-02T17:35:46Z

We present a fast and efficient volumetric capture and reconstruction system that processes either RGB-D or RGB-only input to generate 3D representations in the form of point clouds and Gaussian splats. For Gaussian splat reconstructions, we took the GPS-Gaussian regressor and improved it, enabling high-quality reconstructions with minimal overhead. The system is designed for easy setup and deployment, supporting in-the-wild operation under uncontrolled illumination and arbitrary backgrounds, as well as flexible camera configurations, including sparse setups, arbitrary camera numbers and baselines. Captured data can be exported in standard formats such as PLY, MPEG V-PCC, and SPLAT, and visualized through a web-based viewer or Unity/Unreal plugins. A live on-location preview of both input and reconstruction is available at 5-10 FPS. We present qualitative findings focused on deployability and targeted ablations. The complete framework is open-source, facilitating reproducibility and further research.

2025-12-02T17:35:46Z ACM SIGGRAPH European Conference on Visual Media Production (CVMP) 2025. Code available at: https://github.com/irc-hslu/capturestudio Athanasios Charisoudis Simone Croci Lam Kit Yung Pascal Frossard Aljosa Smolic 10.1145/3756863.3769713 http://arxiv.org/abs/2512.02781v1 LumiX: Structured and Coherent Text-to-Intrinsic Generation 2025-12-02T13:56:02Z

We present LumiX, a structured diffusion framework for coherent text-to-intrinsic generation. Conditioned on text prompts, LumiX jointly generates a comprehensive set of intrinsic maps (e.g., albedo, irradiance, normal, depth, and final color), providing a structured and physically consistent description of an underlying scene. This is enabled by two key contributions: 1) Query-Broadcast Attention, a mechanism that ensures structural consistency by sharing queries across all maps in each self-attention block. 2) Tensor LoRA, a tensor-based adaptation that parameter-efficiently models cross-map relations for efficient joint training. Together, these designs enable stable joint diffusion training and unified generation of multiple intrinsic properties. Experiments show that LumiX produces coherent and physically meaningful results, achieving 23% higher alignment and a better preference score (0.19 vs. -0.41) compared to the state of the art, and it can also perform image-conditioned intrinsic decomposition within the same framework.

2025-12-02T13:56:02Z The code will be available at https://github.com/xhanxu/LumiX Xu Han Biao Zhang Xiangjun Tang Xianzhi Li Peter Wonka http://arxiv.org/abs/2508.02443v2 PRIMU: Uncertainty Estimation for Novel Views in Gaussian Splatting from Primitive-Based Representations of Error and Coverage 2025-12-02T11:05:00Z

We introduce Primitive-based Representations of Uncertainty (PRIMU), a post-hoc uncertainty estimation (UE) framework for Gaussian Splatting (GS). Reliable UE is essential for deploying GS in safety-critical domains such as robotics and medicine. Existing approaches typically estimate Gaussian-primitive variances and rely on the rendering process to obtain pixel-wise uncertainties. In contrast, we construct primitive-level representations of error and visibility/coverage from training views, capturing interpretable uncertainty information. These representations are obtained by projecting view-dependent training errors and coverage statistics onto the primitives. Uncertainties for novel views are inferred by rendering these primitive-level representations, producing uncertainty feature maps, which are aggregate through pixel-wise regression on holdout data. We analyze combinations of uncertainty feature maps and regression models to understand how their interactions affect prediction accuracy and generalization. PRIMU also enables an effective active view selection strategy by directly leveraging these uncertainty feature maps. Additionally, we study the effect of separating splatting into foreground and background regions. Our estimates show strong correlations with true errors, outperforming state-of-the-art methods, especially for depth UE and foreground objects. Finally, our regression models show generalization capabilities to unseen scenes, enabling UE without additional holdout data.

2025-08-04T14:02:20Z Revised writing and figures; additional Gaussian Splatting experiments; added baselines and datasets; active view-selection experiments Thomas Gottwald Edgar Heinert Peter Stehr Chamuditha Jayanga Galappaththige Matthias Rottmann http://arxiv.org/abs/2512.02621v1 Content-Aware Texturing for Gaussian Splatting 2025-12-02T10:29:10Z

Gaussian Splatting has become the method of choice for 3D reconstruction and real-time rendering of captured real scenes. However, fine appearance details need to be represented as a large number of small Gaussian primitives, which can be wasteful when geometry and appearance exhibit different frequency characteristics. Inspired by the long tradition of texture mapping, we propose to use texture to represent detailed appearance where possible. Our main focus is to incorporate per-primitive texture maps that adapt to the scene in a principled manner during Gaussian Splatting optimization. We do this by proposing a new appearance representation for 2D Gaussian primitives with textures where the size of a texel is bounded by the image sampling frequency and adapted to the content of the input images. We achieve this by adaptively upscaling or downscaling the texture resolution during optimization. In addition, our approach enables control of the number of primitives during optimization based on texture resolution. We show that our approach performs favorably in image quality and total number of parameters used compared to alternative solutions for textured Gaussian primitives. Project page: https://repo-sam.inria.fr/nerphys/gs-texturing/

2025-12-02T10:29:10Z Project Page: https://repo-sam.inria.fr/nerphys/gs-texturing/ Eurographics Symposium on Rendering (Symposium Track), 2025 Panagiotis Papantonakis Georgios Kopanas Fredo Durand George Drettakis http://arxiv.org/abs/2504.04634v3 Walk Before You Dance: High-fidelity and Editable Dance Synthesis via Generative Masked Motion Prior 2025-12-02T08:42:28Z

Recent advances in dance generation have enabled the automatic synthesis of 3D dance motions. However, existing methods still face significant challenges in simultaneously achieving high realism, precise dance-music synchronization, diverse motion expression, and physical plausibility. To address these limitations, we propose a novel approach that leverages a generative masked text-to-motion model as a distribution prior to learn a probabilistic mapping from diverse guidance signals, including music, genre, and pose, into high-quality dance motion sequences. Our framework also supports semantic motion editing, such as motion inpainting and body part modification. Specifically, we introduce a multi-tower masked motion model that integrates a text-conditioned masked motion backbone with two parallel, modality-specific branches: a music-guidance tower and a pose-guidance tower. The model is trained using synchronized and progressive masked training, which allows effective infusion of the pretrained text-to-motion prior into the dance synthesis process while enabling each guidance branch to optimize independently through its own loss function, mitigating gradient interference. During inference, we introduce classifier-free logits guidance and pose-guided token optimization to strengthen the influence of music, genre, and pose signals. Extensive experiments demonstrate that our method sets a new state of the art in dance generation, significantly advancing both the quality and editability over existing approaches. Project Page available at https://foram-s1.github.io/DanceMosaic/

2025-04-06T22:05:37Z Foram N Shah Parshwa Shah Muhammad Usama Saleem Ekkasit Pinyoanuntapong Pu Wang Hongfei Xue Ahmed Helmy http://arxiv.org/abs/2512.02263v1 DepthScape: Authoring 2.5D Designs via Depth Estimation, Semantic Understanding, and Geometry Extraction 2025-12-01T23:12:30Z

2.5D effects, such as occlusion and perspective foreshortening, enhance visual dynamics and realism by incorporating 3D depth cues into 2D designs. However, creating such effects remains challenging and labor-intensive due to the complexity of depth perception. We introduce DepthScape, a human-AI collaborative system that facilitates 2.5D effect creation by directly placing design elements into 3D reconstructions. Using monocular depth reconstruction, DepthScape transforms images into 3D reconstructions where visual contents are placed to automatically achieve realistic occlusion and perspective foreshortening. To further simplify 3D placement through a 2D viewport, DepthScape uses a vision-language model to analyze source images and extract key visual components as content anchors for direct manipulation editing. We evaluate DepthScape with nine participants of varying design backgrounds, confirming the effectiveness of our creation pipeline. We also test on 100 professional stock images to assess robustness, and conduct an expert evaluation that confirms the quality of DepthScape's results.

2025-12-01T23:12:30Z Xia Su Cuong Nguyen Matheus A. Gadelha Jon E. Froehlich http://arxiv.org/abs/2508.10201v2 B-repLer: Language-guided Editing of CAD Models 2025-12-01T22:02:11Z

Computer-Aided Design (CAD) models, given their compactness and precision, remain the industry standard for designing and fabricating engineering objects. However, language-guided CAD editing is still in its infancy, largely due to missing semantic connection between user commands and underlying shape geometry, a problem exacerbated by the shortage of paired text-and-edit CAD datasets. While recent Multimodal Large Language Models (mLLMs) have attempted to bridge this gap, their reliance on CAD construction history -- often an expensive and hard to obtain input -- severely limits their expressiveness and restricts their usage. We present B-repLer, a novel framework that directly connects natural language with editing CAD models by operating in a learned latent space. Importantly, our approach bypasses the need for construction history, enabling semantic edits on a wide range of geometries, from simple prismatic parts to complex freeform shapes defined by B-Spline surfaces. To facilitate this research, we introduce BrepEDIT-240K, the first large-scale dataset for this task. We demonstrate how this paired dataset can be automatically generated, (user) validated, and scaled by leveraging existing CAD tools, in conjunction with mLLMs, to create the required paired data without relying on any external annotations. Our results demonstrate that B-repLer can accurately perform complex edits on complex CAD shapes, even when the input edit specifications are high-level and ambiguous to interpret, consistently producing valid, high-quality CAD outputs enabling a class of text-guided edits not previously possible.

2025-08-13T21:23:56Z Project page: https://yilinliu77.github.io/brepler.github.io/ Yilin Liu Niladri Shekhar Dutt Changjian Li Niloy J. Mitra http://arxiv.org/abs/2512.02143v1 CoatFusion: Controllable Material Coating in Images 2025-12-01T19:13:30Z

We introduce Material Coating, a novel image editing task that simulates applying a thin material layer onto an object while preserving its underlying coarse and fine geometry. Material coating is fundamentally different from existing "material transfer" methods, which are designed to replace an object's intrinsic material, often overwriting fine details. To address this new task, we construct a large-scale synthetic dataset (110K images) of 3D objects with varied, physically-based coatings, named DataCoat110K. We then propose CoatFusion, a novel architecture that enables this task by conditioning a diffusion model on both a 2D albedo texture and granular, PBR-style parametric controls, including roughness, metalness, transmission, and a key thickness parameter. Experiments and user studies show CoatFusion produces realistic, controllable coatings and significantly outperforms existing material editing and transfer methods on this new task.

2025-12-01T19:13:30Z Sagie Levy Elad Aharoni Matan Levy Ariel Shamir Dani Lischinski http://arxiv.org/abs/2411.12168v3 Sketch-guided Cage-based 3D Gaussian Splatting Deformation 2025-12-01T15:55:51Z

3D Gaussian Splatting (GS) is one of the most promising novel 3D representations that has received great interest in computer graphics and computer vision. While various systems have introduced editing capabilities for 3D GS, such as those guided by text prompts, fine-grained control over deformation remains an open challenge. In this work, we present a novel sketch-guided 3D GS deformation system that allows users to intuitively modify the geometry of a 3D GS model by drawing a silhouette sketch from a single viewpoint. Our approach introduces a new deformation method that combines cage-based deformations with a variant of Neural Jacobian Fields, enabling precise, fine-grained control. Additionally, it leverages large-scale 2D diffusion priors and ControlNet to ensure the generated deformations are semantically plausible. Through a series of experiments, we demonstrate the effectiveness of our method and showcase its ability to animate static 3D GS models as one of its key applications.

2024-11-19T02:18:19Z 10 pages, 9 figures, accepted at WACV 26, project page: https://tianhaoxie.github.io/project/gs_deform/ Tianhao Xie Noam Aigerman Eugene Belilovsky Tiberiu Popa http://arxiv.org/abs/2512.01648v1 Textured Word-As-Image illustration 2025-12-01T13:20:23Z

In this paper, we propose a novel fully automatic pipeline to generate text images that are legible and strongly aligned to the desired semantic concept taken from the users' inputs. In our method, users are able to put three inputs into the system, including a semantic concept, a word, and a letter. The semantic concept will be used to change the shape of the input letter and generate the texture based on the pre-defined prompt using stable diffusion models. Our pipeline maps the texture on a text image in a way that preserves the readability of the whole output while preserving legibility. The system also provides real-time adjustments for the user to change the scale of the texture and apply it to the text image. User evaluations demonstrate that our method effectively represents semantic meaning without compromising legibility, making it a robust and innovative tool for graphic design, logo creation, and artistic typography.

2025-12-01T13:20:23Z Mohammad Javadian Farzaneh Selim Balcisoy http://arxiv.org/abs/2512.01329v1 TagSplat: Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking 2025-12-01T06:41:54Z

Topology-consistent dynamic model sequences are essential for applications such as animation and model editing. However, existing 4D reconstruction methods face challenges in generating high-quality topology-consistent meshes. To address this, we propose a topology-aware dynamic reconstruction framework based on Gaussian Splatting. We introduce a Gaussian topological structure that explicitly encodes spatial connectivity. This structure enables topology-aware densification and pruning, preserving the manifold consistency of the Gaussian representation. Temporal regularization terms further ensure topological coherence over time, while differentiable mesh rasterization improves mesh quality. Experimental results demonstrate that our method reconstructs topology-consistent mesh sequences with significantly higher accuracy than existing approaches. Moreover, the resulting meshes enable precise 3D keypoint tracking. Project page: https://haza628.github.io/tagSplat/

2025-12-01T06:41:54Z Hanzhi Guo Dongdong Weng Mo Su Yixiao Chen Xiaonuo Dongye Chenyu Xu