https://arxiv.org/api/I6Gv1GHleaK2TY/i6t62VJB9UQ8 2026-06-18T02:41:47Z 9349 855 15 http://arxiv.org/abs/2412.11762v3 GS-ProCams: Gaussian Splatting-based Projector-Camera Systems 2026-02-17T09:23:20Z We present GS-ProCams, the first Gaussian Splatting-based framework for projector-camera systems (ProCams). GS-ProCams is not only view-agnostic but also significantly enhances the efficiency of projection mapping (PM) that requires establishing geometric and radiometric mappings between the projector and the camera. Previous CNN-based ProCams are constrained to a specific viewpoint, limiting their applicability to novel perspectives. In contrast, NeRF-based ProCams support view-agnostic projection mapping, however, they require an additional co-located light source and demand significant computational and memory resources. To address this issue, we propose GS-ProCams that employs 2D Gaussian for scene representations, and enables efficient view-agnostic ProCams applications. In particular, we explicitly model the complex geometric and photometric mappings of ProCams using projector responses, the projection surface's geometry and materials represented by Gaussians, and the global illumination component. Then, we employ differentiable physically-based rendering to jointly estimate them from captured multi-view projections. Compared to state-of-the-art NeRF-based methods, our GS-ProCams eliminates the need for additional devices, achieving superior ProCams simulation quality. It also uses only 1/10 of the GPU memory for training and is 900 times faster in inference speed. Please refer to our project page for the code and dataset: https://realqingyue.github.io/GS-ProCams/. 2024-12-16T13:26:52Z This version includes updated experimental results after an implementation fix Qingyue Deng Jijiang Li Haibin Ling Bingyao Huang 10.1109/TVCG.2025.3616841 http://arxiv.org/abs/2507.14841v2 Towards Geometric and Textural Consistency 3D Scene Generation via Single Image-guided Model Generation and Layout Optimization 2026-02-17T07:45:24Z In recent years, 3D generation has made great strides in both academia and industry. However, generating 3D scenes from a single RGB image remains a significant challenge, as current approaches often struggle to ensure both object generation quality and scene coherence in multi-object scenarios. To overcome these limitations, we propose a novel three-stage framework for 3D scene generation with explicit geometric representations and high-quality textural details via single image-guided model generation and spatial layout optimization. Our method begins with an image instance segmentation and inpainting phase, which recovers missing details of occluded objects in the input images, thereby achieving complete generation of foreground 3D assets. Subsequently, our approach captures the spatial geometry of reference image by constructing pseudo-stereo viewpoint for camera parameter estimation and scene depth inference, while employing a model selection strategy to ensure optimal alignment between the 3D assets generated in the previous step and the input. Finally, through model parameterization and minimization of the Chamfer distance between point clouds in 3D and 2D space, our approach optimizes layout parameters to produce an explicit 3D scene representation that maintains precise alignment with input guidance image. Extensive experiments on multi-object scene image sets have demonstrated that our approach not only outperforms state-of-the-art methods in terms of geometric accuracy and texture fidelity of individual generated 3D models, but also has significant advantages in scene layout synthesis. 2025-07-20T06:59:42Z 14 pages, 9 figures, Project page: https://xdlbw.github.io/sing3d/ Xiang Tang Ruotong Li Xiaopeng Fan http://arxiv.org/abs/2501.12369v3 DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial Basis Functions 2026-02-17T06:40:41Z Splatting-based 3D reconstruction methods have gained popularity with the advent of 3D Gaussian Splatting, efficiently synthesizing high-quality novel views. These methods commonly resort to using exponential family functions, such as the Gaussian function, as reconstruction kernels due to their anisotropic nature, ease of projection, and differentiability in rasterization. However, the field remains restricted to variations within the exponential family, leaving generalized reconstruction kernels largely underexplored, partly due to the lack of easy integrability in 3D to 2D projections. In this light, we show that a class of decaying anisotropic radial basis functions (DARBFs), which are non-negative functions of the Mahalanobis distance, supports splatting by approximating the Gaussian function's closed-form integration advantage. With this fresh perspective, we demonstrate varying performances across selected DARB reconstruction kernels, achieving comparable training convergence and memory footprints, with on-par PSNR, SSIM, and LPIPS results. 2025-01-21T18:49:06Z Link to the project page: https://github.com/viruthshaan/darb-splatting/ Hashiru Pramuditha University of Moratuwa Vinasirajan Viruthshaan University of Moratuwa Vishagar Arunan University of Moratuwa Saeedha Nazar University of Moratuwa Sameera Ramasinghe University of Adelaide Simon Lucey University of Adelaide Ranga Rodrigo University of Moratuwa http://arxiv.org/abs/2506.14714v2 SkinCells: Sparse Skinning using Voronoi Cells 2026-02-17T05:37:49Z For decades, real-time skinning has been the cornerstone of character animation in visual effects and games. Despite its importance, the creation of animatable digital assets remains a labor-intensive manual process. Existing automated tools frequently struggle with intricate geometries, often necessitating significant manual refinement to reach production standards. We present a robust, fully automated method for generating high-quality skinning weights from a standard mesh and skeleton in a canonical A- or T-pose. Unlike traditional approaches, our framework offers direct sparsity controls to limit bone influences per vertex -- a critical requirement for maintaining performance in large-scale mobile environments. Furthermore, we address the challenge of Level-of-Detail (LoD) management by optimizing weights within a continuous spatial volume rather than on discrete vertices. This allows a single optimization pass to be applied seamlessly across multiple asset resolutions and variations. Central to our approach is a novel parameterized family of functions, we call SkinCells. We demonstrate that our method consistently produces stable, high-quality results even in complex scenarios where standard biharmonic weight computations fail. 2025-06-17T16:51:36Z Egor Larionov Igor Santesteban Hsiao-yu Chen Gene Lin Philipp Herholz Ryan Goldade Ladislav Kavan Doug Roble Tuur Stuyck http://arxiv.org/abs/2509.23607v2 ZeroScene: A Zero-Shot Framework for 3D Scene Generation from a Single Image and Controllable Texture Editing 2026-02-17T05:27:01Z In the field of 3D content generation, single image scene reconstruction methods still struggle to simultaneously ensure the quality of individual assets and the coherence of the overall scene in complex environments, while texture editing techniques often fail to maintain both local continuity and multi-view consistency. In this paper, we propose a novel system ZeroScene, which leverages the prior knowledge of large vision models to accomplish both single image-to-3D scene reconstruction and texture editing in a zero-shot manner. ZeroScene extracts object-level 2D segmentation and depth information from input images to infer spatial relationships within the scene. It then jointly optimizes 3D and 2D projection losses of the point cloud to update object poses for precise scene alignment, ultimately constructing a coherent and complete 3D scene that encompasses both foreground and background. Moreover, ZeroScene supports texture editing of objects in the scene. By imposing constraints on the diffusion model and introducing a mask-guided progressive image generation strategy, we effectively maintain texture consistency across multiple viewpoints and further enhance the realism of rendered results through Physically Based Rendering (PBR) material estimation. Experimental results demonstrate that our framework not only ensures the geometric and appearance accuracy of generated assets, but also faithfully reconstructs scene layouts and produces highly detailed textures that closely align with text prompts. 2025-09-28T03:21:12Z 16 pages, 15 figures, Eurographics 2026, Project page: https://xdlbw.github.io/ZeroScene/ Xiang Tang Ruotong Li Xiaopeng Fan http://arxiv.org/abs/2506.03407v2 Multi-Spectral Gaussian Splatting with Neural Color Representation 2026-02-16T08:39:04Z We present MS-Splatting -- a multi-spectral 3D Gaussian Splatting (3DGS) framework that is able to generate multi-view consistent novel views from images of multiple, independent cameras with different spectral domains. In contrast to previous approaches, our method does not require cross-modal camera calibration and is versatile enough to model a variety of different spectra, including thermal and near-infra red, without any algorithmic changes. Unlike existing 3DGS-based frameworks that treat each modality separately (by optimizing per-channel spherical harmonics) and therefore fail to exploit the underlying spectral and spatial correlations, our method leverages a novel neural color representation that encodes multi-spectral information into a learned, compact, per-splat feature embedding. A shallow multi-layer perceptron (MLP) then decodes this embedding to obtain spectral color values, enabling joint learning of all bands within a unified representation. Our experiments show that this simple yet effective strategy is able to improve multi-spectral rendering quality, while also leading to improved per-spectra rendering quality over state-of-the-art methods. We demonstrate the effectiveness of this new technique in agricultural applications to render vegetation indices, such as normalized difference vegetation index (NDVI). 2025-06-03T21:36:50Z for project page, see https://meyerls.github.io/ms_splatting Lukas Meyer Josef Grün Maximilian Weiherer Bernhard Egger Marc Stamminger Linus Franke http://arxiv.org/abs/2509.07175v2 Efficient Computation of Voronoi Diagrams Using Point-in-Cell Tests 2026-02-16T06:53:07Z Since the Voronoi diagram appears in many applications, the topic of improving its computational efficiency remains attractive. We propose a novel yet efficient method to compute Voronoi diagrams bounded by a given domain, i.e., the clipped or restricted Voronoi diagrams. The intersection of the domain and a Voronoi cell (domain-cell intersection) is generated by removing the part outside the cell from the domain, which can be accomplished by several clippings. Different from the existing methods, we present an edge-based search scheme to find clipping planes (bisectors). A test called point-in-cell is first set up to tell whether a space point is in a target Voronoi cell or not. Then, for each edge of the intermediate domain-cell intersection, we will launch a clipping only if its two endpoints are respectively inside and outside the corresponding Voronoi cell, where the bisector for the clipping can be found by using a few times of point-in-cell tests. Therefore, our method only involves the clippings that contribute to the final results, which is a great advantage over the state-of-the-art methods. Additionally, because each domain-cell intersection can be generated independently, we extend the proposed method to the GPUs for computing Voronoi diagrams in parallel. The experimental results show the best performance of our method compared to state-of-the-art ones, regardless of site distribution. This paper was first submitted to SIGGRAPH Asia 2025. 2025-09-08T19:48:28Z Yanyang Xiao Juan Cao Zhonggui Chen http://arxiv.org/abs/2602.14493v1 Gaussian Mesh Renderer for Lightweight Differentiable Rendering 2026-02-16T06:15:42Z 3D Gaussian Splatting (3DGS) has enabled high-fidelity virtualization with fast rendering and optimization for novel view synthesis. On the other hand, triangle mesh models still remain a popular choice for surface reconstruction but suffer from slow or heavy optimization in traditional mesh-based differentiable renderers. To address this problem, we propose a new lightweight differentiable mesh renderer leveraging the efficient rasterization process of 3DGS, named Gaussian Mesh Renderer (GMR), which tightly integrates the Gaussian and mesh representations. Each Gaussian primitive is analytically derived from the corresponding mesh triangle, preserving structural fidelity and enabling the gradient flow. Compared to the traditional mesh renderers, our method achieves smoother gradients, which especially contributes to better optimization using smaller batch sizes with limited memory. Our implementation is available in the public GitHub repository at https://github.com/huntorochi/Gaussian-Mesh-Renderer. 2026-02-16T06:15:42Z IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026). GitHub: https://github.com/huntorochi/Gaussian-Mesh-Renderer Xinpeng Liu Fumio Okura http://arxiv.org/abs/2602.14048v1 ProAct: A Dual-System Framework for Proactive Embodied Social Agents 2026-02-15T08:27:34Z Embodied social agents have recently advanced in generating synchronized speech and gestures. However, most interactive systems remain fundamentally reactive, responding only to current sensory inputs within a short temporal window. Proactive social behavior, in contrast, requires deliberation over accumulated context and intent inference, which conflicts with the strict latency budget of real-time interaction. We present \emph{ProAct}, a dual-system framework that reconciles this time-scale conflict by decoupling a low-latency \emph{Behavioral System} for streaming multimodal interaction from a slower \emph{Cognitive System} which performs long-horizon social reasoning and produces high-level proactive intentions. To translate deliberative intentions into continuous non-verbal behaviors without disrupting fluency, we introduce a streaming flow-matching model conditioned on intentions via ControlNet. This mechanism supports asynchronous intention injection, enabling seamless transitions between reactive and proactive gestures within a single motion stream. We deploy ProAct on a physical humanoid robot and evaluate both motion quality and interactive effectiveness. In real-world interaction user studies, participants and observers consistently prefer ProAct over reactive variants in perceived proactivity, social presence, and overall engagement, demonstrating the benefits of dual-system proactive control for embodied social interaction. 2026-02-15T08:27:34Z Project Page: https://proactrobot.github.io/ Zeyi Zhang Zixi Kang Ruijie Zhao Yusen Feng Biao Jiang Libin Liu http://arxiv.org/abs/2602.12157v2 TexSpot: 3D Texture Enhancement with Spatially-uniform Point Latent Representation 2026-02-14T15:54:01Z High-quality 3D texture generation remains a fundamental challenge due to the view-inconsistency inherent in current mainstream multi-view diffusion pipelines. Existing representations either rely on UV maps, which suffer from distortion during unwrapping, or point-based methods, which tightly couple texture fidelity to geometric density that limits high-resolution texture generation. To address these limitations, we introduce TexSpot, a diffusion-based texture enhancement framework. At its core is Texlet, a novel 3D texture representation that merges the geometric expressiveness of point-based 3D textures with the compactness of UV-based representation. Each Texlet latent vector encodes a local texture patch via a 2D encoder and is further aggregated using a 3D encoder to incorporate global shape context. A cascaded 3D-to-2D decoder reconstructs high-quality texture patches, enabling the Texlet space learning. Leveraging this representation, we train a diffusion transformer conditioned on Texlets to refine and enhance textures produced by multi-view diffusion methods. Extensive experiments demonstrate that TexSpot significantly improves visual fidelity, geometric consistency, and robustness over existing state-of-the-art 3D texture generation and enhancement approaches. Project page: https://texlet-arch.github.io/TexSpot-page. 2026-02-12T16:37:31Z Project page: https://texlet-arch.github.io/TexSpot-page Ziteng Lu Yushuang Wu Chongjie Ye Yuda Qiu Jing Shao Xiaoyang Guo Jiaqing Zhou Tianlei Hu Kun Zhou Xiaoguang Han http://arxiv.org/abs/2602.13185v1 FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control 2026-02-13T18:52:11Z Effective and generalizable control in video generation remains a significant challenge. While many methods rely on ambiguous or task-specific signals, we argue that a fundamental disentanglement of "appearance" and "motion" provides a more robust and scalable pathway. We propose FlexAM, a unified framework built upon a novel 3D control signal. This signal represents video dynamics as a point cloud, introducing three key enhancements: multi-frequency positional encoding to distinguish fine-grained motion, depth-aware positional encoding, and a flexible control signal for balancing precision and generative quality. This representation allows FlexAM to effectively disentangle appearance and motion, enabling a wide range of tasks including I2V/V2V editing, camera control, and spatial object editing. Extensive experiments demonstrate that FlexAM achieves superior performance across all evaluated tasks. 2026-02-13T18:52:11Z Codes: https://github.com/IGL-HKUST/FlexAM Mingzhi Sheng Zekai Gu Peng Li Cheng Lin Hao-Xiang Guo Ying-Cong Chen Yuan Liu http://arxiv.org/abs/2602.12949v1 Real-time Rendering with a Neural Irradiance Volume 2026-02-13T14:15:46Z Rendering diffuse global illumination in real-time is often approximated by pre-computing and storing irradiance in a 3D grid of probes. As long as most of the scene remains static, probes approximate irradiance for all surfaces immersed in the irradiance volume, including novel dynamic objects. This approach, however, suffers from aliasing artifacts and high memory consumption. We propose Neural Irradiance Volume (NIV), a neural-based technique that allows accurate real-time rendering of diffuse global illumination via a compact pre-computed model, overcoming the limitations of traditional probe-based methods, such as the expensive memory footprint, aliasing artifacts, and scene-specific heuristics. The key insight is that neural compression creates an adaptive and amortized representation of irradiance, circumventing the cubic scaling of grid-based methods. Our superior memory-scaling improves quality by at least 10x at the same memory budget, and enables a straightforward representation of higher-dimensional irradiance fields, allowing rendering of time-varying or dynamic effects without requiring additional computation at runtime. Unlike other neural rendering techniques, our method works within strict real-time constraints, providing fast inference (around 1 ms per frame on consumer GPUs at full HD resolution), reduced memory usage (1-5 MB for medium-sized scenes), and only requires a G-buffer as input, without expensive ray tracing or denoising. 2026-02-13T14:15:46Z Accepted at Eurographics 2026 Arno Coomans Giacomo Nazzaro Edoardo A. Dominici Christian Döring Floor Verhoeven Konstantinos Vardis Markus Steinberger http://arxiv.org/abs/2602.12796v1 GSM-GS: Geometry-Constrained Single and Multi-view Gaussian Splatting for Surface Reconstruction 2026-02-13T10:26:32Z Recently, 3D Gaussian Splatting has emerged as a prominent research direction owing to its ultrarapid training speed and high-fidelity rendering capabilities. However, the unstructured and irregular nature of Gaussian point clouds poses challenges to reconstruction accuracy. This limitation frequently causes high-frequency detail loss in complex surface microstructures when relying solely on routine strategies. To address this limitation, we propose GSM-GS: a synergistic optimization framework integrating single-view adaptive sub-region weighting constraints and multi-view spatial structure refinement. For single-view optimization, we leverage image gradient features to partition scenes into texture-rich and texture-less sub-regions. The reconstruction quality is enhanced through adaptive filtering mechanisms guided by depth discrepancy features. This preserves high-weight regions while implementing a dual-branch constraint strategy tailored to regional texture variations, thereby improving geometric detail characterization. For multi-view optimization, we introduce a geometry-guided cross-view point cloud association method combined with a dynamic weight sampling strategy. This constructs 3D structural normal constraints across adjacent point cloud frames, effectively reinforcing multi-view consistency and reconstruction fidelity. Extensive experiments on public datasets demonstrate that our method achieves both competitive rendering quality and geometric reconstruction. See our interactive project page 2026-02-13T10:26:32Z https://aislab-sustech.github.io/GSM-GS/ Xiao Ren Yu Liu Ning An Jian Cheng Xin Qiao He Kong http://arxiv.org/abs/2602.12349v1 Variational Green's Functions for Volumetric PDEs 2026-02-12T19:12:44Z Green's functions characterize the fundamental solutions of partial differential equations; they are essential for tasks ranging from shape analysis to physical simulation, yet they remain computationally prohibitive to evaluate on arbitrary geometric discretizations. We present Variational Green's Function (VGF), a method that learns a smooth, differentiable representation of the Green's function for linear self-adjoint PDE operators, including the Poisson, the screened Poisson, and the biharmonic equations. To resolve the sharp singularities characteristic of the Green's functions, our method decomposes the Green's function into an analytic free-space component, and a learned corrector component. Our method leverages a variational foundation to impose Neumann boundary conditions naturally, and imposes Dirichlet boundary conditions via a projective layer on the output of the neural field. The resulting Green's functions are fast to evaluate, differentiable with respect to source application, and can be conditioned on other signals parameterizing our geometry. 2026-02-12T19:12:44Z Joao Teixeira Eitan Grinspun Otman Benchekroun http://arxiv.org/abs/2507.18352v3 Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation 2026-02-12T18:17:00Z The training of high-quality, robust machine learning models for speech-driven 3D facial animation requires a large, diverse dataset of high-quality audio-animation pairs. To overcome the lack of such a dataset, recent work has introduced large pre-trained speech encoders that are robust to variations in the input audio and, therefore, enable the facial animation model to generalize across speakers, audio quality, and languages. However, the resulting facial animation models are prohibitively large and lend themselves only to offline inference on a dedicated machine. In this work, we explore on-device, real-time facial animation models in the context of game development. We overcome the lack of large datasets by using hybrid knowledge distillation with pseudo-labeling. Given a large audio dataset, we employ a high-performing teacher model to train very small student models. In contrast to the pre-trained speech encoders, our student models only consist of convolutional and fully-connected layers, removing the need for attention context or recurrent updates. In our experiments, we demonstrate that we can reduce the memory footprint to up to 3.4 MB and required future audio context to up to 81 ms while maintaining high-quality animations. This paves the way for on-device inference, an important step towards realistic, model-driven digital characters. 2025-07-24T12:25:12Z Accepted to ACM TOG 2025 (SIGGRAPH journal track); Project page: https://electronicarts.github.io/tiny-voice2face/ ACM Transactions on Graphics, Vol. 44, No. 4, Article 104, July 2025 Zhen Han Mattias Teye Derek Yadgaroff Judith Bütepage 10.1145/3730929