https://arxiv.org/api/I6Gv1GHleaK2TY/i6t62VJB9UQ82026-06-18T02:41:47Z934985515http://arxiv.org/abs/2412.11762v3GS-ProCams: Gaussian Splatting-based Projector-Camera Systems2026-02-17T09:23:20ZWe present GS-ProCams, the first Gaussian Splatting-based framework for projector-camera systems (ProCams). GS-ProCams is not only view-agnostic but also significantly enhances the efficiency of projection mapping (PM) that requires establishing geometric and radiometric mappings between the projector and the camera. Previous CNN-based ProCams are constrained to a specific viewpoint, limiting their applicability to novel perspectives. In contrast, NeRF-based ProCams support view-agnostic projection mapping, however, they require an additional co-located light source and demand significant computational and memory resources. To address this issue, we propose GS-ProCams that employs 2D Gaussian for scene representations, and enables efficient view-agnostic ProCams applications. In particular, we explicitly model the complex geometric and photometric mappings of ProCams using projector responses, the projection surface's geometry and materials represented by Gaussians, and the global illumination component. Then, we employ differentiable physically-based rendering to jointly estimate them from captured multi-view projections. Compared to state-of-the-art NeRF-based methods, our GS-ProCams eliminates the need for additional devices, achieving superior ProCams simulation quality. It also uses only 1/10 of the GPU memory for training and is 900 times faster in inference speed. Please refer to our project page for the code and dataset: https://realqingyue.github.io/GS-ProCams/.2024-12-16T13:26:52ZThis version includes updated experimental results after an implementation fixQingyue DengJijiang LiHaibin LingBingyao Huang10.1109/TVCG.2025.3616841http://arxiv.org/abs/2507.14841v2Towards Geometric and Textural Consistency 3D Scene Generation via Single Image-guided Model Generation and Layout Optimization2026-02-17T07:45:24ZIn recent years, 3D generation has made great strides in both academia and industry. However, generating 3D scenes from a single RGB image remains a significant challenge, as current approaches often struggle to ensure both object generation quality and scene coherence in multi-object scenarios. To overcome these limitations, we propose a novel three-stage framework for 3D scene generation with explicit geometric representations and high-quality textural details via single image-guided model generation and spatial layout optimization. Our method begins with an image instance segmentation and inpainting phase, which recovers missing details of occluded objects in the input images, thereby achieving complete generation of foreground 3D assets. Subsequently, our approach captures the spatial geometry of reference image by constructing pseudo-stereo viewpoint for camera parameter estimation and scene depth inference, while employing a model selection strategy to ensure optimal alignment between the 3D assets generated in the previous step and the input. Finally, through model parameterization and minimization of the Chamfer distance between point clouds in 3D and 2D space, our approach optimizes layout parameters to produce an explicit 3D scene representation that maintains precise alignment with input guidance image. Extensive experiments on multi-object scene image sets have demonstrated that our approach not only outperforms state-of-the-art methods in terms of geometric accuracy and texture fidelity of individual generated 3D models, but also has significant advantages in scene layout synthesis.2025-07-20T06:59:42Z14 pages, 9 figures, Project page: https://xdlbw.github.io/sing3d/Xiang TangRuotong LiXiaopeng Fanhttp://arxiv.org/abs/2501.12369v3DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial Basis Functions2026-02-17T06:40:41ZSplatting-based 3D reconstruction methods have gained popularity with the advent of 3D Gaussian Splatting, efficiently synthesizing high-quality novel views. These methods commonly resort to using exponential family functions, such as the Gaussian function, as reconstruction kernels due to their anisotropic nature, ease of projection, and differentiability in rasterization. However, the field remains restricted to variations within the exponential family, leaving generalized reconstruction kernels largely underexplored, partly due to the lack of easy integrability in 3D to 2D projections. In this light, we show that a class of decaying anisotropic radial basis functions (DARBFs), which are non-negative functions of the Mahalanobis distance, supports splatting by approximating the Gaussian function's closed-form integration advantage. With this fresh perspective, we demonstrate varying performances across selected DARB reconstruction kernels, achieving comparable training convergence and memory footprints, with on-par PSNR, SSIM, and LPIPS results.2025-01-21T18:49:06ZLink to the project page: https://github.com/viruthshaan/darb-splatting/Hashiru PramudithaUniversity of MoratuwaVinasirajan ViruthshaanUniversity of MoratuwaVishagar ArunanUniversity of MoratuwaSaeedha NazarUniversity of MoratuwaSameera RamasingheUniversity of AdelaideSimon LuceyUniversity of AdelaideRanga RodrigoUniversity of Moratuwahttp://arxiv.org/abs/2506.14714v2SkinCells: Sparse Skinning using Voronoi Cells2026-02-17T05:37:49ZFor decades, real-time skinning has been the cornerstone of character animation in visual effects and games. Despite its importance, the creation of animatable digital assets remains a labor-intensive manual process. Existing automated tools frequently struggle with intricate geometries, often necessitating significant manual refinement to reach production standards. We present a robust, fully automated method for generating high-quality skinning weights from a standard mesh and skeleton in a canonical A- or T-pose. Unlike traditional approaches, our framework offers direct sparsity controls to limit bone influences per vertex -- a critical requirement for maintaining performance in large-scale mobile environments. Furthermore, we address the challenge of Level-of-Detail (LoD) management by optimizing weights within a continuous spatial volume rather than on discrete vertices. This allows a single optimization pass to be applied seamlessly across multiple asset resolutions and variations. Central to our approach is a novel parameterized family of functions, we call SkinCells. We demonstrate that our method consistently produces stable, high-quality results even in complex scenarios where standard biharmonic weight computations fail.2025-06-17T16:51:36ZEgor LarionovIgor SantestebanHsiao-yu ChenGene LinPhilipp HerholzRyan GoldadeLadislav KavanDoug RobleTuur Stuyckhttp://arxiv.org/abs/2509.23607v2ZeroScene: A Zero-Shot Framework for 3D Scene Generation from a Single Image and Controllable Texture Editing2026-02-17T05:27:01ZIn the field of 3D content generation, single image scene reconstruction methods still struggle to simultaneously ensure the quality of individual assets and the coherence of the overall scene in complex environments, while texture editing techniques often fail to maintain both local continuity and multi-view consistency. In this paper, we propose a novel system ZeroScene, which leverages the prior knowledge of large vision models to accomplish both single image-to-3D scene reconstruction and texture editing in a zero-shot manner. ZeroScene extracts object-level 2D segmentation and depth information from input images to infer spatial relationships within the scene. It then jointly optimizes 3D and 2D projection losses of the point cloud to update object poses for precise scene alignment, ultimately constructing a coherent and complete 3D scene that encompasses both foreground and background. Moreover, ZeroScene supports texture editing of objects in the scene. By imposing constraints on the diffusion model and introducing a mask-guided progressive image generation strategy, we effectively maintain texture consistency across multiple viewpoints and further enhance the realism of rendered results through Physically Based Rendering (PBR) material estimation. Experimental results demonstrate that our framework not only ensures the geometric and appearance accuracy of generated assets, but also faithfully reconstructs scene layouts and produces highly detailed textures that closely align with text prompts.2025-09-28T03:21:12Z16 pages, 15 figures, Eurographics 2026, Project page: https://xdlbw.github.io/ZeroScene/Xiang TangRuotong LiXiaopeng Fanhttp://arxiv.org/abs/2506.03407v2Multi-Spectral Gaussian Splatting with Neural Color Representation2026-02-16T08:39:04ZWe present MS-Splatting -- a multi-spectral 3D Gaussian Splatting (3DGS) framework that is able to generate multi-view consistent novel views from images of multiple, independent cameras with different spectral domains. In contrast to previous approaches, our method does not require cross-modal camera calibration and is versatile enough to model a variety of different spectra, including thermal and near-infra red, without any algorithmic changes.
Unlike existing 3DGS-based frameworks that treat each modality separately (by optimizing per-channel spherical harmonics) and therefore fail to exploit the underlying spectral and spatial correlations, our method leverages a novel neural color representation that encodes multi-spectral information into a learned, compact, per-splat feature embedding. A shallow multi-layer perceptron (MLP) then decodes this embedding to obtain spectral color values, enabling joint learning of all bands within a unified representation.
Our experiments show that this simple yet effective strategy is able to improve multi-spectral rendering quality, while also leading to improved per-spectra rendering quality over state-of-the-art methods. We demonstrate the effectiveness of this new technique in agricultural applications to render vegetation indices, such as normalized difference vegetation index (NDVI).2025-06-03T21:36:50Zfor project page, see https://meyerls.github.io/ms_splattingLukas MeyerJosef GrünMaximilian WeihererBernhard EggerMarc StammingerLinus Frankehttp://arxiv.org/abs/2509.07175v2Efficient Computation of Voronoi Diagrams Using Point-in-Cell Tests2026-02-16T06:53:07ZSince the Voronoi diagram appears in many applications, the topic of improving its computational efficiency remains attractive. We propose a novel yet efficient method to compute Voronoi diagrams bounded by a given domain, i.e., the clipped or restricted Voronoi diagrams. The intersection of the domain and a Voronoi cell (domain-cell intersection) is generated by removing the part outside the cell from the domain, which can be accomplished by several clippings. Different from the existing methods, we present an edge-based search scheme to find clipping planes (bisectors). A test called point-in-cell is first set up to tell whether a space point is in a target Voronoi cell or not. Then, for each edge of the intermediate domain-cell intersection, we will launch a clipping only if its two endpoints are respectively inside and outside the corresponding Voronoi cell, where the bisector for the clipping can be found by using a few times of point-in-cell tests. Therefore, our method only involves the clippings that contribute to the final results, which is a great advantage over the state-of-the-art methods. Additionally, because each domain-cell intersection can be generated independently, we extend the proposed method to the GPUs for computing Voronoi diagrams in parallel. The experimental results show the best performance of our method compared to state-of-the-art ones, regardless of site distribution. This paper was first submitted to SIGGRAPH Asia 2025.2025-09-08T19:48:28ZYanyang XiaoJuan CaoZhonggui Chenhttp://arxiv.org/abs/2602.14493v1Gaussian Mesh Renderer for Lightweight Differentiable Rendering2026-02-16T06:15:42Z3D Gaussian Splatting (3DGS) has enabled high-fidelity virtualization with fast rendering and optimization for novel view synthesis. On the other hand, triangle mesh models still remain a popular choice for surface reconstruction but suffer from slow or heavy optimization in traditional mesh-based differentiable renderers. To address this problem, we propose a new lightweight differentiable mesh renderer leveraging the efficient rasterization process of 3DGS, named Gaussian Mesh Renderer (GMR), which tightly integrates the Gaussian and mesh representations. Each Gaussian primitive is analytically derived from the corresponding mesh triangle, preserving structural fidelity and enabling the gradient flow. Compared to the traditional mesh renderers, our method achieves smoother gradients, which especially contributes to better optimization using smaller batch sizes with limited memory. Our implementation is available in the public GitHub repository at https://github.com/huntorochi/Gaussian-Mesh-Renderer.2026-02-16T06:15:42ZIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026). GitHub: https://github.com/huntorochi/Gaussian-Mesh-RendererXinpeng LiuFumio Okurahttp://arxiv.org/abs/2602.14048v1ProAct: A Dual-System Framework for Proactive Embodied Social Agents2026-02-15T08:27:34ZEmbodied social agents have recently advanced in generating synchronized speech and gestures. However, most interactive systems remain fundamentally reactive, responding only to current sensory inputs within a short temporal window. Proactive social behavior, in contrast, requires deliberation over accumulated context and intent inference, which conflicts with the strict latency budget of real-time interaction. We present \emph{ProAct}, a dual-system framework that reconciles this time-scale conflict by decoupling a low-latency \emph{Behavioral System} for streaming multimodal interaction from a slower \emph{Cognitive System} which performs long-horizon social reasoning and produces high-level proactive intentions. To translate deliberative intentions into continuous non-verbal behaviors without disrupting fluency, we introduce a streaming flow-matching model conditioned on intentions via ControlNet. This mechanism supports asynchronous intention injection, enabling seamless transitions between reactive and proactive gestures within a single motion stream. We deploy ProAct on a physical humanoid robot and evaluate both motion quality and interactive effectiveness. In real-world interaction user studies, participants and observers consistently prefer ProAct over reactive variants in perceived proactivity, social presence, and overall engagement, demonstrating the benefits of dual-system proactive control for embodied social interaction.2026-02-15T08:27:34ZProject Page: https://proactrobot.github.io/Zeyi ZhangZixi KangRuijie ZhaoYusen FengBiao JiangLibin Liuhttp://arxiv.org/abs/2602.12157v2TexSpot: 3D Texture Enhancement with Spatially-uniform Point Latent Representation2026-02-14T15:54:01ZHigh-quality 3D texture generation remains a fundamental challenge due to the view-inconsistency inherent in current mainstream multi-view diffusion pipelines. Existing representations either rely on UV maps, which suffer from distortion during unwrapping, or point-based methods, which tightly couple texture fidelity to geometric density that limits high-resolution texture generation. To address these limitations, we introduce TexSpot, a diffusion-based texture enhancement framework. At its core is Texlet, a novel 3D texture representation that merges the geometric expressiveness of point-based 3D textures with the compactness of UV-based representation. Each Texlet latent vector encodes a local texture patch via a 2D encoder and is further aggregated using a 3D encoder to incorporate global shape context. A cascaded 3D-to-2D decoder reconstructs high-quality texture patches, enabling the Texlet space learning. Leveraging this representation, we train a diffusion transformer conditioned on Texlets to refine and enhance textures produced by multi-view diffusion methods. Extensive experiments demonstrate that TexSpot significantly improves visual fidelity, geometric consistency, and robustness over existing state-of-the-art 3D texture generation and enhancement approaches. Project page: https://texlet-arch.github.io/TexSpot-page.2026-02-12T16:37:31ZProject page: https://texlet-arch.github.io/TexSpot-pageZiteng LuYushuang WuChongjie YeYuda QiuJing ShaoXiaoyang GuoJiaqing ZhouTianlei HuKun ZhouXiaoguang Hanhttp://arxiv.org/abs/2602.13185v1FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control2026-02-13T18:52:11ZEffective and generalizable control in video generation remains a significant challenge. While many methods rely on ambiguous or task-specific signals, we argue that a fundamental disentanglement of "appearance" and "motion" provides a more robust and scalable pathway. We propose FlexAM, a unified framework built upon a novel 3D control signal. This signal represents video dynamics as a point cloud, introducing three key enhancements: multi-frequency positional encoding to distinguish fine-grained motion, depth-aware positional encoding, and a flexible control signal for balancing precision and generative quality. This representation allows FlexAM to effectively disentangle appearance and motion, enabling a wide range of tasks including I2V/V2V editing, camera control, and spatial object editing. Extensive experiments demonstrate that FlexAM achieves superior performance across all evaluated tasks.2026-02-13T18:52:11ZCodes: https://github.com/IGL-HKUST/FlexAMMingzhi ShengZekai GuPeng LiCheng LinHao-Xiang GuoYing-Cong ChenYuan Liuhttp://arxiv.org/abs/2602.12949v1Real-time Rendering with a Neural Irradiance Volume2026-02-13T14:15:46ZRendering diffuse global illumination in real-time is often approximated by pre-computing and storing irradiance in a 3D grid of probes. As long as most of the scene remains static, probes approximate irradiance for all surfaces immersed in the irradiance volume, including novel dynamic objects. This approach, however, suffers from aliasing artifacts and high memory consumption. We propose Neural Irradiance Volume (NIV), a neural-based technique that allows accurate real-time rendering of diffuse global illumination via a compact pre-computed model, overcoming the limitations of traditional probe-based methods, such as the expensive memory footprint, aliasing artifacts, and scene-specific heuristics. The key insight is that neural compression creates an adaptive and amortized representation of irradiance, circumventing the cubic scaling of grid-based methods. Our superior memory-scaling improves quality by at least 10x at the same memory budget, and enables a straightforward representation of higher-dimensional irradiance fields, allowing rendering of time-varying or dynamic effects without requiring additional computation at runtime. Unlike other neural rendering techniques, our method works within strict real-time constraints, providing fast inference (around 1 ms per frame on consumer GPUs at full HD resolution), reduced memory usage (1-5 MB for medium-sized scenes), and only requires a G-buffer as input, without expensive ray tracing or denoising.2026-02-13T14:15:46ZAccepted at Eurographics 2026Arno CoomansGiacomo NazzaroEdoardo A. DominiciChristian DöringFloor VerhoevenKonstantinos VardisMarkus Steinbergerhttp://arxiv.org/abs/2602.12796v1GSM-GS: Geometry-Constrained Single and Multi-view Gaussian Splatting for Surface Reconstruction2026-02-13T10:26:32ZRecently, 3D Gaussian Splatting has emerged as a prominent research direction owing to its ultrarapid training speed and high-fidelity rendering capabilities. However, the unstructured and irregular nature of Gaussian point clouds poses challenges to reconstruction accuracy. This limitation frequently causes high-frequency detail loss in complex surface microstructures when relying solely on routine strategies. To address this limitation, we propose GSM-GS: a synergistic optimization framework integrating single-view adaptive sub-region weighting constraints and multi-view spatial structure refinement. For single-view optimization, we leverage image gradient features to partition scenes into texture-rich and texture-less sub-regions. The reconstruction quality is enhanced through adaptive filtering mechanisms guided by depth discrepancy features. This preserves high-weight regions while implementing a dual-branch constraint strategy tailored to regional texture variations, thereby improving geometric detail characterization. For multi-view optimization, we introduce a geometry-guided cross-view point cloud association method combined with a dynamic weight sampling strategy. This constructs 3D structural normal constraints across adjacent point cloud frames, effectively reinforcing multi-view consistency and reconstruction fidelity. Extensive experiments on public datasets demonstrate that our method achieves both competitive rendering quality and geometric reconstruction. See our interactive project page2026-02-13T10:26:32Zhttps://aislab-sustech.github.io/GSM-GS/Xiao RenYu LiuNing AnJian ChengXin QiaoHe Konghttp://arxiv.org/abs/2602.12349v1Variational Green's Functions for Volumetric PDEs2026-02-12T19:12:44ZGreen's functions characterize the fundamental solutions of partial differential equations; they are essential for tasks ranging from shape analysis to physical simulation, yet they remain computationally prohibitive to evaluate on arbitrary geometric discretizations. We present Variational Green's Function (VGF), a method that learns a smooth, differentiable representation of the Green's function for linear self-adjoint PDE operators, including the Poisson, the screened Poisson, and the biharmonic equations. To resolve the sharp singularities characteristic of the Green's functions, our method decomposes the Green's function into an analytic free-space component, and a learned corrector component. Our method leverages a variational foundation to impose Neumann boundary conditions naturally, and imposes Dirichlet boundary conditions via a projective layer on the output of the neural field. The resulting Green's functions are fast to evaluate, differentiable with respect to source application, and can be conditioned on other signals parameterizing our geometry.2026-02-12T19:12:44ZJoao TeixeiraEitan GrinspunOtman Benchekrounhttp://arxiv.org/abs/2507.18352v3Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation2026-02-12T18:17:00ZThe training of high-quality, robust machine learning models for speech-driven 3D facial animation requires a large, diverse dataset of high-quality audio-animation pairs. To overcome the lack of such a dataset, recent work has introduced large pre-trained speech encoders that are robust to variations in the input audio and, therefore, enable the facial animation model to generalize across speakers, audio quality, and languages. However, the resulting facial animation models are prohibitively large and lend themselves only to offline inference on a dedicated machine. In this work, we explore on-device, real-time facial animation models in the context of game development. We overcome the lack of large datasets by using hybrid knowledge distillation with pseudo-labeling. Given a large audio dataset, we employ a high-performing teacher model to train very small student models. In contrast to the pre-trained speech encoders, our student models only consist of convolutional and fully-connected layers, removing the need for attention context or recurrent updates. In our experiments, we demonstrate that we can reduce the memory footprint to up to 3.4 MB and required future audio context to up to 81 ms while maintaining high-quality animations. This paves the way for on-device inference, an important step towards realistic, model-driven digital characters.2025-07-24T12:25:12ZAccepted to ACM TOG 2025 (SIGGRAPH journal track); Project page: https://electronicarts.github.io/tiny-voice2face/ACM Transactions on Graphics, Vol. 44, No. 4, Article 104, July 2025Zhen HanMattias TeyeDerek YadgaroffJudith Bütepage10.1145/3730929