https://arxiv.org/api/mbpJ0SWAKTkUr04dDh3tKfcQykE 2026-06-20T07:31:30Z 9354 870 15 http://arxiv.org/abs/2602.12157v2 TexSpot: 3D Texture Enhancement with Spatially-uniform Point Latent Representation 2026-02-14T15:54:01Z

High-quality 3D texture generation remains a fundamental challenge due to the view-inconsistency inherent in current mainstream multi-view diffusion pipelines. Existing representations either rely on UV maps, which suffer from distortion during unwrapping, or point-based methods, which tightly couple texture fidelity to geometric density that limits high-resolution texture generation. To address these limitations, we introduce TexSpot, a diffusion-based texture enhancement framework. At its core is Texlet, a novel 3D texture representation that merges the geometric expressiveness of point-based 3D textures with the compactness of UV-based representation. Each Texlet latent vector encodes a local texture patch via a 2D encoder and is further aggregated using a 3D encoder to incorporate global shape context. A cascaded 3D-to-2D decoder reconstructs high-quality texture patches, enabling the Texlet space learning. Leveraging this representation, we train a diffusion transformer conditioned on Texlets to refine and enhance textures produced by multi-view diffusion methods. Extensive experiments demonstrate that TexSpot significantly improves visual fidelity, geometric consistency, and robustness over existing state-of-the-art 3D texture generation and enhancement approaches. Project page: https://texlet-arch.github.io/TexSpot-page.

2026-02-12T16:37:31Z Project page: https://texlet-arch.github.io/TexSpot-page Ziteng Lu Yushuang Wu Chongjie Ye Yuda Qiu Jing Shao Xiaoyang Guo Jiaqing Zhou Tianlei Hu Kun Zhou Xiaoguang Han http://arxiv.org/abs/2602.13185v1 FlexAM: Flexible Appearance-Motion Decomposition for Versatile Video Generation Control 2026-02-13T18:52:11Z

Effective and generalizable control in video generation remains a significant challenge. While many methods rely on ambiguous or task-specific signals, we argue that a fundamental disentanglement of "appearance" and "motion" provides a more robust and scalable pathway. We propose FlexAM, a unified framework built upon a novel 3D control signal. This signal represents video dynamics as a point cloud, introducing three key enhancements: multi-frequency positional encoding to distinguish fine-grained motion, depth-aware positional encoding, and a flexible control signal for balancing precision and generative quality. This representation allows FlexAM to effectively disentangle appearance and motion, enabling a wide range of tasks including I2V/V2V editing, camera control, and spatial object editing. Extensive experiments demonstrate that FlexAM achieves superior performance across all evaluated tasks.

2026-02-13T18:52:11Z Codes: https://github.com/IGL-HKUST/FlexAM Mingzhi Sheng Zekai Gu Peng Li Cheng Lin Hao-Xiang Guo Ying-Cong Chen Yuan Liu http://arxiv.org/abs/2602.12949v1 Real-time Rendering with a Neural Irradiance Volume 2026-02-13T14:15:46Z

Rendering diffuse global illumination in real-time is often approximated by pre-computing and storing irradiance in a 3D grid of probes. As long as most of the scene remains static, probes approximate irradiance for all surfaces immersed in the irradiance volume, including novel dynamic objects. This approach, however, suffers from aliasing artifacts and high memory consumption. We propose Neural Irradiance Volume (NIV), a neural-based technique that allows accurate real-time rendering of diffuse global illumination via a compact pre-computed model, overcoming the limitations of traditional probe-based methods, such as the expensive memory footprint, aliasing artifacts, and scene-specific heuristics. The key insight is that neural compression creates an adaptive and amortized representation of irradiance, circumventing the cubic scaling of grid-based methods. Our superior memory-scaling improves quality by at least 10x at the same memory budget, and enables a straightforward representation of higher-dimensional irradiance fields, allowing rendering of time-varying or dynamic effects without requiring additional computation at runtime. Unlike other neural rendering techniques, our method works within strict real-time constraints, providing fast inference (around 1 ms per frame on consumer GPUs at full HD resolution), reduced memory usage (1-5 MB for medium-sized scenes), and only requires a G-buffer as input, without expensive ray tracing or denoising.

2026-02-13T14:15:46Z Accepted at Eurographics 2026 Arno Coomans Giacomo Nazzaro Edoardo A. Dominici Christian Döring Floor Verhoeven Konstantinos Vardis Markus Steinberger http://arxiv.org/abs/2602.12796v1 GSM-GS: Geometry-Constrained Single and Multi-view Gaussian Splatting for Surface Reconstruction 2026-02-13T10:26:32Z

Recently, 3D Gaussian Splatting has emerged as a prominent research direction owing to its ultrarapid training speed and high-fidelity rendering capabilities. However, the unstructured and irregular nature of Gaussian point clouds poses challenges to reconstruction accuracy. This limitation frequently causes high-frequency detail loss in complex surface microstructures when relying solely on routine strategies. To address this limitation, we propose GSM-GS: a synergistic optimization framework integrating single-view adaptive sub-region weighting constraints and multi-view spatial structure refinement. For single-view optimization, we leverage image gradient features to partition scenes into texture-rich and texture-less sub-regions. The reconstruction quality is enhanced through adaptive filtering mechanisms guided by depth discrepancy features. This preserves high-weight regions while implementing a dual-branch constraint strategy tailored to regional texture variations, thereby improving geometric detail characterization. For multi-view optimization, we introduce a geometry-guided cross-view point cloud association method combined with a dynamic weight sampling strategy. This constructs 3D structural normal constraints across adjacent point cloud frames, effectively reinforcing multi-view consistency and reconstruction fidelity. Extensive experiments on public datasets demonstrate that our method achieves both competitive rendering quality and geometric reconstruction. See our interactive project page

2026-02-13T10:26:32Z https://aislab-sustech.github.io/GSM-GS/ Xiao Ren Yu Liu Ning An Jian Cheng Xin Qiao He Kong http://arxiv.org/abs/2602.12349v1 Variational Green's Functions for Volumetric PDEs 2026-02-12T19:12:44Z

Green's functions characterize the fundamental solutions of partial differential equations; they are essential for tasks ranging from shape analysis to physical simulation, yet they remain computationally prohibitive to evaluate on arbitrary geometric discretizations. We present Variational Green's Function (VGF), a method that learns a smooth, differentiable representation of the Green's function for linear self-adjoint PDE operators, including the Poisson, the screened Poisson, and the biharmonic equations. To resolve the sharp singularities characteristic of the Green's functions, our method decomposes the Green's function into an analytic free-space component, and a learned corrector component. Our method leverages a variational foundation to impose Neumann boundary conditions naturally, and imposes Dirichlet boundary conditions via a projective layer on the output of the neural field. The resulting Green's functions are fast to evaluate, differentiable with respect to source application, and can be conditioned on other signals parameterizing our geometry.

2026-02-12T19:12:44Z Joao Teixeira Eitan Grinspun Otman Benchekroun http://arxiv.org/abs/2507.18352v3 Tiny is not small enough: High-quality, low-resource facial animation models through hybrid knowledge distillation 2026-02-12T18:17:00Z

The training of high-quality, robust machine learning models for speech-driven 3D facial animation requires a large, diverse dataset of high-quality audio-animation pairs. To overcome the lack of such a dataset, recent work has introduced large pre-trained speech encoders that are robust to variations in the input audio and, therefore, enable the facial animation model to generalize across speakers, audio quality, and languages. However, the resulting facial animation models are prohibitively large and lend themselves only to offline inference on a dedicated machine. In this work, we explore on-device, real-time facial animation models in the context of game development. We overcome the lack of large datasets by using hybrid knowledge distillation with pseudo-labeling. Given a large audio dataset, we employ a high-performing teacher model to train very small student models. In contrast to the pre-trained speech encoders, our student models only consist of convolutional and fully-connected layers, removing the need for attention context or recurrent updates. In our experiments, we demonstrate that we can reduce the memory footprint to up to 3.4 MB and required future audio context to up to 81 ms while maintaining high-quality animations. This paves the way for on-device inference, an important step towards realistic, model-driven digital characters.

2025-07-24T12:25:12Z Accepted to ACM TOG 2025 (SIGGRAPH journal track); Project page: https://electronicarts.github.io/tiny-voice2face/ ACM Transactions on Graphics, Vol. 44, No. 4, Article 104, July 2025 Zhen Han Mattias Teye Derek Yadgaroff Judith Bütepage 10.1145/3730929 http://arxiv.org/abs/2504.13204v2 EDGS: Eliminating Densification for Efficient Convergence of 3DGS 2026-02-12T17:41:41Z

3D Gaussian Splatting reconstructs scenes by starting from a sparse Structure-from-Motion initialization and refining under-reconstructed regions. This process is slow, as it requires multiple densification steps where Gaussians are repeatedly split and adjusted, following a lengthy optimization path. Moreover, this incremental approach often yields suboptimal renderings in high-frequency regions. We propose a fundamentally different approach: eliminate densification with a one-step approximation of scene geometry using triangulated pixels from dense image correspondences. This dense initialization allows us to estimate the rough geometry of the scene while preserving rich details from input RGB images, providing each Gaussian with well-informed color, scale, and position. As a result, we dramatically shorten the optimization path and remove the need for densification. Unlike methods that rely on sparse keypoints, our dense initialization ensures uniform detail across the scene, even in high-frequency regions where other methods struggle. Moreover, since all splats are initialized in parallel at the start of optimization, we remove the need to wait for densification to adjust new Gaussians. EDGS reaches LPIPS and SSIM performance of standard 3DGS significantly faster than existing efficiency-focused approaches. When trained further, it exceeds the reconstruction quality of state-of-the-art models aimed at maximizing fidelity. Our method is fully compatible with other acceleration techniques, making it a versatile and efficient solution that can be integrated with existing approaches.

2025-04-15T18:57:55Z Dmytro Kotovenko Olga Grebenkova Björn Ommer http://arxiv.org/abs/2510.09081v2 Real-Time Rendering of Dynamic Line Sets using Voxel Ray Tracing 2026-02-12T12:45:27Z

Real-time rendering of dynamic line sets is relevant in many visualization tasks, including unsteady flow visualization and interactive white matter reconstruction from Magnetic Resonance Imaging. High-quality global illumination and transparency are important for conveying the spatial structure of dense line sets, yet remain difficult to achieve at interactive rates. We propose an efficient voxel-based ray-tracing framework for rendering large dynamic line sets with ambient occlusion and ground-truth transparency. We introduce a voxelization algorithm that supports efficient on-the-fly construction of acceleration structures for both voxel cone tracing and ray tracing. To further reduce per-frame preprocessing cost, we propose a voxel-based culling method that restricts acceleration structure construction to camera-visible voxels. Together, these contributions enable real-time rendering of large-scale dynamic line sets with high quality and physically accurate transparency. We demonstrate that our method outperforms the state of the art in quality and performance when rendering (semi-)opaque dynamic line sets.

2025-10-10T07:28:05Z Bram Kraaijeveld Andrei C. Jalba Anna Vilanova Maxime Chamberland http://arxiv.org/abs/2505.20457v2 Learned Adaptive Mesh Generation 2026-02-12T09:33:48Z

Elliptic Partial Differential Equations (PDEs) play a central role in computing the equilibrium conditions of physical problems (heat, gravitation, electrostatics, etc.). Efficient solutions to elliptic PDEs are also relevant to computer graphics since they encode global smoothness with local control leading to stable, well-behaved solutions. The Poisson equation is a linear elliptic PDE that serves as a prototypical candidate to assess newly-proposed solvers. Solving the Poisson equation on an arbitrary 3D domain, say a 3D scan of a turbine's blade, is computationally expensive and scales quadratically with discretization. Traditional workflows in research and industry exploit variants of the finite element method (FEM), but some key benefits of using Monte Carlo (MC) methods have been identified. Our key idea is to exploit a sparse and approximate solution (via FEM or MC) to the Poisson equation towards inferring an adaptive discretization in one shot. We achieve this by training a lightweight neural network that generalizes across shapes and boundary conditions. Our algorithm, Learned Adaptive Mesh Generation (LAMG), maps from a coarse solution to a sizing field that defines a local (adaptive) spatial resolution. This output space, rather than directly predicting a high-resolution solution, is a unique aspect of our approach. We use standard methods to generate tetrahedral meshes that respect the sizing field, and obtain the solution via one FEM computation on the adaptive mesh. That is, our neural network serves as a surrogate model of a computationally expensive method that requires multiple (iterative) FEM solves. We demonstrate the versatility, controllability, robustness and efficiency of LAMG via systematic experimentation.

2025-05-26T18:52:53Z Zhiyuan Zhang Amir Vaxman Stefanos-Aldo Papanicolopulos Kartic Subr http://arxiv.org/abs/2601.05844v2 DexterCap: An Affordable and Automated System for Capturing Dexterous Hand-Object Manipulation 2026-02-12T08:29:48Z

Capturing fine-grained hand-object interactions is challenging due to severe self-occlusion from closely spaced fingers and the subtlety of in-hand manipulation motions. Existing optical motion capture systems rely on expensive camera setups and extensive manual post-processing, while low-cost vision-based methods often suffer from reduced accuracy and reliability under occlusion. To address these challenges, we present DexterCap, a low-cost optical capture system for dexterous in-hand manipulation. DexterCap uses dense, character-coded marker patches to achieve robust tracking under severe self-occlusion, together with an automated reconstruction pipeline that requires minimal manual effort. With DexterCap, we introduce DexterHand, a dataset of fine-grained hand-object interactions covering diverse manipulation behaviors and objects, from simple primitives to complex articulated objects such as a Rubik's Cube. We release the dataset and code to support future research on dexterous hand-object interaction. Project website: https://pku-mocca.github.io/Dextercap-Page/

2026-01-09T15:16:31Z 12 pages, 12 figures Yutong Liang Shiyi Xu Yulong Zhang Bowen Zhan He Zhang Libin Liu http://arxiv.org/abs/2602.11693v1 OMEGA-Avatar: One-shot Modeling of 360° Gaussian Avatars 2026-02-12T08:16:38Z

Creating high-fidelity, animatable 3D avatars from a single image remains a formidable challenge. We identified three desirable attributes of avatar generation: 1) the method should be feed-forward, 2) model a 360° full-head, and 3) should be animation-ready. However, current work addresses only two of the three points simultaneously. To address these limitations, we propose OMEGA-Avatar, the first feed-forward framework that simultaneously generates a generalizable, 360°-complete, and animatable 3D Gaussian head from a single image. Starting from a feed-forward and animatable framework, we address the 360° full-head avatar generation problem with two novel components. First, to overcome poor hair modeling in full-head avatar generation, we introduce a semantic-aware mesh deformation module that integrates multi-view normals to optimize a FLAME head with hair while preserving its topology structure. Second, to enable effective feed-forward decoding of full-head features, we propose a multi-view feature splatting module that constructs a shared canonical UV representation from features across multiple views through differentiable bilinear splatting, hierarchical UV mapping, and visibility-aware fusion. This approach preserves both global structural coherence and local high-frequency details across all viewpoints, ensuring 360° consistency without per-instance optimization. Extensive experiments demonstrate that OMEGA-Avatar achieves state-of-the-art performance, significantly outperforming existing baselines in 360° full-head completeness while robustly preserving identity across different viewpoints.

2026-02-12T08:16:38Z Project page: https://omega-avatar.github.io/OMEGA-Avatar/ Zehao Xia Yiqun Wang Zhengda Lu Kai Liu Jun Xiao Peter Wonka http://arxiv.org/abs/2602.11577v1 LeafFit: Plant Assets Creation from 3D Gaussian Splatting 2026-02-12T04:54:41Z

We propose LeafFit, a pipeline that converts 3D Gaussian Splatting (3DGS) of individual plants into editable, instanced mesh assets. While 3DGS faithfully captures complex foliage, its high memory footprint and lack of mesh topology make it incompatible with traditional game production workflows. We address this by leveraging the repetition of leaf shapes; our method segments leaves from the unstructured 3DGS, with optional user interaction included as a fallback. A representative leaf group is selected and converted into a thin, sharp mesh to serve as a template; this template is then fitted to all other leaves via differentiable Moving Least Squares (MLS) deformation. At runtime, the deformation is evaluated efficiently on-the-fly using a vertex shader to minimize storage requirements. Experiments demonstrate that LeafFit achieves higher segmentation quality and deformation accuracy than recent baselines while significantly reducing data size and enabling parameter-level editing.

2026-02-12T04:54:41Z Our source code is publicly available at https://github.com/netbeifeng/leaf_fit Eurographics 2026 Chang Luo Nobuyuki Umetani http://arxiv.org/abs/2603.29602v1 IMAGAgent: Orchestrating Multi-Turn Image Editing via Constraint-Aware Planning and Reflection 2026-02-12T02:37:38Z

Existing multi-turn image editing paradigms are often confined to isolated single-step execution. Due to a lack of context-awareness and closed-loop feedback mechanisms, they are prone to error accumulation and semantic drift during multi-turn interactions, ultimately resulting in severe structural distortion of the generated images. For that, we propose \textbf{IMAGAgent}, a multi-turn image editing agent framework based on a "plan-execute-reflect" closed-loop mechanism that achieves deep synergy among instruction parsing, tool scheduling, and adaptive correction within a unified pipeline. Specifically, we first present a constraint-aware planning module that leverages a vision-language model (VLM) to precisely decompose complex natural language instructions into a series of executable sub-tasks, governed by target singularity, semantic atomicity, and visual perceptibility. Then, the tool-chain orchestration module dynamically constructs execution paths based on the current image, the current sub-task, and the historical context, enabling adaptive scheduling and collaborative operation among heterogeneous operation models covering image retrieval, segmentation, detection, and editing. Finally, we devise a multi-expert collaborative reflection mechanism where a central large language model (LLM) receives the image to be edited and synthesizes VLM critiques into holistic feedback, simultaneously triggering fine-grained self-correction and recording feedback outcomes to optimize future decisions. Extensive experiments on our constructed \textbf{MTEditBench} and the MagicBrush dataset demonstrate that IMAGAgent achieves performance significantly superior to existing methods in terms of instruction consistency, editing precision, and overall quality. The code is available at https://github.com/hackermmzz/IMAGAgent.git.

2026-02-12T02:37:38Z Fei Shen Chengyu Xie Lihong Wang Zhanyi Zhang Xin Jiang Xiaoyu Du Jinhui Tang http://arxiv.org/abs/2602.11433v1 Filmsticking++: Rapid Film Sticking for Explicit Surface Reconstruction 2026-02-11T23:18:16Z

Explicit surface reconstruction aims to generate a surface mesh that exactly interpolates a given point cloud. This requirement is crucial when the point cloud must lie non-negotiably on the final surface to preserve sharp features and fine geometric details. However, the task becomes substantially challenging with low-quality point clouds, due to inherent reconstruction ambiguities compounded by combinatorial complexity. A previous method using filmsticking technique by iteratively compute restricted Voronoi diagram to address these issues, ensures to produce a watertight manifold, setting a new benchmark as the state-of-the-art (SOTA) technique. Unfortunately, RVD-based filmsticking is inability to interpolate all points in the case of deep internal cavities, resulting in very likely is the generation of faulty topology. The cause of this issue is that RVD-based filmsticking has inherent limitations due to Euclidean distance metrics. In this paper, we extend the filmsticking technique, named Filmsticking++. Filmsticking++ reconstructing an explicit surface from points without normals. On one hand, Filmsticking++ break through the inherent limitations of Euclidean distance by employing a weighted-distance-based Restricted Power Diagram, which guarantees that all points are interpolated. On the other hand, we observe that as the guiding surface increasingly approximates the target shape, the external medial axis is gradually expelled outside the guiding surface. Building on this observation, we propose placing virtual sites inside the guiding surface to accelerate the expulsion of the external medial axis from its interior. To summarize, contrary to the SOTA method, Filmsticking++ demonstrates multiple benefits, including decreases computational cost, improved robustness and scalability.

2026-02-11T23:18:16Z 15 pages, 15 figures Pengfei Wang Jian Liu Qiujie Dong Shiqing Xin Yuanfeng Zhou Changhe Tu Caiming Zhang Wenping Wang http://arxiv.org/abs/2602.11314v1 Advancing Digital Twin Generation Through a Novel Simulation Framework and Quantitative Benchmarking 2026-02-11T19:38:00Z

The generation of 3D models from real-world objects has often been accomplished through photogrammetry, i.e., by taking 2D photos from a variety of perspectives and then triangulating matched point-based features to create a textured mesh. Many design choices exist within this framework for the generation of digital twins, and differences between such approaches are largely judged qualitatively. Here, we present and test a novel pipeline for generating synthetic images from high-quality 3D models and programmatically generated camera poses. This enables a wide variety of repeatable, quantifiable experiments which can compare ground-truth knowledge of virtual camera parameters and of virtual objects against the reconstructed estimations of those perspectives and subjects.

2026-02-11T19:38:00Z 9 pages, 10 figures. Preprint Jacob Rubinstein Avi Donaty Don Engel