https://arxiv.org/api/jDyOW95DIxLLqUr+yHiGDW3bA6w 2026-06-17T16:35:45Z 9346 720 15 http://arxiv.org/abs/2603.29860v1 GENIE: Gram-Eigenmode INR Editing with Closed-Form Geometry Updates 2026-03-12T14:25:49Z

Implicit Neural Representations (INRs) provide compact models of geometry, but it is unclear when their learned shapes can be edited without retraining. We show that the Gram operator induced by the INR's penultimate features admits deformation eigenmodes that parameterize a family of realizable edits of the SDF zero level set. A key finding is that these modes are not intrinsic to the geometry alone: they are reliably recoverable only when the Gram operator is estimated from sufficiently rich sampling distributions. We derive a single closed-form update that performs geometric edits to the INR without optimization by leveraging the deformation modes. We characterize theoretically the precise set of deformations that are feasible under this one-shot update, and show that editing is well-posed exactly within the span of these deformation modes.

2026-03-12T14:25:49Z 9 pages, 9 figures Samundra Karki Adarsh Krishnamurthy Baskar Ganapathysubramanian http://arxiv.org/abs/2510.26796v2 See4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting 2026-03-12T13:07:05Z

Immersive applications call for synthesizing spatiotemporal 4D content from casual videos without costly 3D supervision. Existing video-to-4D methods typically rely on manually annotated camera poses, which are labor-intensive and brittle for in-the-wild footage. Recent warp-then-inpaint approaches mitigate the need for pose labels by warping input frames along a novel camera trajectory and using an inpainting model to fill missing regions, thereby depicting the 4D scene from diverse viewpoints. However, this trajectory-to-trajectory formulation often entangles camera motion with scene dynamics and complicates both modeling and inference. We introduce See4D, a pose-free, trajectory-to-camera framework that replaces explicit trajectory prediction with rendering to a bank of fixed virtual cameras, thereby separating camera control from scene modeling. A view-conditional video inpainting model is trained to learn a robust geometry prior by denoising realistically synthesized warped images and to inpaint occluded or missing regions across virtual viewpoints, eliminating the need for explicit 3D annotations. Building on this inpainting core, we design a spatiotemporal autoregressive inference pipeline that traverses virtual-camera splines and extends videos with overlapping windows, enabling coherent generation at bounded per-step complexity. We validate See4D on cross-view video generation and sparse reconstruction benchmarks. Across quantitative metrics and qualitative assessments, our method achieves superior generalization and improved performance relative to pose- or trajectory-conditioned baselines, advancing practical 4D world modeling from casual videos.

2025-10-30T17:59:39Z Eurographics2026; 26 pages; 21 figures; 3 tables; project page: https://see-4d.github.io/ Dongyue Lu Ao Liang Tianxin Huang Xiao Fu Yuyang Zhao Baorui Ma Liang Pan Wei Yin Lingdong Kong Wei Tsang Ooi Ziwei Liu http://arxiv.org/abs/2512.21692v2 ShinyNeRF: Digitizing Anisotropic Appearance in Neural Radiance Fields 2026-03-12T09:40:14Z

Recent advances in digitization technologies have transformed the preservation and dissemination of cultural heritage. In this vein, Neural Radiance Fields (NeRF) have emerged as a leading technology for 3D digitization, delivering representations with exceptional realism. However, existing methods struggle to accurately model anisotropic specular surfaces, typically observed, for example, on brushed metals. In this work, we introduce ShinyNeRF, a novel framework capable of handling both isotropic and anisotropic reflections. Our method is capable of jointly estimating surface normals, tangents, specular concentration, and anisotropy magnitudes of an Anisotropic Spherical Gaussian (ASG) distribution, by learning an approximation of the outgoing radiance as an encoded mixture of isotropic von Mises-Fisher (vMF) distributions. Experimental results show that ShinyNeRF not only achieves state-of-the-art performance on digitizing anisotropic specular reflections, but also offers plausible physical interpretations and editing of material properties compared to existing methods.

2025-12-25T14:35:10Z Albert Barreiro Roger Marí Rafael Redondo Gloria Haro Carles Bosch http://arxiv.org/abs/2603.11573v1 High-Contrast Projection Mapping under Light Field Illumination with LED Display and Aperiodic Lens Array 2026-03-12T05:55:51Z

Projection Mapping (PM) is a technology that projects images onto the surfaces of physical objects, allowing multiple users to share an augmented reality experience without special devices. However, its practical use has been constrained by the need for dark environments to ensure high-quality projection. To overcome this ``dark-room constraint,'' we propose a novel target-excluding lighting method that selectively illuminates the surrounding environment while avoiding the PM target. Our system achieves light-field illumination by combining an LED display panel with an optimized aperiodic lens array. The key contributions include a compact form factor that provides a large effective light source area, reproducing natural soft shadows comparable to typical lighting, while maintaining the spatial controllability needed to precisely avoid the target. We also introduce a computational technique for optimizing aperiodic lens placement to suppress undesired dark spots caused by crosstalk, and efficient methods for computing LED luminance patterns that enable dynamic PM. Experiments with a prototype system demonstrate that our approach achieves high-contrast PM even in bright environments.

2026-03-12T05:55:51Z Kotaro Fujimura Hiroki Kusuyama Masaki Takeuchi Daisuke Iwai http://arxiv.org/abs/2505.24053v3 3DGEER: 3D Gaussian Rendering Made Exact and Efficient for Generic Cameras 2026-03-12T05:28:48Z

3D Gaussian Splatting (3DGS) achieves an appealing balance between rendering quality and efficiency, but relies on approximating 3D Gaussians as 2D projections--an assumption that degrades accuracy, especially under generic large field-of-view (FoV) cameras. Despite recent extensions, no prior work has simultaneously achieved both projective exactness and real-time efficiency for general cameras. We introduce 3DGEER, a geometrically exact and efficient Gaussian rendering framework. From first principles, we derive a closed-form expression for integrating Gaussian density along a ray, enabling precise forward rendering and differentiable optimization under arbitrary camera models. To retain efficiency, we propose the Particle Bounding Frustum (PBF), which provides tight ray-Gaussian association without BVH traversal, and the Bipolar Equiangular Projection (BEAP), which unifies FoV representations, accelerates association, and improves reconstruction quality. Experiments on both pinhole and fisheye datasets show that 3DGEER outperforms prior methods across all metrics, runs 5x faster than existing projective exact ray-based baselines, and generalizes to wider FoVs unseen during training--establishing a new state of the art in real-time radiance field rendering.

2025-05-29T22:52:51Z Published at ICLR 2026. Code is available at: https://github.com/boschresearch/3dgeer The Fourteenth International Conference on Learning Representations (ICLR 2026) Zixun Huang Cho-Ying Wu Yuliang Guo Xinyu Huang Liu Ren http://arxiv.org/abs/2603.11551v1 Shadowless Projection Mapping for Tabletop Workspaces with Synthetic Aperture Projector 2026-03-12T05:13:44Z

Projection mapping (PM) enables augmented reality (AR) experiences without requiring users to wear head-mounted displays and supports multi-user interaction. It is regarded as a promising technology for a variety of applications in which users interact with content superimposed onto augmented objects in tabletop workspaces, including remote collaboration, healthcare, industrial design, urban planning, artwork creation, and office work. However, conventional PM systems often suffer from projection shadows when users occlude the light path. Prior approaches employing multiple distributed projectors can compensate for occlusion, but suffer from latency due to computational processing, degrading the user experience. In this research, we introduce a synthetic-aperture PM system that uses a significantly larger number of projectors, arranged densely in the environment, to achieve delay-free, shadowless projection for tabletop workspaces without requiring computational compensation. To address spatial resolution degradation caused by subpixel misalignment among overlaid projections, we develop and validate an offline blur compensation method whose computation time remains independent of the number of projectors. Furthermore, we demonstrate that our shadowless PM plays a critical role in achieving a fundamental goal of PM: altering material properties without evoking projection-like impression. Specifically, we define this perceptual impression as ``sense of projection (SoP)'' and establish a PM design framework to minimize the SoP based on user studies.

2026-03-12T05:13:44Z Takahiro Okamoto Masaki Takeuchi Masataka Sawayama Daisuke Iwai http://arxiv.org/abs/2603.11047v1 LiTo: Surface Light Field Tokenization 2026-03-11T17:59:59Z

We propose a 3D latent representation that jointly models object geometry and view-dependent appearance. Most prior works focus on either reconstructing 3D geometry or predicting view-independent diffuse appearance, and thus struggle to capture realistic view-dependent effects. Our approach leverages that RGB-depth images provide samples of a surface light field. By encoding random subsamples of this surface light field into a compact set of latent vectors, our model learns to represent both geometry and appearance within a unified 3D latent space. This representation reproduces view-dependent effects such as specular highlights and Fresnel reflections under complex lighting. We further train a latent flow matching model on this representation to learn its distribution conditioned on a single input image, enabling the generation of 3D objects with appearances consistent with the lighting and materials in the input. Experiments show that our approach achieves higher visual quality and better input fidelity than existing methods.

2026-03-11T17:59:59Z ICLR 2026; Project page: https://apple.github.io/ml-lito/ Jen-Hao Rick Chang Xiaoming Zhao Dorian Chan Oncel Tuzel http://arxiv.org/abs/2603.10996v1 TreeON: Reconstructing 3D Tree Point Clouds from Orthophotos and Heightmaps 2026-03-11T17:21:12Z

We present TreeON, a novel neural-based framework for reconstructing detailed 3D tree point clouds from sparse top-down geodata, using only a single orthophoto and its corresponding Digital Surface Model (DSM). Our method introduces a new training supervision strategy that combines both geometric supervision and differentiable shadow and silhouette losses to learn point cloud representations of trees without requiring species labels, procedural rules, terrestrial reconstruction data, or ground laser scans. To address the lack of ground truth data, we generate a synthetic dataset of point clouds from procedurally modeled trees and train our network on it. Quantitative and qualitative experiments demonstrate better reconstruction quality and coverage compared to existing methods, as well as strong generalization to real-world data, producing visually appealing and structurally plausible tree point cloud representations suitable for integration into interactive digital 3D maps. The codebase, synthetic dataset, and pretrained model are publicly available at https://angelikigram.github.io/treeON/.

2026-03-11T17:21:12Z Angeliki Grammatikaki Johannes Eschner Pedro Hermosilla Oscar Argudo Manuela Waldner http://arxiv.org/abs/2504.14373v3 SEGA: Drivable 3D Gaussian Head Avatar from a Single Image 2026-03-11T15:23:42Z

Creating photorealistic 3D head avatars from limited input has become increasingly important for applications in virtual reality, telepresence, and digital entertainment. While recent advances like neural rendering and 3D Gaussian splatting have enabled high-quality digital human avatar creation and animation, most methods rely on multiple images or multi-view inputs, limiting their practicality for real-world use. In this paper, we propose SEGA, a novel approach for Single-imagE-based 3D drivable Gaussian head Avatar creation that combines generalized prior models with a new hierarchical UV-space Gaussian Splatting framework. SEGA seamlessly combines priors derived from large-scale 2D datasets with 3D priors learned from multi-view, multi-expression, and multi-ID data, achieving robust generalization to unseen identities while ensuring 3D consistency across novel viewpoints and expressions. We further present a hierarchical UV-space Gaussian Splatting framework that leverages FLAME-based structural priors and employs a dual-branch architecture to disentangle dynamic and static facial components effectively. The dynamic branch encodes expression-driven fine details, while the static branch focuses on expression-invariant regions, enabling efficient parameter inference and precomputation. This design maximizes the utility of limited 3D data and achieves real-time performance for animation and rendering. Additionally, SEGA performs person-specific fine-tuning to further enhance the fidelity and realism of the generated avatars. Experiments show our method outperforms state-of-the-art approaches in generalization ability, identity preservation, and expression realism, advancing one-shot avatar creation for practical applications.

2025-04-19T18:23:31Z Chen Guo Zhuo Su Liao Wang Jian Wang Shuang Li Xu Chang Zhaohu Li Yang Zhao Guidong Wang Yebin Liu Ruqi Huang http://arxiv.org/abs/2603.10606v1 TopGen: Learning Structural Layouts and Cross-Fields for Quadrilateral Mesh Generation 2026-03-11T10:09:04Z

High-quality quadrilateral mesh generation is a fundamental challenge in computer graphics. Traditional optimization-based methods are often constrained by the topological quality of input meshes and suffer from severe efficiency bottlenecks, frequently becoming computationally prohibitive when handling high-resolution models. While emerging learning-based approaches offer greater flexibility, they primarily focus on cross-field prediction, often resulting in the loss of critical structural layouts and a lack of editability. In this paper, we propose TopGen, a robust and efficient learning-based framework that mimics professional manual modeling workflows by simultaneously predicting structural layouts and cross-fields. By processing input triangular meshes through point cloud sampling and a shape encoder, TopGen is inherently robust to non-manifold geometries and low-quality initial topologies. We introduce a dual-query decoder using edge-based and face-based sampling points as queries to perform structural line classification and cross-field regression in parallel. This integrated approach explicitly extracts the geometric skeleton while concurrently capturing orientation fields. Such synergy ensures the preservation of geometric integrity and provides an intuitive, editable foundation for subsequent quadrilateral remeshing. To support this framework, we also introduce a large-scale quadrilateral mesh dataset, TopGen-220K, featuring high-quality paired data comprising raw triangular meshes, structural layouts, cross-fields, and their corresponding quad meshes. Experimental results demonstrate that TopGen significantly outperforms existing state-of-the-art methods in both geometric fidelity and topological edge flow rationality.

2026-03-11T10:09:04Z 14 pages, 9 figures Yuguang Chen Xinhai Liu Xiangyu Zhu Yiling Zhu Zhuo Chen Dongyu Zhang Chunchao Guo http://arxiv.org/abs/2603.10590v1 Exact Interpolation under Noise: A Reproducible Comparison of Clough-Tocher and Multiquadric RBF Surfaces 2026-03-11T09:46:30Z

This paper presents a reproducible comparison of cubic and radial basis function (RBF) interpolants for multivariate surface analysis. To eliminate evaluation bias, both methods are assessed under a unified slice-wise train/test protocol on the same synthetic function family. Performance is reported using RMSE, MAE, and $R^2$ in two regimes: (i) noise-free observations and (ii) noisy observations. In the noise-free regime, both interpolants achieve high accuracy with output-dependent advantages. In the noisy regime, exact interpolation overfits noisy nodes and degrades out-of-sample performance for both methods; in our experimental setting, the cubic interpolant is comparatively more stable. All experiments are fully reproducible through a single SciPy/NumPy-based script with a fixed random seed, repeated splits, and bootstrap-based uncertainty summaries. From an environmental engineering perspective, the main practical implication is that noisy or apparently inconsistent measurements in thermodynamic process systems should not be discarded by default; instead, they can be structured and interpolated to recover physically meaningful process behavior.

2026-03-11T09:46:30Z Mirkan Emir Sancak http://arxiv.org/abs/2504.08937v5 Rethinking Few-Shot Image Fusion: Granular Ball Priors Enable General-Purpose Deep Fusion 2026-03-11T09:38:28Z

In image fusion tasks, the absence of real fused images as supervision signals poses significant challenges for supervised learning. Existing deep learning methods typically address this issue either by designing handcrafted priors or by relying on large-scale datasets to learn model parameters. Different from previous approaches, this paper introduces the concept of incomplete priors, which formally describe handcrafted priors at the algorithmic level and estimate their confidence. Based on this idea, we couple incomplete priors with the neural network through a sample-level adaptive loss function, enabling the network to learn and re-infer fusion rules under conditions that approximate the real fusion process.To generate incomplete priors, we propose a Granular Ball Pixel Computation (GBPC) algorithm based on the principles of granular computing. The algorithm models fused-image pixels as information units, estimating pixel weights at a fine-grained level while statistically evaluating prior reliability at a coarse-grained level. This design enables the algorithm to perceive cross-modal discrepancies and perform adaptive inference.Experimental results demonstrate that even under few-shot conditions, a lightweight neural network can still learn effective fusion rules by training only on image patches extracted from ten image pairs. Extensive experiments across multiple fusion tasks and datasets further show that the proposed method achieves superior performance in both visual quality and model compactness. The code is available at: https://github.com/DMinjie/GBFF

2025-04-11T19:33:06Z Minjie Deng Yan Wei An Wu Yuncan Ouyang Hao Zhai Qianyao Peng http://arxiv.org/abs/2602.19474v2 Structured Bitmap-to-Mesh Triangulation for Geometry-Aware Discretization of Image-Derived Domains 2026-03-11T04:51:18Z

We propose a template-driven triangulation framework that embeds raster- or segmentation-derived boundaries into a regular triangular grid for stable PDE discretization on image-derived domains. Unlike constrained Delaunay triangulation (CDT), which may trigger global connectivity updates, our method retriangulates only triangles intersected by the boundary, preserves the base mesh, and supports synchronization-free parallel execution. To ensure determinism and scalability, we classify all local boundary-intersection configurations up to discrete equivalence and triangle symmetries, yielding a finite symbolic lookup table that maps each case to a conflict-free retriangulation template. We prove that the resulting mesh is closed, has bounded angles, and is compatible with cotangent-based discretizations and standard finite element methods. Experiments on elliptic and parabolic PDEs, signal interpolation, and structural metrics show fewer sliver elements, more regular triangles, and improved geometric fidelity near complex boundaries. The framework is well suited for real-time geometric analysis and physically based simulation over image-derived domains.

2026-02-23T03:36:55Z This version updates the Gmsh baseline configuration and comparative statistics, revises the downstream heat-diffusion comparison, expands the threshold-sensitivity study in the supplementary material, and corrects minor numerical values in the star-domain results without changing any conclusions. Code: https://github.com/monge-ampere/SBMT Wei Feng Haiyong Zheng 10.1016/j.gmod.2026.101326 http://arxiv.org/abs/2510.12192v2 SDGraph: Multi-Level Sketch Representation Learning by Sparse-Dense Graph Architecture 2026-03-11T02:28:11Z

Freehand sketches exhibit unique sparsity and abstraction, necessitating learning pipelines distinct from those designed for images. For sketch learning methods, the central objective is to fully exploit the effective information embedded in sketches. However, there is limited research on what constitutes effective sketch information, which in turn constrains the performance of existing approaches. To tackle this issue, we first proposed the Multi-Level Sketch Representation Scheme to identify the effective information. The scheme organizes sketch representation into three levels: sketch-level, stroke-level, and point-level. This design is based on the granularity of analytical elements, from coarse (sketch-level) to fine (point-level), thereby ensuring more comprehensive coverage of the sketch information. For each level, we conducted theoretical analyses and experimental evaluations to identify and validate the effective information. Building on the above studies, we developed SDGraph, a deep learning architecture designed to exploit the identified effective information across the three levels. SDGraph comprises two complementary modules: a Sparse Graph that treats strokes as nodes for sketch-level and stroke-level representation learning, and a Dense Graph that treats points as nodes for sketch-level and point-level representation learning. Both modules employ graph convolution along with down-sampling and up-sampling operations, enabling them to function as both encoder and decoder. Besides that, an information fusion module bridges the two graphs to further enhance feature extraction. SDGraph supports a wide range of sketch-related downstream tasks, achieving accuracy improvements of 1.15\% and 2.30\% over the state-of-the-art in classification and retrieval, respectively, and 32.93\% improvement in vector sketch generation quality.

2025-10-14T06:39:06Z Xi Cheng Pingfa Feng Mingyu Fan Zhichao Liao Hang Cheng Long Zeng http://arxiv.org/abs/2603.10337v1 Landmark Guided 4D Facial Expression Generation 2026-03-11T02:15:11Z

In this paper, we proposed a generative model that learns to synthesize the 4D facial expression with the neutral landmark. Existing works mainly focus on the generation of sequences guided by expression labels, speech, etc, while they are not robust to the change of different identities. Our LM-4DGAN utilizes neutral landmarks to guide the facial expression generation while adding an identity discriminator and a landmark autoencoder to the basic WGAN for achieving better identity robustness. Furthermore, we add a cross-attention mechanism to the existing displacement decoder which is suitable for the given identity.

2026-03-11T02:15:11Z Xin Lu Zhengda Lu Yiqun Wang Jun Xiao