https://arxiv.org/api/0itkwC+Ib82AA04ukFHFAw1sIVg 2026-06-14T15:44:14Z 9323 465 15 http://arxiv.org/abs/2604.15310v2 TokenLight: Precise Lighting Control in Images using Attribute Tokens 2026-04-17T17:17:33Z

This paper presents a method for image relighting that enables precise and continuous control over multiple illumination attributes in a photograph. We formulate relighting as a conditional image generation task and introduce attribute tokens to encode distinct lighting factors such as intensity, color, ambient illumination, diffuse level, and 3D light positions. The model is trained on a large-scale synthetic dataset with ground-truth lighting annotations, supplemented by a small set of real captures to enhance realism and generalization. We validate our approach across a variety of relighting tasks, including controlling in-scene lighting fixtures and editing environment illumination using virtual light sources, on synthetic and real images. Our method achieves state-of-the-art quantitative and qualitative performance compared to prior work. Remarkably, without explicit inverse rendering supervision, the model exhibits an inherent understanding of how light interacts with scene geometry, occlusion, and materials, yielding convincing lighting effects even in traditionally challenging scenarios such as placing lights within objects or relighting transparent materials plausibly. Project page: vrroom.github.io/tokenlight/

2026-04-16T17:59:50Z 32 pages, CVPR 2026, Project Page: https://vrroom.github.io/tokenlight/ Sumit Chaturvedi Yannick Hold-Geoffroy Mengwei Ren Jingyuan Liu He Zhang Yiqun Mei Julie Dorsey Zhixin Shu http://arxiv.org/abs/2604.16237v1 Ellipsography: Single-Shot Speckle-Free Holography via Vectorial Interference Shaping 2026-04-17T16:55:36Z

Holographic displays are widely regarded as the "ultimate" display technology, promising immersive 3D visuals with natural depth cues, continuous parallax, and perceptual realism. Realizing this potential, however, has remained elusive due to persistent image quality limitations -- most notably speckle noise, a byproduct of the random interference inherent to coherent light. This is typically further exacerbated by the hologram's phase randomness required for maintaining uniform energy distribution across the eyebox. While speckle suppression techniques like temporal multiplexing or smooth-phase heuristics exist, they often necessitate high-speed hardware and introduce visual artifacts, hindering their practical adoption. We introduce Ellipsography, a single-shot holography technique that achieves near-limit speckle suppression, reaching the image fidelity equivalent to averaging a million conventional scalar holograms -- in a single frame in simulation. By jointly modulating the phase and polarization of light, we structure optical interference and suppress speckle at its source. We present a full pipeline including a vectorial wave model, an end-to-end hologram synthesis algorithm, and a functional prototype display. Our experiments demonstrate substantial improvements in visual clarity, depth continuity, and focus cues over current state-of-the-art methods, achieving high-quality reconstructions approaching 30dB PSNR on a real holographic display for the first time -- a 10dB improvement over the best existing techniques. By pushing holographic reconstruction closer to the perceptual quality expected of modern displays, Ellipsography sets a new benchmark for practical, high-fidelity, speckle-free holography.

2026-04-17T16:55:36Z Anzhou Wen Praneeth Chakravarthula http://arxiv.org/abs/2604.14928v2 Hybrid Latents: Geometry-Appearance-Aware Surfel Splatting 2026-04-17T16:12:24Z

We introduce a hybrid Gaussian-hash-grid radiance representation for reconstructing 2D Gaussian scene models from multi-view images. Similar to NeST splatting, our approach reduces the entanglement between geometry and appearance common in NeRF-based models, but adds per-Gaussian latent features alongside hash-grid features to bias the optimizer toward a separation of low- and high-frequency scene components. This explicit frequency-based decomposition reduces the tendency of high-frequency texture to compensate for geometric errors. Encouraging Gaussians with hard opacity falloffs further strengthens the separation between geometry and appearance, improving both geometry reconstruction and rendering efficiency. Finally, probabilistic pruning combined with a sparsity-inducing BCE opacity loss allows redundant Gaussians to be turned off, yielding a minimal set of Gaussians sufficient to represent the scene. Using both synthetic and real-world datasets, we compare against the state of the art in Gaussian-based novel-view synthesis and demonstrate superior reconstruction fidelity with an order of magnitude fewer primitives.

2026-04-16T12:13:09Z 22 pages, 9 figures Neel Kelkar Simon Niedermayr Klaus Engel Rüdiger Westermann http://arxiv.org/abs/2604.15941v1 Neural Gabor Splatting: Enhanced Gaussian Splatting with Neural Gabor for High-frequency Surface Reconstruction 2026-04-17T11:00:19Z

Recent years have witnessed the rapid emergence of 3D Gaussian splatting (3DGS) as a powerful approach for 3D reconstruction and novel view synthesis. Its explicit representation with Gaussian primitives enables fast training, real-time rendering, and convenient post-processing such as editing and surface reconstruction. However, 3DGS suffers from a critical drawback: the number of primitives grows drastically for scenes with high-frequency appearance details, since each primitive can represent only a single color, requiring multiple primitives for every sharp color transition. To overcome this limitation, we propose neural Gabor splatting, which augments each Gaussian primitive with a lightweight multi-layer perceptron that models a wide range of color variations within a single primitive. To further control primitive numbers, we introduce a frequency-aware densification strategy that selects mismatch primitives for pruning and cloning based on frequency energy. Our method achieves accurate reconstruction of challenging high-frequency surfaces. We demonstrate its effectiveness through extensive experiments on both standard benchmarks, such as Mip-NeRF360 and High-Frequency datasets (e.g., checkered patterns), supported by comprehensive ablation studies.

2026-04-17T11:00:19Z Accepted to CVPR 2026 Haato Watanabe Nobuyuki Umetani http://arxiv.org/abs/2508.21675v3 Is this chart lying to me? Automating the detection of misleading visualizations 2026-04-17T09:52:37Z

Misleading visualizations are a potent driver of misinformation on social media and the web. By violating chart design principles, they distort data and lead readers to draw inaccurate conclusions. Prior work has shown that both humans and multimodal large language models (MLLMs) are frequently deceived by such visualizations. Automatically detecting misleading visualizations and identifying the specific design rules they violate could help protect readers and reduce the spread of misinformation. However, the training and evaluation of AI models has been limited by the absence of large, diverse, and openly available datasets. In this work, we introduce Misviz, a benchmark of 2,604 real-world visualizations annotated with 12 types of misleaders. To support model training, we also create Misviz-synth, a synthetic dataset of 57,665 visualizations generated using Matplotlib and based on real-world data tables. We perform a comprehensive evaluation on both datasets using state-of-the-art MLLMs, rule-based systems, and image-axis classifiers. Our results reveal that the task remains highly challenging. We release Misviz, Misviz-synth, and the accompanying code.

2025-08-29T14:36:45Z Camera-ready version accepted at ACL 2026 Main conference. Code and data available at: https://github.com/UKPLab/acl2026-misviz Jonathan Tonglet Jan Zimny Tinne Tuytelaars Iryna Gurevych http://arxiv.org/abs/2604.15513v1 Divide and Truncate: A Penetration and Inversion Free Framework for Coupled Multi-physics Systems 2026-04-16T20:48:24Z

We present Divide and Truncate (DAT), a unified framework for coupling multi-physics systems through penetration-free collision handling, including rigid bodies, volumetric soft bodies, thin shells, rods, and animated objects. By partitioning the ambient space into exclusive regions and truncating displacements to remain within them, DAT guarantees penetration-free contact resolution. Our \emph{Planar-DAT} variant further refines this by restricting only motion toward nearby surfaces, leaving tangential movement unconstrained, which addresses the artificial damping and deadlock problems of previous works. The framework is material-agnostic: each object responds to contact without knowledge of the opposing body's physics. Our method is also solver-agnostic; it can be integrated seamlessly with any iterative optimizer as a post-processing step, enabling robust simulation of complex multi-body interactions.

2026-04-16T20:48:24Z Anka He Chen Jerry Hsu Youssef Ayman Miles Macklin http://arxiv.org/abs/2410.01540v4 Edge-preserving noise for diffusion models 2026-04-16T13:08:59Z

Classical diffusion models typically rely on isotropic Gaussian noise, treating all regions uniformly and overlooking structural information important for high-quality generation. We introduce an edge-preserving diffusion process that generalizes isotropic models via a hybrid noise scheme with an edge-aware scheduler that smoothly transitions from edge-preserving to isotropic noise. This enables the model to capture fine structural details while generally maintaining global performance. We evaluate the impact of structure-aware noise in both diffusion and flow-matching frameworks, and show that existing isotropic models can be efficiently fine-tuned with edge-preserving noise, making our framework practical for adapting pre-trained systems. Beyond unconditional generation, our method particularly shows improvements in structure-guided tasks such as stroke-to-image synthesis, improving robustness and perceptual quality, as evidenced by consistent improvements across FID, KID, and CLIP-score.

2024-10-02T13:29:52Z Jente Vandersanden Sascha Holl Xingchang Huang Gurprit Singh http://arxiv.org/abs/2604.14025v1 Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective 2026-04-15T16:07:18Z

Reconstructing 3D representations from 2D inputs is a fundamental task in computer vision and graphics, serving as a cornerstone for understanding and interacting with the physical world. While traditional methods achieve high fidelity, they are limited by slow per-scene optimization or category-specific training, which hinders their practical deployment and scalability. Hence, generalizable feed-forward 3D reconstruction has witnessed rapid development in recent years. By learning a model that maps images directly to 3D representations in a single forward pass, these methods enable efficient reconstruction and robust cross-scene generalization. Our survey is motivated by a critical observation: despite the diverse geometric output representations, ranging from implicit fields to explicit primitives, existing feed-forward approaches share similar high-level architectural patterns, such as image feature extraction backbones, multi-view information fusion mechanisms, and geometry-aware design principles. Consequently, we abstract away from these representation differences and instead focus on model design, proposing a novel taxonomy centered on model design strategies that are agnostic to the output format. Our proposed taxonomy organizes the research directions into five key problems that drive recent research development: feature enhancement, geometry awareness, model efficiency, augmentation strategies and temporal-aware models. To support this taxonomy with empirical grounding and standardized evaluation, we further comprehensively review related benchmarks and datasets, and extensively discuss and categorize real-world applications based on feed-forward 3D models. Finally, we outline future directions to address open challenges such as scalability, evaluation standards, and world modeling.

2026-04-15T16:07:18Z 67 pages, 395 references. Project page: https://ff3d-survey.github.io. Code: https://github.com/ziplab/Awesome-Feed-Forward-3D. This work has been submitted to Springer for possible publication Weijie Wang Qihang Cao Sensen Gao Donny Y. Chen Haofei Xu Wenjing Bian Songyou Peng Tat-Jen Cham Chuanxia Zheng Andreas Geiger Jianfei Cai Jia-Wang Bian Bohan Zhuang http://arxiv.org/abs/2605.20209v1 NaP-Control: Navigating Diffusion Prior for Versatile and Fast Character Control 2026-04-15T14:51:32Z

Achieving precise, versatile whole-body character control in physics-based animation remains challenging. Recent diffusion-based policies generate rich and expressive motions but typically rely on gradient-based test-time guidance to satisfy task objectives, which is slow and can reduce robustness. We introduce NaP-Control (Navigating Diffusion Prior for Versatile and Fast Character Control), abbreviated as NaP. Our method uses reinforcement learning to manipulate the latent noise of a task-agnostic diffusion policy prior, steering it toward task-specific behaviors for fast, robust control with high motion fidelity. In contrast to methods that rely solely on offline training, NaP interacts with the environment during training to correct motions and optimize task rewards, improving success rates and enabling adaptation to challenging scenarios. By directly predicting task-optimized diffusion noise, NaP eliminates iterative guidance during denoising and enables efficient inference. Experiments show that NaP attains higher success rates and faster inference while preserving natural motion across diverse tasks.

2026-04-15T14:51:32Z Chia-Wen Chen Yan Wu Korrawe Karunratanakul Siyu Tang http://arxiv.org/abs/2604.25936v1 SAND: Spatially Adaptive Network Depth for Fast Sampling of Neural Implicit Surfaces 2026-04-15T14:14:17Z

Implicit neural representations are powerful for geometric modeling, but their practical use is often limited by the high computational cost of network evaluations. We observe that implicit representations require progressively lower accuracy as query points move farther from the target surface, and that even within the same iso-surface, representation difficulty varies spatially with local geometric complexity. However, conventional neural implicit models evaluate all query points with the same network depth and computational cost, ignoring this spatial variation and thereby incurring substantial computational waste. Motivated by this observation, we propose an efficient neural implicit geometry representation framework with spatially adaptive network depth (SAND). SAND leverages a volumetric network-depth map together with a tailed multi-layer perceptron (T-MLP) to model implicit representation. The volumetric depth map records, for each spatial region, the network depth required to achieve sufficient accuracy, while the T-MLP is a modified MLP designed to learn implicit functions such as signed distance functions, where an output branch, referred to as a tail, is attached to each hidden layer. This design allows network evaluation to terminate adaptively without traversing the full network and directs computational resources to geometrically important and complex regions, improving efficiency while preserving high-fidelity representations. Extensive experimental results demonstrate that our approach can significantly improve the inference-time query speed of implicit neural representations.

2026-04-15T14:14:17Z Chuanxiang Yang Junhui Hou Yuan Liu Siyu Ren Guangshun Wei Taku Komura Yuanfeng Zhou Wenping Wang http://arxiv.org/abs/2605.16295v1 ANVIL: Analogies and Videos for Lecturers 2026-04-15T12:12:14Z

We present ANVIL, a multimodal generative system that automates the production of analogy-based instructional animations for computer science topics. Given a concept definition, ANVIL generates a textual analogy, compiles it into a structured visual screenplay, and produces executable manim code to render an animation, with an automated repair mechanism to improve robustness. Evaluating such systems at scale requires balancing pedagogical validity with scalability. We begin with a teacher evaluation to ground the quality assessment and use its findings to guide automated screening. For textual analogies, we introduce an LLM-based evaluator for scalable quality screening; for videos, where subjective judgments are difficult to automate, we instead assess fidelity to the intended screenplay using an automated proxy for auditing and error analysis. We further conduct a user study with educators to examine adoption requirements and risks. Our findings suggest that ANVIL can produce materials that are frequently rated as adequate, and that educators respond positively to its perceived value and usability.

2026-04-15T12:12:14Z Yuri Noviello Anastasiia Birillo Gosia Migut http://arxiv.org/abs/2410.12331v2 Ellipsoidal Density-Equalizing Map for Genus-0 Closed Surfaces 2026-04-15T09:54:30Z

Surface parameterization is a fundamental task in geometry processing and plays an important role in many science and engineering applications. In recent years, the density-equalizing map, a shape deformation technique based on the physical principle of density diffusion, has been utilized for the parameterization of simply connected and multiply connected open surfaces. More recently, a spherical density-equalizing mapping method has been developed for the parameterization of genus-0 closed surfaces. However, for genus-0 closed surfaces with extreme geometry, using a spherical domain for the parameterization may induce large geometric distortion. In this work, we develop a novel method for computing density-equalizing maps of genus-0 closed surfaces onto an ellipsoidal domain. This allows us to achieve ellipsoidal area-preserving parameterizations and ellipsoidal parameterizations with controlled area change. We further propose an energy minimization approach that combines density-equalizing maps and quasi-conformal maps, which allows us to produce ellipsoidal density-equalizing quasi-conformal maps for achieving a balance between density-equalization and quasi-conformality. Using our proposed methods, we can significantly improve the performance of surface remeshing for genus-0 closed surfaces. Experimental results on a large variety of genus-0 closed surfaces are presented to demonstrate the effectiveness of our proposed methods.

2024-10-16T07:52:32Z Advances in Computational Mathematics, 52, 30 (2026) Zhiyuan Lyu Lok Ming Lui Gary P. T. Choi 10.1007/s10444-026-10304-9 http://arxiv.org/abs/2601.17740v2 Learning Sewing Patterns via Latent Flow Matching of Implicit Fields 2026-04-15T09:00:14Z

Sewing patterns define the structural foundation of garments and are essential for applications such as fashion design, fabrication, and physical simulation. Despite progress in automated pattern generation, accurately modeling sewing patterns remains difficult due to the broad variability in panel geometry and seam arrangements. In this work, we introduce a sewing pattern modeling method based on an implicit representation. We represent each panel using a signed distance field that defines its boundary and an unsigned distance field that identifies seam endpoints, and encode these fields into a continuous latent space that enables differentiable meshing. A latent flow matching model learns distributions over panel combinations in this representation, and a stitching prediction module recovers seam relations from extracted edge segments. This formulation allows accurate modeling and generation of sewing patterns with complex structures. We further show that it can be used to estimate sewing patterns from images with improved accuracy relative to existing approaches, and supports applications such as pattern completion and refitting, providing a practical tool for digital fashion design.

2026-01-25T08:18:39Z SIGGRAPH 2026 Cong Cao Ren Li Corentin Dumery Hao Li http://arxiv.org/abs/2604.13427v1 A Unified Conditional Flow for Motion Generation, Editing, and Intra-Structural Retargeting 2026-04-15T02:53:07Z

Text-driven motion editing and intra-structural retargeting, where source and target share topology but may differ in bone lengths, are traditionally handled by fragmented pipelines with incompatible inputs and representations: editing relies on specialized generative steering, while retargeting is deferred to geometric post-processing. We present a unifying perspective where both tasks are cast as instances of conditional transport within a single generative framework. By leveraging recent advances in flow matching, we demonstrate that editing and retargeting are fundamentally the same generative task, distinguished only by which conditioning signal, semantic or structural, is modulated during inference. We implement this vision via a rectified-flow motion model jointly conditioned on text prompts and target skeletal structures. Our architecture extends a DiT-style transformer with per-joint tokenization and explicit joint self-attention to strictly enforce kinematic dependencies, while a multi-condition classifier-free guidance strategy balances text adherence with skeletal conformity. Experiments on SnapMoGen and a multi-character Mixamo subset show that a single trained model supports text-to-motion generation, zero-shot editing, and zero-shot intra-structural retargeting. This unified approach simplifies deployment and improves structural consistency compared to task-specific baselines.

2026-04-15T02:53:07Z 11 pages, 7 figures Junlin Li Xinhao Song Siqi Wang Haibin Huang Yili Zhao http://arxiv.org/abs/2604.13340v1 MSGS: Multispectral 3D Gaussian Splatting 2026-04-14T23:03:38Z

We present a multispectral extension to 3D Gaussian Splatting (3DGS) for wavelength-aware view synthesis. Each Gaussian is augmented with spectral radiance, represented via per-band spherical harmonics, and optimized under a dual-loss supervision scheme combining RGB and multispectral signals. To improve rendering fidelity, we perform spectral-to-RGB conversion at the pixel level, allowing richer spectral cues to be retained during optimization. Our method is evaluated on both public and self-captured real-world datasets, demonstrating consistent improvements over the RGB-only 3DGS baseline in terms of image quality and spectral consistency. Notably, it excels in challenging scenes involving translucent materials and anisotropic reflections. The proposed approach maintains the compactness and real-time efficiency of 3DGS while laying the foundation for future integration with physically based shading models.

2026-04-14T23:03:38Z Published in IEEE ISMAR 2025 Adjunct Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR) Adjunct, 2025 Iris Zheng Guojun Tang Alexander Doronin Paul Teal Fang-Lue Zhang 10.1109/ISMAR-Adjunct68609.2025.00011