https://arxiv.org/api/OOAOfXEqalxWBKCjGkQ7htlKeII 2026-06-22T08:01:34Z 9354 915 15 http://arxiv.org/abs/2602.05423v1 NeVStereo: A NeRF-Driven NVS-Stereo Architecture for High-Fidelity 3D Tasks 2026-02-05T08:15:06Z

In modern dense 3D reconstruction, feed-forward systems (e.g., VGGT, pi3) focus on end-to-end matching and geometry prediction but do not explicitly output the novel view synthesis (NVS). Neural rendering-based approaches offer high-fidelity NVS and detailed geometry from posed images, yet they typically assume fixed camera poses and can be sensitive to pose errors. As a result, it remains non-trivial to obtain a single framework that can offer accurate poses, reliable depth, high-quality rendering, and accurate 3D surfaces from casually captured views. We present NeVStereo, a NeRF-driven NVS-stereo architecture that aims to jointly deliver camera poses, multi-view depth, novel view synthesis, and surface reconstruction from multi-view RGB-only inputs. NeVStereo combines NeRF-based NVS for stereo-friendly renderings, confidence-guided multi-view depth estimation, NeRF-coupled bundle adjustment for pose refinement, and an iterative refinement stage that updates both depth and the radiance field to improve geometric consistency. This design mitigated the common NeRF-based issues such as surface stacking, artifacts, and pose-depth coupling. Across indoor, outdoor, tabletop, and aerial benchmarks, our experiments indicate that NeVStereo achieves consistently strong zero-shot performance, with up to 36% lower depth error, 10.4% improved pose accuracy, 4.5% higher NVS fidelity, and state-of-the-art mesh quality (F1 91.93%, Chamfer 4.35 mm) compared to existing prestigious methods.

2026-02-05T08:15:06Z Pengcheng Chen Yue Hu Wenhao Li Nicole M Gunderson Andrew Feng Zhenglong Sun Peter Beerel Eric J Seibel http://arxiv.org/abs/2602.05335v1 Boxplots and quartile plots for grouped and periodic angular data 2026-02-05T05:58:15Z

Angular observations, or observations lying on the unit circle, arise in many disciplines and require special care in their description, analysis, interpretation and visualization. We provide methods to construct concentric circular boxplot displays of distributions of groups of angular data. The use of concentric boxplots brings challenges of visual perception, so we set the boxwidths to be inversely proportional to the square root of their distance from the centre. A perception survey supports this scaled boxwidth choice. For a large number of groups, we propose circular quartile plots. A three-dimensional toroidal display is also implemented for periodic angular distributions. We illustrate our methods on datasets in (1) psychology, to display motor resonance under different conditions, (2) genomics, to understand the distribution of peak phases for ancillary clock genes, and (3) meteorology and wind turbine power generation, to study the changing and periodic distribution of wind direction over the course of a year.

2026-02-05T05:58:15Z 7 pages, 8 figures Joshua D. Berlinski Fan Dai Ranjan Maitra http://arxiv.org/abs/2602.05295v1 High-Performance Moment-Encoded Lattice Boltzmann Method with Stability-Guided Quantization 2026-02-05T04:38:06Z

In this work, we present a memory-efficient, high-performance GPU framework for moment-based lattice Boltzmann methods (LBM) with fluid-solid coupling. We introduce a split-kernel scheme that decouples fluid updates from solid boundary handling, substantially reducing warp divergence and improving utilization on GPUs. We further perform the first von Neumann stability analysis of the high-order moment-encoded LBM (HOME-LBM) formulation, characterizing its spectral behavior and deriving stability bounds for individual moment components. These theoretical insights directly guide a practical 16-bit moment quantization without compromising numerical stability. Our framework achieves up to 6x speedup and reduces GPU memory footprint by up to 50% in fluid-only scenarios and 25% in scenes with complex solid boundaries compared to the state-of-the-art LBM solver, while preserving physical fidelity across a range of large-scale benchmarks and real-time demonstrations. The proposed approach enables scalable, stable, and high-resolution LBM simulation on a single GPU, bridging theoretical stability analysis with practical GPU algorithm design.

2026-02-05T04:38:06Z Yixin Chen Wei Li David I. W. Levin Kui Wu http://arxiv.org/abs/2602.05190v1 PoseGaussian: Pose-Driven Novel View Synthesis for Robust 3D Human Reconstruction 2026-02-05T01:34:52Z

We propose PoseGaussian, a pose-guided Gaussian Splatting framework for high-fidelity human novel view synthesis. Human body pose serves a dual purpose in our design: as a structural prior, it is fused with a color encoder to refine depth estimation; as a temporal cue, it is processed by a dedicated pose encoder to enhance temporal consistency across frames. These components are integrated into a fully differentiable, end-to-end trainable pipeline. Unlike prior works that use pose only as a condition or for warping, PoseGaussian embeds pose signals into both geometric and temporal stages to improve robustness and generalization. It is specifically designed to address challenges inherent in dynamic human scenes, such as articulated motion and severe self-occlusion. Notably, our framework achieves real-time rendering at 100 FPS, maintaining the efficiency of standard Gaussian Splatting pipelines. We validate our approach on ZJU-MoCap, THuman2.0, and in-house datasets, demonstrating state-of-the-art performance in perceptual quality and structural accuracy (PSNR 30.86, SSIM 0.979, LPIPS 0.028).

2026-02-05T01:34:52Z Ju Shen Chen Chen Tam V. Nguyen Vijayan K. Asari http://arxiv.org/abs/2602.05081v1 Gabor Fields: Orientation-Selective Level-of-Detail for Volume Rendering 2026-02-04T21:58:03Z

Gaussian-based representations have enabled efficient physically-based volume rendering at a fraction of the memory cost of regular, discrete, voxel-based distributions. However, several remaining issues hamper their widespread use. One of the advantages of classic voxel grids is the ease of constructing hierarchical representations by either storing volumetric mipmaps or selectively pruning branches of an already hierarchical voxel grid. Such strategies reduce rendering time and eliminate aliasing when lower levels of detail are required. Constructing similar strategies for Gaussian-based volumes is not trivial. Straightforward solutions, such as prefiltering or computing mipmap-style representations, lead to increased memory requirements or expensive re-fitting of each level separately. Additionally, such solutions do not guarantee a smooth transition between different hierarchy levels. To address these limitations, we propose Gabor Fields, an orientation-selective mixture of Gabor kernels that enables continuous frequency filtering at no cost. The frequency content of the asset is reduced by selectively pruning primitives, directly benefiting rendering performance. Beyond filtering, we demonstrate that stochastically sampling from different frequencies and orientations at each ray recursion enables masking substantial portions of the volume, accelerating ray traversal time in single- and multiple-scattering settings. Furthermore, inspired by procedural volumes, we present an application for efficient design and rendering of procedural clouds as Gabor-noise-modulated Gaussians.

2026-02-04T21:58:03Z 19 pages, incl Appendix and References Jorge Condor Nicolai Hermann Mehmet Ata Yurtsever Piotr Didyk http://arxiv.org/abs/2602.02907v2 VoroUDF: Meshing Unsigned Distance Fields with Voronoi Optimization 2026-02-04T20:54:18Z

We present VoroUDF, an algorithm for reconstructing high-quality triangle meshes from Unsigned Distance Fields (UDFs). Our algorithm supports non-manifold geometry, sharp features, and open boundaries, without relying on error-prone inside/outside estimation, restrictive look-up tables nor topologically noisy optimization. Our Voronoi-based formulation combines a L_1 tangent minimization with feature-aware repulsion to robustly recover complex surface topology. It achieves significantly improved topological consistency and geometric fidelity compared to existing methods, while producing lightweight meshes suitable for downstream real-time and interactive applications.

2026-02-02T23:28:21Z Ningna Wang Zilong Wang Xiana Carrera Xiaohu Guo Silvia Sellán http://arxiv.org/abs/2602.05013v1 Untwisting RoPE: Frequency Control for Shared Attention in DiTs 2026-02-04T20:01:59Z

Positional encodings are essential to transformer-based generative models, yet their behavior in multimodal and attention-sharing settings is not fully understood. In this work, we present a principled analysis of Rotary Positional Embeddings (RoPE), showing that RoPE naturally decomposes into frequency components with distinct positional sensitivities. We demonstrate that this frequency structure explains why shared-attention mechanisms, where a target image is generated while attending to tokens from a reference image, can lead to reference copying, in which the model reproduces content from the reference instead of extracting only its stylistic cues. Our analysis reveals that the high-frequency components of RoPE dominate the attention computation, forcing queries to attend mainly to spatially aligned reference tokens and thereby inducing this unintended copying behavior. Building on these insights, we introduce a method for selectively modulating RoPE frequency bands so that attention reflects semantic similarity rather than strict positional alignment. Applied to modern transformer-based diffusion architectures, where all tokens share attention, this modulation restores stable and meaningful shared attention. As a result, it enables effective control over the degree of style transfer versus content copying, yielding a proper style-aligned generation process in which stylistic attributes are transferred without duplicating reference content.

2026-02-04T20:01:59Z Aryan Mikaeili Or Patashnik Andrea Tagliasacchi Daniel Cohen-Or Ali Mahdavi-Amiri http://arxiv.org/abs/2510.08394v2 Spectral Prefiltering of Neural Fields 2026-02-04T19:45:09Z

Neural fields excel at representing continuous visual signals but typically operate at a single, fixed resolution. We present a simple yet powerful method to optimize neural fields that can be prefiltered in a single forward pass. Key innovations and features include: (1) We perform convolutional filtering in the input domain by analytically scaling Fourier feature embeddings with the filter's frequency response. (2) This closed-form modulation generalizes beyond Gaussian filtering and supports other parametric filters (Box and Lanczos) that are unseen at training time. (3) We train the neural field using single-sample Monte Carlo estimates of the filtered signal. Our method is fast during both training and inference, and imposes no additional constraints on the network architecture. We show quantitative and qualitative improvements over existing methods for neural-field filtering.

2025-10-09T16:15:46Z 16 pages, 10 figures, Website: https://myaldiz.info/assets/spnf Proceedings of the SIGGRAPH Asia 2025 Conference Papers, Article No. 87, pp. 1-12, 2025 Mustafa B. Yaldiz Ishit Mehta Nithin Raghavan Andreas Meuleman Tzu-Mao Li Ravi Ramamoorthi 10.1145/3757377.3763901 http://arxiv.org/abs/2602.04814v1 X2HDR: HDR Image Generation in a Perceptually Uniform Space 2026-02-04T17:59:51Z

High-dynamic-range (HDR) formats and displays are becoming increasingly prevalent, yet state-of-the-art image generators (e.g., Stable Diffusion and FLUX) typically remain limited to low-dynamic-range (LDR) output due to the lack of large-scale HDR training data. In this work, we show that existing pretrained diffusion models can be easily adapted to HDR generation without retraining from scratch. A key challenge is that HDR images are natively represented in linear RGB, whose intensity and color statistics differ substantially from those of sRGB-encoded LDR images. This gap, however, can be effectively bridged by converting HDR inputs into perceptually uniform encodings (e.g., using PU21 or PQ). Empirically, we find that LDR-pretrained variational autoencoders (VAEs) reconstruct PU21-encoded HDR inputs with fidelity comparable to LDR data, whereas linear RGB inputs cause severe degradations. Motivated by this finding, we describe an efficient adaptation strategy that freezes the VAE and finetunes only the denoiser via low-rank adaptation in a perceptually uniform space. This results in a unified computational method that supports both text-to-HDR synthesis and single-image RAW-to-HDR reconstruction. Experiments demonstrate that our perceptually encoded adaptation consistently improves perceptual fidelity, text-image alignment, and effective dynamic range, relative to previous techniques.

2026-02-04T17:59:51Z Project page: https://x2hdr.github.io/, Code: https://github.com/X2HDR/X2HDR Ronghuan Wu Wanchao Su Kede Ma Jing Liao Rafał K. Mantiuk http://arxiv.org/abs/2602.04805v1 Skin Tokens: A Learned Compact Representation for Unified Autoregressive Rigging 2026-02-04T17:52:17Z

The rapid proliferation of generative 3D models has created a critical bottleneck in animation pipelines: rigging. Existing automated methods are fundamentally limited by their approach to skinning, treating it as an ill-posed, high-dimensional regression task that is inefficient to optimize and is typically decoupled from skeleton generation. We posit this is a representation problem and introduce SkinTokens: a learned, compact, and discrete representation for skinning weights. By leveraging an FSQ-CVAE to capture the intrinsic sparsity of skinning, we reframe the task from continuous regression to a more tractable token sequence prediction problem. This representation enables TokenRig, a unified autoregressive framework that models the entire rig as a single sequence of skeletal parameters and SkinTokens, learning the complicated dependencies between skeletons and skin deformations. The unified model is then amenable to a reinforcement learning stage, where tailored geometric and semantic rewards improve generalization to complex, out-of-distribution assets. Quantitatively, the SkinTokens representation leads to a 98%-133% percents improvement in skinning accuracy over state-of-the-art methods, while the full TokenRig framework, refined with RL, enhances bone prediction by 17%-22%. Our work presents a unified, generative approach to rigging that yields higher fidelity and robustness, offering a scalable solution to a long-standing challenge in 3D content creation.

2026-02-04T17:52:17Z 14 pages, 10 figures Jia-peng Zhang Cheng-Feng Pu Meng-Hao Guo Yan-Pei Cao Shi-Min Hu http://arxiv.org/abs/2410.17774v2 Quasi-Medial Distance Field (Q-MDF): A Robust Method for Approximating and Discretizing Neural Medial Axes 2026-02-04T15:16:48Z

The medial axis, a lower-dimensional descriptor that captures the extrinsic structure of a shape, plays an important role in digital geometry processing. Despite its importance, computing the medial axis transform robustly from diverse inputs, especially point clouds with defects, remains a challenging problem. In this paper, we propose a new implicit method that deviates from traditional explicit medial axis computation. Our key technical insight is that the difference between the signed distance field (SDF) and the medial field (MF) of a solid shape relates to the unsigned distance field (UDF) of the shape's medial axis. This observation allows us to formulate medial axis extraction as an implicit reconstruction problem. By employing a modified double covering strategy, we recover the medial axis as the zero level-set of the UDF. Extensive experiments demonstrate that our method achieves higher accuracy and robustness in learning compact medial axis transforms from challenging meshes and point clouds, outperforming existing approaches.

2024-10-23T11:23:05Z Jiayi Kong Chen Zong Jun Luo Shiqing Xin Fei Hou Hanqing Jiang Chen Qian Ying He http://arxiv.org/abs/2603.29587v1 Style-Instructed Mask-Free Virtual Try On 2026-02-04T11:32:29Z

Virtual Try-On is a promising research area with broad applications in e-commerce and everyday life, enabling users to visualize garments on themselves or others before purchase. Most existing methods depend on predefined or user-specified masks to guide garment placement, but their performance is highly sensitive to mask quality, often causing misalignment or artifacts, and introduces redundant steps for users. To overcome these limitations, we propose a mask-free virtual try-on framework that requires only minimal modifications to the underlying architecture while remaining compatible with common diffusion-based pipelines. To address the increased ambiguity in the absence of masks, we integrate an attention-based guidance mechanism that explicitly directs the model to focus on the target garment region and improves correspondence between the garment and the person. Additionally, we incorporate instruction prompts, allowing users to flexibly control garment categories and wearing styles, addressing the underutilization of prompts in prior work and improving interaction flexibility. Both qualitative and quantitative evaluations across multiple datasets demonstrate that our approach consistently outperforms existing methods, producing more accurate, robust, and user-friendly try-on results.

2026-02-04T11:32:29Z Project page: https://smf-vto.github.io Mengqi Zhang Qi Li Mehmet Saygin Seyfioglu Karim Bouyarmane http://arxiv.org/abs/2602.04292v1 Event-T2M: Event-level Conditioning for Complex Text-to-Motion Synthesis 2026-02-04T07:45:21Z

Text-to-motion generation has advanced with diffusion models, yet existing systems often collapse complex multi-action prompts into a single embedding, leading to omissions, reordering, or unnatural transitions. In this work, we shift perspective by introducing a principled definition of an event as the smallest semantically self-contained action or state change in a text prompt that can be temporally aligned with a motion segment. Building on this definition, we propose Event-T2M, a diffusion-based framework that decomposes prompts into events, encodes each with a motion-aware retrieval model, and integrates them through event-based cross-attention in Conformer blocks. Existing benchmarks mix simple and multi-event prompts, making it unclear whether models that succeed on single actions generalize to multi-action cases. To address this, we construct HumanML3D-E, the first benchmark stratified by event count. Experiments on HumanML3D, KIT-ML, and HumanML3D-E show that Event-T2M matches state-of-the-art baselines on standard tests while outperforming them as event complexity increases. Human studies validate the plausibility of our event definition, the reliability of HumanML3D-E, and the superiority of Event-T2M in generating multi-event motions that preserve order and naturalness close to ground-truth. These results establish event-level conditioning as a generalizable principle for advancing text-to-motion generation beyond single-action prompts.

2026-02-04T07:45:21Z 28 pages, 7 figures. Accepted to ICLR 2026 Seong-Eun Hong JaeYoung Seon JuYeong Hwang JongHwan Shin HyeongYeop Kang http://arxiv.org/abs/2602.04271v1 SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization 2026-02-04T07:00:44Z

4D generation has made remarkable progress in synthesizing dynamic 3D objects from input text, images, or videos. However, existing methods often represent motion as an implicit deformation field, which limits direct control and editability. To address this issue, we propose SkeletonGaussian, a novel framework for generating editable dynamic 3D Gaussians from monocular video input. Our approach introduces a hierarchical articulated representation that decomposes motion into sparse rigid motion explicitly driven by a skeleton and fine-grained non-rigid motion. Concretely, we extract a robust skeleton and drive rigid motion via linear blend skinning, followed by a hexplane-based refinement for non-rigid deformations, enhancing interpretability and editability. Experimental results demonstrate that SkeletonGaussian surpasses existing methods in generation quality while enabling intuitive motion editing, establishing a new paradigm for editable 4D generation. Project page: https://wusar.github.io/projects/skeletongaussian/

2026-02-04T07:00:44Z Accepted by CVM 2026. Project page: https://wusar.github.io/projects/skeletongaussian Lifan Wu Ruijie Zhu Yubo Ai Tianzhu Zhang http://arxiv.org/abs/2602.04174v1 GenMRP: A Generative Multi-Route Planning Framework for Efficient and Personalized Real-Time Industrial Navigation 2026-02-04T03:21:21Z

Existing industrial-scale navigation applications contend with massive road networks, typically employing two main categories of approaches for route planning. The first relies on precomputed road costs for optimal routing and heuristic algorithms for generating alternatives, while the second, generative methods, has recently gained significant attention. However, the former struggles with personalization and route diversity, while the latter fails to meet the efficiency requirements of large-scale real-time scenarios. To address these limitations, we propose GenMRP, a generative framework for multi-route planning. To ensure generation efficiency, GenMRP first introduces a skeleton-to-capillary approach that dynamically constructs a relevant sub-network significantly smaller than the full road network. Within this sub-network, routes are generated iteratively. The first iteration identifies the optimal route, while the subsequent ones generate alternatives that balance quality and diversity using the newly proposed correctional boosting approach. Each iteration incorporates road features, user historical sequences, and previously generated routes into a Link Cost Model to update road costs, followed by route generation using the Dijkstra algorithm. Extensive experiments show that GenMRP achieves state-of-the-art performance with high efficiency in both offline and online environments. To facilitate further research, we have publicly released the training and evaluation dataset. GenMRP has been fully deployed in a real-world navigation app, demonstrating its effectiveness and benefits.

2026-02-04T03:21:21Z Chengzhang Wang Chao Chen Jun Tao Tengfei Liu He Bai Song Wang Longfei Xu Kaikui Liu Xiangxiang Chu