https://arxiv.org/api/inf64mK8bDKvGRG3hkzsH9iL3gw 2026-06-17T15:39:54Z 9346 705 15 http://arxiv.org/abs/2505.23685v3 Perceptual Sensitivity to Stereo Geometry Errors in Head-Mounted Displays 2026-03-16T16:45:48Z

Stereoscopic head-mounted displays (HMDs) render and present binocular images to create an egocentric, 3D percept to the HMD user. Within this render and presentation pipeline there are potential rendering camera and viewing position errors that can induce deviations in the depth and distance that a user perceives compared to the underlying intended geometry. For example, rendering errors can arise when HMD render cameras are incorrectly positioned relative to the assumed centers of projections of the HMD displays and viewing errors can arise when users view stereo geometry from the incorrect location in the HMD eyebox. In this work we present a geometric framework that predicts errors in distance perception arising from inaccurate HMD perspective geometry and build an HMD platform to reliably simulate render and viewing error in a Quest 3 HMD with eye tracking to experimentally test these predictions. We present a series of five experiments to explore the efficacy of this geometric framework and show that errors in perspective geometry can induce both under- and over-estimations in perceived distance. We further demonstrate how real-time visual feedback can be used to dynamically recalibrate visuomotor mapping so that an accurate reach distance is achieved even if the perceived visual distance is negatively impacted by geometric error.

2025-05-29T17:24:38Z Raffles Xingqi Zhu Charlie S. Burlingham Olivier Mercier Phillip Guan http://arxiv.org/abs/2603.15447v1 A Texture Lookup Approach to Bézier Curve Evaluation on the GPU 2026-03-16T15:47:33Z

We present a texture-based technique for evaluating Bézier curves on the GPU that leverages fixed-function linear texture interpolation hardware. By offloading curve evaluation to the texture interpolator, this approach can improve performance in compute-bound GPU workloads. The method can also be used naturally for Bézier surfaces and volumes and extends to advanced curve types such as B-splines, NURBS, and both integral and rational polynomials. We show how Seiler interpolation fits into this framework to improve efficiency. We also compare performance and accuracy against curves evaluated as polynomials in shader code.

2026-03-16T15:47:33Z Muhammad Anas Alan Wolfe http://arxiv.org/abs/2510.03813v3 Diverse Text-to-Image Generation via Contrastive Noise Optimization 2026-03-16T13:07:55Z

Text-to-image (T2I) diffusion models have demonstrated impressive performance in generating high-fidelity images, largely enabled by text-guided inference. However, this advantage often comes with a critical drawback: limited diversity, as outputs tend to collapse into similar modes under strong text guidance. Existing approaches typically optimize intermediate latents or text conditions during inference, but these methods deliver only modest gains or remain sensitive to hyperparameter tuning. In this work, we introduce Contrastive Noise Optimization, a simple yet effective method that addresses the diversity issue from a distinct perspective. Unlike prior techniques that adapt intermediate latents, our approach shapes the initial noise to promote diverse outputs. Specifically, we develop a contrastive loss defined in the Tweedie data space and optimize a batch of noise latents. Our contrastive optimization repels instances within the batch to maximize diversity while keeping them anchored to a reference sample to preserve fidelity. We further provide theoretical insights into the mechanism of this preprocessing to substantiate its effectiveness. Extensive experiments across multiple T2I backbones demonstrate that our approach achieves a superior quality-diversity Pareto frontier while remaining robust to hyperparameter choices.

2025-10-04T13:51:32Z Accepted to ICLR 2026 Byungjun Kim Soobin Um Jong Chul Ye http://arxiv.org/abs/2603.14982v1 Adaptive GPU Kinetic Solver for Fluid-Granular Flows 2026-03-16T08:45:46Z

Simulating fluid-granular flows is crucial for understanding natural disasters, industrial processes, and visually realistic phenomena in computer graphics. These systems are challenging to simulate because of the strong nonlinear coupling between continuum fluids and discrete granular media, making it difficult to achieve both physical fidelity and computational efficiency at large scales. In this work, we present a unified framework for large-scale fluid-granular simulation that couples the Lattice Boltzmann Method (LBM) for fluids with the Material Point Method (MPM) for granular materials such as sand and snow. We introduce an adaptive block-based multi-level HOME-LBM solver based on solid geometric structures, enabling efficient memory usage and computational performance across multiple lattice resolutions. Consistent rescaling laws for moments allow accurate transfer of macroscopic quantities across refinement interfaces, while a GPU-based algorithm dynamically maintains the multi-level blocks in response to particle motion. By enforcing that all MPM particles reside within the finest fluid nodes, we achieve accurate two-way coupling between fluid and granular phases. Our framework supports a wide range of large-scale phenomena, including snow avalanches, sandstorms, and sand migration, demonstrating high physical fidelity and computational efficiency.

2026-03-16T08:45:46Z Xingqiao Li Kui Wu Haozhe Su Tianhong Gao Mengyu Chu Chenfanfu Jiang Wei Li Baoquan Chen http://arxiv.org/abs/2512.12459v2 From Particles to Fields: Reframing Photon Mapping with Continuous Gaussian Photon Fields 2026-03-16T02:56:53Z

Accurately modeling light transport is essential for realistic image synthesis. Photon mapping provides physically grounded estimates of complex global illumination effects such as caustics and specular-diffuse interactions, yet its per-view radiance estimation remains computationally inefficient when rendering multiple views of the same scene. The inefficiency arises from independent photon tracing and stochastic kernel estimation at each viewpoint, leading to inevitable redundant computation. To accelerate multi-view rendering, we reformulate photon mapping as a continuous and reusable radiance function. Specifically, we introduce the Gaussian Photon Field (GPF), a learnable representation that encodes photon distributions as anisotropic 3D Gaussian primitives parameterized by position, rotation, scale, and spectrum. GPF is initialized from physically traced photons in the first SPPM iteration and optimized using multi-view supervision of final radiance, distilling photon-based light transport into a continuous field. Once trained, the field enables differentiable radiance evaluation along camera rays without repeated photon tracing or iterative refinement. Extensive experiments on scenes with complex light transport, such as caustics and specular-diffuse interactions, demonstrate that GPF attains photon-level accuracy while reducing computation by orders of magnitude, unifying the physical rigor of photon-based rendering with the efficiency of neural scene representations.

2025-12-13T21:09:09Z Jiachen Tao Benjamin Planche Van Nguyen Nguyen Junyi Wu Yuchun Liu Haoxuan Wang Zhongpai Gao Gengyu Zhang Meng Zheng Feiran Wang Anwesa Choudhuri Zhenghao Zhao Weitai Kang Terrence Chen Yan Yan Ziyan Wu http://arxiv.org/abs/2603.14301v1 4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding 2026-03-15T09:32:58Z

Current 4D representations decouple geometry, motion, and semantics: reconstruction methods discard interpretable motion structure; language-grounded methods attach semantics after motion is learned, blind to how objects move; and motion-aware methods encode dynamics as opaque per-point residuals without object-level organization. We propose 4D Synchronized Fields, a 4D Gaussian representation that learns object-factored motion in-loop during reconstruction and synchronizes language to the resulting kinematics through a per-object conditioned field. Each Gaussian trajectory is decomposed into shared object motion plus an implicit residual, and a kinematic-conditioned ridge map predicts temporal semantic variation, yielding a single representation in which reconstruction, motion, and semantics are structurally coupled and enabling open-vocabulary temporal queries that retrieve both objects and moments. On HyperNeRF, 4D Synchronized Fields achieves 28.52 dB mean PSNR, the highest among all language-grounded and motion-aware baselines, within 1.5 dB of reconstruction-only methods. On targeted temporal-state retrieval, the kinematic-conditioned field attains 0.884 mean accuracy, 0.815 mean vIoU, and 0.733 mean tIoU, surpassing 4D LangSplat (0.620, 0.433, and 0.439 respectively) and LangSplat (0.415, 0.304, and 0.262). Ablation confirms that kinematic conditioning is the primary driver, accounting for +0.45 tIoU over a static-embedding-only baseline. 4D Synchronized Fields is the only method that jointly exposes interpretable motion primitives and temporally grounded language fields from a single trained representation. Code will be released.

2026-03-15T09:32:58Z 34 pages, 3 figures, 7 tables. Includes supplementary material. Preprint Mohamed Rayan Barhdadi Samir Abdaljalil Rasul Khanbayov Erchin Serpedin Hasan Kurban http://arxiv.org/abs/2512.12898v2 Towards High-Fidelity Gaussian Splatting with Queried-Convolution Neural Networks 2026-03-15T01:33:21Z

Gaussian Splatting has revolutionized the field of Novel View Synthesis (NVS) with faster training and real-time rendering. However, its reconstruction fidelity still trails behind the powerful radiance models such as Zip-NeRF. Motivated by our theoretical result that both queries (such as coordinates) and neighborhood are important to learn high-fidelity signals, this paper proposes Queried-Convolutions (Qonvolutions), a simple yet powerful modification using the neighborhood properties of convolution. Qonvolutions convolve a low-fidelity signal with queries to output residual and achieve high-fidelity reconstruction. We empirically demonstrate that combining Gaussian splatting with Qonvolution neural networks (QNNs) results in state-of-the-art NVS on real-world scenes, even outperforming Zip-NeRF on image fidelity. QNNs also enhance performance of 1D regression, 2D regression and 2D super-resolution tasks.

2025-12-15T00:46:09Z 38 pages, 8 figures, Project Page: https://abhi1kumar.github.io/qonvolution/ Abhinav Kumar Tristan Aumentado-Armstrong Lazar Valkov Gopal Sharma Alex Levinshtein Radek Grzeszczuk Suren Kumar http://arxiv.org/abs/2603.08079v2 M-ABD: Scalable, Efficient, and Robust Multi-Affine-Body Dynamics 2026-03-13T22:57:12Z

Simulating large-scale articulated assemblies poses a significant challenge due to the numerical stiffness and geometric complexity of jointed structures. Conventional rigid body solvers struggle with the high nonlinearity induced by rotation parameterization. This difficulty becomes more pronounced for multiple two-way-coupled bodies. This paper introduces a novel framework that leverages the linear kinematic mapping of Affine Body Dynamics (ABD). As ABD targets near-rigid objects, the constitutive variations of different materials become negligible, which justifies a co-rotational approach to isolate geometric nonlinearities of the system. This insight enables the use of constant system matrices that can be pre-factorized throughout the simulation, even with fully implicit integration schemes. To manage the high DOF counts of large-scale systems, we map primal body coordinates onto a compact dual space defined by minimal joint degrees of freedom. By solving the resulting KKT systems, our method ensures exact constraint enforcement and physically accurate motion propagation. We provide a suite of specialized solvers tailored for diverse joint topologies, including chains, trees, closed loops, and irregular networks. Experimental results show that our approach achieves interactive rates for systems with hundreds of thousands of bodies on a single CPU core, while maintaining excellent stability at large time steps.

2026-03-09T08:17:46Z 18 pages, 22 figures Zhiyong He University of Utah Dewen Guo University of Utah Minghao Guo MIT Yili Zhao USC Wojciech Matusik MIT Hao Su UCSD Chenfanfu Jiang UCLA Peter Yichen Chen UBC Yin Yang University of Utah http://arxiv.org/abs/2601.01050v2 EgoGrasp: World-Space Hand-Object Interaction Estimation from Egocentric Videos 2026-03-13T21:58:25Z

We propose EgoGrasp, the first method to reconstruct world-space hand-object interactions (W-HOI) from dynamic egoview videos, supporting open-vocabulary objects. Accurate W-HOI reconstruction is critical for embodied intelligence yet remains challenging. Existing HOI methods are largely restricted to local camera coordinates or single frames, failing to capture global temporal dynamics. While some recent approaches attempt world-space hand estimation, they overlook object poses and HOI constraints. Moreover, previous HOI estimation methods either fail to handle open-set categories due to their reliance on object templates or employ differentiable rendering that requires per-instance optimization, resulting in prohibitive computational costs. Finally, frequent occlusions in egocentric videos severely degrade performance. To overcome these challenges, we propose a multi-stage framework: (i) a robust pre-processing pipeline leveraging vision foundation models for initial 3D scene, hand and object reconstruction; (ii) a body-guided diffusion model that incorporates explicit egocentric body priors for hand pose estimation; and (iii) an HOI-prior-informed diffusion model for hand-aware 6DoF pose infilling, ensuring physically plausible and temporally consistent W-HOI estimation. We experimentally demonstrate that EgoGrasp can achieve state-of-the-art performance in W-HOI reconstruction, handling multiple and open vocabulary objects robustly.

2026-01-03T03:08:48Z Hongming Fu Wenjia Wang Xiaozhen Qiao Rolandos Alexandros Potamias Taku Komura Shuo Yang Zheng Liu Bo Zhao http://arxiv.org/abs/2412.16253v2 ExCellGen: Fast, Controllable, Photorealistic 3D Scene Generation from a Single Real-World Exemplar 2026-03-13T15:36:59Z

Photorealistic 3D scene generation is challenging due to the scarcity of large-scale, high-quality real-world 3D datasets and complex workflows requiring specialized expertise for manual modeling. These constraints often result in slow iteration cycles, where each modification demands substantial effort, ultimately stifling creativity. We propose a fast, exemplar-driven framework for generating 3D scenes from a single casual input, such as handheld video or drone footage. Our method first leverages 3D Gaussian Splatting (3DGS) to robustly reconstruct input scenes with a high-quality 3D appearance model. We then train a per-scene Generative Cellular Automaton (GCA) to produce a sparse volume of featurized voxels, effectively amortizing scene generation while enabling controllability. A subsequent patch-based remapping step composites the complete scene from the exemplar's initial 3D Gaussian splats, successfully recovering the appearance statistics of the input scene. The entire pipeline can be trained in less than 10 minutes for each exemplar and generates scenes in 0.5-2 seconds. Our method enables interactive creation with full user control, and we showcase complex 3D generation results from real-world exemplars within a self-contained interactive GUI.

2024-12-20T04:39:50Z Clément Jambon Changwoon Choi Dongsu Zhang Olga Sorkine-Hornung Young Min Kim http://arxiv.org/abs/2602.11638v3 Variation-aware Flexible 3D Gaussian Editing 2026-03-13T13:10:45Z

Indirect editing methods for 3D Gaussian Splatting (3DGS) have recently witnessed significant advancements. These approaches operate by first applying edits in the rendered 2D space and subsequently projecting the modifications back into 3D. However, this paradigm inevitably introduces cross-view inconsistencies and constrains both the flexibility and efficiency of the editing process. To address these challenges, we present VF-Editor, which enables native editing of Gaussian primitives by predicting attribute variations in a feedforward manner. To accurately and efficiently estimate these variations, we design a novel variation predictor distilled from 2D editing knowledge. The predictor encodes the input to generate a variation field and employs two learnable, parallel decoding functions to iteratively infer attribute changes for each 3D Gaussian. Thanks to its unified design, VF-Editor can seamlessly distill editing knowledge from diverse 2D editors and strategies into a single predictor, allowing for flexible and effective knowledge transfer into the 3D domain. Extensive experiments on both public and private datasets reveal the inherent limitations of indirect editing pipelines and validate the effectiveness and flexibility of our approach.

2026-02-12T06:43:04Z Hao Qin Yukai Sun Meng Wang Ming Kong Mengxu Lu Qiang Zhu http://arxiv.org/abs/2507.20205v5 HOI-Brain: a novel multi-channel transformers framework for brain disorder diagnosis by accurately extracting signed higher-order interactions from fMRI 2026-03-13T10:15:54Z

Accurately characterizing higher-order interactions of brain regions and extracting interpretable organizational patterns from Functional Magnetic Resonance Imaging data is crucial for brain disease diagnosis. Current graph-based deep learning models primarily focus on pairwise or triadic patterns while neglecting signed higher-order interactions, limiting comprehensive understanding of brain-wide communication. We propose HOI-Brain, a novel computational framework leveraging signed higher-order interactions and organizational patterns in fMRI data for brain disease diagnosis. First, we introduce a co-fluctuation measure based on Multiplication of Temporal Derivatives to detect higher-order interactions with temporal resolution. We then distinguish positive and negative synergistic interactions, encoding them in signed weighted simplicial complexes to reveal brain communication insights. Using Persistent Homology theory, we apply two filtration processes to these complexes to extract signed higher-dimensional neural organizations spatiotemporally. Finally, we propose a multi-channel brain Transformer to integrate heterogeneous topological features. Experiments on Alzheimer' s disease, Parkinson' s syndrome, and autism spectrum disorder datasets demonstrate our framework' s superiority, effectiveness, and interpretability. The identified key brain regions and higher-order patterns align with neuroscience literature, providing meaningful biological insights.

2025-07-27T10:05:30Z accepted by Medical Image Analysis Dengyi Zhao Zhiheng Zhou Guiying Yan Dongxiao Yu Xingqin Qi 10.1016/j.media.2026.104009 http://arxiv.org/abs/2512.13674v2 Towards Interactive Intelligence for Digital Humans 2026-03-13T09:23:03Z

We introduce Interactive Intelligence, a novel paradigm of digital human that is capable of personality-aligned expression, adaptive interaction, and self-evolution. To realize this, we present Mio (Multimodal Interactive Omni-Avatar), an end-to-end framework composed of five specialized modules: Thinker, Talker, Face Animator, Body Animator, and Renderer. This unified architecture integrates cognitive reasoning with real-time multimodal embodiment to enable fluid, consistent interaction. Furthermore, we establish a new benchmark to rigorously evaluate the capabilities of interactive intelligence. Extensive experiments demonstrate that our framework achieves superior performance compared to state-of-the-art methods across all evaluated dimensions. Together, these contributions move digital humans beyond superficial imitation toward intelligent interaction.

2025-12-15T18:57:35Z Yiyi Cai Xuangeng Chu Xiwei Gao Sitong Gong Yifei Huang Caixin Kang Kunhang Li Haiyang Liu Ruicong Liu Yun Liu Dianwen Ng Zixiong Su Erwin Wu Yuhan Wu Dingkun Yan Tianyu Yan Chang Zeng Bo Zheng You Zhou http://arxiv.org/abs/2603.12820v1 NeurFrame: Learning Continuous Frame Fields for Structured Mesh Generation 2026-03-13T09:20:34Z

Structured meshes, composed of quadrilateral elements in 2D and hexahedral elements in 3D, are widely used in industrial applications and engineering simulations due to their regularity and superior accuracy in finite element analysis. Generating high-quality structured meshes, however, remains challenging, especially for complex geometries and singularities. Field-guided approaches, which construct cross fields in 2D and frame fields in 3D to encode element orientation, are promising but are typically defined on discrete meshes, limiting continuity and computational efficiency. To address these challenges, we introduce \emph{NeurFrame}, a neural framework that represents frame fields continuously over the domain, supporting infinite-resolution evaluation. Trained in a self-supervised manner on discrete mesh samples, NeurFrame produces smooth, high-quality frame fields without relying on dense tetrahedral discretizations. The resulting fields simultaneously guide high-quality quadrilateral surface meshes and hexahedral volumetric meshes, with fewer and better-distributed singularities. By using a single network, NeurFrame also achieves lower computational cost compared to prior self-supervised neural methods that jointly optimize multiple fields.

2026-03-13T09:20:34Z Xiaoyang Yu Canjia Huang Zhonggui Chen Juan Cao http://arxiv.org/abs/2602.12740v2 SPRig: Self-Supervised Pose-Invariant Rigging from Mesh Sequences 2026-03-13T07:30:57Z

State-of-the-art rigging methods typically assume a predefined canonical rest pose. However, this assumption does not hold for dynamic mesh sequences such as DyMesh or DT4D, where no canonical T-pose is available. When applied independently frame-by-frame, existing methods lack pose invariance and often yield temporally inconsistent topologies. To address this limitation, we propose SPRig, a general fine-tuning framework that enforces cross-frame consistency across a sequence to learn pose-invariant rigs on top of existing models, covering both skeleton and skinning generation. For skeleton generation, we introduce novel consistency regularization in both token space and geometry space. For skinning, we improve temporal stability through an articulation-invariant consistency loss combined with consistency distillation and structural regularization. Extensive experiments show that SPRig achieves superior temporal coherence and significantly reduces artifacts in prior methods, without sacrificing and often even enhancing per-frame static generation quality. The code is available in the supplemental material and will be made publicly available upon publication.

2026-02-13T09:08:50Z Code: https://github.com/WANG-Ruipeng/SPRig Ruipeng Wang Langkun Zhong Miaowei Wang