https://arxiv.org/api/G9GwyFNgHmKvwzhyNIHWZngFmv4 2026-03-24T11:29:25Z 8847 45 15 http://arxiv.org/abs/2603.14927v2 Masked BRep Autoencoder via Hierarchical Graph Transformer 2026-03-17T03:30:12Z We introduce a novel self-supervised learning framework that automatically learns representations from input computer-aided design (CAD) models for downstream tasks, including part classification, modeling segmentation, and machining feature recognition. To train our network, we construct a large-scale, unlabeled dataset of boundary representation (BRep) models. The success of our algorithm relies on two keycomponents. The first is a masked graph autoencoder that reconstructs randomly masked geometries and attributes of BReps for representation learning to enhance the generalization. The second is a hierarchical graph Transformer architecture that elegantly fuses global and local learning by a cross-scale mutual attention block to model long-range geometric dependencies and a graph neural network block to aggregate local topological information. After training the autoencoder, we replace its decoder with a task-specific network trained on a small amount of labeled data for downstream tasks. We conduct experiments on various tasks and achieve high performance, even with a small amount of labeled data, demonstrating the practicality and generalizability of our model. Compared to other methods, our model performs significantly better on downstream tasks with the same amount of training data, particularly when the training data is very limited. 2026-03-16T07:30:11Z 27 pages, 11 figures. Under review Yifei Li Kang Wu Wenming Wu Xiao-Ming Fu http://arxiv.org/abs/2603.16078v1 Volumetrically Consistent Implicit Atlas Learning via Neural Diffeomorphic Flow for Placenta MRI 2026-03-17T02:55:02Z Establishing dense volumetric correspondences across anatomical shapes is essential for group-level analysis but remains challenging for implicit neural representations. Most existing implicit registration methods rely on supervision near the zero-level set and thus capture only surface correspondences, leaving interior deformations under-constrained. We introduce a volumetrically consistent implicit model that couples reconstruction of signed distance functions (SDFs) with neural diffeomorphic flow to learn a shared canonical template of the placenta. Volumetric regularization, including Jacobian-determinant and biharmonic penalties, suppresses local folding and promotes globally coherent deformations. In the motivating application to placenta MRI, our formulation jointly reconstructs individual placentas, aligns them to a population-derived implicit template, and enables voxel-wise intensity mapping in a unified canonical space. Experiments on in-vivo placenta MRI scans demonstrate improved geometric fidelity and volumetric alignment over surface-based implicit baseline methods, yielding anatomically interpretable and topologically consistent flattening suitable for group analysis. 2026-03-17T02:55:02Z Athena Taymourtash S. Mazdak Abulnaga Esra Abaci Turk P. Ellen Grant Polina Golland http://arxiv.org/abs/2603.16057v1 Toward Reliable Scientific Visualization Pipeline Construction with Structure-Aware Retrieval-Augmented LLMs 2026-03-17T01:52:11Z Scientific visualization pipelines encode domain-specific procedural knowledge with strict execution dependencies, making their construction sensitive to missing stages, incorrect operator usage, or improper ordering. Thus, generating executable scientific visualization pipelines from natural-language descriptions remains challenging for large language models, particularly in web-based environments where visualization authoring relies on explicit code-level pipeline assembly. In this work, we investigate the reliability of LLM-based scientific visualization pipeline generation, focusing on vtk.js as a representative web-based visualization library. We propose a structure-aware retrieval-augmented generation workflow that provides pipeline-aligned vtk.js code examples as contextual guidance, supporting correct module selection, parameter configuration, and execution order. We evaluate the proposed workflow across multiple multi-stage scientific visualization tasks and LLMs, measuring reliability in terms of pipeline executability and human correction effort. To this end, we introduce correction cost as metric for the amount of manual intervention required to obtain a valid pipeline. Our results show that structured, domain-specific context substantially improves pipeline executability and reduces correction cost. We additionally provide an interactive analysis interface to support human-in-the-loop inspection and systematic evaluation of generated visualization pipelines. 2026-03-17T01:52:11Z Guanghui Zhao Zhe Wang Yu Dong Guan Li GuiHua Shan http://arxiv.org/abs/2603.15991v1 The Midas Touch in Gaze vs. Hand Pointing: Modality-Specific Failure Modes and Implications for XR Interfaces 2026-03-16T23:03:26Z Extended Reality (XR) interfaces impose both ergonomic and cognitive demands, yet current systems often force a binary choice between hand-based input, which can produce fatigue, and gaze-based input, which is vulnerable to the Midas Touch problem and precision limitations. We introduce the xr-adaptive-modality-2025 platform, a web-based open-source framework for studying whether modality-specific adaptive interventions can improve XR-relevant pointing performance and reduce workload relative to static unimodal interaction. The platform combines physiologically informed gaze simulation, an ISO 9241-9 multidirectional tapping task, and two modality-specific adaptive interventions: gaze declutter and hand target-width inflation. We evaluated the system in a 2 x 2 x 2 within-subjects design manipulating Modality (Hand vs. Gaze), UI Mode (Static vs. Adaptive), and Pressure (Yes vs. No). Results from N=69 participants show that hand yielded higher throughput than gaze (5.17 vs. 4.73 bits/s), lower error (1.8% vs. 19.1%), and lower NASA-TLX workload. Crucially, error profiles differed sharply by modality: gaze errors were predominantly slips (99.2%), whereas hand errors were predominantly misses (95.7%), consistent with the Midas Touch account. Of the two adaptive interventions, only gaze declutter executed in this dataset; it modestly reduced timeouts but not slips. Hand width inflation was not evaluable due to a UI integration bug. These findings reveal modality-specific failure modes with direct implications for adaptive policy design, and establish the platform as a reproducible infrastructure for future studies. 2026-03-16T23:03:26Z 25 pages, 10 figures Mohammad Dastgheib Fatemeh Pourmahdian http://arxiv.org/abs/2502.05175v2 Fillerbuster: Unified Generative Scene Completion Model for Casual Captures 2026-03-16T22:10:13Z We present Fillerbuster, a unified model that completes unknown regions of a 3D scene with a multi-view latent diffusion transformer. Casual captures are often sparse and miss surrounding content behind objects or above the scene. Existing methods are not suitable for this challenge as they focus on making known pixels look good with sparse-view priors, or on creating missing sides of objects from just one or two photos. In reality, we often have hundreds of input frames and want to complete areas that are missing and unobserved from the input frames. Our solution is to train a generative model that can consume a large context of input frames while generating unknown target views and recovering image poses when camera parameters are unknown. We show results where we complete partial captures on two existing datasets. We also present an uncalibrated scene completion task where our unified model predicts both poses and creates new content. We open-source our framework for integration into popular reconstruction platforms like Nerfstudio or Gsplat. We present a flexible, unified inpainting framework to predict many images and poses together, where all inputs are jointly inpainted, and it could be extended to predict more modalities such as depth. 2025-02-07T18:59:51Z Project page at https://ethanweber.me/fillerbuster/ Ethan Weber Norman Müller Yash Kant Vasu Agrawal Michael Zollhöfer Angjoo Kanazawa Christian Richardt http://arxiv.org/abs/2603.15796v1 Perceptual Requirements for Low-Latency Head-Mounted Displays 2026-03-16T18:26:55Z End-to-end (e2e) latency in head-mounted displays (HMD) is the time delay between a physical change in the world (e.g., a user's head movement) and the moment the display updates to reflect that change. Tracking, rendering, and other computation in real systems invariably introduce some amount of e2e latency to all HMDs. In modern devices this latency is usually in the range of 12-60 milliseconds which is partially addressed through pose prediction and late stage reprojection which means that perceptual studies and user experience evaluations cannot explore latencies below these values. Here, we introduce a video passthrough HMD, called Camsicle, which is capable of 2-millisecond e2e latency and, additionally, uses a catadioptric design to achieve perspective-correct passthrough without reprojection. This platform enables naturalistic user studies to interrogate the impacts of latency on user experience, preference, and performance. Across two user studies and 57 participants we find that 2 and 14.3 millisecond latencies are preferred over 23 and 29 milliseconds when attempting to catch a ball. Additionally, we compare individual latency preferences in this naturalistic ball-catching task to psychophysical thresholds for latency detection in a reference-grade system with zero latency to investigate how psychophysical thresholds may relate to subjective evaluations in naturalistic scenarios. 2026-03-16T18:26:55Z Eric Penner Josephine D'Angelo Clinton Smith Nathan Matsuda Neethan Siva Phillip Guan http://arxiv.org/abs/2603.15780v1 Parallelised Differentiable Straightest Geodesics for 3D Meshes 2026-03-16T18:10:28Z Machine learning has been progressively generalised to operate within non-Euclidean domains, but geometrically accurate methods for learning on surfaces are still falling behind. The lack of closed-form Riemannian operators, the non-differentiability of their discrete counterparts, and poor parallelisation capabilities have been the main obstacles to the development of the field on meshes. A principled framework to compute the exponential map on Riemannian surfaces discretised as meshes is straightest geodesics, which also allows to trace geodesics and parallel-transport vectors as a by-product. We provide a parallel GPU implementation and derive two different methods for differentiating through the straightest geodesics, one leveraging an extrinsic proxy function and one based upon a geodesic finite differences scheme. After proving our parallelisation performance and accuracy, we demonstrate how our differentiable exponential map can improve learning and optimisation pipelines on general geometries. In particular, to showcase the versatility of our method, we propose a new geodesic convolutional layer, a new flow matching method for learning on meshes, and a second-order optimiser that we apply to centroidal Voronoi tessellation. Our code, models, and pip-installable library (digeo) are available at: circle-group.github.io/research/DSG. 2026-03-16T18:10:28Z Accepted to CVPR 2026 Hippolyte Verninas Caner Korkmaz Stefanos Zafeiriou Tolga Birdal Simone Foti http://arxiv.org/abs/2502.17531v2 Laplace-Beltrami Operator for Gaussian Splatting 2026-03-16T18:02:34Z With the rising popularity of 3D Gaussian splatting and the expanse of applications from rendering to 3D reconstruction, there comes also a need for geometry processing applications directly on this new representation. While considering the centers of Gaussians as a point cloud or meshing them is an option that allows to apply existing algorithms, this might ignore information present in the data or be unnecessarily expensive. Additionally, Gaussian splatting tends to contain a large number of outliers which do not affect the rendering quality but need to be handled correctly in order not to produce noisy results in geometry processing applications. In this work, we propose a formulation to compute the Laplace-Beltrami operator, a widely used tool in geometry processing, directly on Gaussian splatting using the Mahalanobis distance. While conceptually similar to a point cloud Laplacian, our experiments show superior accuracy on the point clouds encoded in the Gaussian splatting centers and, additionally, the operator can be used to evaluate the quality of the output during optimization. 2025-02-24T14:29:33Z 10 pages Hongyu Zhou Zorah Lähner http://arxiv.org/abs/2603.15546v1 Kimodo: Scaling Controllable Human Motion Generation 2026-03-16T17:09:30Z High-quality human motion data is becoming increasingly important for applications in robotics, simulation, and entertainment. Recent generative models offer a potential data source, enabling human motion synthesis through intuitive inputs like text prompts or kinematic constraints on poses. However, the small scale of public mocap datasets has limited the motion quality, control accuracy, and generalization of these models. In this work, we introduce Kimodo, an expressive and controllable kinematic motion diffusion model trained on 700 hours of optical motion capture data. Our model generates high-quality motions while being easily controlled through text and a comprehensive suite of kinematic constraints including full-body keyframes, sparse joint positions/rotations, 2D waypoints, and dense 2D paths. This is enabled through a carefully designed motion representation and two-stage denoiser architecture that decomposes root and body prediction to minimize motion artifacts while allowing for flexible constraint conditioning. Experiments on the large-scale mocap dataset justify key design decisions and analyze how the scaling of dataset size and model size affect performance. 2026-03-16T17:09:30Z Project page: https://research.nvidia.com/labs/sil/projects/kimodo/ Davis Rempe Mathis Petrovich Ye Yuan Haotian Zhang Xue Bin Peng Yifeng Jiang Tingwu Wang Umar Iqbal David Minor Michael de Ruyter Jiefeng Li Chen Tessler Edy Lim Eugene Jeong Sam Wu Ehsan Hassani Michael Huang Jin-Bey Yu Chaeyeon Chung Lina Song Olivier Dionne Jan Kautz Simon Yuen Sanja Fidler http://arxiv.org/abs/2505.23685v3 Perceptual Sensitivity to Stereo Geometry Errors in Head-Mounted Displays 2026-03-16T16:45:48Z Stereoscopic head-mounted displays (HMDs) render and present binocular images to create an egocentric, 3D percept to the HMD user. Within this render and presentation pipeline there are potential rendering camera and viewing position errors that can induce deviations in the depth and distance that a user perceives compared to the underlying intended geometry. For example, rendering errors can arise when HMD render cameras are incorrectly positioned relative to the assumed centers of projections of the HMD displays and viewing errors can arise when users view stereo geometry from the incorrect location in the HMD eyebox. In this work we present a geometric framework that predicts errors in distance perception arising from inaccurate HMD perspective geometry and build an HMD platform to reliably simulate render and viewing error in a Quest 3 HMD with eye tracking to experimentally test these predictions. We present a series of five experiments to explore the efficacy of this geometric framework and show that errors in perspective geometry can induce both under- and over-estimations in perceived distance. We further demonstrate how real-time visual feedback can be used to dynamically recalibrate visuomotor mapping so that an accurate reach distance is achieved even if the perceived visual distance is negatively impacted by geometric error. 2025-05-29T17:24:38Z Raffles Xingqi Zhu Charlie S. Burlingham Olivier Mercier Phillip Guan http://arxiv.org/abs/2603.15447v1 A Texture Lookup Approach to Bézier Curve Evaluation on the GPU 2026-03-16T15:47:33Z We present a texture-based technique for evaluating Bézier curves on the GPU that leverages fixed-function linear texture interpolation hardware. By offloading curve evaluation to the texture interpolator, this approach can improve performance in compute-bound GPU workloads. The method can also be used naturally for Bézier surfaces and volumes and extends to advanced curve types such as B-splines, NURBS, and both integral and rational polynomials. We show how Seiler interpolation fits into this framework to improve efficiency. We also compare performance and accuracy against curves evaluated as polynomials in shader code. 2026-03-16T15:47:33Z Muhammad Anas Alan Wolfe http://arxiv.org/abs/2510.03813v3 Diverse Text-to-Image Generation via Contrastive Noise Optimization 2026-03-16T13:07:55Z Text-to-image (T2I) diffusion models have demonstrated impressive performance in generating high-fidelity images, largely enabled by text-guided inference. However, this advantage often comes with a critical drawback: limited diversity, as outputs tend to collapse into similar modes under strong text guidance. Existing approaches typically optimize intermediate latents or text conditions during inference, but these methods deliver only modest gains or remain sensitive to hyperparameter tuning. In this work, we introduce Contrastive Noise Optimization, a simple yet effective method that addresses the diversity issue from a distinct perspective. Unlike prior techniques that adapt intermediate latents, our approach shapes the initial noise to promote diverse outputs. Specifically, we develop a contrastive loss defined in the Tweedie data space and optimize a batch of noise latents. Our contrastive optimization repels instances within the batch to maximize diversity while keeping them anchored to a reference sample to preserve fidelity. We further provide theoretical insights into the mechanism of this preprocessing to substantiate its effectiveness. Extensive experiments across multiple T2I backbones demonstrate that our approach achieves a superior quality-diversity Pareto frontier while remaining robust to hyperparameter choices. 2025-10-04T13:51:32Z Accepted to ICLR 2026 Byungjun Kim Soobin Um Jong Chul Ye http://arxiv.org/abs/2603.14982v1 Adaptive GPU Kinetic Solver for Fluid-Granular Flows 2026-03-16T08:45:46Z Simulating fluid-granular flows is crucial for understanding natural disasters, industrial processes, and visually realistic phenomena in computer graphics. These systems are challenging to simulate because of the strong nonlinear coupling between continuum fluids and discrete granular media, making it difficult to achieve both physical fidelity and computational efficiency at large scales. In this work, we present a unified framework for large-scale fluid-granular simulation that couples the Lattice Boltzmann Method (LBM) for fluids with the Material Point Method (MPM) for granular materials such as sand and snow. We introduce an adaptive block-based multi-level HOME-LBM solver based on solid geometric structures, enabling efficient memory usage and computational performance across multiple lattice resolutions. Consistent rescaling laws for moments allow accurate transfer of macroscopic quantities across refinement interfaces, while a GPU-based algorithm dynamically maintains the multi-level blocks in response to particle motion. By enforcing that all MPM particles reside within the finest fluid nodes, we achieve accurate two-way coupling between fluid and granular phases. Our framework supports a wide range of large-scale phenomena, including snow avalanches, sandstorms, and sand migration, demonstrating high physical fidelity and computational efficiency. 2026-03-16T08:45:46Z Xingqiao Li Kui Wu Haozhe Su Tianhong Gao Mengyu Chu Chenfanfu Jiang Wei Li Baoquan Chen http://arxiv.org/abs/2512.12459v2 From Particles to Fields: Reframing Photon Mapping with Continuous Gaussian Photon Fields 2026-03-16T02:56:53Z Accurately modeling light transport is essential for realistic image synthesis. Photon mapping provides physically grounded estimates of complex global illumination effects such as caustics and specular-diffuse interactions, yet its per-view radiance estimation remains computationally inefficient when rendering multiple views of the same scene. The inefficiency arises from independent photon tracing and stochastic kernel estimation at each viewpoint, leading to inevitable redundant computation. To accelerate multi-view rendering, we reformulate photon mapping as a continuous and reusable radiance function. Specifically, we introduce the Gaussian Photon Field (GPF), a learnable representation that encodes photon distributions as anisotropic 3D Gaussian primitives parameterized by position, rotation, scale, and spectrum. GPF is initialized from physically traced photons in the first SPPM iteration and optimized using multi-view supervision of final radiance, distilling photon-based light transport into a continuous field. Once trained, the field enables differentiable radiance evaluation along camera rays without repeated photon tracing or iterative refinement. Extensive experiments on scenes with complex light transport, such as caustics and specular-diffuse interactions, demonstrate that GPF attains photon-level accuracy while reducing computation by orders of magnitude, unifying the physical rigor of photon-based rendering with the efficiency of neural scene representations. 2025-12-13T21:09:09Z Jiachen Tao Benjamin Planche Van Nguyen Nguyen Junyi Wu Yuchun Liu Haoxuan Wang Zhongpai Gao Gengyu Zhang Meng Zheng Feiran Wang Anwesa Choudhuri Zhenghao Zhao Weitai Kang Terrence Chen Yan Yan Ziyan Wu http://arxiv.org/abs/2603.14301v1 4D Synchronized Fields: Motion-Language Gaussian Splatting for Temporal Scene Understanding 2026-03-15T09:32:58Z Current 4D representations decouple geometry, motion, and semantics: reconstruction methods discard interpretable motion structure; language-grounded methods attach semantics after motion is learned, blind to how objects move; and motion-aware methods encode dynamics as opaque per-point residuals without object-level organization. We propose 4D Synchronized Fields, a 4D Gaussian representation that learns object-factored motion in-loop during reconstruction and synchronizes language to the resulting kinematics through a per-object conditioned field. Each Gaussian trajectory is decomposed into shared object motion plus an implicit residual, and a kinematic-conditioned ridge map predicts temporal semantic variation, yielding a single representation in which reconstruction, motion, and semantics are structurally coupled and enabling open-vocabulary temporal queries that retrieve both objects and moments. On HyperNeRF, 4D Synchronized Fields achieves 28.52 dB mean PSNR, the highest among all language-grounded and motion-aware baselines, within 1.5 dB of reconstruction-only methods. On targeted temporal-state retrieval, the kinematic-conditioned field attains 0.884 mean accuracy, 0.815 mean vIoU, and 0.733 mean tIoU, surpassing 4D LangSplat (0.620, 0.433, and 0.439 respectively) and LangSplat (0.415, 0.304, and 0.262). Ablation confirms that kinematic conditioning is the primary driver, accounting for +0.45 tIoU over a static-embedding-only baseline. 4D Synchronized Fields is the only method that jointly exposes interpretable motion primitives and temporally grounded language fields from a single trained representation. Code will be released. 2026-03-15T09:32:58Z 34 pages, 3 figures, 7 tables. Includes supplementary material. Preprint Mohamed Rayan Barhdadi Samir Abdaljalil Rasul Khanbayov Erchin Serpedin Hasan Kurban