https://arxiv.org/api/c/p8WVjOgmJvJoFeT+4Xdsu7dxY2026-06-09T22:25:46Z93013015http://arxiv.org/abs/2509.22685v2VIRTUS-FPP: Virtual Sensor Modeling for Fringe Projection Profilometry in NVIDIA Isaac Sim2026-06-05T10:41:26ZFringe projection profilometry (FPP) is a high-precision structured-light sensing technique for 3D surface reconstruction, yet its practical deployment is often constrained by complex calibration procedures, sensitivity to environmental conditions, and the high cost of physical experimentation. At the same time, robotics research increasingly relies on simulation platforms such as NVIDIA Isaac Sim for scalable development and validation, but accurate virtual representations of optical metrology sensors such as FPP are not currently available. In this work, we present VIRTUS-FPP, the first end-to-end virtual sensor modeling framework for fringe projection profilometry implemented in NVIDIA Isaac Sim, enabling physically grounded simulation of the complete FPP pipeline, including structured light projection, image formation, calibration, and 3D reconstruction, without dependence on pre-calibrated physical systems. The framework leverages an inverse camera model for projector representation, ensuring geometric and photometric fidelity consistent with structured-light principles. By bridging optical metrology and robotics simulation, VIRTUS-FPP enables high-fidelity synthetic data generation, systematic evaluation of sensing pipelines, and digital twin replication of real-world FPP systems. Experimental results demonstrate sub-millimeter reconstruction accuracy and strong correspondence between simulated and physical measurements, highlighting the framework's effectiveness and its potential to advance perception-driven robotics, simulation-to-reality transfer, and scalable optical sensor design.2025-09-18T00:21:15Z10 pages, 13 figures, accepted for publication in IEEE Sensors JournalAdam HaroonAnush LakshmanBadrinath BalasubramaniamBeiwen Li10.1109/JSEN.2026.3698278http://arxiv.org/abs/2606.07115v13DMorph: Single-Image-Guided Local 3D Shape Editing and Morphing2026-06-05T10:10:33ZDespite recent progress in 3D generation, intuitive editing of existing shapes remains limited. Unlike images, which benefit from well-established inpainting tools, general 3D objects such as meshes still lack simple and effective methods for local shape editing. Existing approaches are often global, domain-specific, require complex user interaction, or focus on appearance (color and texture) rather than geometry. We introduce 3DMorph, a training-free framework for single-image-guided local 3D shape editing and morphing. Given an edited image showing a desired shape modification, our method automatically localizes the relevant 3D region and transfers 2D modifications to 3D while preserving unmodified areas. 3DMorph also enables intermediate shape generation between the original and edited objects, facilitating design exploration. To benchmark editing quality, we introduce Delta3D, an image-guided local 3D editing benchmark with paired ground-truth edits. Experimental results show that 3DMorph translates intuitive 2D edits into 3D, outperforming state-of-the-art generative and editing methods.2026-06-05T10:10:33ZAccepted to IJCNN 2026Tobias PreintnerYunfei DengPhillip MüllerSebastian IllingAdrian KönigThomas BäckElena RaponiNiki van Steinhttp://arxiv.org/abs/2507.10924v2OffsetCrust: Variable-Radius Offset Approximation with Power Diagrams2026-06-05T07:26:17ZOffset surfaces, defined as the Minkowski sum of a base surface and a rolling ball, play a crucial role in geometry processing, with applications ranging from coverage motion planning to brush modeling. While considerable progress has been made in computing constant-radius offset surfaces, computing variable-radius offset surfaces remains a challenging problem. In this paper, we present OffsetCrust, a novel framework that efficiently addresses the variable-radius offsetting problem by computing a power diagram. Let $R$ denote the radius function defined on the base surface $S$. The power diagram is constructed from contributing sites, consisting of carefully sampled base points on $S$ and their corresponding off-surface points, displaced along $R$-dependent directions. In the constant-radius case only, these displacement directions align exactly with the surface normals of $S$. Moreover, our method mitigates the misalignment issues commonly seen in crust-based approaches through a lightweight fine-tuning procedure. We validate the accuracy and efficiency of OffsetCrust through extensive experiments, and demonstrate its practical utility in applications such as reconstructing original boundary surfaces from medial axis transform (MAT) representations.2025-07-15T02:32:38ZZihan ZhaoPengfei WangMinfeng XuShuangmin ChenShiqing XinChanghe TuWenping Wanghttp://arxiv.org/abs/2606.06685v1RigPAPR: Rig-Based Animation of Static Neural Point Clouds from a Fixed-Viewpoint Video2026-06-04T19:59:35ZStatic neural point reconstructions capture a subject at high fidelity from posed images. Given such a reconstruction, we aim to animate it to follow a monocular fixed-viewpoint driving video of the subject, whether captured or produced by image-to-video (I2V) generation, and to recover a rigged, re-posable 3D asset. Existing methods deform Gaussian splats through direct linear blend skinning (LBS) or mesh proxies, both of which are prone to joint-boundary artifacts under articulation, even with per-primitive corrections. We trace the artifact to the representation: each splat carries an individual shape calibrated in the canonical pose to tile with its neighbours. Under rigid LBS, each splat moves with its bone but cannot bend, so the canonical tiling breaks at joint boundaries into gaps and spikes. Proximity attention point rendering (PAPR) instead carries no per-primitive shape; each pixel is recomposed at render time from the deformed primitives' positions, so the surface re-forms naturally with the articulation. We present RigPAPR, which auto-rigs a static PAPR cloud and drives it under direct LBS from a single fixed-viewpoint video, without mesh proxy, pose-dependent correction, or category template. On synthetic subjects, RigPAPR matches the strongest baseline at the supervised view and exceeds mesh-based and Gaussian-splatting baselines at novel views by 3+dB PSNR, with cleaner joint-boundary renderings of both synthetic and real subjects.2026-06-04T19:59:35ZAn overview video is available at https://youtu.be/up3BwRHYWG8Shichong PengYanshu ZhangKe Lihttp://arxiv.org/abs/2606.06565v1AI Level of Detail: Distance-Aware ML Model Precision Selection for Real-Time Human Motion Prediction in Games2026-06-04T17:27:00ZModern game engines spend significant compute animating NPCs with learned motion models. This paper proposes AI Level of Detail (AI LOD), a framework in which machine learning inference precision is adapted based on the distance between each NPC and the player camera. The core idea mirrors classical geometry LOD: substitute a cheaper approximation where the difference is imperceptible. Here, the approximation is a lower-precision quantized machine learning model rather than a lower-polygon mesh.
The contribution of this work is the AI LOD concept itself: that inference-time quantization can serve as the LOD axis for AI-driven character animation - and more broadly, for any AI-based runtime system where perceptual sensitivity varies with context. The convolutional sequence-to-sequence model of Li et al. is used as a representative example to demonstrate the concept, with its trained checkpoint exported into three ONNX Runtime variants (FP32, FP16, and INT8 per-tensor), intended to be routed by a distance-based selector at runtime. Evaluation on the CMU Mocap dataset provides initial evidence that each precision tier can be served at its assigned distance range with negligible perceptible degradation, supporting the broader premise that distance-aware ML model precision selection is a viable LOD strategy for AI-based character animation.2026-06-04T17:27:00ZCamera-ready for SIGGRAPH Technical Workshops 2026Mathew Varghesehttp://arxiv.org/abs/2606.06199v1SC-MFJ: A Simple Haptic Quality Metric for Medical Image Segmentation2026-06-04T14:04:42ZStandard segmentation metrics such as Dice and Hausdorff distance measure geometric overlap but say nothing about whether a segmented surface is suitable for haptic rendering in surgical simulation. We propose SC-MFJ (Surface-Constrained Mean Force Jerk), a simple, inexpensive metric that samples a segmented organ surface with many short virtual stylus walks and measures how jerky the resulting contact forces are. The metric is computed from existing segmentation outputs and uses roughly one minute of CPU time per case. We evaluate three pancreas CT segmentation approaches-binary nnU-Net output, Gaussian-smoothed output, and learned signed distance function (SDF) regression-across 80 cases in five-fold cross-validation. SC-MFJ reveals a 147x gap in haptic quality between the raw binary baseline and simple Gaussian post-processing, a difference entirely invisible to Dice and HD95. It also shows that learned SDF regression, despite requiring full model retraining, produces more variable haptic quality than Gaussian smoothing, with a case-level standard deviation of 168 N/s2 compared with 22 N/s2 for Gaussian. A second evaluation on the LiTS liver dataset (131 cases) confirms the generality of these findings: the binary-to-Gaussian gap widens to 189x, and Gaussian smoothing again produces consistently low force jerk across all folds. Our results suggest that for haptic simulation applications, a one-line post-processing step may be sufficient, and that a cheap metric like SC-MFJ can flag problems that geometric metrics miss.2026-06-04T14:04:42Z11 pages, 5 figures, 5 tables, http://www.wscg.eu/Souraj AdhikaryNegar ChabiAndre Mastmeyerhttp://arxiv.org/abs/2606.06066v1FontFusion: Enhancing Generative Text in Diffusion Models with Typographic Conditioning2026-06-04T12:07:12ZTypography generation in diffusion models faces a persistent trade-off: enabling precise font control typically degrades text legibility, while maintaining readability often sacrifices typographic fidelity. We present FontFusion, a plug-and-play conditioning framework for Diffusion Transformer (DiT) architectures that resolves this dilemma through three core innovations: (1) a hierarchical token representation establishing explicit text-font relationships at multiple granularities, (2) position-aware embeddings creating spatial bindings between typography and image content, and (3) a multi-level token dropping strategy improving both computational efficiency and generalization to unseen fonts. Our systematic evaluation of font embedding spaces reveals that a dual encoder combining DeepFont and DINOv2 outperforms any single encoder for typography tasks. FontFusion demonstrates 76% relative improvement on challenging decorative fonts over single-encoder baselines and font consistency gains exceeding approximately 68-76% over unconditioned models, while integrating into existing DiT architectures without retraining.2026-06-04T12:07:12Z12 pages, 8 figures, accepted at ICANN 2026Marian LupascuNipun JindalIonut MironicaZhaowen Wanghttp://arxiv.org/abs/2606.05650v1GS-NFS: Bandwidth-adaptive Streaming of Dynamic Gaussian Splats and Point Clouds2026-06-04T03:27:56ZDynamic 3D Gaussian Splatting (3DGS) holds great promise as a 3D video streaming technology since it can represent complex 3D scenes with high fidelity. In this approach, every frame in a 3D video represents the environment as a collection of Gaussians with position and other attributes such as scale, rotation, opacity, and color. Frames capture fine details, permit views from any arbitrary perspective, but are an order of magnitude, or more, larger than 2D video frames. A line of recent work has explored how to compress dynamic 3DGS frames, but these approaches are often slow, in part because their compression techniques are not amenable to efficient acceleration. GS-NFS accelerates dynamic 3DGS compression and decompression on a GPU, to the point where it can encode and decode at full frame rate. It achieves this by developing novel GPU-based parallelizations of existing algorithms for encoding both positions and attributes of Gaussians. As a result, it is 1-2 orders of magnitude faster than the state-of-the-art in encoding and decoding a frame, while offering competitive compression performance and rendering quality.2026-06-04T03:27:56ZRajrup GhoshHaodong WangHaoran HongEduardo PavezAmartya ChaudhuriWeiwu PangHarsha V. MadhyasthaAntonio OrtegaRamesh Govindanhttp://arxiv.org/abs/2606.05624v1KV-Control: Parameter-Efficient K/V Injection for Trajectory-Controlled Text-to-Motion2026-06-04T02:50:20ZText-conditioned 3D human motion models now synthesize plausible motions from prompts, but practical animation and embodied-agent workflows rarely stop at text: a character may need to follow a sketched root path, hit an end-effector target, or satisfy a multi-joint trajectory while still preserving the gait, style, and intent described by language. This exposes a control trade-off. A trajectory controller should be precise without overwriting the pretrained text-conditioned motion prior, yet existing solutions either duplicate large portions of the generator to regain per-layer control access or move much of the cost to test-time optimization. We introduce KV-Control, a compact attention-side control interface for frozen masked text-to-motion transformers. The key idea is to make geometric constraints available as memory inside self-attention rather than injecting them through a global pose token or enforcing them only at the output side. To support this interface, we co-design a part-tokenized motion substrate and controller: \textbf{PartVQ} learns anatomy-aligned part codebooks, T-Concat exposes each frame--part token as an attention-addressable site, and KV-Control injects control-conditioned key/value memories at every self-attention layer while preserving the pretrained query stream, text cross-attention, FFN, and all backbone weights. The resulting adapter adds only trainable injection parameters atop a shared trajectory encoder, yet tracks root and multi-joint constraints with sub-centimeter accuracy under the inherited refinement protocol while retaining text-conditioned motion quality. KV-Control reframes trajectory conditioning as lightweight memory retrieval, providing a small, precise, and transparent control interface for text-to-motion generation.2026-06-04T02:50:20ZTengjiao SunPengcheng FangXiaoyu ZhanYanwen GuoDongjie FuXiaohao CaiHansung Kimhttp://arxiv.org/abs/2606.05581v1Monte Carlo Steklov Operators for Large-Scale Geometry Processing in the Wild2026-06-04T01:56:11ZIntrinsic methods fill the default toolbox for geometry processing on meshes. Intrinsic operators, in particular the Laplacian, underlie methods that require invariance to isometry and have hence been employed in many algorithms for shape analysis, learning, and editing. However, intrinsic methods are predicated on assumptions that quickly become brittle when working with in-the-wild geometry, where (i) mesh quality is not guaranteed, and (ii) many meshes are modeled with multiple connected components. In such settings, volumetric constructions are better-defined, since restrictions on surface topology can be relaxed. This paper presents a Monte Carlo method for estimating the Dirichlet-to-Neumann (DtN) operator -- a boundary-to-boundary volumetric operator -- and its associated Steklov eigenmodes. We build on recent developments in Monte Carlo geometry processing by casting this boundary operator itself as the subject of estimation. The DtN operator, defined through a volumetric stochastic process, is then generalized to the exterior domain, where it couples disconnected components through the surrounding ambient space. We show that our method is orders of magnitude faster than existing boundary-element approaches for computing Steklov spectra while remaining robust to poor triangulations, high-resolution meshes, and multi-component geometry. To demonstrate this scalability, we compute interior and exterior Steklov eigenspectra for approximately 450,000 shapes from the uncurated Objaverse dataset. We incorporate these operators into Steklov-CLIP, a mesh-based neural network that uses volumetric spectral operators for large-scale contrastive 3D representation learning. The resulting network learns semantically meaningful global and dense shape representations, illustrating that geometrically-principled volumetric operators can be made practical at the scale of modern 3D datasets.2026-06-04T01:56:11Z21 pagesArman MaesumiTanish MakadiaAruna AndersonOras PhongpanangamJustin SolomonDaniel Ritchiehttp://arxiv.org/abs/2602.00898v2Fast Sparse Matrix Permutation for Mesh-Based Direct Solvers2026-06-04T01:26:29ZWe present a fast sparse matrix permutation algorithm tailored to linear systems arising from triangle meshes. Our approach produces nested-dissection-style permutations while significantly reducing permutation runtime overhead. Rather than enforcing strict balance and separator optimality, the algorithm deliberately relaxes these design decisions to favor fast partitioning and efficient elimination-tree construction. Our method decomposes permutation into patch-level local orderings and a compact quotient-graph ordering of separators, preserving the essential structure required by sparse Cholesky factorization while avoiding its most expensive components. We integrate our algorithm into vendor-maintained sparse Cholesky solvers on both CPUs and GPUs. Across a range of graphics applications, including single factorizations and repeated factorizations, our method reduces permutation time and improves the sparse Cholesky solve performance by up to 6.27x. Our code is available at https://github.com/BehroozZare/fast-permute.2026-01-31T20:56:42ZSIGGRAPH 2026In Proceedings of the SIGGRAPH 2026 Conference Papers, SIGGRAPH Conference Papers '26, New York, NY, USA, July 2026Behrooz ZarebavamiAhmed H. MahmoudAna DodikChangcheng YuanSerban D. PorumbescuJohn D. OwensMaryam Mehri DehnaviJustin Solomon10.1145/3799902.3811189http://arxiv.org/abs/2509.00406v3Locality-Aware Automatic Differentiation on the GPU for Mesh-Based Computations2026-06-04T01:12:40ZWe present a GPU-based system for automatic differentiation (AD) of functions defined on triangle meshes, designed to exploit the locality and sparsity in mesh-based computation. Our system evaluates derivatives using per-element forward-mode AD, confining all computation to registers and shared memory and assembling global gradients, sparse Jacobians, and sparse Hessians directly on the GPU. By avoiding global computation graphs, intermediate buffers, and device-host synchronization, our approach minimizes memory traffic and enables efficient differentiation under both static and dynamically changing sparsity. Our programming model lets users express energy terms over mesh neighborhoods, while our system automatically manages parallel execution, derivative propagation, sparse assembly, and matrix-free operations such as Hessian-vector products. Our system supports both scalar- and vector-valued objectives, dynamic interaction-driven sparsity updates, and seamless integration with external GPU sparse linear solvers. We evaluate our system on applications including elastic and cloth simulation, surface parameterization, mesh smoothing, frame field design, ARAP deformation, and spherical manifold optimization. Across these tasks, our system consistently outperforms state-of-the-art differentiation frameworks, including PyTorch, JAX, Warp, DrJIT, EnzymeAD, and Thallo. We demonstrate speedups across a range of solver types, from Newton and Gauss-Newton for nonlinear least squares to L-BFGS and gradient descent, and across different derivative usage modes, including Hessian-vector products as well as full sparse Hessian and Jacobian construction. Our system is available as open source at https://github.com/owensgroup/RXMesh.2025-08-30T08:30:48ZSIGGRAPH 2026ACM Transactions on Graphics, 45(4), July 2026Ahmed H. MahmoudRahul GoelJonathan Ragan-KelleyJustin Solomon10.1145/3811338http://arxiv.org/abs/2606.05552v1Balancing Image Compression and Generation with Bootstrapped Tokenization2026-06-04T01:06:52ZDespite progress in image tokenization, standard methods encode redundant information by mixing all granularities within each token, thus redundancy persists between tokens. The mix of information of different granularity also complicates the training of generators. This paper introduces SelfBootTok, a method that resolves this by cleanly decomposing information into global and local token groups. Through self-bootstrapped learning, the model predicts local details exclusively from global tokens, shifting the burden of visual details from the generator to the tokenizer. Consequently, our generator is far more efficient, requiring only global tokens and reducing computation by approximately 40%, while delivering superior reconstruction and generation. Moreover, this paradigm scales elegantly: by leveraging more data or parameters to self-supervise local representation learning, SelfBootTok achieves a new state-of-the-art gFID score of 1.56 using only 64 tokens.2026-06-04T01:06:52ZHaozhe ChiJinghan LiHao JiangWu ShengYi MaJing WangYadong Muhttp://arxiv.org/abs/2604.14468v2Progressive Convex Hull Simplification2026-06-03T20:54:55ZConvex hulls are useful as tight bounding proxies for a variety of tasks including collision detection, ray intersection, and distance computation. Unfortunately, the complexity of polyhedral convex hulls grows linearly with their input. We consider the problem of conservatively simplifying a convex hull to a specified number of half-spaces while minimizing added volume or surface area. By working in the dual representation, we propose an efficient $O(n \log n)$ greedy optimization. In comparisons, we show that existing methods either exhibit poor efficiency, tightness or safety. We demonstrate the success of our method on a variety of input shapes and downstream application domains.2026-04-15T23:00:35Zaccepted to be presented at Symposium on Geometry Processing 2026Alec Jacobsonhttp://arxiv.org/abs/2606.05328v1The Invisible Hand of Physics: When Video Diffusion Models Know More Than They Show2026-06-03T18:11:51ZModern video diffusion models generate increasingly realistic and temporally coherent videos, motivating their use as candidate world simulators. Yet it remains unclear whether these models internally encode physical structure, or merely reproduce motion patterns seen during training. We study this question by probing video diffusion models along latent trajectories corresponding to real videos with known physical plausibility. To obtain such trajectories, we approximately invert the deterministic sampling process by integrating the learned velocity field backward from a clean video latent to noise, giving access to the model's intermediate states and attention maps. Using these recovered trajectories, we show that physical plausibility is linearly decodable from diffusion transformer states across IntPhys and InfLevel, reaching around 81.27% average accuracy and outperforming dedicated representation-learning baselines such as V-JEPA and VideoMAE. Surprisingly, this signal is absent from the VAE latent input and emerges inside the denoising transformer itself, despite the model not being trained with a self-supervised predictive objective. These findings suggest that physically meaningful representations can arise as a byproduct of generative denoising.2026-06-03T18:11:51ZParsa EsmatiSomjit NathKatja HofmannDerek NowrouzezahraiSamira Ebrahimi KahouMajid Mirmehdi