https://arxiv.org/api/UkHHbKL3fvjk13hIlTMJg+BEGxc 2026-06-13T16:14:37Z 9323 135 15 http://arxiv.org/abs/2605.20460v3 HyperBones: Realtime Bone-driven Neural Garment Simulation with Hypernetwork Conditioning 2026-05-28T13:34:23Z

Recent advances in garment simulation have brought high-quality results closer to real-time performance. Physics-based simulators can produce accurate motion, but remain too computationally expensive for interactive applications. In contrast, linear blend skinning is efficient, but cannot capture the complex dynamics of loose-fitting garments, often leading to unrealistic motion and visual artifacts. Neural methods offer a promising alternative, yet they still struggle to animate loose clothing plausibly under strict runtime constraints. We present a fast and physically plausible approach for dynamic garment simulation. Our method trains a reduced-space neural dynamics simulator composed of independent coarse- and fine-level components. At the coarse level, the garment is driven by a set of virtual bones integrated with a lightweight neural network. Fine-scale wrinkle details are then recovered using a trained convolutional neural map. By decoupling identity-specific computation from real-time neural integration, our architecture maintains high performance while supporting diverse body shapes and motions. We further introduce an effective physics-supervision scheme that enables accurate results without relying on an external simulator. Experiments show that our method produces physically plausible garment dynamics, generalizes across a range of motions and body shapes, and supports a fixed set of garments. Our simulator runs at 300+ FPS on a commodity GPU, making it suitable for real-time applications.

2026-05-19T20:13:54Z Astitva Srivastava Hsiao-Yu Chen Ryan Goldade Philipp Herholz Zhongshi Jiang Gene Wei-Chin Lin Lingchen Yang Nikolaos Sarafianos Tuur Stuyck Doug Roble Avinash Sharma Egor Larionov http://arxiv.org/abs/2605.29809v1 Cert-LAS: Toward Certified Model Ownership Verification for Text-to-Image Diffusion Models via Layer-Adaptive Smoothing 2026-05-28T11:54:52Z

Large-scale text-to-image (T2I) diffusion models have enabled unprecedented creative applications, but their unauthorized use has raised serious intellectual property concerns, making model ownership verification (MOV) increasingly critical. We find that existing backdoor-based diffusion watermarking methods often (implicitly) assume a "faithful" verification process, namely, that the verifier can query a suspicious model and obtain the faithful watermark response to complete MOV. However, in practice, adversaries may intentionally or unintentionally damage potential watermark signals, significantly degrading verification reliability. To address this issue, we propose Cert-LAS, the first certified MOV method for T2I models based on layer-adaptive smoothing. In general, Cert-LAS embeds specified watermarks using diffusion classifiers and an LFS-guided layer-adaptive noise, and verifies ownership by examining whether the suspected model exhibits significantly stronger watermark responses compared to unwatermarked references through hypothesis testing. We further prove that, under certain conditions, our Cert-LAS can still achieve reliable verification even in the presence of malicious removal attacks. Extensive experiments validate the effectiveness of Cert-LAS and its resistance to adaptive attacks. Our code is available at https://github.com/Leyi-Qi/Cert-LAS.

2026-05-28T11:54:52Z This paper has been accepted to the International Conference on Machine Learning (ICML) 2026. 26 pages Leyi Qi Yiming Li Siyuan Liang Zhengzhong Tu Dacheng Tao http://arxiv.org/abs/2605.28551v2 Resolution-free neural surrogates for geometric parameterization and mapping with spatially varying fields 2026-05-28T08:10:06Z

Many imaging problems require computing spatial transformations induced by spatially varying intensity, feature, or density fields. Canonical examples include distortion correction, deformable image registration, atlas-based segmentation, and deformation-driven image analysis. These tasks can be formulated as geometric mapping problems in which the transformation is constrained to preserve local structure, control boundary behavior, or regulate angular distortion. Such formulations typically lead to variational models, diffusion processes, or elliptic partial differential equations. However, repeatedly solving high-resolution systems becomes computationally expensive when the underlying parameter fields vary across instances. In this work, we propose a resolution-free neural surrogate for geometric parameterization and mapping problems. Given a spatially varying parameter field $p:Ω\to\mathbb{R}^m$ and query locations $\{x_i\}_{i=1}^N\subsetΩ$, the model predicts mapped locations $\{u(x_i)\}_{i=1}^N$ on arbitrary structured or unstructured point sets. To avoid dependence on a fixed grid, we use a multi-resolution geometric encoding strategy that conditions the network on coordinate-augmented samples of the parameter field. The model is trained without labeled solution data by enforcing geometry-aware constraints derived from variational energies, diffusion-based density equalization, and quasi-conformal theory. Experimental results on quasi-conformal mapping and density-equalizing mapping problems are presented to demonstrate the effectiveness of our proposed method.

2026-05-27T14:41:41Z Yanwen Huang Lok Ming Lui Gary P. T. Choi http://arxiv.org/abs/2512.03010v2 SurfFill: Completion of LiDAR Point Clouds via Gaussian Surfel Splatting 2026-05-28T06:51:18Z

LiDAR-captured point clouds are often considered the gold standard in active 3D reconstruction. While their accuracy is exceptional in flat regions, the capturing is susceptible to miss small geometric structures and may fail with dark, absorbent materials. Alternatively, capturing multiple photos of the scene and applying 3D photogrammetry can infer these details as they often represent feature-rich regions. However, the accuracy of LiDAR for featureless regions is rarely reached. Therefore, we suggest combining the strengths of LiDAR and camera-based capture by introducing SurfFill: a Gaussian surfel-based LiDAR completion scheme. We analyze LiDAR capturings and attribute LiDAR beam divergence as a main factor for artifacts, manifesting mostly at thin structures and edges. We use this insight to introduce an ambiguity heuristic for completed scans by evaluating the change in density in the point cloud. This allows us to identify points close to missed areas, which we can then use to grow additional points from to complete the scan. For this point growing, we constrain Gaussian surfel reconstruction to focus optimization and densification on these ambiguous areas. Finally, Gaussian primitives of the reconstruction in ambiguous areas are extracted and sampled for points to complete the point cloud. To address the challenges of large-scale reconstruction, we extend this pipeline with a divide-and-conquer scheme for building-sized point cloud completion. We evaluate on the task of LiDAR point cloud completion of synthetic and real-world scenes and find that our method outperforms previous reconstruction methods.

2025-12-02T18:35:54Z Project page: https://lfranke.github.io/surffill Svenja Strobel Matthias Innmann Bernhard Egger Marc Stamminger Linus Franke http://arxiv.org/abs/2605.29318v1 FreeForm: Reduced-Order Deformable Simulation from Particle-Based Skinning Eigenmodes 2026-05-28T03:49:06Z

We present a novel formulation for mesh-free, reduced-order simulation of deformable hyperelastic objects. Existing work in reduced-order elastodynamic simulation represents the input geometry by either meshes, which can be difficult to obtain due to challenges in scanning and triangulating complex shapes, or by neural fields that require per-shape optimization. We propose to adopt a Reproducing Kernel Particle Method (RKPM) representation, which enables the construction of reduced-order skinning weights by solving a generalized eigensystem on the Hessian matrix of the elastic energy. We demonstrate that this formulation not only leads to a 40x training speedup compared with the per-shape optimization of neural fields, but also achieves lower simulation error when evaluated against the converged results of finite element method. We show our simulation results on a wide variety of objects in different representations including meshes and Gaussian splats, as well as the application of our method in the downstream task of robot simulation.

2026-05-28T03:49:06Z CVPR 2026, project website: https://research.nvidia.com/labs/sil/projects/freeform/ Donglai Xiang Vismay Modi Rishit Dagli Ty Trusty Gilles Daviet Anka He Chen Nicholas Sharp David I. W. Levin http://arxiv.org/abs/2605.29004v1 Auditing Training-Free 3D Shape Retrieval with Diffused Geodesic Moments 2026-05-27T19:00:41Z

Reported retrieval scores for training-free shape descriptors conflate local signal design, normalization, aggregation, codebook fitting, and metric choices, making isolated component evaluation difficult. This paper reframes descriptor evaluation as a {\em protocol audit}. We introduce Diffused Geodesic Moments (DGM), a seed-conditioned descriptor that computes sparse implicit heat responses, converts them to distance-like fields, and summarizes each vertex by low-order moments across seeds and scales. DGM is used both as a practical non-spectral baseline and as an instrument for isolating protocol effects. On the registered FAUST benchmark split (FAUST-Reg) and the TOSCA shape collection, aggregation-matched experiments show that an independent Geometric Moment Shape Descriptor baseline built on Heat Kernel Signature features (GMSD-HKS) obtains the highest scores in this implementation ($0.621/0.820$ and $0.865/0.963$ mean average precision (mAP)/top-1), Wave Kernel Signature (WKS) remains a strong classical signal, and DGM is useful mainly when sparse solves, non-spectral deployment, or symmetry-informative seed frames are priorities. The broader finding is methodological: the input field and aggregation protocol can dominate the moment formula. The paper contributes a reproducible protocol-cascade analysis, a cross-shape alignment diagnostic for functional-map compatibility, and concrete recommendations for designing and reporting training-free shape descriptors.

2026-05-27T19:00:41Z Zhicheng Du Changyue Liu Wenji Xi Zhaotian Xie Zhuo Deng Ziheng Zhang Yang Liu Lan Ma http://arxiv.org/abs/2506.11483v4 Capsule: Efficient Player Isolation for Datacenters 2026-05-27T17:57:47Z

We introduce Capsule, a mechanism for seamlessly sharing datacenter resources across multiple players. It decouples player-local and global states to achieve isolation and to maximize cross-player sharing. Our evaluations show that Capsule increases datacenter resource utilization by accommodating up to 2.25x more players without degrading the user experience. This improvement stems from Capsule consuming up to 1.43x less GPU, 3.11x less VRAM, 3.7x less CPU, and 3.87x less RAM compared to the baseline. We evaluated Capsule across four applications and various hardware configurations, including three distinct servers and a multi-server cluster. These results demonstrate that the Capsule design is portable to other game engines.

2025-06-13T06:12:31Z 4 main pages, 6 more appendix pages, 8 figures; an extended version of EUROGRAPHICS 2026 short paper Zhouheng Du Nima Davari Li Li Wei Sen Loi Nodir Kodirov 10.2312/egs.20261014 http://arxiv.org/abs/2601.10714v2 Alterbute: Editing Intrinsic Attributes of Objects in Images 2026-05-27T15:30:42Z

We introduce Alterbute, a diffusion-based method for editing an object's intrinsic attributes in an image. We allow changing color, texture, material, and even the shape of an object, while preserving its perceived identity and scene context. Existing approaches either rely on unsupervised priors that often fail to preserve identity or use overly restrictive supervision that prevents meaningful intrinsic variations. Our method relies on: (i) a relaxed training objective that allows the model to change both intrinsic and extrinsic attributes conditioned on an identity reference image, a textual prompt describing the target intrinsic attributes, and a background image and object mask defining the extrinsic context. At inference, we restrict extrinsic changes by reusing the original background and object mask, thereby ensuring that only the desired intrinsic attributes are altered; (ii) Visual Named Entities (VNEs) - fine-grained visual identity categories (e.g., ''Porsche 911 Carrera'') that group objects sharing identity-defining features while allowing variation in intrinsic attributes. We use a vision-language model to automatically extract VNE labels and intrinsic attribute descriptions from a large public image dataset, enabling scalable, identity-preserving supervision. Alterbute outperforms existing methods on identity-preserving object intrinsic attribute editing.

2026-01-15T18:59:53Z ICML 2026. Project page is available at https://talreiss.github.io/alterbute/ Tal Reiss Daniel Winter Matan Cohen Alex Rav-Acha Yael Pritch Ariel Shamir Yedid Hoshen http://arxiv.org/abs/2605.28394v1 Sketch2Motion: Text-driven 2D Sketch to 3D Animation via Diffusion-guided Skeleton Optimization 2026-05-27T12:32:07Z

Animation of 2D hand-drawn sketches provides an effective medium for visual communication. However, these sketches pose challenges, particularly in handling occlusions and accurately mapping motion. While 3D animation naturally addresses these challenges, estimating 3D motion remains a very complex task. Recent approaches to converting 2D sketches to 3D animations have mainly focused on specific types of motion, such as bipedal movements and facial expressions. We propose Sketch2Motion, a diffusion-guided framework for skeleton-based motion synthesis that combines classical character animation pipelines with deep generative priors. Our method represents motion using skeletal transformations, which are propagated to mesh deformations via linear blend skinning. To guide the resulting animation toward realistic and semantically meaningful motion, we integrate a text-to-video diffusion model via motion-aware score-distillation sampling (MoSDS), enabling optimization without paired motion data. Additionally, we apply physics-inspired smoothness, topological, and contact constraints to stabilize optimization and preserve motion plausibility. Further, we integrate a spring-mass simulator to introduce secondary motion effects. The proposed framework is generalized, fully differentiable, modular, and compatible with biped, quadruped, and non-living articulated characters. Experiments demonstrate that our approach produces temporally coherent, text-aligned animations that outperform baseline motion transfer methods that lack generative priors or explicit physical constraints. We will make our code and dataset publicly available.

2026-05-27T12:32:07Z Gaurav Rai Ojaswa Sharma http://arxiv.org/abs/2501.04144v3 Chirpy3D: Part-Aware Multi-View Diffusion for Creative Fine-Grained Object Generation 2026-05-27T12:23:41Z

Understanding and generating the fine-grained structure of objects -- such as birds with species-specific beaks, wings, and tails -- is a long-standing challenge in computer vision. We propose Chirpy3D, a part-aware multi-view diffusion framework that learns a hierarchical part latent space from unposed 2D images, using only off-the-shelf 2D part segmentation masks as spatial guidance -- without requiring any 3D data, camera poses, or manual part annotations. This latent space enables intuitive part-level swapping, interpolation, and zero-shot composition. A self-supervised feature consistency loss further encourages structural alignment across views, allowing coherent generation even with hybrid or unseen part combinations. Our core contribution is the controllable part-aware latent space and multi-view diffusion model. Downstream 3D generation is supported via any differentiable renderer such as NeRF but is orthogonal to the main framework, making Chirpy3D a flexible foundation for creative object generation in the absence of structured 3D data. Code is released at https://github.com/kamwoh/chirpy3d.

2025-01-07T21:14:11Z 20 pages. Code at https://github.com/kamwoh/chirpy3d Kam Woh Ng Jing Yang Jia Wei Sii Chee Seng Chan Jiankang Deng Yi-Zhe Song Tao Xiang Xiatian Zhu http://arxiv.org/abs/2510.21890v2 The Principles of Diffusion Models 2026-05-27T08:48:11Z

This book presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by defining a forward process that gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions. The goal is to learn a reverse process that transforms noise back into data while recovering the same intermediates. We describe three complementary views. The variational view, inspired by variational autoencoders, sees diffusion as learning to remove noise step by step. The score-based view, rooted in energy-based modeling, learns the gradient of the evolving data distribution, indicating how to nudge samples toward more likely regions. The flow-based view, related to normalizing flows, treats generation as following a smooth path that moves samples from noise to data under a learned velocity field. These perspectives share a common backbone: a time-dependent velocity field whose flow transports a simple prior to the data. Sampling then amounts to solving a differential equation that evolves noise into data along a continuous trajectory. On this foundation, the book discusses guidance for controllable generation, efficient numerical solvers, and diffusion-motivated flow-map models that learn direct mappings between arbitrary times. It provides a conceptual and mathematically grounded understanding of diffusion models for readers with basic deep-learning knowledge.

2025-10-24T02:29:02Z Supplementary materials for the book are available at the book website: https://the-principles-of-diffusion-models.github.io/ Chieh-Hsin Lai Yang Song Dongjun Kim Yuki Mitsufuji Stefano Ermon http://arxiv.org/abs/2601.17354v5 PocketGS: On-Device Training of 3D Gaussian Splatting for High Perceptual Modeling 2026-05-27T08:22:56Z

While 3D Gaussian Splatting (3DGS) enables real-time rendering, its training demands workstation-level compute and memory, making mobile deployment impractical under minute-scale time budgets and limited peak memory. We present PocketGS, a mobile scene modeling paradigm that enables on-device 3DGS training under these tightly coupled constraints while preserving high-fidelity reconstruction. PocketGS resolves the fundamental tension between training efficiency, memory compactness, and modeling quality through three co-designed operators: $\mathcal{G}$ builds geometry-faithful point-cloud priors; $\mathcal{I}$ injects local surface statistics to seed anisotropic Gaussians, thereby reducing early conditioning gaps; and $\mathcal{T}$ unrolls alpha compositing with cached intermediates and index-mapped gradient scattering for stable mobile backpropagation. Extensive experiments demonstrate that PocketGS outperforms the powerful mainstream workstation 3DGS baseline under mobile budgets, delivering high-quality reconstructions and enabling a fully on-device, practical capture-to-rendering workflow.

2026-01-24T07:58:53Z Wenzhi Guo Guangchi Fang Shu Yang Bing Wang http://arxiv.org/abs/2605.28125v1 CLEAR-NeRF: Collinearity and Local-region Enhanced Accurate 3D Reconstruction in Unbounded Scenes 2026-05-27T08:16:48Z

Many real-world 3D reconstruction applications demand photorealism and metric accuracy across unbounded, complex scenes with challenging lighting and imperfect captures that current Neural Radiance Field (NeRF) pipelines only partly satisfy. This study adapts NeRF-based 3D reconstruction to multi-region of interest unbounded scenes to improve robustness to lighting and pose variation while enforcing metric accuracy suitable for digital-twin applications. Our approach introduces (i) automated local region localization/detection and reconstruction to seamlessly prioritize areas of interest without proliferating submodules, (ii) collinearity-enforcing ray sampling to learn smooth planar and curved surfaces, (iii) depth-localized neighborhood point extraction to suppress surface artifacts, and (iv) geometry-relevant color aggregation to mitigate lighting- and pose-caused variations. Results indicate superior performance of the proposed pipeline over the baseline NeRF models and established Structure from Motion (SfM) - Multi-View Stereo (MVS) solutions.

2026-05-27T08:16:48Z Vladislav Polianskii Elijs Dima Isabel Salmerón Marazuela Gergő László Nagy Sigurdur Sverrisson Volodya Grancharov http://arxiv.org/abs/2602.23754v2 Neural Image Space Tessellation efect 2026-05-27T07:34:01Z

We present Neural Image Space Tessellation effect (NIST), a lightweight screen-space post-processing approach for reducing the faceted silhouettes of low-poly renderings. Instead of tessellating primitives, creating new geometry, or modifying the underlying mesh, NIST uses the low-poly rendering result together with simple auxiliary G-buffer attributes to learn geometry-guided smoothing of object contours in image space. At its core, NIST first deforms image-space contours implicitly and then learns to reassign appearance in the whole image-space, including the deformed regions, preserving texture continuity and avoiding seam artifacts. Experiments show that NIST reduces visually apparent geometric faceting and produces smooth, coherent silhouettes close to tessellation-based smoothing references, with a nearly constant per-frame cost in our tested settings. To the best of our knowledge, NIST is the first work to move the solution of low-poly silhouette faceting from the pre-rendering geometry stage to a post-rendering screen-space stage.

2026-02-27T07:31:40Z Youyang Du Shandong University Mohamed bin Zayed University of Artificial Intelligence Junqiu Zhu Shandong University Zheng Zeng University of California, Santa Barbara Lu Wang Shandong University Lingqi Yan Mohamed bin Zayed University of Artificial Intelligence http://arxiv.org/abs/2605.26391v2 Garment Particles: A 2D--3D Symmetric Garment Representation for Generation and Editing 2026-05-27T05:11:18Z

Practical garment design spans two modes: intuitive creation from high-level intent, such as a reference image or text description, and complex low-level editing across 2D sewing patterns and 3D draped geometry, which requires professional training to navigate their complex interdependencies. Yet existing frameworks address only part of this challenge, offering either garment generation from casual inputs or direct editing on sewing patterns. To support both ends of the spectrum, we propose Garment Particles, a 5D point-cloud representation that jointly encodes 2D sewing patterns and 3D geometry. This representation enables Garment Particles Flow (GPF), a rectified flow framework that supports intuitive generation from high-level inputs (text, images, sketches) and various editing operations on 2D sewing patterns and 3D geometries via diffusion posterior sampling. Finally, we introduce Particles-to-Pattern Flow that converts generated garment particles into curved-based patterns for simulation. We validate our model's generation ability on multiple datasets, achieving state-of-the-art garment generation results against competitive baselines. Our model also enables many garment editing scenarios, including garment interpolation, sewing pattern editing, point-cloud- and silhouette-conditioned garment generation. Our project website is at https://garment-particles.github.io .

2026-05-25T23:43:54Z Kiyohiro Nakayama I-Chao Shen Ruofan Liu Yiming Wang Gordon Wetzstein Takeo Igarashi