https://arxiv.org/api/y9tTyImPS28LTkdQbw13dnjgr7Q 2026-06-25T20:28:44Z 9383 1335 15 http://arxiv.org/abs/2501.09054v3 NeurOp-Diff:Continuous Remote Sensing Image Super-Resolution via Neural Operator Diffusion 2025-11-05T02:05:03Z Most publicly accessible remote sensing data suffer from low resolution, limiting their practical applications. To address this, we propose a diffusion model guided by neural operators for continuous remote sensing image super-resolution (NeurOp-Diff). Neural operators are used to learn resolution representations at arbitrary scales, encoding low-resolution (LR) images into high-dimensional features, which are then used as prior conditions to guide the diffusion model for denoising. This effectively addresses the artifacts and excessive smoothing issues present in existing super-resolution (SR) methods, enabling the generation of high-quality, continuous super-resolution images. Specifically, we adjust the super-resolution scale by a scaling factor s, allowing the model to adapt to different super-resolution magnifications. Furthermore, experiments on multiple datasets demonstrate the effectiveness of NeurOp-Diff. Our code is available at https://github.com/zerono000/NeurOp-Diff. 2025-01-15T15:53:09Z Zihao Xu Yuzhi Tang Bowen Xu Qingquan Li http://arxiv.org/abs/2510.11878v2 GS-Verse: Mesh-based Gaussian Splatting for Physics-aware Interaction in Virtual Reality 2025-11-04T18:24:59Z As the demand for immersive 3D content grows, the need for intuitive and efficient interaction methods becomes paramount. Current techniques for physically manipulating 3D content within Virtual Reality (VR) often face significant limitations, including reliance on engineering-intensive processes and simplified geometric representations, such as tetrahedral cages, which can compromise visual fidelity and physical accuracy. In this paper, we introduce GS-Verse (Gaussian Splatting for Virtual Environment Rendering and Scene Editing), a novel method designed to overcome these challenges by directly integrating an object's mesh with a Gaussian Splatting (GS) representation. Our approach enables more precise surface approximation, leading to highly realistic deformations and interactions. By leveraging existing 3D mesh assets, GS-Verse facilitates seamless content reuse and simplifies the development workflow. Moreover, our system is designed to be physics-engine-agnostic, granting developers robust deployment flexibility. This versatile architecture delivers a highly realistic, adaptable, and intuitive approach to interactive 3D manipulation. We rigorously validate our method against the current state-of-the-art technique that couples VR with GS in a comparative user study involving 18 participants. Specifically, we demonstrate that our approach is statistically significantly better for physics-aware stretching manipulation and is also more consistent in other physics-based manipulations like twisting and shaking. Further evaluation across various interactions and scenes confirms that our method consistently delivers high and reliable performance, showing its potential as a plausible alternative to existing methods. 2025-10-13T19:36:47Z Anastasiya Pechko Piotr Borycki Joanna Waczyńska Daniel Barczyk Agata Szymańska Sławomir Tadeja Przemysław Spurek http://arxiv.org/abs/2506.09485v2 Adv-BMT: Bidirectional Motion Transformer for Safety-Critical Traffic Scenario Generation 2025-11-04T02:15:16Z Scenario-based testing is essential for validating the performance of autonomous driving (AD) systems. However, such testing is limited by the scarcity of long-tailed, safety-critical scenarios in existing datasets collected in the real world. To tackle the data issue, we propose the Adv-BMT framework, which augments real-world scenarios with diverse and realistic adversarial traffic interactions. The core component of Adv-BMT is a bidirectional motion transformer (BMT) model to perform inverse traffic motion predictions, which takes agent information in the last time step of the scenario as input, and reconstructs the traffic in the inverse of chronological order until the initial time step. The Adv-BMT framework is a two-staged pipeline: it first conducts adversarial initializations and then inverse motion predictions. Different from previous work, we do not need any collision data for pretraining, and are able to generate realistic and diverse collision interactions. Our experimental results validate the quality of generated collision scenarios by Adv-BMT: training in our augmented dataset would reduce episode collision rates by 20%. Demo and code are available at: https://metadriverse.github.io/adv-bmt/. 2025-06-11T07:54:50Z Yuxin Liu Zhenghao Peng Xuanhao Cui Bolei Zhou http://arxiv.org/abs/2510.25765v2 FreeArt3D: Training-Free Articulated Object Generation using 3D Diffusion 2025-11-03T22:47:17Z Articulated 3D objects are central to many applications in robotics, AR/VR, and animation. Recent approaches to modeling such objects either rely on optimization-based reconstruction pipelines that require dense-view supervision or on feed-forward generative models that produce coarse geometric approximations and often overlook surface texture. In contrast, open-world 3D generation of static objects has achieved remarkable success, especially with the advent of native 3D diffusion models such as Trellis. However, extending these methods to articulated objects by training native 3D diffusion models poses significant challenges. In this work, we present FreeArt3D, a training-free framework for articulated 3D object generation. Instead of training a new model on limited articulated data, FreeArt3D repurposes a pre-trained static 3D diffusion model (e.g., Trellis) as a powerful shape prior. It extends Score Distillation Sampling (SDS) into the 3D-to-4D domain by treating articulation as an additional generative dimension. Given a few images captured in different articulation states, FreeArt3D jointly optimizes the object's geometry, texture, and articulation parameters without requiring task-specific training or access to large-scale articulated datasets. Our method generates high-fidelity geometry and textures, accurately predicts underlying kinematic structures, and generalizes well across diverse object categories. Despite following a per-instance optimization paradigm, FreeArt3D completes in minutes and significantly outperforms prior state-of-the-art approaches in both quality and versatility. Please check our website for more details: https://czzzzh.github.io/FreeArt3D 2025-10-29T17:58:14Z Project Page: https://czzzzh.github.io/FreeArt3D Code: https://github.com/CzzzzH/FreeArt3D Chuhao Chen Isabella Liu Xinyue Wei Hao Su Minghua Liu http://arxiv.org/abs/2511.01513v1 Example-Based Feature Painting on Textures 2025-11-03T12:26:50Z In this work, we propose a system that covers the complete workflow for achieving controlled authoring and editing of textures that present distinctive local characteristics. These include various effects that change the surface appearance of materials, such as stains, tears, holes, abrasions, discoloration, and more. Such alterations are ubiquitous in nature, and including them in the synthesis process is crucial for generating realistic textures. We introduce a novel approach for creating textures with such blemishes, adopting a learning-based approach that leverages unlabeled examples. Our approach does not require manual annotations by the user; instead, it detects the appearance-altering features through unsupervised anomaly detection. The various textural features are then automatically clustered into semantically coherent groups, which are used to guide the conditional generation of images. Our pipeline as a whole goes from a small image collection to a versatile generative model that enables the user to interactively create and paint features on textures of arbitrary size. Notably, the algorithms we introduce for diffusion-based editing and infinite stationary texture generation are generic and should prove useful in other contexts as well. Project page: https://reality.tf.fau.de/pub/ardelean2025examplebased.html 2025-11-03T12:26:50Z "\c{opyright} 2025 Andrei-Timotei Ardelean, Tim Weyrich. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Trans. Graph., Vol. 44, No. 6, https://doi.org/10.1145/3763301 Andrei-Timotei Ardelean Tim Weyrich 10.1145/3763301 http://arxiv.org/abs/2511.01463v1 HMVLM: Human Motion-Vision-Lanuage Model via MoE LoRA 2025-11-03T11:22:10Z The expansion of instruction-tuning data has enabled foundation language models to exhibit improved instruction adherence and superior performance across diverse downstream tasks. Semantically-rich 3D human motion is being progressively integrated with these foundation models to enhance multimodal understanding and cross-modal generation capabilities. However, the modality gap between human motion and text raises unresolved concerns about catastrophic forgetting during this integration. In addition, developing autoregressive-compatible pose representations that preserve generalizability across heterogeneous downstream tasks remains a critical technical barrier. To address these issues, we propose the Human Motion-Vision-Language Model (HMVLM), a unified framework based on the Mixture of Expert Low-Rank Adaption(MoE LoRA) strategy. The framework leverages the gating network to dynamically allocate LoRA expert weights based on the input prompt, enabling synchronized fine-tuning of multiple tasks. To mitigate catastrophic forgetting during instruction-tuning, we introduce a novel zero expert that preserves the pre-trained parameters for general linguistic tasks. For pose representation, we implement body-part-specific tokenization by partitioning the human body into different joint groups, enhancing the spatial resolution of the representation. Experiments show that our method effectively alleviates knowledge forgetting during instruction-tuning and achieves remarkable performance across diverse human motion downstream tasks. 2025-11-03T11:22:10Z 10 pages, 5figures. The Thirty-Ninth Annual Conference on Neural Information Processing Systems Lei Hu Yongjing Ye Shihong Xia http://arxiv.org/abs/1210.0026v2 Coupled quasi-harmonic bases 2025-11-03T10:12:30Z The use of Laplacian eigenbases has been shown to be fruitful in many computer graphics applications. Today, state-of-the-art approaches to shape analysis, synthesis, and correspondence rely on these natural harmonic bases that allow using classical tools from harmonic analysis on manifolds. However, many applications involving multiple shapes are obstacled by the fact that Laplacian eigenbases computed independently on different shapes are often incompatible with each other. In this paper, we propose the construction of common approximate eigenbases for multiple shapes using approximate joint diagonalization algorithms. We illustrate the benefits of the proposed approach on tasks from shape editing, pose transfer, correspondence, and similarity. 2012-09-28T20:29:37Z Symbolic withdrawal of my first PhD paper as an open call to reform peer review. Fig.7 is NOT reproducible (MSER not used, manual fix ignored). I propose implementing my S.V.E. framework (https://github.com/skovnats/SVE-Systemic-Verification-Engineering/blob/master/Papers/SVE-3.pdf) and can assist if requested A. Kovnatsky M. M. Bronstein A. M. Bronstein K. Glashoff R. Kimmel http://arxiv.org/abs/2511.01259v1 An Adjoint Method for Differentiable Fluid Simulation on Flow Maps 2025-11-03T06:11:02Z This paper presents a novel adjoint solver for differentiable fluid simulation based on bidirectional flow maps. Our key observation is that the forward fluid solver and its corresponding backward, adjoint solver share the same flow map as the forward simulation. In the forward pass, this map transports fluid impulse variables from the initial frame to the current frame to simulate vortical dynamics. In the backward pass, the same map propagates adjoint variables from the current frame back to the initial frame to compute gradients. This shared long-range map allows the accuracy of gradient computation to benefit directly from improvements in flow map construction. Building on this insight, we introduce a novel adjoint solver that solves the adjoint equations directly on the flow map, enabling long-range and accurate differentiation of incompressible flows without differentiating intermediate numerical steps or storing intermediate variables, as required in conventional adjoint methods. To further improve efficiency, we propose a long-short time-sparse flow map representation for evolving adjoint variables. Our approach has low memory usage, requiring only 6.53GB of data at a resolution of $192^3$ while preserving high accuracy in tracking vorticity, enabling new differentiable simulation tasks that require precise identification, prediction, and control of vortex dynamics. 2025-11-03T06:11:02Z 15 pages, 16 figures ACM SIGGRAPH Asia Conference Proceedings (2025) Zhiqi Li Jinjin He Barnabás Börcsök Taiyuan Zhang Duowen Chen Tao Du Ming C. Lin Greg Turk Bo Zhu 10.1145/3757377.3763903 http://arxiv.org/abs/2511.00965v1 Detecting Coverage Holes in Wireless Sensor Networks Using Connected Component Labeling and Force-Directed Algorithms 2025-11-02T15:00:23Z Contour detection in Wireless Sensor Networks (WSNs) is crucial for tasks like energy saving and network optimization, especially in security and surveillance applications. Coverage holes, where data transmission is not achievable, are a significant issue caused by factors such as energy depletion and physical damage. Traditional methods for detecting these holes often suffer from inaccuracy, low processing speed, and high energy consumption, relying heavily on physical information like node coordinates and sensing range. To address these challenges, we propose a novel, coordinate-free coverage hole detection method using Connected Component Labeling (CCL) and Force-Directed (FD) algorithms, termed FD-CCL. This method does not require node coordinates or sensing range information. We also investigate Suzuki's Contour Tracing (CT) algorithm and compare its performance with CCL on various FD graphs. Our experiments demonstrate the effectiveness of FD-CCL in terms of processing time and accuracy. Simulation results confirm the superiority of FD-CCL in detecting and locating coverage holes in WSNs. 2025-11-02T15:00:23Z Jiacheng Xu Xiongfei Zhao Hou-Wan Long Cheong Se-Hang Yain-Whar Si http://arxiv.org/abs/2507.11949v2 MOSPA: Human Motion Generation Driven by Spatial Audio 2025-11-02T14:52:57Z Enabling virtual humans to dynamically and realistically respond to diverse auditory stimuli remains a key challenge in character animation, demanding the integration of perceptual modeling and motion synthesis. Despite its significance, this task remains largely unexplored. Most previous works have primarily focused on mapping modalities like speech, audio, and music to generate human motion. As of yet, these models typically overlook the impact of spatial features encoded in spatial audio signals on human motion. To bridge this gap and enable high-quality modeling of human movements in response to spatial audio, we introduce the first comprehensive Spatial Audio-Driven Human Motion (SAM) dataset, which contains diverse and high-quality spatial audio and motion data. For benchmarking, we develop a simple yet effective diffusion-based generative framework for human MOtion generation driven by SPatial Audio, termed MOSPA, which faithfully captures the relationship between body motion and spatial audio through an effective fusion mechanism. Once trained, MOSPA can generate diverse, realistic human motions conditioned on varying spatial audio inputs. We perform a thorough investigation of the proposed dataset and conduct extensive experiments for benchmarking, where our method achieves state-of-the-art performance on this task. Our code and model are publicly available at https://github.com/xsy27/Mospa-Acoustic-driven-Motion-Generation 2025-07-16T06:33:11Z NeurIPS 2025 (Spotlight) Shuyang Xu Zhiyang Dou Mingyi Shi Liang Pan Leo Ho Jingbo Wang Yuan Liu Cheng Lin Yuexin Ma Wenping Wang Taku Komura http://arxiv.org/abs/2511.00702v1 Applying Medical Imaging Tractography Techniques to Painterly Rendering of Images 2025-11-01T20:51:01Z Doctors and researchers routinely use diffusion tensor imaging (DTI) and tractography to visualize the fibrous structure of tissues in the human body. This paper explores the connection of these techniques to the painterly rendering of images. Using a tractography algorithm the presented method can place brush strokes that mimic the painting process of human artists, analogously to how fibres are tracked in DTI. The analogue to the diffusion tensor for image orientation is the structural tensor, which can provide better local orientation information than the gradient alone. I demonstrate this technique in portraits and general images, and discuss the parallels between fibre tracking and brush stroke placement, and frame it in the language of tractography. This work presents an exploratory investigation into the cross-domain application of diffusion tensor imaging techniques to painterly rendering of images. All the code is available at https://github.com/tito21/st-python 2025-11-01T20:51:01Z Exploratory investigation applying medical imaging tractography techniques to painterly image rendering. Code available at https://github.com/tito21/st-python Alberto Di Biase http://arxiv.org/abs/2510.01176v2 Audio Driven Real-Time Facial Animation for Social Telepresence 2025-11-01T15:10:39Z We present an audio-driven real-time system for animating photorealistic 3D facial avatars with minimal latency, designed for social interactions in virtual reality for anyone. Central to our approach is an encoder model that transforms audio signals into latent facial expression sequences in real time, which are then decoded as photorealistic 3D facial avatars. Leveraging the generative capabilities of diffusion models, we capture the rich spectrum of facial expressions necessary for natural communication while achieving real-time performance (<15ms GPU time). Our novel architecture minimizes latency through two key innovations: an online transformer that eliminates dependency on future inputs and a distillation pipeline that accelerates iterative denoising into a single step. We further address critical design challenges in live scenarios for processing continuous audio signals frame-by-frame while maintaining consistent animation quality. The versatility of our framework extends to multimodal applications, including semantic modalities such as emotion conditions and multimodal sensors with head-mounted eye cameras on VR headsets. Experimental results demonstrate significant improvements in facial animation accuracy over existing offline state-of-the-art baselines, achieving 100 to 1000 times faster inference speed. We validate our approach through live VR demonstrations and across various scenarios such as multilingual speeches. 2025-10-01T17:57:05Z SIGGRAPH Asia 2025. Project page: https://jiyewise.github.io/projects/AudioRTA Jiye Lee Chenghui Li Linh Tran Shih-En Wei Jason Saragih Alexander Richard Hanbyul Joo Shaojie Bai 10.1145/3757377.3763854 http://arxiv.org/abs/2511.00548v1 Image-based ground distance detection for crop-residue-covered soil 2025-11-01T13:17:23Z Conservation agriculture features a soil surface covered with crop residues, which brings benefits of improving soil health and saving water. However, one significant challenge in conservation agriculture lies in precisely controlling the seeding depth on the soil covered with crop residues. This is constrained by the lack of ground distance information, since current distance measurement techniques, like laser, ultrasonic, or mechanical displacement sensors, are incapable of differentiating whether the distance information comes from the residue or the soil. This paper presents an image-based method to get the ground distance information for the crop-residues-covered soil. This method is performed with 3D camera and RGB camera, obtaining depth image and color image at the same time. The color image is used to distinguish the different areas of residues and soil and finally generates a mask image. The mask image is applied to the depth image so that only the soil area depth information can be used to calculate the ground distance, and residue areas can be recognized and excluded from ground distance detection. Experimentation shows that this distance measurement method is feasible for real-time implementation, and the measurement error is within plus or minus 3mm. It can be applied in conservation agriculture machinery for precision depth seeding, as well as other depth-control-demanding applications like transplant or tillage. 2025-11-01T13:17:23Z under review at Computers and Electronics in Agriculture Baochao Wang Xingyu Zhang Qingtao Zong Alim Pulatov Shuqi Shang Dongwei Wang http://arxiv.org/abs/2506.11252v2 Anti-Aliased 2D Gaussian Splatting 2025-11-01T11:13:52Z 2D Gaussian Splatting (2DGS) has recently emerged as a promising method for novel view synthesis and surface reconstruction, offering better view-consistency and geometric accuracy than volumetric 3DGS. However, 2DGS suffers from severe aliasing artifacts when rendering at different sampling rates than those used during training, limiting its practical applications in scenarios requiring camera zoom or varying fields of view. We identify that these artifacts stem from two key limitations: the lack of frequency constraints in the representation and an ineffective screen-space clamping approach. To address these issues, we present AA-2DGS, an anti-aliased formulation of 2D Gaussian Splatting that maintains its geometric benefits while significantly enhancing rendering quality across different scales. Our method introduces a world-space flat smoothing kernel that constrains the frequency content of 2D Gaussian primitives based on the maximal sampling frequency from training views, effectively eliminating high-frequency artifacts when zooming in. Additionally, we derive a novel object-space Mip filter by leveraging an affine approximation of the ray-splat intersection mapping, which allows us to efficiently apply proper anti-aliasing directly in the local space of each splat. 2025-06-12T19:49:57Z NeurIPS 2025. Code will be available at https://github.com/maeyounes/AA-2DGS Mae Younes Adnane Boukhayma http://arxiv.org/abs/2510.08271v2 SViM3D: Stable Video Material Diffusion for Single Image 3D Generation 2025-11-01T11:07:33Z We present Stable Video Materials 3D (SViM3D), a framework to predict multi-view consistent physically based rendering (PBR) materials, given a single image. Recently, video diffusion models have been successfully used to reconstruct 3D objects from a single image efficiently. However, reflectance is still represented by simple material models or needs to be estimated in additional steps to enable relighting and controlled appearance edits. We extend a latent video diffusion model to output spatially varying PBR parameters and surface normals jointly with each generated view based on explicit camera control. This unique setup allows for relighting and generating a 3D asset using our model as neural prior. We introduce various mechanisms to this pipeline that improve quality in this ill-posed setting. We show state-of-the-art relighting and novel view synthesis performance on multiple object-centric datasets. Our method generalizes to diverse inputs, enabling the generation of relightable 3D assets useful in AR/VR, movies, games and other visual media. 2025-10-09T14:29:47Z Accepted by International Conference on Computer Vision (ICCV 2025). Project page: http://svim3d.aengelhardt.com Andreas Engelhardt Mark Boss Vikram Voleti Chun-Han Yao Hendrik P. A. Lensch Varun Jampani