https://arxiv.org/api/Zi0OBYj6k4rxRnFmsu1v4lAYUfM 2026-06-26T14:28:45Z 9390 1575 15 http://arxiv.org/abs/2509.18097v1 Preconditioned Deformation Grids 2025-09-22T17:59:55Z

Dynamic surface reconstruction of objects from point cloud sequences is a challenging field in computer graphics. Existing approaches either require multiple regularization terms or extensive training data which, however, lead to compromises in reconstruction accuracy as well as over-smoothing or poor generalization to unseen objects and motions. To address these lim- itations, we introduce Preconditioned Deformation Grids, a novel technique for estimating coherent deformation fields directly from unstructured point cloud sequences without requiring or forming explicit correspondences. Key to our approach is the use of multi-resolution voxel grids that capture the overall motion at varying spatial scales, enabling a more flexible deformation representation. In conjunction with incorporating grid-based Sobolev preconditioning into gradient-based optimization, we show that applying a Chamfer loss between the input point clouds as well as to an evolving template mesh is sufficient to obtain accurate deformations. To ensure temporal consistency along the object surface, we include a weak isometry loss on mesh edges which complements the main objective without constraining deformation fidelity. Extensive evaluations demonstrate that our method achieves superior results, particularly for long sequences, compared to state-of-the-art techniques.

2025-09-22T17:59:55Z GitHub: https://github.com/vc-bonn/preconditioned-deformation-grids Computer Graphics Forum, Volume 44, 2025 Julian Kaltheuner Alexander Oebel Hannah Droege Patrick Stotko Reinhard Klein http://arxiv.org/abs/2509.17985v1 VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models 2025-09-22T16:28:47Z

In this paper, we propose VideoFrom3D, a novel framework for synthesizing high-quality 3D scene videos from coarse geometry, a camera trajectory, and a reference image. Our approach streamlines the 3D graphic design workflow, enabling flexible design exploration and rapid production of deliverables. A straightforward approach to synthesizing a video from coarse geometry might condition a video diffusion model on geometric structure. However, existing video diffusion models struggle to generate high-fidelity results for complex scenes due to the difficulty of jointly modeling visual quality, motion, and temporal consistency. To address this, we propose a generative framework that leverages the complementary strengths of image and video diffusion models. Specifically, our framework consists of a Sparse Anchor-view Generation (SAG) and a Geometry-guided Generative Inbetweening (GGI) module. The SAG module generates high-quality, cross-view consistent anchor views using an image diffusion model, aided by Sparse Appearance-guided Sampling. Building on these anchor views, GGI module faithfully interpolates intermediate frames using a video diffusion model, enhanced by flow-based camera control and structural guidance. Notably, both modules operate without any paired dataset of 3D scene models and natural images, which is extremely difficult to obtain. Comprehensive experiments show that our method produces high-quality, style-consistent scene videos under diverse and challenging scenarios, outperforming simple and extended baselines.

2025-09-22T16:28:47Z Project page: https://kimgeonung.github.io/VideoFrom3D/ Geonung Kim Janghyeok Han Sunghyun Cho http://arxiv.org/abs/2509.17979v1 Towards Seeing Bones at Radio Frequency 2025-09-22T16:24:36Z

Wireless sensing literature has long aspired to achieve X-ray-like vision at radio frequencies. Yet, state-of-the-art wireless sensing literature has yet to generate the archetypal X-ray image: one of the bones beneath flesh. In this paper, we explore MCT, a penetration-based RF-imaging system for imaging bones at mm-resolution, one that significantly exceeds prior penetration-based RF imaging literature. Indeed the long wavelength, significant attenuation and complex diffraction that occur as RF propagates through flesh, have long limited imaging resolution (to several centimeters at best). We address these concerns through a novel penetration-based synthetic aperture algorithm, coupled with a learning-based pipeline to correct for diffraction-induced artifacts. A detailed evaluation of meat models demonstrates a resolution improvement from sub-decimeter to sub-centimeter over prior art in RF penetrative imaging.

2025-09-22T16:24:36Z Yiwen Song Hongyang Li Kuang Yuan Ran Bi Swarun Kumar http://arxiv.org/abs/2504.00745v2 The Granule-In-Cell Method for Simulating Sand--Water Mixtures 2025-09-22T14:49:08Z

The simulation of sand--water mixtures requires capturing the stochastic behavior of individual sand particles within a uniform, continuous fluid medium, such as the characteristic of migration, deposition, and plugging across various scenarios. In this paper, we introduce a Granule-in-Cell (GIC) method for simulating such sand--water interaction. We leverage the Discrete Element Method (DEM) to capture the fine-scale details of individual granules and the Particle-in-Cell (PIC) method for its continuous spatial representation and particle-based structure for density projection. To combine these two frameworks, we treat granules as macroscopic transport flow rather than solid boundaries for the fluid. This bidirectional coupling allows our model to accommodate a range of interphase forces with different discretization schemes, resulting in a more realistic simulation with fully respect to the mass conservation equation. Experimental results demonstrate the effectiveness of our method in simulating complex sand--water interactions, while maintaining volume consistency. Notably, in the dam-breaking experiment, our simulation uniquely captures the distinct physical properties of sand under varying infiltration degree within a single scenario. Our work advances the state of the art in granule--fluid simulation, offering a unified framework that bridges mesoscopic and macroscopic dynamics.

2025-04-01T12:56:55Z 19 pages, 15 figures, To appear in ACM Transactions on Graphics (SIGGRAPH Asia 2025) Yizao Tang Yuechen Zhu Xingyu Ni Baoquan Chen 10.1145/3763279 http://arxiv.org/abs/2509.17803v1 Effect of Appearance and Animation Realism on the Perception of Emotionally Expressive Virtual Humans 2025-09-22T13:59:14Z

3D Virtual Human technology is growing with several potential applications in health, education, business and telecommunications. Investigating the perception of these virtual humans can help guide to develop better and more effective applications. Recent developments show that the appearance of the virtual humans reached to a very realistic level. However, there is not yet adequate analysis on the perception of appearance and animation realism for emotionally expressive virtual humans. In this paper, we designed a user experiment and analyzed the effect of a realistic virtual human's appearance realism and animation realism in varying emotion conditions. We found that higher appearance realism and higher animation realism leads to higher social presence and higher attractiveness ratings. We also found significant effects of animation realism on perceived realism and emotion intensity levels. Our study sheds light into how appearance and animation realism effects the perception of highly realistic virtual humans in emotionally expressive scenarios and points out to future directions.

2025-09-22T13:59:14Z pre-print, 8 pages, accepted at ACM International Conference on Intelligent Virtual Agents 2023 (IVA 2023) Nabila Amadou Kazi Injamamul Haque Zerrin Yumak 10.1145/3570945.360730 http://arxiv.org/abs/2509.17755v1 Learning Neural Antiderivatives 2025-09-22T13:19:07Z

Neural fields offer continuous, learnable representations that extend beyond traditional discrete formats in visual computing. We study the problem of learning neural representations of repeated antiderivatives directly from a function, a continuous analogue of summed-area tables. Although widely used in discrete domains, such cumulative schemes rely on grids, which prevents their applicability in continuous neural contexts. We introduce and analyze a range of neural methods for repeated integration, including both adaptations of prior work and novel designs. Our evaluation spans multiple input dimensionalities and integration orders, assessing both reconstruction quality and performance in downstream tasks such as filtering and rendering. These results enable integrating classical cumulative operators into modern neural systems and offer insights into learning tasks involving differential and integral operators.

2025-09-22T13:19:07Z Fizza Rubab Ntumba Elie Nsampi Martin Balint Felix Mujkanovic Hans-Peter Seidel Tobias Ritschel Thomas Leimkühler http://arxiv.org/abs/2509.17748v1 "I don't like my avatar": Investigating Human Digital Doubles 2025-09-22T13:11:28Z

Creating human digital doubles is becoming easier and much more accessible to everyone using consumer grade devices. In this work, we investigate how avatar style (realistic vs cartoon) and avatar familiarity (self, acquaintance, unknown person) affect self/other-identification, perceived realism, affinity and social presence with a controlled offline experiment. We created two styles of avatars (realistic-looking MetaHumans and cartoon-looking ReadyPlayerMe avatars) and facial animations stimuli for them using performance capture. Questionnaire responses demonstrate that higher appearance realism leads to a higher level of identification, perceived realism and social presence. However, avatars with familiar faces, especially those with high appearance realism, lead to a lower level of identification, perceived realism, and affinity. Although participants identified their digital doubles as their own, they consistently did not like their avatars, especially of realistic appearance. But they were less critical and more forgiving about their acquaintance's or an unknown person's digital double.

2025-09-22T13:11:28Z pre-print, 12 papges, accepted at ACM Siggraph Motion, Interaction and Games 2025 (MIG 2025) conference Siyi Liu Kazi Injamamul Haque Zerrin Yumak 10.1145/3769047.3769061 http://arxiv.org/abs/2509.11003v2 AD-GS: Alternating Densification for Sparse-Input 3D Gaussian Splatting 2025-09-22T12:25:56Z

3D Gaussian Splatting (3DGS) has shown impressive results in real-time novel view synthesis. However, it often struggles under sparse-view settings, producing undesirable artifacts such as floaters, inaccurate geometry, and overfitting due to limited observations. We find that a key contributing factor is uncontrolled densification, where adding Gaussian primitives rapidly without guidance can harm geometry and cause artifacts. We propose AD-GS, a novel alternating densification framework that interleaves high and low densification phases. During high densification, the model densifies aggressively, followed by photometric loss based training to capture fine-grained scene details. Low densification then primarily involves aggressive opacity pruning of Gaussians followed by regularizing their geometry through pseudo-view consistency and edge-aware depth smoothness. This alternating approach helps reduce overfitting by carefully controlling model capacity growth while progressively refining the scene representation. Extensive experiments on challenging datasets demonstrate that AD-GS significantly improves rendering quality and geometric consistency compared to existing methods. The source code for our model can be found on our project page: https://gurutvapatle.github.io/publications/2025/ADGS.html .

2025-09-13T23:05:49Z SIGGRAPH Asia 2025 Gurutva Patle Nilay Girgaonkar Nagabhushan Somraj Rajiv Soundararajan 10.1145/3757377.3763993 http://arxiv.org/abs/2509.08947v2 CameraVDP: Perceptual Display Assessment with Uncertainty Estimation via Camera and Visual Difference Prediction 2025-09-21T21:34:01Z

Accurate measurement of images produced by electronic displays is critical for the evaluation of both traditional and computational displays. Traditional display measurement methods based on sparse radiometric sampling and fitting a model are inadequate for capturing spatially varying display artifacts, as they fail to capture high-frequency and pixel-level distortions. While cameras offer sufficient spatial resolution, they introduce optical, sampling, and photometric distortions. Furthermore, the physical measurement must be combined with a model of a visual system to assess whether the distortions are going to be visible. To enable perceptual assessment of displays, we propose a combination of a camera-based reconstruction pipeline with a visual difference predictor, which account for both the inaccuracy of camera measurements and visual difference prediction. The reconstruction pipeline combines HDR image stacking, MTF inversion, vignetting correction, geometric undistortion, homography transformation, and color correction, enabling cameras to function as precise display measurement instruments. By incorporating a Visual Difference Predictor (VDP), our system models the visibility of various stimuli under different viewing conditions for the human visual system. We validate the proposed CameraVDP framework through three applications: defective pixel detection, color fringing awareness, and display non-uniformity evaluation. Our uncertainty analysis framework enables the estimation of the theoretical upper bound for defect pixel detection performance and provides confidence intervals for VDP quality scores.

2025-09-10T19:13:14Z Accepted by SIGGRAPH Asia 2025 Yancheng Cai Robert Wanat Rafal Mantiuk http://arxiv.org/abs/2412.03526v3 Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos 2025-09-21T19:13:53Z

Recent advancements in static feed-forward scene reconstruction have demonstrated significant progress in high-quality novel view synthesis. However, these models often struggle with generalizability across diverse environments and fail to effectively handle dynamic content. We present BTimer (short for BulletTimer), the first motion-aware feed-forward model for real-time reconstruction and novel view synthesis of dynamic scenes. Our approach reconstructs the full scene in a 3D Gaussian Splatting representation at a given target ('bullet') timestamp by aggregating information from all the context frames. Such a formulation allows BTimer to gain scalability and generalization by leveraging both static and dynamic scene datasets. Given a casual monocular dynamic video, BTimer reconstructs a bullet-time scene within 150ms while reaching state-of-the-art performance on both static and dynamic scene datasets, even compared with optimization-based approaches.

2024-12-04T18:15:06Z Project website: https://research.nvidia.com/labs/toronto-ai/bullet-timer/ Hanxue Liang Jiawei Ren Ashkan Mirzaei Antonio Torralba Ziwei Liu Igor Gilitschenski Sanja Fidler Cengiz Oztireli Huan Ling Zan Gojcic Jiahui Huang http://arxiv.org/abs/2509.16960v1 SemanticGarment: Semantic-Controlled Generation and Editing of 3D Gaussian Garments 2025-09-21T07:46:01Z

3D digital garment generation and editing play a pivotal role in fashion design, virtual try-on, and gaming. Traditional methods struggle to meet the growing demand due to technical complexity and high resource costs. Learning-based approaches offer faster, more diverse garment synthesis based on specific requirements and reduce human efforts and time costs. However, they still face challenges such as inconsistent multi-view geometry or textures and heavy reliance on detailed garment topology and manual rigging. We propose SemanticGarment, a 3D Gaussian-based method that realizes high-fidelity 3D garment generation from text or image prompts and supports semantic-based interactive editing for flexible user customization. To ensure multi-view consistency and garment fitting, we propose to leverage structural human priors for the generative model by introducing a 3D semantic clothing model, which initializes the geometry structure and lays the groundwork for view-consistent garment generation and editing. Without the need to regenerate or rely on existing mesh templates, our approach allows for rapid and diverse modifications to existing Gaussians, either globally or within a local region. To address the artifacts caused by self-occlusion for garment reconstruction based on single image, we develop a self-occlusion optimization strategy to mitigate holes and artifacts that arise when directly animating self-occluded garments. Extensive experiments are conducted to demonstrate our superior performance in 3D garment generation and editing.

2025-09-21T07:46:01Z Ruiyan Wang Zhengxue Cheng Zonghao Lin Jun Ling Yuzhou Liu Yanru An Rong Xie Li Song 10.1145/3746027.3755136 http://arxiv.org/abs/2509.16869v1 PhysHDR: When Lighting Meets Materials and Scene Geometry in HDR Reconstruction 2025-09-21T01:41:40Z

Low Dynamic Range (LDR) to High Dynamic Range (HDR) image translation is a fundamental task in many computational vision problems. Numerous data-driven methods have been proposed to address this problem; however, they lack explicit modeling of illumination, lighting, and scene geometry in images. This limits the quality of the reconstructed HDR images. Since lighting and shadows interact differently with different materials, (e.g., specular surfaces such as glass and metal, and lambertian or diffuse surfaces such as wood and stone), modeling material-specific properties (e.g., specular and diffuse reflectance) has the potential to improve the quality of HDR image reconstruction. This paper presents PhysHDR, a simple yet powerful latent diffusion-based generative model for HDR image reconstruction. The denoising process is conditioned on lighting and depth information and guided by a novel loss to incorporate material properties of surfaces in the scene. The experimental results establish the efficacy of PhysHDR in comparison to a number of recent state-of-the-art methods.

2025-09-21T01:41:40Z Submitted to IEEE Hrishav Bakul Barua Kalin Stefanov Ganesh Krishnasamy KokSheik Wong Abhinav Dhall http://arxiv.org/abs/2509.16773v1 Improve bounding box in Carla Simulator 2025-09-20T18:44:18Z

The CARLA simulator (Car Learning to Act) serves as a robust platform for testing algorithms and generating datasets in the field of Autonomous Driving (AD). It provides control over various environmental parameters, enabling thorough evaluation. Development bounding boxes are commonly utilized tools in deep learning and play a crucial role in AD applications. The predominant method for data generation in the CARLA Simulator involves identifying and delineating objects of interest, such as vehicles, using bounding boxes. The operation in CARLA entails capturing the coordinates of all objects on the map, which are subsequently aligned with the sensor's coordinate system at the ego vehicle and then enclosed within bounding boxes relative to the ego vehicle's perspective. However, this primary approach encounters challenges associated with object detection and bounding box annotation, such as ghost boxes. Although these procedures are generally effective at detecting vehicles and other objects within their direct line of sight, they may also produce false positives by identifying objects that are obscured by obstructions. We have enhanced the primary approach with the objective of filtering out unwanted boxes. Performance analysis indicates that the improved approach has achieved high accuracy.

2025-09-20T18:44:18Z 9 pages, 12 figures,VEHITS Conference 2024 Mohamad Mofeed Chaar Jamal Raiyn Galia Weidl 10.5220/0012600500003702 http://arxiv.org/abs/2509.16735v1 Brain Connectivity Network Structure Learning For Brain Disorder Diagnosis 2025-09-20T15:59:54Z

Recent studies in neuroscience highlight the significant potential of brain connectivity networks, which are commonly constructed from functional magnetic resonance imaging (fMRI) data for brain disorder diagnosis. Traditional brain connectivity networks are typically obtained using predefined methods that incorporate manually-set thresholds to estimate inter-regional relationships. However, such approaches often introduce redundant connections or overlook essential interactions, compromising the value of the constructed networks. Besides, the insufficiency of labeled data further increases the difficulty of learning generalized representations of intrinsic brain characteristics. To mitigate those issues, we propose a self-supervised framework to learn an optimal structure and representation for brain connectivity networks, focusing on individualized generation and optimization in an unsupervised manner. We firstly employ two existing whole-brain connectomes to adaptively construct their complementary brain network structure learner, and then introduce a multi-state graph-based encoder with a joint iterative learning strategy to simultaneously optimize both the generated network structure and its representation. By leveraging self-supervised pretraining on large-scale unlabeled brain connectivity data, our framework enables the brain connectivity network learner to generalize e ffectively to unseen disorders, while requiring only minimal finetuning of the encoder for adaptation to new diagnostic tasks. Extensive experiments on cross-dataset brain disorder diagnosis demonstrate that our method consistently outperforms state-of-the-art approaches, validating its effectiveness and generalizability. The code is publicly available at https://github.com/neochen1/BCNSL.

2025-09-20T15:59:54Z Dongdong Chen Linlin Yao Mengjun Liu Zhenrong Shen Yuqi Hu Zhiyun Song Shengyu Lu Qian Wang Dinggang Shen Lichi Zhang http://arxiv.org/abs/2508.16024v3 Wavelet-Space Representations for Neural Super-Resolution in Rendering Pipelines 2025-09-20T13:56:49Z

We investigate the use of wavelet-space feature decomposition in neural super-resolution for rendering pipelines. Building on recent neural upscaling frameworks, we introduce a formulation that predicts stationary wavelet coefficients rather than directly regressing RGB values. This frequency-aware decomposition separates low- and high-frequency components, enabling sharper texture recovery and reducing blur in challenging regions. Unlike conventional wavelet transforms, our use of the stationary wavelet transform (SWT) preserves spatial alignment across subbands, allowing the network to integrate G-buffer attributes and temporally warped history frames in a shift-invariant manner. The predicted coefficients are recombined through inverse wavelet synthesis, producing resolution-consistent reconstructions across arbitrary scale factors. We conduct extensive evaluations and ablations, showing that incorporating SWT improves both fidelity and perceptual quality with only modest overhead, while remaining compatible with standard rendering architectures. Taken together, our results suggest that wavelet-domain neural super-resolution provides a principled and efficient path toward higher-quality real-time rendering, with broader implications for neural rendering and graphics applications.

2025-08-22T01:01:44Z Prateek Poudel Prashant Aryal Kirtan Kunwar Navin Nepal Dinesh Baniya Kshatri