https://arxiv.org/api/otMtgv8oJBYqjRxmMAu4f/aElA4 2026-06-28T14:48:20Z 9390 1890 15 http://arxiv.org/abs/2506.23957v2 GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering 2025-07-18T09:34:33Z

Video stabilization is pivotal for video processing, as it removes unwanted shakiness while preserving the original user motion intent. Existing approaches, depending on the domain they operate, suffer from several issues (e.g. geometric distortions, excessive cropping, poor generalization) that degrade the user experience. To address these issues, we introduce \textbf{GaVS}, a novel 3D-grounded approach that reformulates video stabilization as a temporally-consistent `local reconstruction and rendering' paradigm. Given 3D camera poses, we augment a reconstruction model to predict Gaussian Splatting primitives, and finetune it at test-time, with multi-view dynamics-aware photometric supervision and cross-frame regularization, to produce temporally-consistent local reconstructions. The model are then used to render each stabilized frame. We utilize a scene extrapolation module to avoid frame cropping. Our method is evaluated on a repurposed dataset, instilled with 3D-grounded information, covering samples with diverse camera motions and scene dynamics. Quantitatively, our method is competitive with or superior to state-of-the-art 2D and 2.5D approaches in terms of conventional task metrics and new geometry consistency. Qualitatively, our method produces noticeably better results compared to alternatives, validated by the user study.

2025-06-30T15:24:27Z siggraph 2025, project website: https://sinoyou.github.io/gavs. version 2, update discussion Zinuo You Stamatios Georgoulis Anpei Chen Siyu Tang Dengxin Dai 10.1145/3721238.3730757 http://arxiv.org/abs/2507.12168v2 Shape Adaptation for 3D Hairstyle Retargeting 2025-07-18T07:47:59Z

It is demanding to author an existing hairstyle for novel characters in games and VR applications. However, it is a non-trivial task for artists due to the complicated hair geometries and spatial interactions to preserve. In this paper, we present an automatic shape adaptation method to retarget 3D hairstyles. We formulate the adaptation process as a constrained optimization problem, where all the shape properties and spatial relationships are converted into individual objectives and constraints. To make such an optimization on high-resolution hairstyles tractable, we adopt a multi-scale strategy to compute the target positions of the hair strands in a coarse-to-fine manner. The global solving for the inter-strands coupling is restricted to the coarse level, and the solving for fine details is made local and parallel. In addition, we present a novel hairline edit tool to allow for user customization during retargeting. We achieve it by solving physics-based deformations of an embedded membrane to redistribute the hair roots with minimal distortion. We demonstrate the efficacy of our method through quantitative and qualitative experiments on various hairstyles and characters.

2025-07-16T11:55:11Z Lu Yu Zhong Ren Youyi Zheng Xiang Chen Kun Zhou http://arxiv.org/abs/2507.13660v1 Managing level of detail through peripheral degradation: Effects on search performance with a head-mounted display 2025-07-18T05:07:02Z

Two user studies were performed to evaluate the effect of level-of-detail (LOD) degradation in the periphery of head-mounted displays on visual search performance. In the first study, spatial detail was degraded by reducing resolution. In the second study, detail was degraded in the color domain by using grayscale in the periphery. In each study, 10 subjects were given a complex search task that required users to indicate whether or not a target object was present among distracters. Subjects used several different displays varying in the amount of detail presented. Frame rate, object location, subject input method, and order of display use were all controlled. The primary dependent measures were search time on correctly performed trials and the percentage of all trials correctly performed. Results indicated that peripheral LOD degradation can be used to reduce color or spatial visual complexity by almost half in some search tasks with out significantly reducing performance.

2025-07-18T05:07:02Z ACM Transactions on Computer-Human Interaction (TOCHI) Volume 4 Issue 4 Pages 323-346. (1997) Benjamin Watson Neff Walker Larry F Hodges Aileen Worden 10.1145/267135.267137 http://arxiv.org/abs/2507.13586v1 TexGS-VolVis: Expressive Scene Editing for Volume Visualization via Textured Gaussian Splatting 2025-07-18T00:14:27Z

Advancements in volume visualization (VolVis) focus on extracting insights from 3D volumetric data by generating visually compelling renderings that reveal complex internal structures. Existing VolVis approaches have explored non-photorealistic rendering techniques to enhance the clarity, expressiveness, and informativeness of visual communication. While effective, these methods often rely on complex predefined rules and are limited to transferring a single style, restricting their flexibility. To overcome these limitations, we advocate the representation of VolVis scenes using differentiable Gaussian primitives combined with pretrained large models to enable arbitrary style transfer and real-time rendering. However, conventional 3D Gaussian primitives tightly couple geometry and appearance, leading to suboptimal stylization results. To address this, we introduce TexGS-VolVis, a textured Gaussian splatting framework for VolVis. TexGS-VolVis employs 2D Gaussian primitives, extending each Gaussian with additional texture and shading attributes, resulting in higher-quality, geometry-consistent stylization and enhanced lighting control during inference. Despite these improvements, achieving flexible and controllable scene editing remains challenging. To further enhance stylization, we develop image- and text-driven non-photorealistic scene editing tailored for TexGS-VolVis and 2D-lift-3D segmentation to enable partial editing with fine-grained control. We evaluate TexGS-VolVis both qualitatively and quantitatively across various volume rendering scenes, demonstrating its superiority over existing methods in terms of efficiency, visual quality, and editing flexibility.

2025-07-18T00:14:27Z Accepted by IEEE VIS 2025 Kaiyuan Tang Kuangshi Ai Jun Han Chaoli Wang http://arxiv.org/abs/2504.15028v2 A Controllable Appearance Representation for Flexible Transfer and Editing 2025-07-17T14:09:31Z

We present a method that computes an interpretable representation of material appearance within a highly compact, disentangled latent space. This representation is learned in a self-supervised fashion using an adapted FactorVAE. We train our model with a carefully designed unlabeled dataset, avoiding possible biases induced by human-generated labels. Our model demonstrates strong disentanglement and interpretability by effectively encoding material appearance and illumination, despite the absence of explicit supervision. Then, we use our representation as guidance for training a lightweight IP-Adapter to condition a diffusion pipeline that transfers the appearance of one or more images onto a target geometry, and allows the user to further edit the resulting appearance. Our approach offers fine-grained control over the generated results: thanks to the well-structured compact latent space, users can intuitively manipulate attributes such as hue or glossiness in image space to achieve the desired final appearance.

2025-04-21T11:29:06Z EGSR 2025 - Symposium Track Santiago Jimenez-Navarro Julia Guerrero-Viu Belen Masia 10.2312/sr.20251187 http://arxiv.org/abs/2507.13419v1 Lab-Scale Gantry Crane Digital Twin Exemplar 2025-07-17T13:54:57Z

The research topic of digital twins has attracted a large amount of interest over the past decade. However, publicly available exemplars remain scarce. In the interest of open and reproducible science, in this exemplar paper we present a lab-scale gantry crane and its digital twin. The exemplar comprises both the physical and digital side of the twin system. The physical side consists of the physical crane and its controller. The digital side covers the CAD models and kinematic model of the crane, and provides services for optimal control, historical data logging, data visualization and continuous validation. We used this setup as use case in several previous publications where its functionality was validated. It is publicly available and only relies on other freely available and commonly used software, this way we hope it can be used for future research or education on the topic of digital twins.

2025-07-17T13:54:57Z 6 pages, 8 figures, associated GitHub repository: https://github.com/Cosys-Lab/lab-scale-gantry-crane Joost Mertens Joachim Denil http://arxiv.org/abs/2507.12667v1 VolSegGS: Segmentation and Tracking in Dynamic Volumetric Scenes via Deformable 3D Gaussians 2025-07-16T22:57:03Z

Visualization of large-scale time-dependent simulation data is crucial for domain scientists to analyze complex phenomena, but it demands significant I/O bandwidth, storage, and computational resources. To enable effective visualization on local, low-end machines, recent advances in view synthesis techniques, such as neural radiance fields, utilize neural networks to generate novel visualizations for volumetric scenes. However, these methods focus on reconstruction quality rather than facilitating interactive visualization exploration, such as feature extraction and tracking. We introduce VolSegGS, a novel Gaussian splatting framework that supports interactive segmentation and tracking in dynamic volumetric scenes for exploratory visualization and analysis. Our approach utilizes deformable 3D Gaussians to represent a dynamic volumetric scene, allowing for real-time novel view synthesis. For accurate segmentation, we leverage the view-independent colors of Gaussians for coarse-level segmentation and refine the results with an affinity field network for fine-level segmentation. Additionally, by embedding segmentation results within the Gaussians, we ensure that their deformation enables continuous tracking of segmented regions over time. We demonstrate the effectiveness of VolSegGS with several time-varying datasets and compare our solutions against state-of-the-art methods. With the ability to interact with a dynamic scene in real time and provide flexible segmentation and tracking capabilities, VolSegGS offers a powerful solution under low computational demands. This framework unlocks exciting new possibilities for time-varying volumetric data analysis and visualization.

2025-07-16T22:57:03Z Siyuan Yao Chaoli Wang http://arxiv.org/abs/2507.12621v1 NLI4VolVis: Natural Language Interaction for Volume Visualization via LLM Multi-Agents and Editable 3D Gaussian Splatting 2025-07-16T20:35:46Z

Traditional volume visualization (VolVis) methods, like direct volume rendering, suffer from rigid transfer function designs and high computational costs. Although novel view synthesis approaches enhance rendering efficiency, they require additional learning effort for non-experts and lack support for semantic-level interaction. To bridge this gap, we propose NLI4VolVis, an interactive system that enables users to explore, query, and edit volumetric scenes using natural language. NLI4VolVis integrates multi-view semantic segmentation and vision-language models to extract and understand semantic components in a scene. We introduce a multi-agent large language model architecture equipped with extensive function-calling tools to interpret user intents and execute visualization tasks. The agents leverage external tools and declarative VolVis commands to interact with the VolVis engine powered by 3D editable Gaussians, enabling open-vocabulary object querying, real-time scene editing, best-view selection, and 2D stylization. We validate our system through case studies and a user study, highlighting its improved accessibility and usability in volumetric data exploration. We strongly recommend readers check our case studies, demo video, and source code at https://nli4volvis.github.io/.

2025-07-16T20:35:46Z IEEE VIS 2025. Project Page: https://nli4volvis.github.io/ IEEE Transactions on Visualization and Computer Graphics (TVCG), vol. 32, no. 1, 2026 Kuangshi Ai Kaiyuan Tang Chaoli Wang http://arxiv.org/abs/2507.12600v1 HairFormer: Transformer-Based Dynamic Neural Hair Simulation 2025-07-16T19:42:08Z

Simulating hair dynamics that generalize across arbitrary hairstyles, body shapes, and motions is a critical challenge. Our novel two-stage neural solution is the first to leverage Transformer-based architectures for such a broad generalization. We propose a Transformer-powered static network that predicts static draped shapes for any hairstyle, effectively resolving hair-body penetrations and preserving hair fidelity. Subsequently, a dynamic network with a novel cross-attention mechanism fuses static hair features with kinematic input to generate expressive dynamics and complex secondary motions. This dynamic network also allows for efficient fine-tuning of challenging motion sequences, such as abrupt head movements. Our method offers real-time inference for both static single-frame drapes and dynamic drapes over pose sequences. Our method demonstrates high-fidelity and generalizable dynamic hair across various styles, guided by physics-informed losses, and can resolve penetrations even for complex, unseen long hairstyles, highlighting its broad generalization.

2025-07-16T19:42:08Z Joy Xiaoji Zhang Jingsen Zhu Hanyu Chen Steve Marschner http://arxiv.org/abs/2506.11133v2 Monocular 3D Hand Pose Estimation with Implicit Camera Alignment 2025-07-16T18:17:59Z

Estimating the 3D hand articulation from a single color image is an important problem with applications in Augmented Reality (AR), Virtual Reality (VR), Human-Computer Interaction (HCI), and robotics. Apart from the absence of depth information, occlusions, articulation complexity, and the need for camera parameters knowledge pose additional challenges. In this work, we propose an optimization pipeline for estimating the 3D hand articulation from 2D keypoint input, which includes a keypoint alignment step and a fingertip loss to overcome the need to know or estimate the camera parameters. We evaluate our approach on the EgoDexter and Dexter+Object benchmarks to showcase that it performs competitively with the state-of-the-art, while also demonstrating its robustness when processing "in-the-wild" images without any prior camera knowledge. Our quantitative analysis highlights the sensitivity of the 2D keypoint estimation accuracy, despite the use of hand priors. Code is available at the project page https://cpantazop.github.io/HandRepo/

2025-06-10T18:45:22Z Code is available at the project page https://cpantazop.github.io/HandRepo/ Christos Pantazopoulos Spyridon Thermos Gerasimos Potamianos http://arxiv.org/abs/2411.02179v3 CleAR: Robust Context-Guided Generative Lighting Estimation for Mobile Augmented Reality 2025-07-16T16:15:30Z

High-quality environment lighting is essential for creating immersive mobile augmented reality (AR) experiences. However, achieving visually coherent estimation for mobile AR is challenging due to several key limitations in AR device sensing capabilities, including low camera FoV and limited pixel dynamic ranges. Recent advancements in generative AI, which can generate high-quality images from different types of prompts, including texts and images, present a potential solution for high-quality lighting estimation. Still, to effectively use generative image diffusion models, we must address two key limitations of content quality and slow inference. In this work, we design and implement a generative lighting estimation system called CleAR that can produce high-quality, diverse environment maps in the format of 360° HDR images. Specifically, we design a two-step generation pipeline guided by AR environment context data to ensure the output aligns with the physical environment's visual context and color appearance. To improve the estimation robustness under different lighting conditions, we design a real-time refinement component to adjust lighting estimation results on AR devices. Through a combination of quantitative and qualitative evaluations, we show that CleAR outperforms state-of-the-art lighting estimation methods on both estimation accuracy, latency, and robustness, and is rated by 31 participants as producing better renderings for most virtual objects. For example, CleAR achieves 51% to 56% accuracy improvement on virtual object renderings across objects of three distinctive types of materials and reflective properties. CleAR produces lighting estimates of comparable or better quality in just 3.2 seconds -- over 110X faster than state-of-the-art methods.

2024-11-04T15:37:18Z Yiqin Zhao Mallesham Dasari Tian Guo 10.1145/3749535 http://arxiv.org/abs/2506.05935v2 SurGSplat: Progressive Geometry-Constrained Gaussian Splatting for Surgical Scene Reconstruction 2025-07-16T10:02:27Z

Intraoperative navigation relies heavily on precise 3D reconstruction to ensure accuracy and safety during surgical procedures. However, endoscopic scenarios present unique challenges, including sparse features and inconsistent lighting, which render many existing Structure-from-Motion (SfM)-based methods inadequate and prone to reconstruction failure. To mitigate these constraints, we propose SurGSplat, a novel paradigm designed to progressively refine 3D Gaussian Splatting (3DGS) through the integration of geometric constraints. By enabling the detailed reconstruction of vascular structures and other critical features, SurGSplat provides surgeons with enhanced visual clarity, facilitating precise intraoperative decision-making. Experimental evaluations demonstrate that SurGSplat achieves superior performance in both novel view synthesis (NVS) and pose estimation accuracy, establishing it as a high-fidelity and efficient solution for surgical scene reconstruction. More information and results can be found on the page https://surgsplat.github.io/.

2025-06-06T10:02:11Z Yuchao Zheng Jianing Zhang Guochen Ning Hongen Liao http://arxiv.org/abs/2507.11971v1 HPR3D: Hierarchical Proxy Representation for High-Fidelity 3D Reconstruction and Controllable Editing 2025-07-16T07:09:05Z

Current 3D representations like meshes, voxels, point clouds, and NeRF-based neural implicit fields exhibit significant limitations: they are often task-specific, lacking universal applicability across reconstruction, generation, editing, and driving. While meshes offer high precision, their dense vertex data complicates editing; NeRFs deliver excellent rendering but suffer from structural ambiguity, hindering animation and manipulation; all representations inherently struggle with the trade-off between data complexity and fidelity. To overcome these issues, we introduce a novel 3D Hierarchical Proxy Node representation. Its core innovation lies in representing an object's shape and texture via a sparse set of hierarchically organized (tree-structured) proxy nodes distributed on its surface and interior. Each node stores local shape and texture information (implicitly encoded by a small MLP) within its neighborhood. Querying any 3D coordinate's properties involves efficient neural interpolation and lightweight decoding from relevant nearby and parent nodes. This framework yields a highly compact representation where nodes align with local semantics, enabling direct drag-and-edit manipulation, and offers scalable quality-complexity control. Extensive experiments across 3D reconstruction and editing demonstrate our method's expressive efficiency, high-fidelity rendering quality, and superior editability.

2025-07-16T07:09:05Z Tielong Wang Yuxuan Xiong Jinfan Liu Zhifan Zhang Ye Chen Yue Shi Bingbing Ni http://arxiv.org/abs/2507.13388v1 DLSF: Dual-Layer Synergistic Fusion for High-Fidelity Image Syn-thesis 2025-07-16T03:22:43Z

With the rapid advancement of diffusion-based generative models, Stable Diffusion (SD) has emerged as a state-of-the-art framework for high-fidelity im-age synthesis. However, existing SD models suffer from suboptimal feature aggregation, leading to in-complete semantic alignment and loss of fine-grained details, especially in highly textured and complex scenes. To address these limitations, we propose a novel dual-latent integration framework that en-hances feature interactions between the base latent and refined latent representations. Our approach em-ploys a feature concatenation strategy followed by an adaptive fusion module, which can be instantiated as either (i) an Adaptive Global Fusion (AGF) for hier-archical feature harmonization, or (ii) a Dynamic Spatial Fusion (DSF) for spatially-aware refinement. This design enables more effective cross-latent com-munication, preserving both global coherence and local texture fidelity. Our GitHub page: https://anonymous.4open.science/r/MVA2025-22 .

2025-07-16T03:22:43Z Zhen-Qi Chen Yuan-Fu Yang http://arxiv.org/abs/2507.11857v1 Measuring and predicting visual fidelity 2025-07-16T02:52:20Z

This paper is a study of techniques for measuring and predicting visual fidelity. As visual stimuli we use polygonal models, and vary their fidelity with two different model simplification algorithms. We also group the stimuli into two object types: animals and man made artifacts. We examine three different experimental techniques for measuring these fidelity changes: naming times, ratings, and preferences. All the measures were sensitive to the type of simplification and level of simplification. However, the measures differed from one another in their response to object type. We also examine several automatic techniques for predicting these experimental measures, including techniques based on images and on the models themselves. Automatic measures of fidelity were successful at predicting experimental ratings, less successful at predicting preferences, and largely failures at predicting naming times. We conclude with suggestions for use and improvement of the experimental and automatic measures of visual fidelity.

2025-07-16T02:52:20Z SIGGRAPH '01: Proceedings of the 28th annual conference on Computer graphics and interactive techniques Pages 213 - 220. 2001 Benjamin Watson Alinda Friedman Aaron McGaffey 10.1145/383259.383283