https://arxiv.org/api/xcTMuIfYfAZgBn5s3zjY9bayKnc 2026-06-28T20:03:02Z 9390 1965 15 http://arxiv.org/abs/2503.11054v2 LUSD: Localized Update Score Distillation for Text-Guided Image Editing 2025-07-02T06:00:36Z

While diffusion models show promising results in image editing given a target prompt, achieving both prompt fidelity and background preservation remains difficult. Recent works have introduced score distillation techniques that leverage the rich generative prior of text-to-image diffusion models to solve this task without additional fine-tuning. However, these methods often struggle with tasks such as object insertion. Our investigation of these failures reveals significant variations in gradient magnitude and spatial distribution, making hyperparameter tuning highly input-specific or unsuccessful. To address this, we propose two simple yet effective modifications: attention-based spatial regularization and gradient filtering-normalization, both aimed at reducing these variations during gradient updates. Experimental results show our method outperforms state-of-the-art score distillation techniques in prompt fidelity, improving successful edits while preserving the background. Users also preferred our method over state-of-the-art techniques across three metrics, and by 58-64% overall.

2025-03-14T03:45:29Z ICCV 2025. Project page: https://github.com/sincostanx/LUSD Worameth Chinchuthakun Tossaporn Saengja Nontawat Tritrong Pitchaporn Rewatbowornwong Pramook Khungurn Supasorn Suwajanakorn http://arxiv.org/abs/2507.01140v1 Multi-Focus Probes for Context-Preserving Network Exploration and Interaction in Immersive Analytics 2025-07-01T18:59:07Z

Immersive visualization of network data enables users to physically navigate and interact with complex structures, but managing transitions between detailed local (egocentric) views and global (exocentric) overviews remains a major challenge. We present a multifocus probe technique for immersive environments that allows users to instantiate multiple egocentric subgraph views while maintaining persistent links to the global network context. Each probe acts as a portable local focus, enabling fine-grained inspection and editing of distant or occluded regions. Visual and haptic guidance mechanisms ensure context preservation during multi-scale interaction. We demonstrate and discuss the usability of our technique for the editing of network data.

2025-07-01T18:59:07Z 5 pages, 3 figures, IEEE Vis 2025 Eric Zimmermann Stefan Bruckner http://arxiv.org/abs/2507.01116v1 Semiautomatic Simplification 2025-07-01T18:23:19Z

We present semisimp, a tool for semiautomatic simplification of three dimensional polygonal models. Existing automatic simplification technology is quite mature, but is not sensitive to the heightened importance of distinct semantic model regions such as faces and limbs, nor to simplification constraints imposed by model usage such as animation. semisimp allows users to preserve such regions by intervening in the simplification process. Users can manipulate the order in which basic simplifications are applied to redistribute model detail, improve the simplified models themselves by repositioning vertices with propagation to neighboring levels of detail, and adjust the hierarchical partitioning of the model surface to segment simplification and improve control of reordering and position propagation.

2025-07-01T18:23:19Z Proceedings of the ACM 2001 symposium on Interactive 3D graphics, pages 43-48 Gong Li Benjamin Watson 10.1145/364338.364344 http://arxiv.org/abs/2506.15815v2 GratNet: A Photorealistic Neural Shader for Diffractive Surfaces 2025-07-01T18:16:01Z

Structural coloration is commonly modeled using wave optics for reliable and photorealistic rendering of natural, quasi-periodic and complex nanostructures. Such models often rely on dense, preliminary or preprocessed data to accurately capture the nuanced variations in diffractive surface reflectances. This heavy data dependency warrants implicit neural representation which has not been addressed comprehensively in the current literature. In this paper, we present a multi-layer perceptron (MLP) based method for data-driven rendering of diffractive surfaces with high accuracy and efficiency. We primarily approach this problem from a data compression perspective to devise a nuanced training and modeling method which is attuned to the domain and range characteristics of diffractive reflectance datasets. Importantly, our approach avoids over-fitting and has robust resampling behavior. Using Peak-Signal-to-Noise (PSNR), Structural Similarity Index Measure (SSIM) and a flipping difference evaluator (FLIP) as evaluation metrics, we demonstrate the high-quality reconstruction of the ground-truth. In comparison to a recent state-of-the-art offline, wave-optical, forward modeling approach, our method reproduces subjectively similar results with significant performance gains. We reduce the memory footprint of the raw datasets by two orders of magnitude in general. Lastly, we depict the working of our method with actual surface renderings.

2025-06-18T18:58:00Z Narayan Kandel Daljit Singh J. S. Dhillon http://arxiv.org/abs/2507.00725v1 Analyzing Time-Varying Scalar Fields using Piecewise-Linear Morse-Cerf Theory 2025-07-01T13:08:07Z

Morse-Cerf theory considers a one-parameter family of smooth functions defined on a manifold and studies the evolution of their critical points with the parameter. This paper presents an adaptation of Morse-Cerf theory to a family of piecewise-linear (PL) functions. The vertex diagram and Cerf diagram are introduced as representations of the evolution of critical points of the PL function. The characterization of a crossing in the vertex diagram based on the homology of the lower links of vertices leads to the definition of a topological descriptor for time-varying scalar fields. An algorithm for computing the Cerf diagram and a measure for comparing two Cerf diagrams are also described together with experimental results on time-varying scalar fields.

2025-07-01T13:08:07Z Amritendu Dhar Apratim Chakraborty Vijay Natarajan http://arxiv.org/abs/2506.23777v2 Synthetically Expressive: Evaluating gesture and voice for emotion and empathy in VR and 2D scenarios 2025-07-01T09:56:02Z

The creation of virtual humans increasingly leverages automated synthesis of speech and gestures, enabling expressive, adaptable agents that effectively engage users. However, the independent development of voice and gesture generation technologies, alongside the growing popularity of virtual reality (VR), presents significant questions about the integration of these signals and their ability to convey emotional detail in immersive environments. In this paper, we evaluate the influence of real and synthetic gestures and speech, alongside varying levels of immersion (VR vs. 2D displays) and emotional contexts (positive, neutral, negative) on user perceptions. We investigate how immersion affects the perceived match between gestures and speech and the impact on key aspects of user experience, including emotional and empathetic responses and the sense of co-presence. Our findings indicate that while VR enhances the perception of natural gesture-voice pairings, it does not similarly improve synthetic ones - amplifying the perceptual gap between them. These results highlight the need to reassess gesture appropriateness and refine AI-driven synthesis for immersive environments. Supplementary video: https://youtu.be/WMfjIB1X-dc

2025-06-30T12:18:52Z Haoyang Du Kiran Chhatre Christopher Peters Brian Keegan Rachel McDonnell Cathy Ennis http://arxiv.org/abs/2505.04203v2 ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition 2025-07-01T07:35:51Z

The art of instrument performance stands as a vivid manifestation of human creativity and emotion. Nonetheless, generating instrument performance motions is a highly challenging task, as it requires not only capturing intricate movements but also reconstructing the complex dynamics of the performer-instrument interaction. While existing works primarily focus on modeling partial body motions, we propose Expressive ceLlo performance motion Generation for Audio Rendition (ELGAR), a state-of-the-art diffusion-based framework for whole-body fine-grained instrument performance motion generation solely from audio. To emphasize the interactive nature of the instrument performance, we introduce Hand Interactive Contact Loss (HICL) and Bow Interactive Contact Loss (BICL), which effectively guarantee the authenticity of the interplay. Moreover, to better evaluate whether the generated motions align with the semantic context of the music audio, we design novel metrics specifically for string instrument performance motion generation, including finger-contact distance, bow-string distance, and bowing score. Extensive evaluations and ablation studies are conducted to validate the efficacy of the proposed methods. In addition, we put forward a motion generation dataset SPD-GEN, collated and normalized from the MoCap dataset SPD. As demonstrated, ELGAR has shown great potential in generating instrument performance motions with complicated and fast interactions, which will promote further development in areas such as animation, music education, interactive art creation, etc.

2025-05-07T07:57:08Z SIGGRAPH 2025 Zhiping Qiu Yitong Jin Yuan Wang Yi Shi Chongwu Wang Chao Tan Xiaobing Li Feng Yu Tao Yu Qionghai Dai 10.1145/3721238.3730756 http://arxiv.org/abs/2507.00476v1 FreNBRDF: A Frequency-Rectified Neural Material Representation 2025-07-01T06:48:50Z

Accurate material modeling is crucial for achieving photorealistic rendering, bridging the gap between computer-generated imagery and real-world photographs. While traditional approaches rely on tabulated BRDF data, recent work has shifted towards implicit neural representations, which offer compact and flexible frameworks for a range of tasks. However, their behavior in the frequency domain remains poorly understood. To address this, we introduce FreNBRDF, a frequency-rectified neural material representation. By leveraging spherical harmonics, we integrate frequency-domain considerations into neural BRDF modeling. We propose a novel frequency-rectified loss, derived from a frequency analysis of neural materials, and incorporate it into a generalizable and adaptive reconstruction and editing pipeline. This framework enhances fidelity, adaptability, and efficiency. Extensive experiments demonstrate that FreNBRDF improves the accuracy and robustness of material appearance reconstruction and editing compared to state-of-the-art baselines, enabling more structured and interpretable downstream tasks and applications.

2025-07-01T06:48:50Z Chenliang Zhou Zheyuan Hu Cengiz Oztireli http://arxiv.org/abs/2507.00333v1 Scope Meets Screen: Lessons Learned in Designing Composite Visualizations for Marksmanship Training Across Skill Levels 2025-07-01T00:16:41Z

Marksmanship practices are required in various professions, including police, military personnel, hunters, as well as sports shooters, such as Olympic shooting, biathlon, and modern pentathlon. The current form of training and coaching is mostly based on repetition, where the coach does not see through the eyes of the shooter, and analysis is limited to stance and accuracy post-session. In this study, we present a shooting visualization system and evaluate its perceived effectiveness for both novice and expert shooters. To achieve this, five composite visualizations were developed using first-person shooting video recordings enriched with overlaid metrics and graphical summaries. These views were evaluated with 10 participants (5 expert marksmen, 5 novices) through a mixed-methods study including shot-count and aiming interpretation tasks, pairwise preference comparisons, and semi-structured interviews. The results show that a dashboard-style composite view, combining raw video with a polar plot and selected graphs, was preferred in 9 of 10 cases and supported understanding across skill levels. The insights gained from this design study point to the broader value of integrating first-person video with visual analytics for coaching, and we suggest directions for applying this approach to other precision-based sports.

2025-07-01T00:16:41Z 5 pages, accepted at IEEE VIS 2025 Emin Zerman Jonas Carlsson Mårten Sjöström http://arxiv.org/abs/2507.00261v1 VirtualFencer: Generating Fencing Bouts based on Strategies Extracted from In-the-Wild Videos 2025-06-30T20:55:22Z

Fencing is a sport where athletes engage in diverse yet strategically logical motions. While most motions fall into a few high-level actions (e.g. step, lunge, parry), the execution can vary widely-fast vs. slow, large vs. small, offensive vs. defensive. Moreover, a fencer's actions are informed by a strategy that often comes in response to the opponent's behavior. This combination of motion diversity with underlying two-player strategy motivates the application of data-driven modeling to fencing. We present VirtualFencer, a system capable of extracting 3D fencing motion and strategy from in-the-wild video without supervision, and then using that extracted knowledge to generate realistic fencing behavior. We demonstrate the versatile capabilities of our system by having it (i) fence against itself (self-play), (ii) fence against a real fencer's motion from online video, and (iii) fence interactively against a professional fencer.

2025-06-30T20:55:22Z Zhiyin Lin Purvi Goel Joy Yun C. Karen Liu Joao Pedro Araujo http://arxiv.org/abs/2302.14368v5 Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods 2025-06-30T19:44:40Z

As Diffusion Models have shown promising performance, a lot of efforts have been made to improve the controllability of Diffusion Models. However, how to train Diffusion Models to have the disentangled latent spaces and how to naturally incorporate the disentangled conditions during the sampling process have been underexplored. In this paper, we present a training framework for feature disentanglement of Diffusion Models (FDiff). We further propose two sampling methods that can boost the realism of our Diffusion Models and also enhance the controllability. Concisely, we train Diffusion Models conditioned on two latent features, a spatial content mask, and a flattened style embedding. We rely on the inductive bias of the denoising process of Diffusion Models to encode pose/layout information in the content feature and semantic/style information in the style feature. Regarding the sampling methods, we first generalize Composable Diffusion Models (GCDM) by breaking the conditional independence assumption to allow for some dependence between conditional inputs, which is shown to be effective in realistic generation in our experiments. Second, we propose timestep-dependent weight scheduling for content and style features to further improve the performance. We also observe better controllability of our proposed methods compared to existing methods in image manipulation and image translation.

2023-02-28T07:43:00Z ECCV 2024; Code is available at https://github.com/WonwoongCho/Towards-Enhanced-Controllability-of-Diffusion-Models Wonwoong Cho Hareesh Ravi Midhun Harikumar Vinh Khuc Krishna Kumar Singh Jingwan Lu David I. Inouye Ajinkya Kale http://arxiv.org/abs/2506.23854v1 HiNeuS: High-fidelity Neural Surface Mitigating Low-texture and Reflective Ambiguity 2025-06-30T13:45:25Z

Neural surface reconstruction faces persistent challenges in reconciling geometric fidelity with photometric consistency under complex scene conditions. We present HiNeuS, a unified framework that holistically addresses three core limitations in existing approaches: multi-view radiance inconsistency, missing keypoints in textureless regions, and structural degradation from over-enforced Eikonal constraints during joint optimization. To resolve these issues through a unified pipeline, we introduce: 1) Differential visibility verification through SDF-guided ray tracing, resolving reflection ambiguities via continuous occlusion modeling; 2) Planar-conformal regularization via ray-aligned geometry patches that enforce local surface coherence while preserving sharp edges through adaptive appearance weighting; and 3) Physically-grounded Eikonal relaxation that dynamically modulates geometric constraints based on local radiance gradients, enabling detail preservation without sacrificing global regularity. Unlike prior methods that handle these aspects through sequential optimizations or isolated modules, our approach achieves cohesive integration where appearance-geometry constraints evolve synergistically throughout training. Comprehensive evaluations across synthetic and real-world datasets demonstrate state-of-the-art performance, including a 21.4% reduction in Chamfer distance over reflection-aware baselines and 2.32 dB PSNR improvement against neural rendering counterparts. Qualitative analyses reveal superior capability in recovering specular instruments, urban layouts with centimeter-scale infrastructure, and low-textured surfaces without local patch collapse. The method's generalizability is further validated through successful application to inverse rendering tasks, including material decomposition and view-consistent relighting.

2025-06-30T13:45:25Z Published in International Conference on Computer Vision (ICCV) 2025 Yida Wang Xueyang Zhang Kun Zhan Peng Jia Xianpeng Lang http://arxiv.org/abs/2506.10507v2 Edit360: 2D Image Edits to 3D Assets from Any Angle 2025-06-30T07:13:23Z

Recent advances in diffusion models have significantly improved image generation and editing, but extending these capabilities to 3D assets remains challenging, especially for fine-grained edits that require multi-view consistency. Existing methods typically restrict editing to predetermined viewing angles, severely limiting their flexibility and practical applications. We introduce Edit360, a tuning-free framework that extends 2D modifications to multi-view consistent 3D editing. Built upon video diffusion models, Edit360 enables user-specific editing from arbitrary viewpoints while ensuring structural coherence across all views. The framework selects anchor views for 2D modifications and propagates edits across the entire 360-degree range. To achieve this, Edit360 introduces a novel Anchor-View Editing Propagation mechanism, which effectively aligns and merges multi-view information within the latent and attention spaces of diffusion models. The resulting edited multi-view sequences facilitate the reconstruction of high-quality 3D assets, enabling customizable 3D content creation.

2025-06-12T09:09:28Z 11 pages, 9 figures Junchao Huang Xinting Hu Shaoshuai Shi Zhuotao Tian Li Jiang http://arxiv.org/abs/2506.23406v1 Uncertain Mode Surfaces in 3D Symmetric Second-Order Tensor Field Ensembles 2025-06-29T21:58:24Z

The analysis of 3D symmetric second-order tensor fields often relies on topological features such as degenerate tensor lines, neutral surfaces, and their generalization to mode surfaces, which reveal important structural insights into the data. However, uncertainty in such fields is typically visualized using derived scalar attributes or tensor glyph representations, which often fail to capture the global behavior. Recent advances have introduced uncertain topological features for tensor field ensembles by focusing on degenerate tensor locations. Yet, mode surfaces, including neutral surfaces and arbitrary mode surfaces are essential to a comprehensive understanding of tensor field topology. In this work, we present a generalization of uncertain degenerate tensor features to uncertain mode surfaces of arbitrary mode values, encompassing uncertain degenerate tensor lines as a special case. Our approach supports both surface and line geometries, forming a unified framework for analyzing uncertain mode-based topological features in tensor field ensembles. We demonstrate the effectiveness of our method on several real-world simulation datasets from engineering and materials science.

2025-06-29T21:58:24Z 4 + 1 pages, 4 figures, IEEE VIS 2025 Tim Gerrits http://arxiv.org/abs/2506.23388v1 Escher Tile Deformation via Closed-Form Solution 2025-06-29T20:03:47Z

We present a real-time deformation method for Escher tiles -- interlocking organic forms that seamlessly tessellate the plane following symmetry rules. We formulate the problem as determining a periodic displacement field. The goal is to deform Escher tiles without introducing gaps or overlaps. The resulting displacement field is obtained in closed form by an analytical solution. Our method processes tiles of 17 wallpaper groups across various representations such as images and meshes. Rather than treating tiles as mere boundaries, we consider them as textured shapes, ensuring that both the boundary and interior deform simultaneously. To enable fine-grained artistic input, our interactive tool features a user-controllable adaptive fall-off parameter, allowing precise adjustment of locality and supporting deformations with meaningful semantic control. We demonstrate the effectiveness of our method through various examples, including photo editing and shape sculpting, showing its use in applications such as fabrication and animation.

2025-06-29T20:03:47Z SIGGRAPH 2025 Crane He Chen Vladimir G. Kim 10.1145/3721238.3730681