https://arxiv.org/api/xcTMuIfYfAZgBn5s3zjY9bayKnc2026-06-28T20:03:02Z9390196515http://arxiv.org/abs/2503.11054v2LUSD: Localized Update Score Distillation for Text-Guided Image Editing2025-07-02T06:00:36ZWhile diffusion models show promising results in image editing given a target prompt, achieving both prompt fidelity and background preservation remains difficult. Recent works have introduced score distillation techniques that leverage the rich generative prior of text-to-image diffusion models to solve this task without additional fine-tuning. However, these methods often struggle with tasks such as object insertion. Our investigation of these failures reveals significant variations in gradient magnitude and spatial distribution, making hyperparameter tuning highly input-specific or unsuccessful. To address this, we propose two simple yet effective modifications: attention-based spatial regularization and gradient filtering-normalization, both aimed at reducing these variations during gradient updates. Experimental results show our method outperforms state-of-the-art score distillation techniques in prompt fidelity, improving successful edits while preserving the background. Users also preferred our method over state-of-the-art techniques across three metrics, and by 58-64% overall.2025-03-14T03:45:29ZICCV 2025. Project page: https://github.com/sincostanx/LUSDWorameth ChinchuthakunTossaporn SaengjaNontawat TritrongPitchaporn RewatbowornwongPramook KhungurnSupasorn Suwajanakornhttp://arxiv.org/abs/2507.01140v1Multi-Focus Probes for Context-Preserving Network Exploration and Interaction in Immersive Analytics2025-07-01T18:59:07ZImmersive visualization of network data enables users to physically navigate and interact with complex structures, but managing transitions between detailed local (egocentric) views and global (exocentric) overviews remains a major challenge. We present a multifocus probe technique for immersive environments that allows users to instantiate multiple egocentric subgraph views while maintaining persistent links to the global network context. Each probe acts as a portable local focus, enabling fine-grained inspection and editing of distant or occluded regions. Visual and haptic guidance mechanisms ensure context preservation during multi-scale interaction. We demonstrate and discuss the usability of our technique for the editing of network data.2025-07-01T18:59:07Z5 pages, 3 figures, IEEE Vis 2025Eric ZimmermannStefan Brucknerhttp://arxiv.org/abs/2507.01116v1Semiautomatic Simplification2025-07-01T18:23:19ZWe present semisimp, a tool for semiautomatic simplification of three dimensional polygonal models. Existing automatic simplification technology is quite mature, but is not sensitive to the heightened importance of distinct semantic model regions such as faces and limbs, nor to simplification constraints imposed by model usage such as animation. semisimp allows users to preserve such regions by intervening in the simplification process. Users can manipulate the order in which basic simplifications are applied to redistribute model detail, improve the simplified models themselves by repositioning vertices with propagation to neighboring levels of detail, and adjust the hierarchical partitioning of the model surface to segment simplification and improve control of reordering and position propagation.2025-07-01T18:23:19ZProceedings of the ACM 2001 symposium on Interactive 3D graphics, pages 43-48Gong LiBenjamin Watson10.1145/364338.364344http://arxiv.org/abs/2506.15815v2GratNet: A Photorealistic Neural Shader for Diffractive Surfaces2025-07-01T18:16:01ZStructural coloration is commonly modeled using wave optics for reliable and photorealistic rendering of natural, quasi-periodic and complex nanostructures. Such models often rely on dense, preliminary or preprocessed data to accurately capture the nuanced variations in diffractive surface reflectances. This heavy data dependency warrants implicit neural representation which has not been addressed comprehensively in the current literature. In this paper, we present a multi-layer perceptron (MLP) based method for data-driven rendering of diffractive surfaces with high accuracy and efficiency. We primarily approach this problem from a data compression perspective to devise a nuanced training and modeling method which is attuned to the domain and range characteristics of diffractive reflectance datasets. Importantly, our approach avoids over-fitting and has robust resampling behavior. Using Peak-Signal-to-Noise (PSNR), Structural Similarity Index Measure (SSIM) and a flipping difference evaluator (FLIP) as evaluation metrics, we demonstrate the high-quality reconstruction of the ground-truth. In comparison to a recent state-of-the-art offline, wave-optical, forward modeling approach, our method reproduces subjectively similar results with significant performance gains. We reduce the memory footprint of the raw datasets by two orders of magnitude in general. Lastly, we depict the working of our method with actual surface renderings.2025-06-18T18:58:00ZNarayan KandelDaljit Singh J. S. Dhillonhttp://arxiv.org/abs/2507.00725v1Analyzing Time-Varying Scalar Fields using Piecewise-Linear Morse-Cerf Theory2025-07-01T13:08:07ZMorse-Cerf theory considers a one-parameter family of smooth functions defined on a manifold and studies the evolution of their critical points with the parameter. This paper presents an adaptation of Morse-Cerf theory to a family of piecewise-linear (PL) functions. The vertex diagram and Cerf diagram are introduced as representations of the evolution of critical points of the PL function. The characterization of a crossing in the vertex diagram based on the homology of the lower links of vertices leads to the definition of a topological descriptor for time-varying scalar fields. An algorithm for computing the Cerf diagram and a measure for comparing two Cerf diagrams are also described together with experimental results on time-varying scalar fields.2025-07-01T13:08:07ZAmritendu DharApratim ChakrabortyVijay Natarajanhttp://arxiv.org/abs/2506.23777v2Synthetically Expressive: Evaluating gesture and voice for emotion and empathy in VR and 2D scenarios2025-07-01T09:56:02ZThe creation of virtual humans increasingly leverages automated synthesis of speech and gestures, enabling expressive, adaptable agents that effectively engage users. However, the independent development of voice and gesture generation technologies, alongside the growing popularity of virtual reality (VR), presents significant questions about the integration of these signals and their ability to convey emotional detail in immersive environments. In this paper, we evaluate the influence of real and synthetic gestures and speech, alongside varying levels of immersion (VR vs. 2D displays) and emotional contexts (positive, neutral, negative) on user perceptions. We investigate how immersion affects the perceived match between gestures and speech and the impact on key aspects of user experience, including emotional and empathetic responses and the sense of co-presence. Our findings indicate that while VR enhances the perception of natural gesture-voice pairings, it does not similarly improve synthetic ones - amplifying the perceptual gap between them. These results highlight the need to reassess gesture appropriateness and refine AI-driven synthesis for immersive environments. Supplementary video: https://youtu.be/WMfjIB1X-dc2025-06-30T12:18:52ZHaoyang DuKiran ChhatreChristopher PetersBrian KeeganRachel McDonnellCathy Ennishttp://arxiv.org/abs/2505.04203v2ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition2025-07-01T07:35:51ZThe art of instrument performance stands as a vivid manifestation of human creativity and emotion. Nonetheless, generating instrument performance motions is a highly challenging task, as it requires not only capturing intricate movements but also reconstructing the complex dynamics of the performer-instrument interaction. While existing works primarily focus on modeling partial body motions, we propose Expressive ceLlo performance motion Generation for Audio Rendition (ELGAR), a state-of-the-art diffusion-based framework for whole-body fine-grained instrument performance motion generation solely from audio. To emphasize the interactive nature of the instrument performance, we introduce Hand Interactive Contact Loss (HICL) and Bow Interactive Contact Loss (BICL), which effectively guarantee the authenticity of the interplay. Moreover, to better evaluate whether the generated motions align with the semantic context of the music audio, we design novel metrics specifically for string instrument performance motion generation, including finger-contact distance, bow-string distance, and bowing score. Extensive evaluations and ablation studies are conducted to validate the efficacy of the proposed methods. In addition, we put forward a motion generation dataset SPD-GEN, collated and normalized from the MoCap dataset SPD. As demonstrated, ELGAR has shown great potential in generating instrument performance motions with complicated and fast interactions, which will promote further development in areas such as animation, music education, interactive art creation, etc.2025-05-07T07:57:08ZSIGGRAPH 2025Zhiping QiuYitong JinYuan WangYi ShiChongwu WangChao TanXiaobing LiFeng YuTao YuQionghai Dai10.1145/3721238.3730756http://arxiv.org/abs/2507.00476v1FreNBRDF: A Frequency-Rectified Neural Material Representation2025-07-01T06:48:50ZAccurate material modeling is crucial for achieving photorealistic rendering, bridging the gap between computer-generated imagery and real-world photographs. While traditional approaches rely on tabulated BRDF data, recent work has shifted towards implicit neural representations, which offer compact and flexible frameworks for a range of tasks. However, their behavior in the frequency domain remains poorly understood. To address this, we introduce FreNBRDF, a frequency-rectified neural material representation. By leveraging spherical harmonics, we integrate frequency-domain considerations into neural BRDF modeling. We propose a novel frequency-rectified loss, derived from a frequency analysis of neural materials, and incorporate it into a generalizable and adaptive reconstruction and editing pipeline. This framework enhances fidelity, adaptability, and efficiency. Extensive experiments demonstrate that FreNBRDF improves the accuracy and robustness of material appearance reconstruction and editing compared to state-of-the-art baselines, enabling more structured and interpretable downstream tasks and applications.2025-07-01T06:48:50ZChenliang ZhouZheyuan HuCengiz Oztirelihttp://arxiv.org/abs/2507.00333v1Scope Meets Screen: Lessons Learned in Designing Composite Visualizations for Marksmanship Training Across Skill Levels2025-07-01T00:16:41ZMarksmanship practices are required in various professions, including police, military personnel, hunters, as well as sports shooters, such as Olympic shooting, biathlon, and modern pentathlon. The current form of training and coaching is mostly based on repetition, where the coach does not see through the eyes of the shooter, and analysis is limited to stance and accuracy post-session. In this study, we present a shooting visualization system and evaluate its perceived effectiveness for both novice and expert shooters. To achieve this, five composite visualizations were developed using first-person shooting video recordings enriched with overlaid metrics and graphical summaries. These views were evaluated with 10 participants (5 expert marksmen, 5 novices) through a mixed-methods study including shot-count and aiming interpretation tasks, pairwise preference comparisons, and semi-structured interviews. The results show that a dashboard-style composite view, combining raw video with a polar plot and selected graphs, was preferred in 9 of 10 cases and supported understanding across skill levels. The insights gained from this design study point to the broader value of integrating first-person video with visual analytics for coaching, and we suggest directions for applying this approach to other precision-based sports.2025-07-01T00:16:41Z5 pages, accepted at IEEE VIS 2025Emin ZermanJonas CarlssonMårten Sjöströmhttp://arxiv.org/abs/2507.00261v1VirtualFencer: Generating Fencing Bouts based on Strategies Extracted from In-the-Wild Videos2025-06-30T20:55:22ZFencing is a sport where athletes engage in diverse yet strategically logical motions. While most motions fall into a few high-level actions (e.g. step, lunge, parry), the execution can vary widely-fast vs. slow, large vs. small, offensive vs. defensive. Moreover, a fencer's actions are informed by a strategy that often comes in response to the opponent's behavior. This combination of motion diversity with underlying two-player strategy motivates the application of data-driven modeling to fencing. We present VirtualFencer, a system capable of extracting 3D fencing motion and strategy from in-the-wild video without supervision, and then using that extracted knowledge to generate realistic fencing behavior. We demonstrate the versatile capabilities of our system by having it (i) fence against itself (self-play), (ii) fence against a real fencer's motion from online video, and (iii) fence interactively against a professional fencer.2025-06-30T20:55:22ZZhiyin LinPurvi GoelJoy YunC. Karen LiuJoao Pedro Araujohttp://arxiv.org/abs/2302.14368v5Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods2025-06-30T19:44:40ZAs Diffusion Models have shown promising performance, a lot of efforts have been made to improve the controllability of Diffusion Models. However, how to train Diffusion Models to have the disentangled latent spaces and how to naturally incorporate the disentangled conditions during the sampling process have been underexplored. In this paper, we present a training framework for feature disentanglement of Diffusion Models (FDiff). We further propose two sampling methods that can boost the realism of our Diffusion Models and also enhance the controllability. Concisely, we train Diffusion Models conditioned on two latent features, a spatial content mask, and a flattened style embedding. We rely on the inductive bias of the denoising process of Diffusion Models to encode pose/layout information in the content feature and semantic/style information in the style feature. Regarding the sampling methods, we first generalize Composable Diffusion Models (GCDM) by breaking the conditional independence assumption to allow for some dependence between conditional inputs, which is shown to be effective in realistic generation in our experiments. Second, we propose timestep-dependent weight scheduling for content and style features to further improve the performance. We also observe better controllability of our proposed methods compared to existing methods in image manipulation and image translation.2023-02-28T07:43:00ZECCV 2024; Code is available at https://github.com/WonwoongCho/Towards-Enhanced-Controllability-of-Diffusion-ModelsWonwoong ChoHareesh RaviMidhun HarikumarVinh KhucKrishna Kumar SinghJingwan LuDavid I. InouyeAjinkya Kalehttp://arxiv.org/abs/2506.23854v1HiNeuS: High-fidelity Neural Surface Mitigating Low-texture and Reflective Ambiguity2025-06-30T13:45:25ZNeural surface reconstruction faces persistent challenges in reconciling geometric fidelity with photometric consistency under complex scene conditions. We present HiNeuS, a unified framework that holistically addresses three core limitations in existing approaches: multi-view radiance inconsistency, missing keypoints in textureless regions, and structural degradation from over-enforced Eikonal constraints during joint optimization. To resolve these issues through a unified pipeline, we introduce: 1) Differential visibility verification through SDF-guided ray tracing, resolving reflection ambiguities via continuous occlusion modeling; 2) Planar-conformal regularization via ray-aligned geometry patches that enforce local surface coherence while preserving sharp edges through adaptive appearance weighting; and 3) Physically-grounded Eikonal relaxation that dynamically modulates geometric constraints based on local radiance gradients, enabling detail preservation without sacrificing global regularity. Unlike prior methods that handle these aspects through sequential optimizations or isolated modules, our approach achieves cohesive integration where appearance-geometry constraints evolve synergistically throughout training. Comprehensive evaluations across synthetic and real-world datasets demonstrate state-of-the-art performance, including a 21.4% reduction in Chamfer distance over reflection-aware baselines and 2.32 dB PSNR improvement against neural rendering counterparts. Qualitative analyses reveal superior capability in recovering specular instruments, urban layouts with centimeter-scale infrastructure, and low-textured surfaces without local patch collapse. The method's generalizability is further validated through successful application to inverse rendering tasks, including material decomposition and view-consistent relighting.2025-06-30T13:45:25ZPublished in International Conference on Computer Vision (ICCV) 2025Yida WangXueyang ZhangKun ZhanPeng JiaXianpeng Langhttp://arxiv.org/abs/2506.10507v2Edit360: 2D Image Edits to 3D Assets from Any Angle2025-06-30T07:13:23ZRecent advances in diffusion models have significantly improved image generation and editing, but extending these capabilities to 3D assets remains challenging, especially for fine-grained edits that require multi-view consistency. Existing methods typically restrict editing to predetermined viewing angles, severely limiting their flexibility and practical applications. We introduce Edit360, a tuning-free framework that extends 2D modifications to multi-view consistent 3D editing. Built upon video diffusion models, Edit360 enables user-specific editing from arbitrary viewpoints while ensuring structural coherence across all views. The framework selects anchor views for 2D modifications and propagates edits across the entire 360-degree range. To achieve this, Edit360 introduces a novel Anchor-View Editing Propagation mechanism, which effectively aligns and merges multi-view information within the latent and attention spaces of diffusion models. The resulting edited multi-view sequences facilitate the reconstruction of high-quality 3D assets, enabling customizable 3D content creation.2025-06-12T09:09:28Z11 pages, 9 figuresJunchao HuangXinting HuShaoshuai ShiZhuotao TianLi Jianghttp://arxiv.org/abs/2506.23406v1Uncertain Mode Surfaces in 3D Symmetric Second-Order Tensor Field Ensembles2025-06-29T21:58:24ZThe analysis of 3D symmetric second-order tensor fields often relies on topological features such as degenerate tensor lines, neutral surfaces, and their generalization to mode surfaces, which reveal important structural insights into the data. However, uncertainty in such fields is typically visualized using derived scalar attributes or tensor glyph representations, which often fail to capture the global behavior. Recent advances have introduced uncertain topological features for tensor field ensembles by focusing on degenerate tensor locations. Yet, mode surfaces, including neutral surfaces and arbitrary mode surfaces are essential to a comprehensive understanding of tensor field topology. In this work, we present a generalization of uncertain degenerate tensor features to uncertain mode surfaces of arbitrary mode values, encompassing uncertain degenerate tensor lines as a special case. Our approach supports both surface and line geometries, forming a unified framework for analyzing uncertain mode-based topological features in tensor field ensembles. We demonstrate the effectiveness of our method on several real-world simulation datasets from engineering and materials science.2025-06-29T21:58:24Z4 + 1 pages, 4 figures, IEEE VIS 2025Tim Gerritshttp://arxiv.org/abs/2506.23388v1Escher Tile Deformation via Closed-Form Solution2025-06-29T20:03:47ZWe present a real-time deformation method for Escher tiles -- interlocking organic forms that seamlessly tessellate the plane following symmetry rules. We formulate the problem as determining a periodic displacement field. The goal is to deform Escher tiles without introducing gaps or overlaps. The resulting displacement field is obtained in closed form by an analytical solution. Our method processes tiles of 17 wallpaper groups across various representations such as images and meshes. Rather than treating tiles as mere boundaries, we consider them as textured shapes, ensuring that both the boundary and interior deform simultaneously. To enable fine-grained artistic input, our interactive tool features a user-controllable adaptive fall-off parameter, allowing precise adjustment of locality and supporting deformations with meaningful semantic control. We demonstrate the effectiveness of our method through various examples, including photo editing and shape sculpting, showing its use in applications such as fabrication and animation.2025-06-29T20:03:47ZSIGGRAPH 2025Crane He ChenVladimir G. Kim10.1145/3721238.3730681