https://arxiv.org/api/wD34c4DJB3vrXKXHwrm6XdOPecw 2026-06-28T16:57:55Z 9390 1920 15 http://arxiv.org/abs/2507.09140v1 Interactive Drawing Guidance for Anime Illustrations with Diffusion Model 2025-07-12T05:03:09Z

Creating high-quality anime illustrations presents notable challenges, particularly for beginners, due to the intricate styles and fine details inherent in anime art. We present an interactive drawing guidance system specifically designed for anime illustrations to address this issue. It offers real-time guidance to help users refine their work and streamline the creative process. Our system is built upon the StreamDiffusion pipeline to deliver real-time drawing assistance. We fine-tune Stable Diffusion with LoRA to synthesize anime style RGB images from user-provided hand-drawn sketches and prompts. Leveraging the Informative Drawings model, we transform these RGB images into rough sketches, which are further refined into structured guidance sketches using a custom-designed optimizer. The proposed system offers precise, real-time guidance aligned with the creative intent of the user, significantly enhancing both the efficiency and accuracy of the drawing process. To assess the effectiveness of our approach, we conducted a user study, gathering empirical feedback on both system performance and interface usability.

2025-07-12T05:03:09Z 9 pages, 7 figures. In proceedings of NICOGRAPH International 2025 Chuang Chen Xiaoxuan Xie Yongming Zhang Tianyu Zhang Haoran Xie http://arxiv.org/abs/2412.14371v3 SEREP: Semantic Facial Expression Representation for Robust In-the-Wild Capture and Retargeting 2025-07-11T15:57:22Z

Monocular facial performance capture in-the-wild is challenging due to varied capture conditions, face shapes, and expressions. Most current methods rely on linear 3D Morphable Models, which represent facial expressions independently of identity at the vertex displacement level. We propose SEREP (Semantic Expression Representation), a model that disentangles expression from identity at the semantic level. We start by learning an expression representation from high-quality 3D data of unpaired facial expressions. Then, we train a model to predict expression from monocular images relying on a novel semi-supervised scheme using low quality synthetic data. In addition, we introduce MultiREX, a benchmark addressing the lack of evaluation resources for the expression capture task. Our experiments show that SEREP outperforms state-of-the-art methods, capturing challenging expressions and transferring them to new identities.

2024-12-18T22:12:28Z For our project page, see https://ubisoft-laforge.github.io/character/serep/ Arthur Josi Luiz Gustavo Hafemann Abdallah Dib Emeline Got Rafael M. O. Cruz Marc-Andre Carbonneau http://arxiv.org/abs/2507.08285v1 FlowDrag: 3D-aware Drag-based Image Editing with Mesh-guided Deformation Vector Flow Fields 2025-07-11T03:18:52Z

Drag-based editing allows precise object manipulation through point-based control, offering user convenience. However, current methods often suffer from a geometric inconsistency problem by focusing exclusively on matching user-defined points, neglecting the broader geometry and leading to artifacts or unstable edits. We propose FlowDrag, which leverages geometric information for more accurate and coherent transformations. Our approach constructs a 3D mesh from the image, using an energy function to guide mesh deformation based on user-defined drag points. The resulting mesh displacements are projected into 2D and incorporated into a UNet denoising process, enabling precise handle-to-target point alignment while preserving structural integrity. Additionally, existing drag-editing benchmarks provide no ground truth, making it difficult to assess how accurately the edits match the intended transformations. To address this, we present VFD (VidFrameDrag) benchmark dataset, which provides ground-truth frames using consecutive shots in a video dataset. FlowDrag outperforms existing drag-based editing methods on both VFD Bench and DragBench.

2025-07-11T03:18:52Z ICML 2025 Spotlight Gwanhyeong Koo Sunjae Yoon Younghwan Lee Ji Woo Hong Chang D. Yoo http://arxiv.org/abs/2507.07890v1 Hi-d maps: An interactive visualization technique for multi-dimensional categorical data 2025-07-10T16:20:37Z

In this paper, we present Hi-D maps, a novel method for the visualization of multi-dimensional categorical data. Our work addresses the scarcity of techniques for visualizing a large number of data-dimensions in an effective and space-efficient manner. We have mapped the full data-space onto a 2D regular polygonal region. The polygon is cut hierarchically with lines parallel to a user-controlled, ordered sequence of sides, each representing a dimension. We have used multiple visual cues such as orientation, thickness, color, countable glyphs, and text to depict cross-dimensional information. We have added interactivity and hierarchical browsing to facilitate flexible exploration of the display: small areas can be scrutinized for details. Thus, our method is also easily extendable to visualize hierarchical information. Our glyph animations add an engaging aesthetic during interaction. Like many visualizations, Hi-D maps become less effective when a large number of dimensions stresses perceptual limits, but Hi-D maps may add clarity before those limits are reached.

2025-07-10T16:20:37Z 2019 IEEE Visualization Conference (VIS), pages 216-220 Radi Muhammad Reza Benjamin A Watson 10.1109/VISUAL.2019.8933709 http://arxiv.org/abs/2507.08884v1 Agent-based visualization of streaming text 2025-07-10T16:01:57Z

We present a visualization infrastructure that maps data elements to agents, which have behaviors parameterized by those elements. Dynamic visualizations emerge as the agents change position, alter appearance and respond to one other. Agents move to minimize the difference between displayed agent-to-agent distances, and an input matrix of ideal distances. Our current application is visualization of streaming text. Each agent represents a significant word, visualizing it by displaying the word itself, centered in a circle sized by the frequency of word occurrence. We derive the ideal distance matrix from word cooccurrence, mapping higher co-occurrence to lower distance. To depict co-occurrence in its textual context, the ratio of intersection to circle area approximates the ratio of word co-occurrence to frequency. A networked backend process gathers articles from news feeds, blogs, Digg or Twitter, exploiting online search APIs to focus on user-chosen topics. Resulting visuals reveal the primary topics in text streams as clusters, with agent-based layout moving without instability as data streams change dynamically.

2025-07-10T16:01:57Z IEEE Information Visualization Conference Posters (2008) Jordan Riley Benson David Crist Phil Lafleur Benjamin Watson http://arxiv.org/abs/2507.07465v1 SD-GS: Structured Deformable 3D Gaussians for Efficient Dynamic Scene Reconstruction 2025-07-10T06:35:03Z

Current 4D Gaussian frameworks for dynamic scene reconstruction deliver impressive visual fidelity and rendering speed, however, the inherent trade-off between storage costs and the ability to characterize complex physical motions significantly limits the practical application of these methods. To tackle these problems, we propose SD-GS, a compact and efficient dynamic Gaussian splatting framework for complex dynamic scene reconstruction, featuring two key contributions. First, we introduce a deformable anchor grid, a hierarchical and memory-efficient scene representation where each anchor point derives multiple 3D Gaussians in its local spatiotemporal region and serves as the geometric backbone of the 3D scene. Second, to enhance modeling capability for complex motions, we present a deformation-aware densification strategy that adaptively grows anchors in under-reconstructed high-dynamic regions while reducing redundancy in static areas, achieving superior visual quality with fewer anchors. Experimental results demonstrate that, compared to state-of-the-art methods, SD-GS achieves an average of 60\% reduction in model size and an average of 100\% improvement in FPS, significantly enhancing computational efficiency while maintaining or even surpassing visual quality.

2025-07-10T06:35:03Z Wei Yao Shuzhao Xie Letian Li Weixiang Zhang Zhixin Lai Shiqi Dai Ke Zhang Zhi Wang http://arxiv.org/abs/2507.07440v1 Self-supervised Learning of Latent Space Dynamics 2025-07-10T05:30:02Z

Modeling the dynamic behavior of deformable objects is crucial for creating realistic digital worlds. While conventional simulations produce high-quality motions, their computational costs are often prohibitive. Subspace simulation techniques address this challenge by restricting deformations to a lower-dimensional space, improving performance while maintaining visually compelling results. However, even subspace methods struggle to meet the stringent performance demands of portable devices such as virtual reality headsets and mobile platforms. To overcome this limitation, we introduce a novel subspace simulation framework powered by a neural latent-space integrator. Our approach leverages self-supervised learning to enhance inference stability and generalization. By operating entirely within latent space, our method eliminates the need for full-space computations, resulting in a highly efficient method well-suited for deployment on portable devices. We demonstrate the effectiveness of our approach on challenging examples involving rods, shells, and solids, showcasing its versatility and potential for widespread adoption.

2025-07-10T05:30:02Z Yue Li Gene Wei-Chin Lin Egor Larionov Aljaz Bozic Doug Roble Ladislav Kavan Stelian Coros Bernhard Thomaszewski Tuur Stuyck Hsiao-yu Chen 10.1145/3747854 http://arxiv.org/abs/2507.07387v1 Digital Salon: An AI and Physics-Driven Tool for 3D Hair Grooming and Simulation 2025-07-10T02:58:51Z

We introduce Digital Salon, a comprehensive hair authoring system that supports real-time 3D hair generation, simulation, and rendering. Unlike existing methods that focus on isolated parts of 3D hair modeling and involve a heavy computation process or network training, Digital Salon offers a holistic and interactive system that lowers the technical barriers of 3D hair modeling through natural language-based interaction. The system guides users through four key stages: text-guided hair retrieval, real-time hair simulation, interactive hair refinement, and hair-conditioned image generation. This cohesive workflow makes advanced hair design accessible to users of varying skill levels and dramatically streamlines the creative process in digital media with an intuitive, versatile, and efficient solution for hair modeling. User studies show that our system can outperform traditional hair modeling workflows for rapid prototyping. Furthermore, we provide insights into the benefits of our system with future potential of deploying our system in real salon environments. More details can be found on our project page: https://digital-salon.github.io/.

2025-07-10T02:58:51Z Chengan He Jorge Alejandro Amador Herrera Zhixin Shu Xin Sun Yao Feng Sören Pirk Dominik L. Michels Meng Zhang Tuanfeng Y. Wang Julie Dorsey Holly Rushmeier Yi Zhou http://arxiv.org/abs/2411.16446v3 VQ-SGen: A Vector Quantized Stroke Representation for Creative Sketch Generation 2025-07-09T17:32:37Z

This paper presents VQ-SGen, a novel algorithm for high-quality creative sketch generation. Recent approaches have framed the task as pixel-based generation either as a whole or part-by-part, neglecting the intrinsic and contextual relationships among individual strokes, such as the shape and spatial positioning of both proximal and distant strokes. To overcome these limitations, we propose treating each stroke within a sketch as an entity and introducing a vector-quantized (VQ) stroke representation for fine-grained sketch generation. Our method follows a two-stage framework - in stage one, we decouple each stroke's shape and location information to ensure the VQ representation prioritizes stroke shape learning. In stage two, we feed the precise and compact representation into an auto-decoding Transformer to incorporate stroke semantics, positions, and shapes into the generation process. By utilizing tokenized stroke representation, our approach generates strokes with high fidelity and facilitates novel applications, such as text or class label conditioned generation and sketch completion. Comprehensive experiments demonstrate our method surpasses existing state-of-the-art techniques on the CreativeSketch dataset, underscoring its effectiveness.

2024-11-25T14:51:22Z Project Page: https://enigma-li.github.io/projects/VQ-SGen/VQ-SGen.html Jiawei Wang Zhiming Cui Changjian Li http://arxiv.org/abs/2507.07000v1 Enhancing non-Rigid 3D Model Deformations Using Mesh-based Gaussian Splatting 2025-07-09T16:26:04Z

We propose a novel framework that enhances non-rigid 3D model deformations by bridging mesh representations with 3D Gaussian splatting. While traditional Gaussian splatting delivers fast, real-time radiance-field rendering, its post-editing capabilities and support for large-scale, non-rigid deformations remain limited. Our method addresses these challenges by embedding Gaussian kernels directly onto explicit mesh surfaces. This allows the mesh's inherent topological and geometric priors to guide intuitive editing operations -- such as moving, scaling, and rotating individual 3D components -- and enables complex deformations like bending and stretching. This work paves the way for more flexible 3D content-creation workflows in applications spanning virtual reality, character animation, and interactive design.

2025-07-09T16:26:04Z Wijayathunga W. M. R. D. B http://arxiv.org/abs/2507.06790v1 Better frame rates or better visuals? An early report of Esports player practice in Dota 2 2025-07-09T12:28:05Z

Esports athletes often reduce visual quality to improve latency and frame rate, and increase their in-game performance. Little research has examined the effects of this visuo-spatial tradeoff on performance, but we could find no work studying how players manage this tradeoff in practice. This paper is an initial examination of this question in the game Dota 2. First, we gather the game configuration data of Dota 2 players in a small survey. We learn that players do limit visual detail, particularly by turning off VSYNC, which removes rendering/display synchronization delay but permits visual "tearing". Second, we survey the intent of those same players with a few subjective questions. Player intent matches configuration practice. While our sampling of Dota 2 players may not be representative, our survey does reveal suggestive trends that lay the groundwork for future, more rigorous and larger surveys. Such surveys can help new players adapt to the game more quickly, encourage researchers to investigate the relative importance of temporal and visual detail, and justify design effort by developers in "low visual" game configurations.

2025-07-09T12:28:05Z Extended Abstracts of the 2021 ACM Annual Symposium on Computer-Human Interaction in Play (CHI Play) Arjun Madhusudan Benjamin Watson 10.1145/3450337.3483484 http://arxiv.org/abs/2503.08724v2 Direct Flow Simulations with Implicit Neural Representation of Complex Geometry 2025-07-09T11:28:46Z

Implicit neural representations have emerged as a powerful approach for encoding complex geometries as continuous functions. These implicit models are widely used in computer vision and 3D content creation, but their integration into scientific computing workflows, such as finite element or finite volume simulations, remains limited. One reason is that conventional simulation pipelines require explicit geometric inputs (meshes), forcing INR-based shapes to be converted to meshes--a step that introduces approximation errors, computational overhead, and significant manual effort. Immersed boundary methods partially alleviate this issue by allowing simulations on background grids without body-fitted meshes. However, they still require an explicit boundary description and can suffer from numerical artifacts, such as sliver cut cells. The shifted boundary method (SBM) eliminates the need for explicit geometry by using grid-aligned surrogate boundaries, making it inherently compatible with implicit shape representations. Here, we present a framework that directly couples neural implicit geometries with SBM to perform high-fidelity fluid flow simulations without any intermediate mesh generation. By leveraging neural network inference, our approach computes the surrogate boundary and distance vectors required by SBM on-the-fly directly from the INR, thus completely bypassing traditional geometry processing. We demonstrate this approach on canonical 2D and 3D flow benchmarks (lid-driven cavity flows) and complex geometries (gyroids, the Stanford bunny, and AI-generated shapes), achieving simulation accuracy comparable to conventional mesh-based methods. This work highlights a novel pathway for integrating AI-driven geometric representations into computational physics, establishing INRs as a versatile and scalable tool for simulations and removing a long-standing bottleneck in geometry handling.

2025-03-10T23:54:41Z 32 pages,29 figures, Supplement at end Samundra Karki Mehdi Shadkah Cheng-Hau Yang Aditya Balu Guglielmo Scovazzi Adarsh Krishnamurthy Baskar Ganapathysubramanian http://arxiv.org/abs/2405.18133v2 Gaussian Fluids: A Grid-Free Fluid Solver based on Gaussian Spatial Representation 2025-07-09T08:19:28Z

We present a grid-free fluid solver featuring a novel Gaussian representation. Drawing inspiration from the expressive capabilities of 3D Gaussian Splatting in multi-view image reconstruction, we model the continuous flow velocity as a weighted sum of multiple Gaussian functions. This representation is continuously differentiable, which enables us to derive spatial differentials directly and solve the time-dependent PDE via a custom first-order optimization tailored to fluid dynamics. Compared to traditional discretizations, which typically adopt Eulerian, Lagrangian, or hybrid perspectives, our approach is inherently memory-efficient and spatially adaptive, enabling it to preserve fine-scale structures and vortices with high fidelity. While these advantages are also sought by implicit neural representations, GSR offers enhanced robustness, accuracy, and generality across diverse fluid phenomena, with improved computational efficiency during temporal evolution. Though our first-order solver does not yet match the speed of fluid solvers using explicit representations, its continuous nature substantially reduces spatial discretization error and opens a new avenue for high-fidelity simulation. We evaluate the proposed solver across a broad range of 2D and 3D fluid phenomena, demonstrating its ability to preserve intricate vortex dynamics, accurately capture boundary-induced effects such as Kármán vortex streets, and remain robust across long time horizons - all without additional parameter tuning. Our results suggest that GSR offers a compelling direction for future research in fluid simulation.

2024-05-28T12:47:49Z Jingrui Xing Bin Wang Mengyu Chu Baoquan Chen http://arxiv.org/abs/2507.06523v1 FIFA: Unified Faithfulness Evaluation Framework for Text-to-Video and Video-to-Text Generation 2025-07-09T03:51:27Z

Video Multimodal Large Language Models (VideoMLLMs) have achieved remarkable progress in both Video-to-Text and Text-to-Video tasks. However, they often suffer fro hallucinations, generating content that contradicts the visual input. Existing evaluation methods are limited to one task (e.g., V2T) and also fail to assess hallucinations in open-ended, free-form responses. To address this gap, we propose FIFA, a unified FaIthFulness evAluation framework that extracts comprehensive descriptive facts, models their semantic dependencies via a Spatio-Temporal Semantic Dependency Graph, and verifies them using VideoQA models. We further introduce Post-Correction, a tool-based correction framework that revises hallucinated content. Extensive experiments demonstrate that FIFA aligns more closely with human judgment than existing evaluation methods, and that Post-Correction effectively improves factual consistency in both text and video generation.

2025-07-09T03:51:27Z Liqiang Jing Viet Lai Seunghyun Yoon Trung Bui Xinya Du http://arxiv.org/abs/2507.07133v1 Generative Panoramic Image Stitching 2025-07-08T22:07:12Z

We introduce the task of generative panoramic image stitching, which aims to synthesize seamless panoramas that are faithful to the content of multiple reference images containing parallax effects and strong variations in lighting, camera capture settings, or style. In this challenging setting, traditional image stitching pipelines fail, producing outputs with ghosting and other artifacts. While recent generative models are capable of outpainting content consistent with multiple reference images, they fail when tasked with synthesizing large, coherent regions of a panorama. To address these limitations, we propose a method that fine-tunes a diffusion-based inpainting model to preserve a scene's content and layout based on multiple reference images. Once fine-tuned, the model outpaints a full panorama from a single reference image, producing a seamless and visually coherent result that faithfully integrates content from all reference images. Our approach significantly outperforms baselines for this task in terms of image quality and the consistency of image structure and scene layout when evaluated on captured datasets.

2025-07-08T22:07:12Z Mathieu Tuli Kaveh Kamali David B. Lindell