https://arxiv.org/api/vfXWQKhz5JmjAc2O8RMBStg2yQs 2026-06-28T12:08:36Z 9390 1860 15 http://arxiv.org/abs/2507.18155v1 GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar 2025-07-24T07:41:40Z

Despite recent progress in 3D head avatar generation, balancing identity preservation, i.e., reconstruction, with novel poses and expressions, i.e., animation, remains a challenge. Existing methods struggle to adapt Gaussians to varying geometrical deviations across facial regions, resulting in suboptimal quality. To address this, we propose GeoAvatar, a framework for adaptive geometrical Gaussian Splatting. GeoAvatar leverages Adaptive Pre-allocation Stage (APS), an unsupervised method that segments Gaussians into rigid and flexible sets for adaptive offset regularization. Then, based on mouth anatomy and dynamics, we introduce a novel mouth structure and the part-wise deformation strategy to enhance the animation fidelity of the mouth. Finally, we propose a regularization loss for precise rigging between Gaussians and 3DMM faces. Moreover, we release DynamicFace, a video dataset with highly expressive facial motions. Extensive experiments show the superiority of GeoAvatar compared to state-of-the-art methods in reconstruction and novel animation scenarios.

2025-07-24T07:41:40Z ICCV 2025, Project page: https://hahminlew.github.io/geoavatar/ SeungJun Moon Hah Min Lew Seungeun Lee Ji-Su Kang Gyeong-Moon Park http://arxiv.org/abs/2507.17248v2 Reality Proxy: Fluid Interactions with Real-World Objects in MR via Abstract Representations 2025-07-24T07:13:36Z

Interacting with real-world objects in Mixed Reality (MR) often proves difficult when they are crowded, distant, or partially occluded, hindering straightforward selection and manipulation. We observe that these difficulties stem from performing interaction directly on physical objects, where input is tightly coupled to their physical constraints. Our key insight is to decouple interaction from these constraints by introducing proxies-abstract representations of real-world objects. We embody this concept in Reality Proxy, a system that seamlessly shifts interaction targets from physical objects to their proxies during selection. Beyond facilitating basic selection, Reality Proxy uses AI to enrich proxies with semantic attributes and hierarchical spatial relationships of their corresponding physical objects, enabling novel and previously cumbersome interactions in MR - such as skimming, attribute-based filtering, navigating nested groups, and complex multi object selections - all without requiring new gestures or menu systems. We demonstrate Reality Proxy's versatility across diverse scenarios, including office information retrieval, large-scale spatial navigation, and multi-drone control. An expert evaluation suggests the system's utility and usability, suggesting that proxy-based abstractions offer a powerful and generalizable interaction paradigm for future MR systems.

2025-07-23T06:34:58Z 16 pages, 9 figures. Accepted for publication in UIST'25 (The 38th Annual ACM Symposium on User Interface Software and Technology), Busan, Republic of Korea, 28 Sep - 1 Oct 2025 Xiaoan Liu Difan Jia Xianhao Carton Liu Mar Gonzalez-Franco Chen Zhu-Tian 10.1145/3746059.3747709 http://arxiv.org/abs/2507.18052v1 DanceGraph: A Complementary Architecture for Synchronous Dancing Online 2025-07-24T02:56:30Z

DanceGraph is an architecture for synchronized online dancing overcoming the latency of networked body pose sharing. We break down this challenge by developing a real-time bandwidth-efficient architecture to minimize lag and reduce the timeframe of required motion prediction for synchronization with the music's rhythm. In addition, we show an interactive method for the parameterized stylization of dance motions for rhythmic dance using online dance correctives.

2025-07-24T02:56:30Z 36th International Conference on Computer Animation and Social Agents David Sinclair Ademyemi Ademola Babis Koniaris Kenny Mitchell http://arxiv.org/abs/2507.18664v1 Generating real-time detailed ground visualisations from sparse aerial point clouds 2025-07-24T02:34:39Z

Building realistic wide scale outdoor 3D content with sufficient visual quality to observe at walking eye level or from driven vehicles is often carried out by large teams of artists skilled in modelling, texturing, material shading and lighting, which typically leads to both prohibitive costs and reduced accuracy honoring the variety of real world ground truth landscapes. In our proposed method, we define a process to automatically amplify real-world scanned data and render real-time in animated 3D to explore at close range with high quality for training, simulation, video game and visualisation applications.

2025-07-24T02:34:39Z CVMP Short Paper. 1 page, 3 figures, CVMP 2022: The 19th ACM SIGGRAPH European Conference on Visual Media Production, London. This work was supported by the European Union's Horizon 2020 research and innovation programme under Grant 101017779 Aidan Murray Eddie Waite Caleb Ross Scarlet Mitchell Alexander Bradley Joanna Jamrozy Kenny Mitchell http://arxiv.org/abs/2507.08513v2 Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation 2025-07-23T22:34:55Z

Multimodal Large Language Models (MLLMs) struggle with accurately capturing camera-object relations, especially for object orientation, camera viewpoint, and camera shots. This stems from the fact that existing MLLMs are trained on images with limited diverse camera-object relations and corresponding textual descriptions. To address this, we propose a synthetic generation pipeline to create large-scale 3D visual instruction datasets. Our framework takes 3D assets as input and uses rendering and diffusion-based image generation models to create photorealistic images preserving precise camera-object relations. Additionally, large language models (LLMs) are used to generate text prompts for guiding visual instruction tuning and controlling image generation. We create Ultimate3D, a dataset of 240K VQAs with precise camera-object annotations, and corresponding benchmark. MLLMs fine-tuned on our proposed dataset outperform commercial models by a large margin, achieving an average accuracy improvement of 33.4% on camera-object relation recognition tasks. Our code, dataset, and benchmark will contribute to broad MLLM applications.

2025-07-11T12:00:10Z Liu He Xiao Zeng Yizhi Song Albert Y. C. Chen Lu Xia Shashwat Verma Sankalp Dayal Min Sun Cheng-Hao Kuo Daniel Aliaga http://arxiv.org/abs/2507.17963v1 Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA 2025-07-23T22:09:38Z

Recent advances in text-to-video generation have enabled high-quality synthesis from text and image prompts. While the personalization of dynamic concepts, which capture subject-specific appearance and motion from a single video, is now feasible, most existing methods require per-instance fine-tuning, limiting scalability. We introduce a fully zero-shot framework for dynamic concept personalization in text-to-video models. Our method leverages structured 2x2 video grids that spatially organize input and output pairs, enabling the training of lightweight Grid-LoRA adapters for editing and composition within these grids. At inference, a dedicated Grid Fill module completes partially observed layouts, producing temporally coherent and identity preserving outputs. Once trained, the entire system operates in a single forward pass, generalizing to previously unseen dynamic concepts without any test-time optimization. Extensive experiments demonstrate high-quality and consistent results across a wide range of subjects beyond trained concepts and editing scenarios.

2025-07-23T22:09:38Z Project Page and Video : https://snap-research.github.io/zero-shot-dynamic-concepts/ Rameen Abdal Or Patashnik Ekaterina Deyneka Hao Chen Aliaksandr Siarohin Sergey Tulyakov Daniel Cohen-Or Kfir Aberman http://arxiv.org/abs/2507.17931v1 Quantum Machine Learning Playground 2025-07-23T21:08:29Z

This article introduces an innovative interactive visualization tool designed to demystify quantum machine learning (QML) algorithms. Our work is inspired by the success of classical machine learning visualization tools, such as TensorFlow Playground, and aims to bridge the gap in visualization resources specifically for the field of QML. The article includes a comprehensive overview of relevant visualization metaphors from both quantum computing and classical machine learning, the development of an algorithm visualization concept, and the design of a concrete implementation as an interactive web application. By combining common visualization metaphors for the so-called data re-uploading universal quantum classifier as a representative QML model, this article aims to lower the entry barrier to quantum computing and encourage further innovation in the field. The accompanying interactive application is a proposal for the first version of a quantum machine learning playground for learning and exploring QML models.

2025-07-23T21:08:29Z Accepted to IEEE Computer Graphics and Applications. Final version: https://doi.org/10.1109/MCG.2024.3456288 IEEE Computer Graphics and Applications, vol. 44, no. 5, pp. 40-53, Sept.-Oct. 2024, Pascal Debus Sebastian Issel Kilian Tscharke 10.1109/MCG.2024.3456288 http://arxiv.org/abs/2507.17440v1 Parametric Integration with Neural Integral Operators 2025-07-23T12:02:01Z

Real-time rendering imposes strict limitations on the sampling budget for light transport simulation, often resulting in noisy images. However, denoisers have demonstrated that it is possible to produce noise-free images through filtering. We enhance image quality by removing noise before material shading, rather than filtering already shaded noisy images. This approach allows for material-agnostic denoising (MAD) and leverages machine learning by approximating the light transport integral operator with a neural network, effectively performing parametric integration with neural operators. Our method operates in real-time, requires data from only a single frame, seamlessly integrates with existing denoisers and temporal anti-aliasing techniques, and is efficient to train. Additionally, it is straightforward to incorporate with physically based rendering algorithms.

2025-07-23T12:02:01Z Christoph Schied Alexander Keller http://arxiv.org/abs/2507.17265v1 Visualization-Driven Illumination for Density Plots 2025-07-23T07:02:13Z

We present a novel visualization-driven illumination model for density plots, a new technique to enhance density plots by effectively revealing the detailed structures in high- and medium-density regions and outliers in low-density regions, while avoiding artifacts in the density field's colors. When visualizing large and dense discrete point samples, scatterplots and dot density maps often suffer from overplotting, and density plots are commonly employed to provide aggregated views while revealing underlying structures. Yet, in such density plots, existing illumination models may produce color distortion and hide details in low-density regions, making it challenging to look up density values, compare them, and find outliers. The key novelty in this work includes (i) a visualization-driven illumination model that inherently supports density-plot-specific analysis tasks and (ii) a new image composition technique to reduce the interference between the image shading and the color-encoded density values. To demonstrate the effectiveness of our technique, we conducted a quantitative study, an empirical evaluation of our technique in a controlled study, and two case studies, exploring twelve datasets with up to two million data point samples.

2025-07-23T07:02:13Z Xin Chen Yunhai Wang Huaiwei Bao Kecheng Lu Jaemin Jo Chi-Wing Fu Jean-Daniel Fekete http://arxiv.org/abs/2507.17184v1 A Scientist Question: Research on the Impact of Super Structured Quadrilateral Meshes on Convergence and Accuracy of Finite Element Analysis 2025-07-23T04:16:15Z

In the current practices of both industry and academia, the convergence and accuracy of finite element calculations are closely related to the methods and quality of mesh generation. For years, the research on high-quality mesh generation in the domestic academic field has mainly referred to the local quality of quadrilaterals and hexahedrons approximating that of squares and cubes. The main contribution of this paper is to propose a brand-new research direction and content: it is necessary to explore and study the influence of the overall global arrangement structure and pattern of super structured quadrilateral meshes on the convergence and calculation accuracy of finite element calculations. Through the research in this new field, it can help solve the non-rigorous state of serious reliance on "experience" in the mesh generation stage during simulation in the current industry and academia, and make clear judgments on which global arrangements of mesh generation can ensure the convergence of finite element calculations. In order to generate and design super-structured quadrilateral meshes with controllable overall arrangement structures, a large number of modern two-dimensional and three-dimensional geometric topology theories are required, such as moduli space, Teichmüller space, harmonic foliations, dynamical systems, surface mappings, meromorphic quadratic differentials, surface mappings, etc.

2025-07-23T04:16:15Z in Chinese and English Hui Zhao http://arxiv.org/abs/2507.17174v1 GhostUMAP2: Measuring and Analyzing (r,d)-Stability of UMAP 2025-07-23T03:40:53Z

Despite the widespread use of Uniform Manifold Approximation and Projection (UMAP), the impact of its stochastic optimization process on the results remains underexplored. We observed that it often produces unstable results where the projections of data points are determined mostly by chance rather than reflecting neighboring structures. To address this limitation, we introduce (r,d)-stability to UMAP: a framework that analyzes the stochastic positioning of data points in the projection space. To assess how stochastic elements, specifically initial projection positions and negative sampling, impact UMAP results, we introduce "ghosts", or duplicates of data points representing potential positional variations due to stochasticity. We define a data point's projection as (r,d)-stable if its ghosts perturbed within a circle of radius r in the initial projection remain confined within a circle of radius d for their final positions. To efficiently compute the ghost projections, we develop an adaptive dropping scheme that reduces a runtime up to 60% compared to an unoptimized baseline while maintaining approximately 90% of unstable points. We also present a visualization tool that supports the interactive exploration of the (r,d)-stability of data points. Finally, we demonstrate the effectiveness of our framework by examining the stability of projections of real-world datasets and present usage guidelines for the effective use of our framework.

2025-07-23T03:40:53Z Myeongwon Jung Takanori Fujiwara Jaemin Jo http://arxiv.org/abs/2507.17029v1 StreamME: Simplify 3D Gaussian Avatar within Live Stream 2025-07-22T21:33:30Z

We propose StreamME, a method focuses on fast 3D avatar reconstruction. The StreamME synchronously records and reconstructs a head avatar from live video streams without any pre-cached data, enabling seamless integration of the reconstructed appearance into downstream applications. This exceptionally fast training strategy, which we refer to as on-the-fly training, is central to our approach. Our method is built upon 3D Gaussian Splatting (3DGS), eliminating the reliance on MLPs in deformable 3DGS and relying solely on geometry, which significantly improves the adaptation speed to facial expression. To further ensure high efficiency in on-the-fly training, we introduced a simplification strategy based on primary points, which distributes the point clouds more sparsely across the facial surface, optimizing points number while maintaining rendering quality. Leveraging the on-the-fly training capabilities, our method protects the facial privacy and reduces communication bandwidth in VR system or online conference. Additionally, it can be directly applied to downstream application such as animation, toonify, and relighting. Please refer to our project page for more details: https://songluchuan.github.io/StreamME/.

2025-07-22T21:33:30Z 12 pages, 15 Figures Luchuan Song Yang Zhou Zhan Xu Yi Zhou Deepali Aneja Chenliang Xu http://arxiv.org/abs/2506.17032v2 Toward Understanding Similarity of Visualization Techniques 2025-07-22T13:03:31Z

The literature describes many visualization techniques for different types of data, tasks, and application contexts, and new techniques are proposed on a regular basis. Visualization surveys try to capture the immense space of techniques and structure it with meaningful categorizations. Yet, it remains difficult to understand the similarity of visualization techniques in general. We approach this open research question from two angles. First, we follow a model-driven approach that is based on defining the signature of visualization techniques and interpreting the similarity of signatures as the similarity of their associated techniques. Second, following an expert-driven approach, we asked visualization experts in a small online study for their ad-hoc intuitive assessment of the similarity of pairs of visualization techniques. From both approaches, we gain insight into the similarity of a set of 13 basic and advanced visualizations for different types of data. While our results are so far preliminary and academic, they are first steps toward better understanding the similarity of visualization techniques.

2025-06-20T14:42:16Z Abdulhaq Adetunji Salako Christian Tominski http://arxiv.org/abs/2507.16463v1 MMS Player: an open source software for parametric data-driven animation of Sign Language avatars 2025-07-22T11:06:13Z

This paper describes the MMS-Player, an open source software able to synthesise sign language animations from a novel sign language representation format called MMS (MultiModal Signstream). The MMS enhances gloss-based representations by adding information on parallel execution of signs, timing, and inflections. The implementation consists of Python scripts for the popular Blender 3D authoring tool and can be invoked via command line or HTTP API. Animations can be rendered as videos or exported in other popular 3D animation exchange formats. The software is freely available under GPL-3.0 license at https://github.com/DFKI-SignLanguage/MMS-Player.

2025-07-22T11:06:13Z Fabrizio Nunnari Shailesh Mishra Patrick Gebhard http://arxiv.org/abs/2410.13613v3 MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes 2025-07-22T10:55:59Z

4D Gaussian Splatting (4DGS) has recently emerged as a promising technique for capturing complex dynamic 3D scenes with high fidelity. It utilizes a 4D Gaussian representation and a GPU-friendly rasterizer, enabling rapid rendering speeds. Despite its advantages, 4DGS faces significant challenges, notably the requirement of millions of 4D Gaussians, each with extensive associated attributes, leading to substantial memory and storage cost. This paper introduces a memory-efficient framework for 4DGS. We streamline the color attribute by decomposing it into a per-Gaussian direct color component with only 3 parameters and a shared lightweight alternating current color predictor. This approach eliminates the need for spherical harmonics coefficients, which typically involve up to 144 parameters in classic 4DGS, thereby creating a memory-efficient 4D Gaussian representation. Furthermore, we introduce an entropy-constrained Gaussian deformation technique that uses a deformation field to expand the action range of each Gaussian and integrates an opacity-based entropy loss to limit the number of Gaussians, thus forcing our model to use as few Gaussians as possible to fit a dynamic scene well. With simple half-precision storage and zip compression, our framework achieves a storage reduction by approximately 190$\times$ and 125$\times$ on the Technicolor and Neural 3D Video datasets, respectively, compared to the original 4DGS. Meanwhile, it maintains comparable rendering speeds and scene representation quality, setting a new standard in the field. Code is available at https://github.com/Xinjie-Q/MEGA.

2024-10-17T14:47:08Z Accepted by ICCV 2025 Xinjie Zhang Zhening Liu Yifan Zhang Xingtong Ge Dailan He Tongda Xu Yan Wang Zehong Lin Shuicheng Yan Jun Zhang