https://arxiv.org/api/vfXWQKhz5JmjAc2O8RMBStg2yQs2026-06-28T12:08:36Z9390186015http://arxiv.org/abs/2507.18155v1GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar2025-07-24T07:41:40ZDespite recent progress in 3D head avatar generation, balancing identity preservation, i.e., reconstruction, with novel poses and expressions, i.e., animation, remains a challenge. Existing methods struggle to adapt Gaussians to varying geometrical deviations across facial regions, resulting in suboptimal quality. To address this, we propose GeoAvatar, a framework for adaptive geometrical Gaussian Splatting. GeoAvatar leverages Adaptive Pre-allocation Stage (APS), an unsupervised method that segments Gaussians into rigid and flexible sets for adaptive offset regularization. Then, based on mouth anatomy and dynamics, we introduce a novel mouth structure and the part-wise deformation strategy to enhance the animation fidelity of the mouth. Finally, we propose a regularization loss for precise rigging between Gaussians and 3DMM faces. Moreover, we release DynamicFace, a video dataset with highly expressive facial motions. Extensive experiments show the superiority of GeoAvatar compared to state-of-the-art methods in reconstruction and novel animation scenarios.2025-07-24T07:41:40ZICCV 2025, Project page: https://hahminlew.github.io/geoavatar/SeungJun MoonHah Min LewSeungeun LeeJi-Su KangGyeong-Moon Parkhttp://arxiv.org/abs/2507.17248v2Reality Proxy: Fluid Interactions with Real-World Objects in MR via Abstract Representations2025-07-24T07:13:36ZInteracting with real-world objects in Mixed Reality (MR) often proves difficult when they are crowded, distant, or partially occluded, hindering straightforward selection and manipulation. We observe that these difficulties stem from performing interaction directly on physical objects, where input is tightly coupled to their physical constraints. Our key insight is to decouple interaction from these constraints by introducing proxies-abstract representations of real-world objects. We embody this concept in Reality Proxy, a system that seamlessly shifts interaction targets from physical objects to their proxies during selection. Beyond facilitating basic selection, Reality Proxy uses AI to enrich proxies with semantic attributes and hierarchical spatial relationships of their corresponding physical objects, enabling novel and previously cumbersome interactions in MR - such as skimming, attribute-based filtering, navigating nested groups, and complex multi object selections - all without requiring new gestures or menu systems. We demonstrate Reality Proxy's versatility across diverse scenarios, including office information retrieval, large-scale spatial navigation, and multi-drone control. An expert evaluation suggests the system's utility and usability, suggesting that proxy-based abstractions offer a powerful and generalizable interaction paradigm for future MR systems.2025-07-23T06:34:58Z16 pages, 9 figures. Accepted for publication in UIST'25 (The 38th Annual ACM Symposium on User Interface Software and Technology), Busan, Republic of Korea, 28 Sep - 1 Oct 2025Xiaoan LiuDifan JiaXianhao Carton LiuMar Gonzalez-FrancoChen Zhu-Tian10.1145/3746059.3747709http://arxiv.org/abs/2507.18052v1DanceGraph: A Complementary Architecture for Synchronous Dancing Online2025-07-24T02:56:30ZDanceGraph is an architecture for synchronized online dancing overcoming the latency of networked body pose sharing. We break down this challenge by developing a real-time bandwidth-efficient architecture to minimize lag and reduce the timeframe of required motion prediction for synchronization with the music's rhythm. In addition, we show an interactive method for the parameterized stylization of dance motions for rhythmic dance using online dance correctives.2025-07-24T02:56:30Z36th International Conference on Computer Animation and Social AgentsDavid SinclairAdemyemi AdemolaBabis KoniarisKenny Mitchellhttp://arxiv.org/abs/2507.18664v1Generating real-time detailed ground visualisations from sparse aerial point clouds2025-07-24T02:34:39ZBuilding realistic wide scale outdoor 3D content with sufficient visual quality to observe at walking eye level or from driven vehicles is often carried out by large teams of artists skilled in modelling, texturing, material shading and lighting, which typically leads to both prohibitive costs and reduced accuracy honoring the variety of real world ground truth landscapes. In our proposed method, we define a process to automatically amplify real-world scanned data and render real-time in animated 3D to explore at close range with high quality for training, simulation, video game and visualisation applications.2025-07-24T02:34:39ZCVMP Short Paper. 1 page, 3 figures, CVMP 2022: The 19th ACM SIGGRAPH European Conference on Visual Media Production, London. This work was supported by the European Union's Horizon 2020 research and innovation programme under Grant 101017779Aidan MurrayEddie WaiteCaleb RossScarlet MitchellAlexander BradleyJoanna JamrozyKenny Mitchellhttp://arxiv.org/abs/2507.08513v2Advancing Multimodal LLMs by Large-Scale 3D Visual Instruction Dataset Generation2025-07-23T22:34:55ZMultimodal Large Language Models (MLLMs) struggle with accurately capturing camera-object relations, especially for object orientation, camera viewpoint, and camera shots. This stems from the fact that existing MLLMs are trained on images with limited diverse camera-object relations and corresponding textual descriptions. To address this, we propose a synthetic generation pipeline to create large-scale 3D visual instruction datasets. Our framework takes 3D assets as input and uses rendering and diffusion-based image generation models to create photorealistic images preserving precise camera-object relations. Additionally, large language models (LLMs) are used to generate text prompts for guiding visual instruction tuning and controlling image generation. We create Ultimate3D, a dataset of 240K VQAs with precise camera-object annotations, and corresponding benchmark. MLLMs fine-tuned on our proposed dataset outperform commercial models by a large margin, achieving an average accuracy improvement of 33.4% on camera-object relation recognition tasks. Our code, dataset, and benchmark will contribute to broad MLLM applications.2025-07-11T12:00:10ZLiu HeXiao ZengYizhi SongAlbert Y. C. ChenLu XiaShashwat VermaSankalp DayalMin SunCheng-Hao KuoDaniel Aliagahttp://arxiv.org/abs/2507.17963v1Zero-Shot Dynamic Concept Personalization with Grid-Based LoRA2025-07-23T22:09:38ZRecent advances in text-to-video generation have enabled high-quality synthesis from text and image prompts. While the personalization of dynamic concepts, which capture subject-specific appearance and motion from a single video, is now feasible, most existing methods require per-instance fine-tuning, limiting scalability. We introduce a fully zero-shot framework for dynamic concept personalization in text-to-video models. Our method leverages structured 2x2 video grids that spatially organize input and output pairs, enabling the training of lightweight Grid-LoRA adapters for editing and composition within these grids. At inference, a dedicated Grid Fill module completes partially observed layouts, producing temporally coherent and identity preserving outputs. Once trained, the entire system operates in a single forward pass, generalizing to previously unseen dynamic concepts without any test-time optimization. Extensive experiments demonstrate high-quality and consistent results across a wide range of subjects beyond trained concepts and editing scenarios.2025-07-23T22:09:38ZProject Page and Video : https://snap-research.github.io/zero-shot-dynamic-concepts/Rameen AbdalOr PatashnikEkaterina DeynekaHao ChenAliaksandr SiarohinSergey TulyakovDaniel Cohen-OrKfir Abermanhttp://arxiv.org/abs/2507.17931v1Quantum Machine Learning Playground2025-07-23T21:08:29ZThis article introduces an innovative interactive visualization tool designed to demystify quantum machine learning (QML) algorithms. Our work is inspired by the success of classical machine learning visualization tools, such as TensorFlow Playground, and aims to bridge the gap in visualization resources specifically for the field of QML. The article includes a comprehensive overview of relevant visualization metaphors from both quantum computing and classical machine learning, the development of an algorithm visualization concept, and the design of a concrete implementation as an interactive web application. By combining common visualization metaphors for the so-called data re-uploading universal quantum classifier as a representative QML model, this article aims to lower the entry barrier to quantum computing and encourage further innovation in the field. The accompanying interactive application is a proposal for the first version of a quantum machine learning playground for learning and exploring QML models.2025-07-23T21:08:29ZAccepted to IEEE Computer Graphics and Applications. Final version: https://doi.org/10.1109/MCG.2024.3456288IEEE Computer Graphics and Applications, vol. 44, no. 5, pp. 40-53, Sept.-Oct. 2024,Pascal DebusSebastian IsselKilian Tscharke10.1109/MCG.2024.3456288http://arxiv.org/abs/2507.17440v1Parametric Integration with Neural Integral Operators2025-07-23T12:02:01ZReal-time rendering imposes strict limitations on the sampling budget for light transport simulation, often resulting in noisy images. However, denoisers have demonstrated that it is possible to produce noise-free images through filtering. We enhance image quality by removing noise before material shading, rather than filtering already shaded noisy images. This approach allows for material-agnostic denoising (MAD) and leverages machine learning by approximating the light transport integral operator with a neural network, effectively performing parametric integration with neural operators. Our method operates in real-time, requires data from only a single frame, seamlessly integrates with existing denoisers and temporal anti-aliasing techniques, and is efficient to train. Additionally, it is straightforward to incorporate with physically based rendering algorithms.2025-07-23T12:02:01ZChristoph SchiedAlexander Kellerhttp://arxiv.org/abs/2507.17265v1Visualization-Driven Illumination for Density Plots2025-07-23T07:02:13ZWe present a novel visualization-driven illumination model for density plots, a new technique to enhance density plots by effectively revealing the detailed structures in high- and medium-density regions and outliers in low-density regions, while avoiding artifacts in the density field's colors. When visualizing large and dense discrete point samples, scatterplots and dot density maps often suffer from overplotting, and density plots are commonly employed to provide aggregated views while revealing underlying structures. Yet, in such density plots, existing illumination models may produce color distortion and hide details in low-density regions, making it challenging to look up density values, compare them, and find outliers. The key novelty in this work includes (i) a visualization-driven illumination model that inherently supports density-plot-specific analysis tasks and (ii) a new image composition technique to reduce the interference between the image shading and the color-encoded density values. To demonstrate the effectiveness of our technique, we conducted a quantitative study, an empirical evaluation of our technique in a controlled study, and two case studies, exploring twelve datasets with up to two million data point samples.2025-07-23T07:02:13ZXin ChenYunhai WangHuaiwei BaoKecheng LuJaemin JoChi-Wing FuJean-Daniel Feketehttp://arxiv.org/abs/2507.17184v1A Scientist Question: Research on the Impact of Super Structured Quadrilateral Meshes on Convergence and Accuracy of Finite Element Analysis2025-07-23T04:16:15ZIn the current practices of both industry and academia, the convergence and accuracy of finite element calculations are closely related to the methods and quality of mesh generation. For years, the research on high-quality mesh generation in the domestic academic field has mainly referred to the local quality of quadrilaterals and hexahedrons approximating that of squares and cubes. The main contribution of this paper is to propose a brand-new research direction and content: it is necessary to explore and study the influence of the overall global arrangement structure and pattern of super structured quadrilateral meshes on the convergence and calculation accuracy of finite element calculations. Through the research in this new field, it can help solve the non-rigorous state of serious reliance on "experience" in the mesh generation stage during simulation in the current industry and academia, and make clear judgments on which global arrangements of mesh generation can ensure the convergence of finite element calculations. In order to generate and design super-structured quadrilateral meshes with controllable overall arrangement structures, a large number of modern two-dimensional and three-dimensional geometric topology theories are required, such as moduli space, Teichmüller space, harmonic foliations, dynamical systems, surface mappings, meromorphic quadratic differentials, surface mappings, etc.2025-07-23T04:16:15Zin Chinese and EnglishHui Zhaohttp://arxiv.org/abs/2507.17174v1GhostUMAP2: Measuring and Analyzing (r,d)-Stability of UMAP2025-07-23T03:40:53ZDespite the widespread use of Uniform Manifold Approximation and Projection (UMAP), the impact of its stochastic optimization process on the results remains underexplored. We observed that it often produces unstable results where the projections of data points are determined mostly by chance rather than reflecting neighboring structures. To address this limitation, we introduce (r,d)-stability to UMAP: a framework that analyzes the stochastic positioning of data points in the projection space. To assess how stochastic elements, specifically initial projection positions and negative sampling, impact UMAP results, we introduce "ghosts", or duplicates of data points representing potential positional variations due to stochasticity. We define a data point's projection as (r,d)-stable if its ghosts perturbed within a circle of radius r in the initial projection remain confined within a circle of radius d for their final positions. To efficiently compute the ghost projections, we develop an adaptive dropping scheme that reduces a runtime up to 60% compared to an unoptimized baseline while maintaining approximately 90% of unstable points. We also present a visualization tool that supports the interactive exploration of the (r,d)-stability of data points. Finally, we demonstrate the effectiveness of our framework by examining the stability of projections of real-world datasets and present usage guidelines for the effective use of our framework.2025-07-23T03:40:53ZMyeongwon JungTakanori FujiwaraJaemin Johttp://arxiv.org/abs/2507.17029v1StreamME: Simplify 3D Gaussian Avatar within Live Stream2025-07-22T21:33:30ZWe propose StreamME, a method focuses on fast 3D avatar reconstruction. The StreamME synchronously records and reconstructs a head avatar from live video streams without any pre-cached data, enabling seamless integration of the reconstructed appearance into downstream applications. This exceptionally fast training strategy, which we refer to as on-the-fly training, is central to our approach. Our method is built upon 3D Gaussian Splatting (3DGS), eliminating the reliance on MLPs in deformable 3DGS and relying solely on geometry, which significantly improves the adaptation speed to facial expression. To further ensure high efficiency in on-the-fly training, we introduced a simplification strategy based on primary points, which distributes the point clouds more sparsely across the facial surface, optimizing points number while maintaining rendering quality. Leveraging the on-the-fly training capabilities, our method protects the facial privacy and reduces communication bandwidth in VR system or online conference. Additionally, it can be directly applied to downstream application such as animation, toonify, and relighting. Please refer to our project page for more details: https://songluchuan.github.io/StreamME/.2025-07-22T21:33:30Z12 pages, 15 FiguresLuchuan SongYang ZhouZhan XuYi ZhouDeepali AnejaChenliang Xuhttp://arxiv.org/abs/2506.17032v2Toward Understanding Similarity of Visualization Techniques2025-07-22T13:03:31ZThe literature describes many visualization techniques for different types of data, tasks, and application contexts, and new techniques are proposed on a regular basis. Visualization surveys try to capture the immense space of techniques and structure it with meaningful categorizations. Yet, it remains difficult to understand the similarity of visualization techniques in general. We approach this open research question from two angles. First, we follow a model-driven approach that is based on defining the signature of visualization techniques and interpreting the similarity of signatures as the similarity of their associated techniques. Second, following an expert-driven approach, we asked visualization experts in a small online study for their ad-hoc intuitive assessment of the similarity of pairs of visualization techniques. From both approaches, we gain insight into the similarity of a set of 13 basic and advanced visualizations for different types of data. While our results are so far preliminary and academic, they are first steps toward better understanding the similarity of visualization techniques.2025-06-20T14:42:16ZAbdulhaq Adetunji SalakoChristian Tominskihttp://arxiv.org/abs/2507.16463v1MMS Player: an open source software for parametric data-driven animation of Sign Language avatars2025-07-22T11:06:13ZThis paper describes the MMS-Player, an open source software able to synthesise sign language animations from a novel sign language representation format called MMS (MultiModal Signstream). The MMS enhances gloss-based representations by adding information on parallel execution of signs, timing, and inflections. The implementation consists of Python scripts for the popular Blender 3D authoring tool and can be invoked via command line or HTTP API. Animations can be rendered as videos or exported in other popular 3D animation exchange formats. The software is freely available under GPL-3.0 license at https://github.com/DFKI-SignLanguage/MMS-Player.2025-07-22T11:06:13ZFabrizio NunnariShailesh MishraPatrick Gebhardhttp://arxiv.org/abs/2410.13613v3MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes2025-07-22T10:55:59Z4D Gaussian Splatting (4DGS) has recently emerged as a promising technique for capturing complex dynamic 3D scenes with high fidelity. It utilizes a 4D Gaussian representation and a GPU-friendly rasterizer, enabling rapid rendering speeds. Despite its advantages, 4DGS faces significant challenges, notably the requirement of millions of 4D Gaussians, each with extensive associated attributes, leading to substantial memory and storage cost. This paper introduces a memory-efficient framework for 4DGS. We streamline the color attribute by decomposing it into a per-Gaussian direct color component with only 3 parameters and a shared lightweight alternating current color predictor. This approach eliminates the need for spherical harmonics coefficients, which typically involve up to 144 parameters in classic 4DGS, thereby creating a memory-efficient 4D Gaussian representation. Furthermore, we introduce an entropy-constrained Gaussian deformation technique that uses a deformation field to expand the action range of each Gaussian and integrates an opacity-based entropy loss to limit the number of Gaussians, thus forcing our model to use as few Gaussians as possible to fit a dynamic scene well. With simple half-precision storage and zip compression, our framework achieves a storage reduction by approximately 190$\times$ and 125$\times$ on the Technicolor and Neural 3D Video datasets, respectively, compared to the original 4DGS. Meanwhile, it maintains comparable rendering speeds and scene representation quality, setting a new standard in the field. Code is available at https://github.com/Xinjie-Q/MEGA.2024-10-17T14:47:08ZAccepted by ICCV 2025Xinjie ZhangZhening LiuYifan ZhangXingtong GeDailan HeTongda XuYan WangZehong LinShuicheng YanJun Zhang