https://arxiv.org/api/AUaeaVD+GH58dtsDHM1Rg/6HKHA 2026-06-28T08:43:26Z 9390 1815 15 http://arxiv.org/abs/2407.05436v2 Adaptive Video Streaming over 6G Networks: Buffer Control and User Behavior Analysis 2025-08-01T19:20:13Z

This paper delves into the synergistic potential of adaptive video streaming over emerging 6G wireless networks, emphasizing innovative buffer control techniques and detailed analysis of user viewing behaviors. As 6G technology heralds a new era with significantly enhanced capabilities including higher bandwidths, lower latencies, and increased connection densities, it is poised to fundamentally transform video streaming services. This study explores the integration of these technological advancements to optimize video streaming processes, ensuring seamless service delivery and superior Quality of Experience (QoE) for users. We propose novel buffer management strategies that leverage the ultra-reliable and low-latency communication features of 6G networks to mitigate issues related to video streaming such as rebuffering and quality fluctuations. Additionally, we examine how insights into viewing behaviors can inform adaptive streaming algorithms, allowing for real-time adjustments that align with user preferences and viewing conditions. The implications of our findings are demonstrated through rigorous simulation studies, which validate the effectiveness of our proposed solutions across diverse scenarios. This research not only highlights the challenges faced in deploying adaptive streaming solutions over 6G but also outlines future directions for research and development in this fast-evolving field.

2024-07-07T16:38:56Z arXiv admin note: This paper has been withdrawn by arXiv due to disputed and unverifiable authorship and affiliation Kevin Nassisid Teef David Kassi Muhammad http://arxiv.org/abs/2406.17112v2 Integrating Generative AI with Network Digital Twins for Enhanced Network Operations 2025-08-01T19:19:51Z

As telecommunications networks become increasingly complex, the integration of advanced technologies such as network digital twins and generative artificial intelligence (AI) emerges as a pivotal solution to enhance network operations and resilience. This paper explores the synergy between network digital twins, which provide a dynamic virtual representation of physical networks, and generative AI, particularly focusing on Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). We propose a novel architectural framework that incorporates these technologies to significantly improve predictive maintenance, network scenario simulation, and real-time data-driven decision-making. Through extensive simulations, we demonstrate how generative AI can enhance the accuracy and operational efficiency of network digital twins, effectively handling real-world complexities such as unpredictable traffic loads and network failures. The findings suggest that this integration not only boosts the capability of digital twins in scenario forecasting and anomaly detection but also facilitates a more adaptive and intelligent network management system.

2024-06-24T19:54:58Z arXiv admin note: This paper has been withdrawn by arXiv due to disputed and unverifiable authorship and affiliation Kassi Muhammad Teef David Giulia Nassisid Tina Farus http://arxiv.org/abs/2508.00782v1 SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation 2025-08-01T17:05:04Z

Audio-driven video generation aims to synthesize realistic videos that align with input audio recordings, akin to the human ability to visualize scenes from auditory input. However, existing approaches predominantly focus on exploring semantic information, such as the classes of sounding sources present in the audio, limiting their ability to generate videos with accurate content and spatial composition. In contrast, we humans can not only naturally identify the semantic categories of sounding sources but also determine their deeply encoded spatial attributes, including locations and movement directions. This useful information can be elucidated by considering specific spatial indicators derived from the inherent physical properties of sound, such as loudness or frequency. As prior methods largely ignore this factor, we present SpA2V, the first framework explicitly exploits these spatial auditory cues from audios to generate videos with high semantic and spatial correspondence. SpA2V decomposes the generation process into two stages: 1) Audio-guided Video Planning: We meticulously adapt a state-of-the-art MLLM for a novel task of harnessing spatial and semantic cues from input audio to construct Video Scene Layouts (VSLs). This serves as an intermediate representation to bridge the gap between the audio and video modalities. 2) Layout-grounded Video Generation: We develop an efficient and effective approach to seamlessly integrate VSLs as conditional guidance into pre-trained diffusion models, enabling VSL-grounded video generation in a training-free manner. Extensive experiments demonstrate that SpA2V excels in generating realistic videos with semantic and spatial alignment to the input audios.

2025-08-01T17:05:04Z The 33rd ACM Multimedia Conference (MM '25) Kien T. Pham Yingqing He Yazhou Xing Qifeng Chen Long Chen 10.1145/3746027.3755705 http://arxiv.org/abs/2508.00428v1 Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation 2025-08-01T08:36:15Z

Text-to-3D (T23D) generation has transformed digital content creation, yet remains bottlenecked by blind trial-and-error prompting processes that yield unpredictable results. While visual prompt engineering has advanced in text-to-image domains, its application to 3D generation presents unique challenges requiring multi-view consistency evaluation and spatial understanding. We present Sel3DCraft, a visual prompt engineering system for T23D that transforms unstructured exploration into a guided visual process. Our approach introduces three key innovations: a dual-branch structure combining retrieval and generation for diverse candidate exploration; a multi-view hybrid scoring approach that leverages MLLMs with innovative high-level metrics to assess 3D models with human-expert consistency; and a prompt-driven visual analytics suite that enables intuitive defect identification and refinement. Extensive testing and user studies demonstrate that Sel3DCraft surpasses other T23D systems in supporting creativity for designers.

2025-08-01T08:36:15Z IEEE VIS VAST 2025 ACM 2012 CCS - Human-centered computing, Visualization, Visualization design and evaluation methods Nan Xiang Tianyi Liang Haiwen Huang Shiqi Jiang Hao Huang Yifei Huang Liangyu Chen Changbo Wang Chenhui Li http://arxiv.org/abs/2508.00424v1 CrossSet: Unveiling the Complex Interplay of Two Set-typed Dimensions in Multivariate Data 2025-08-01T08:25:53Z

The interactive visual analysis of set-typed data, i.e., data with attributes that are of type set, is a rewarding area of research and applications. Valuable prior work has contributed solutions that enable the study of such data with individual set-typed dimensions. In this paper, we present CrossSet, a novel method for the joint study of two set-typed dimensions and their interplay. Based on a task analysis, we describe a new, multi-scale approach to the interactive visual exploration and analysis of such data. Two set-typed data dimensions are jointly visualized using a hierarchical matrix layout, enabling the analysis of the interactions between two set-typed attributes at several levels, in addition to the analysis of individual such dimensions. CrossSet is anchored at a compact, large-scale overview that is complemented by drill-down opportunities to study the relations between and within the set-typed dimensions, enabling an interactive visual multi-scale exploration and analysis of bivariate set-typed data. Such an interactive approach makes it possible to study single set-typed dimensions in detail, to gain an overview of the interaction and association between two such dimensions, to refine one of the dimensions to gain additional details at several levels, and to drill down to the specific interactions of individual set-elements from the set-typed dimensions. To demonstrate the effectiveness and efficiency of CrossSet, we have evaluated the new method in the context of several application scenarios.

2025-08-01T08:25:53Z Will be published in TVCG and presented at IEEE VIS Kresimir Matkovic Rainer Splechtna Denis Gracanin Helwig Hauser http://arxiv.org/abs/2504.12811v2 AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering 2025-08-01T08:12:07Z

Although 3D Gaussian Splatting (3DGS) has revolutionized 3D reconstruction, it still faces challenges such as aliasing, projection artifacts, and view inconsistencies, primarily due to the simplification of treating splats as 2D entities. We argue that incorporating full 3D evaluation of Gaussians throughout the 3DGS pipeline can effectively address these issues while preserving rasterization efficiency. Specifically, we introduce an adaptive 3D smoothing filter to mitigate aliasing and present a stable view-space bounding method that eliminates popping artifacts when Gaussians extend beyond the view frustum. Furthermore, we promote tile-based culling to 3D with screen-space planes, accelerating rendering and reducing sorting costs for hierarchical rasterization. Our method achieves state-of-the-art quality on in-distribution evaluation sets and significantly outperforms other approaches for out-of-distribution views. Our qualitative evaluations further demonstrate the effective removal of aliasing, distortions, and popping artifacts, ensuring real-time, artifact-free rendering.

2025-04-17T10:16:47Z Michael Steiner Thomas Köhler Lukas Radl Felix Windisch Dieter Schmalstieg Markus Steinberger http://arxiv.org/abs/2508.00398v1 Occlusion-robust Stylization for Drawing-based 3D Animation 2025-08-01T07:52:07Z

3D animation aims to generate a 3D animated video from an input image and a target 3D motion sequence. Recent advances in image-to-3D models enable the creation of animations directly from user-hand drawings. Distinguished from conventional 3D animation, drawing-based 3D animation is crucial to preserve artist's unique style properties, such as rough contours and distinct stroke patterns. However, recent methods still exhibit quality deterioration in style properties, especially under occlusions caused by overlapping body parts, leading to contour flickering and stroke blurring. This occurs due to a `stylization pose gap' between training and inference in stylization networks designed to preserve drawing styles in drawing-based 3D animation systems. The stylization pose gap denotes that input target poses used to train the stylization network are always in occlusion-free poses, while target poses encountered in an inference include diverse occlusions under dynamic motions. To this end, we propose Occlusion-robust Stylization Framework (OSF) for drawing-based 3D animation. We found that while employing object's edge can be effective input prior for guiding stylization, it becomes notably inaccurate when occlusions occur at inference. Thus, our proposed OSF provides occlusion-robust edge guidance for stylization network using optical flow, ensuring a consistent stylization even under occlusions. Furthermore, OSF operates in a single run instead of the previous two-stage method, achieving 2.4x faster inference and 2.1x less memory.

2025-08-01T07:52:07Z 11 pages, 13 figures, ICCV 2025 Sunjae Yoon Gwanhyeong Koo Younghwan Lee Ji Woo Hong Chang D. Yoo http://arxiv.org/abs/2507.23454v2 Breaking the mould of Social Mixed Reality - State-of-the-Art and Glossary 2025-08-01T06:05:12Z

This article explores a critical gap in Mixed Reality (MR) technology: while advances have been made, MR still struggles to authentically replicate human embodiment and socio-motor interaction. For MR to enable truly meaningful social experiences, it needs to incorporate multi-modal data streams and multi-agent interaction capabilities. To address this challenge, we present a comprehensive glossary covering key topics such as Virtual Characters and Autonomisation, Responsible AI, Ethics by Design, and the Scientific Challenges of Social MR within Neuroscience, Embodiment, and Technology. Our aim is to drive the transformative evolution of MR technologies that prioritize human-centric innovation, fostering richer digital connections. We advocate for MR systems that enhance social interaction and collaboration between humans and virtual autonomous agents, ensuring inclusivity, ethical design and psychological safety in the process.

2025-07-31T11:31:12Z pre-print Marta Bieńkiewicz Julia Ayache Panayiotis Charalambous Cristina Becchio Marco Corragio Bertram Taetz Francesco De Lellis Antonio Grotta Anna Server Daniel Rammer Richard Kulpa Franck Multon Azucena Garcia-Palacios Jessica Sutherland Kathleen Bryson Stéphane Donikian Didier Stricker Benoît Bardy http://arxiv.org/abs/2412.17812v2 FaceLift: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads 2025-08-01T03:09:31Z

We present FaceLift, a novel feed-forward approach for generalizable high-quality 360-degree 3D head reconstruction from a single image. Our pipeline first employs a multi-view latent diffusion model to generate consistent side and back views from a single facial input, which then feeds into a transformer-based reconstructor that produces a comprehensive 3D Gaussian splats representation. Previous methods for monocular 3D face reconstruction often lack full view coverage or view consistency due to insufficient multi-view supervision. We address this by creating a high-quality synthetic head dataset that enables consistent supervision across viewpoints. To bridge the domain gap between synthetic training data and real-world images, we propose a simple yet effective technique that ensures the view generation process maintains fidelity to the input by learning to reconstruct the input image alongside the view generation. Despite being trained exclusively on synthetic data, our method demonstrates remarkable generalization to real-world images. Through extensive qualitative and quantitative evaluations, we show that FaceLift outperforms state-of-the-art 3D face reconstruction methods on identity preservation, detail recovery, and rendering quality.

2024-12-23T18:59:49Z ICCV 2025 Camera-Ready Version. Project Page: https://weijielyu.github.io/FaceLift Weijie Lyu Yi Zhou Ming-Hsuan Yang Zhixin Shu http://arxiv.org/abs/2508.00950v1 Investigating Crossing Perception in 3D Graph Visualisation 2025-07-31T21:57:07Z

Human perception of graph drawings is influenced by a variety of impact factors for which quality measures are used as a proxy indicator. The investigation of those impact factors and their effects is important to evaluate and improve quality measures and drawing algorithms. The number of edge crossings in a 2D graph drawing has long been a main quality measure for drawing evaluation. The use of stereoscopic 3D graph visualisations has gained attraction over the last years, and results from several studies indicate that they can improve analysis efficiency for a range of analysis scenarios. While edge crossings can also occur in 3D, there are edge configurations in space that are not crossings but might be perceived as such from a specific viewpoint. Such configurations create crossings when projected on the corresponding 2D image plane and could impact readability similar to 2D crossings. In 3D drawings, the additional depth aspect and the subsequent impact factors of edge distance and relative edge direction in space might further influence the importance of those configurations for readability. We investigate the impact of such factors in an empirical study and report on findings of difference between major factor categories.

2025-07-31T21:57:07Z Ying Zhang Niklas Groene Karsten Klein Giuseppe Liotta Falk Schreiber http://arxiv.org/abs/2401.13639v2 Winding Clearness for Differentiable Point Cloud Optimization 2025-07-31T16:22:50Z

We propose to explore the properties of raw point clouds through the \emph{winding clearness}, a concept we first introduce for measuring the clarity of the interior/exterior relationships represented by the winding number field of the point cloud. In geometric modeling, the winding number is a powerful tool for distinguishing the interior and exterior of a given surface $\partial Ω$, and it has been previously used for point normal orientation and surface reconstruction. In this work, we introduce a novel approach to evaluate and optimize the quality of point clouds based on the winding clearness. We observe that point clouds with less noise generally exhibit better winding clearness. Accordingly, we propose an objective function that quantifies the error in winding clearness, solely utilizing the coordinates of the point clouds. Moreover, we demonstrate that the winding clearness error is differentiable and can serve as a loss function in point cloud processing. We present this observation from two aspects: 1) We update the coordinates of the points by back-propagating the loss function for individual point clouds, resulting in an overall improvement without involving a neural network. 2) We incorporate winding clearness as a geometric constraint in the diffusion-based 3D generative model and update the network parameters to generate point clouds with less noise. Experimental results demonstrate the effectiveness of optimizing the winding clearness in enhancing the point cloud quality. Notably, our method exhibits superior performance in handling noisy point clouds with thin structures, highlighting the benefits of the global perspective enabled by the winding number.

2024-01-24T18:09:16Z Accepted by Computer-Aided Design through SPM 2025 Dong Xiao Yueji Ma Zuoqiang Shi Shiqing Xin Wenping Wang Bailin Deng Bin Wang 10.1016/j.cad.2025.103930 http://arxiv.org/abs/2507.01631v2 Tile and Slide : A New Framework for Scaling NeRF from Local to Global 3D Earth Observation 2025-07-31T13:32:03Z

Neural Radiance Fields (NeRF) have recently emerged as a paradigm for 3D reconstruction from multiview satellite imagery. However, state-of-the-art NeRF methods are typically constrained to small scenes due to the memory footprint during training, which we study in this paper. Previous work on large-scale NeRFs palliate this by dividing the scene into NeRFs. This paper introduces Snake-NeRF, a framework that scales to large scenes. Our out-of-core method eliminates the need to load all images and networks simultaneously, and operates on a single device. We achieve this by dividing the region of interest into NeRFs that 3D tile without overlap. Importantly, we crop the images with overlap to ensure each NeRFs is trained with all the necessary pixels. We introduce a novel $2\times 2$ 3D tile progression strategy and segmented sampler, which together prevent 3D reconstruction errors along the tile edges. Our experiments conclude that large satellite images can effectively be processed with linear time complexity, on a single GPU, and without compromise in quality.

2025-07-02T11:59:36Z Accepted at ICCV 2025 Workshop 3D-VAST (From street to space: 3D Vision Across Altitudes). Our code will be made public after the conference at https://github.com/Ellimac0/Snake-NeRF Camille Billouard Dawa Derksen Alexandre Constantin Bruno Vallet http://arxiv.org/abs/2507.15230v3 GALE: Leveraging Heterogeneous Systems for Efficient Unstructured Mesh Data Analysis 2025-07-30T20:08:13Z

Unstructured meshes present challenges in scientific data analysis due to irregular distribution and complex connectivity. Computing and storing connectivity information is a major bottleneck for visualization algorithms, affecting both time and memory performance. Recent task-parallel data structures address this by precomputing connectivity information at runtime while the analysis algorithm executes, effectively hiding computation costs and improving performance. However, existing approaches are CPU-bound, forcing the data structure and analysis algorithm to compete for the same computational resources, limiting potential speedups. To overcome this limitation, we introduce a novel task-parallel approach optimized for heterogeneous CPU-GPU systems. Specifically, we offload the computation of mesh connectivity information to GPU threads, enabling CPU threads to focus on executing the visualization algorithm. Following this paradigm, we propose GALE (GPU-Aided Localized data structurE), the first open-source CUDA-based data structure designed for heterogeneous task parallelism. Experiments on two 20-core CPUs and an NVIDIA V100 GPU show that GALE achieves up to 2.7x speedup over state-of-the-art localized data structures while maintaining memory efficiency.

2025-07-21T04:20:12Z Accepted at IEEE VIS 2025 Guoxi Liu Thomas Randall Rong Ge Federico Iuricich http://arxiv.org/abs/2507.23002v1 Noise-Coded Illumination for Forensic and Photometric Video Analysis 2025-07-30T18:08:34Z

The proliferation of advanced tools for manipulating video has led to an arms race, pitting those who wish to sow disinformation against those who want to detect and expose it. Unfortunately, time favors the ill-intentioned in this race, with fake videos growing increasingly difficult to distinguish from real ones. At the root of this trend is a fundamental advantage held by those manipulating media: equal access to a distribution of what we consider authentic (i.e., "natural") video. In this paper, we show how coding very subtle, noise-like modulations into the illumination of a scene can help combat this advantage by creating an information asymmetry that favors verification. Our approach effectively adds a temporal watermark to any video recorded under coded illumination. However, rather than encoding a specific message, this watermark encodes an image of the unmanipulated scene as it would appear lit only by the coded illumination. We show that even when an adversary knows that our technique is being used, creating a plausible coded fake video amounts to solving a second, more difficult version of the original adversarial content creation problem at an information disadvantage. This is a promising avenue for protecting high-stakes settings like public events and interviews, where the content on display is a likely target for manipulation, and while the illumination can be controlled, the cameras capturing video cannot.

2025-07-30T18:08:34Z ACM Transactions on Graphics (2025), presented at SIGGRAPH 2025 ACM Trans. Graph. 44, 5, Article 165 (October 2025), 16 pages Peter F. Michael Zekun Hao Serge Belongie Abe Davis 10.1145/3742892 http://arxiv.org/abs/2402.00186v3 Distance and Collision Probability Estimation from Gaussian Surface Models 2025-07-30T17:10:59Z

This paper describes continuous-space methodologies to estimate the collision probability, Euclidean distance and gradient between an ellipsoidal robot model and an environment surface modeled as a set of Gaussian distributions. Continuous-space collision probability estimation is critical for uncertainty-aware motion planning. Most collision detection and avoidance approaches assume the robot is modeled as a sphere, but ellipsoidal representations provide tighter approximations and enable navigation in cluttered and narrow spaces. State-of-the-art methods derive the Euclidean distance and gradient by processing raw point clouds, which is computationally expensive for large workspaces. Recent advances in Gaussian surface modeling (e.g. mixture models, splatting) enable compressed and high-fidelity surface representations. Few methods exist to estimate continuous-space occupancy from such models. They require Gaussians to model free space and are unable to estimate the collision probability, Euclidean distance and gradient for an ellipsoidal robot. The proposed methods bridge this gap by extending prior work in ellipsoid-to-ellipsoid Euclidean distance and collision probability estimation to Gaussian surface models. A geometric blending approach is also proposed to improve collision probability estimation. The approaches are evaluated with numerical 2D and 3D experiments using real-world point cloud data. Methods for efficient calculation of these quantities are demonstrated to execute within a few microseconds per ellipsoid pair using a single-thread on low-power CPUs of modern embedded computers

2024-01-31T21:28:40Z Accepted at IROS 2025 Kshitij Goel Wennie Tabib