https://arxiv.org/api/EKJYaSAzBMUVmzcGl+cYA4z/Cik 2026-06-28T21:01:02Z 9390 1980 15 http://arxiv.org/abs/2506.23001v1 The ultimate display: Where will all the pixels come from? 2025-06-28T20:27:00Z

Could the answer be to compute fewer pixels? Renderers that break traditional framed patterns and opt for temporally adaptive sampling might be the key to printer-resolution wall displays that update hundreds of times per second.

2025-06-28T20:27:00Z IEEE Computer (2005). Volume 38, Issue 8, Pages 54-61 Benjamin Watson David Luebke 10.1109/MC.2005.274 http://arxiv.org/abs/2506.22973v1 Confident Splatting: Confidence-Based Compression of 3D Gaussian Splatting via Learnable Beta Distributions 2025-06-28T18:11:30Z

3D Gaussian Splatting enables high-quality real-time rendering but often produces millions of splats, resulting in excessive storage and computational overhead. We propose a novel lossy compression method based on learnable confidence scores modeled as Beta distributions. Each splat's confidence is optimized through reconstruction-aware losses, enabling pruning of low-confidence splats while preserving visual fidelity. The proposed approach is architecture-agnostic and can be applied to any Gaussian Splatting variant. In addition, the average confidence values serve as a new metric to assess the quality of the scene. Extensive experiments demonstrate favorable trade-offs between compression and fidelity compared to prior work. Our code and data are publicly available at https://github.com/amirhossein-razlighi/Confident-Splatting

2025-06-28T18:11:30Z AmirHossein Naghi Razlighi Elaheh Badali Golezani Shohreh Kasaei http://arxiv.org/abs/2407.08906v3 AirSketch: Generative Motion to Sketch 2025-06-28T17:16:06Z

Illustration is a fundamental mode of human expression and communication. Certain types of motion that accompany speech can provide this illustrative mode of communication. While Augmented and Virtual Reality technologies (AR/VR) have introduced tools for producing drawings with hand motions (air drawing), they typically require costly hardware and additional digital markers, thereby limiting their accessibility and portability. Furthermore, air drawing demands considerable skill to achieve aesthetic results. To address these challenges, we introduce the concept of AirSketch, aimed at generating faithful and visually coherent sketches directly from hand motions, eliminating the need for complicated headsets or markers. We devise a simple augmentation-based self-supervised training procedure, enabling a controllable image diffusion model to learn to translate from highly noisy hand tracking images to clean, aesthetically pleasing sketches, while preserving the essential visual cues from the original tracking data. We present two air drawing datasets to study this problem. Our findings demonstrate that beyond producing photo-realistic images from precise spatial inputs, controllable image diffusion can effectively produce a refined, clear sketch from a noisy input. Our work serves as an initial step towards marker-less air drawing and reveals distinct applications of controllable diffusion models to AirSketch and AR/VR in general.

2024-07-12T00:52:04Z Hui Xian Grace Lim Xuanming Cui Yogesh S Rawat Ser-Nam Lim http://arxiv.org/abs/2506.22926v1 Coordinated 2D-3D Visualization of Volumetric Medical Data in XR with Multimodal Interactions 2025-06-28T15:23:13Z

Volumetric medical imaging technologies produce detailed 3D representations of anatomical structures. However, effective medical data visualization and exploration pose significant challenges, especially for individuals with limited medical expertise. We introduce a novel XR-based system with two key innovations: (1) a coordinated visualization module integrating Multi-layered Multi-planar Reconstruction with 3D mesh models and (2) a multimodal interaction framework combining hand gestures with LLM-enabled voice commands. We conduct preliminary evaluations, including a 15-participant user study and expert interviews, to demonstrate the system's abilities to enhance spatial understanding and reduce cognitive load. Experimental results show notable improvements in task completion times, usability metrics, and interaction effectiveness enhanced by LLM-driven voice control. While identifying areas for future refinement, our findings highlight the potential of this immersive visualization system to advance medical training and clinical practice. Our demo application and supplemental materials are available for download at: https://osf.io/bpjq5/.

2025-06-28T15:23:13Z IEEE VIS 2025 Short Paper Qixuan Liu Shi Qiu Yinqiao Wang Xiwen Wu Kenneth Siu Ho Chok Chi-Wing Fu Pheng-Ann Heng http://arxiv.org/abs/2506.22907v1 MagShield: Towards Better Robustness in Sparse Inertial Motion Capture Under Magnetic Disturbances 2025-06-28T14:42:59Z

This paper proposes a novel method called MagShield, designed to address the issue of magnetic interference in sparse inertial motion capture (MoCap) systems. Existing Inertial Measurement Unit (IMU) systems are prone to orientation estimation errors in magnetically disturbed environments, limiting their practical application in real-world scenarios. To address this problem, MagShield employs a "detect-then-correct" strategy, first detecting magnetic disturbances through multi-IMU joint analysis, and then correcting orientation errors using human motion priors. MagShield can be integrated with most existing sparse inertial MoCap systems, improving their performance in magnetically disturbed environments. Experimental results demonstrate that MagShield significantly enhances the accuracy of motion capture under magnetic interference and exhibits good compatibility across different sparse inertial MoCap systems.

2025-06-28T14:42:59Z Yunzhe Shao Xinyu Yi Lu Yin Shihui Guo Junhai Yong Feng Xu http://arxiv.org/abs/2506.22849v1 DOBB-BVH: Efficient Ray Traversal by Transforming Wide BVHs into Oriented Bounding Box Trees using Discrete Rotations 2025-06-28T11:12:35Z

Oriented bounding box (OBB) bounding volume hierarchies offer a more precise fit than axis-aligned bounding box hierarchies in scenarios with thin elongated and arbitrarily rotated geometry, enhancing intersection test performance in ray tracing. However, determining optimally oriented bounding boxes can be computationally expensive and have high memory requirements. Recent research has shown that pre-built hierarchies can be efficiently converted to OBB hierarchies on the GPU in a bottom-up pass, yielding significant ray tracing traversal improvements. In this paper, we introduce a novel OBB construction technique where all internal node children share a consistent OBB transform, chosen from a fixed set of discrete quantized rotations. This allows for efficient encoding and reduces the computational complexity of OBB transformations. We further extend our approach to hierarchies with multiple children per node by leveraging Discrete Orientation Polytopes (k-DOPs), demonstrating improvements in traversal performance while limiting the build time impact for real-time applications. Our method is applied as a post-processing step, integrating seamlessly into existing hierarchy construction pipelines. Despite a 12.6% increase in build time, our experimental results demonstrate an average improvement of 18.5% in primary, 32.4% in secondary rays, and maximum gain of 65% in ray intersection performance, highlighting its potential for advancing real-time applications.

2025-06-28T11:12:35Z 10 pages main content, 3 pages appendix Michael A. Kern Alain Galvan David Oldcorn Daniel Skinner Rohan Mehalwal Leo Reyes Lozano Matthäus G. Chajdas http://arxiv.org/abs/2506.22799v1 VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding 2025-06-28T08:02:43Z

3D Gaussian Splatting (3DGS) has become horsepower in high-quality, real-time rendering for novel view synthesis of 3D scenes. However, existing methods focus primarily on geometric and appearance modeling, lacking deeper scene understanding while also incurring high training costs that complicate the originally streamlined differentiable rendering pipeline. To this end, we propose VoteSplat, a novel 3D scene understanding framework that integrates Hough voting with 3DGS. Specifically, Segment Anything Model (SAM) is utilized for instance segmentation, extracting objects, and generating 2D vote maps. We then embed spatial offset vectors into Gaussian primitives. These offsets construct 3D spatial votes by associating them with 2D image votes, while depth distortion constraints refine localization along the depth axis. For open-vocabulary object localization, VoteSplat maps 2D image semantics to 3D point clouds via voting points, reducing training costs associated with high-dimensional CLIP features while preserving semantic unambiguity. Extensive experiments demonstrate effectiveness of VoteSplat in open-vocabulary 3D instance localization, 3D point cloud understanding, click-based 3D object localization, hierarchical segmentation, and ablation studies. Our code is available at https://sy-ja.github.io/votesplat/

2025-06-28T08:02:43Z Accepted to ICCV 2025 Minchao Jiang Shunyu Jia Jiaming Gu Xiaoyuan Lu Guangming Zhu Anqi Dong Liang Zhang http://arxiv.org/abs/2506.22583v1 Supra-threshold control of peripheral LOD 2025-06-27T19:01:45Z

Level of detail (LOD) is widely used to control visual feedback in interactive applications. LOD control is typically based on perception at threshold - the conditions in which a stimulus first becomes perceivable. Yet most LOD manipulations are quite perceivable and occur well above threshold. Moreover, research shows that supra-threshold perception differs drastically from perception at threshold. In that case, should supra-threshold LOD control also differ from LOD control at threshold? In two experiments, we examine supra-threshold LOD control in the visual periphery and find that indeed, it should differ drastically from LOD control at threshold. Specifically, we find that LOD must support a task-dependent level of reliable perceptibility. Above that level, perceptibility of LOD control manipulations should be minimized, and detail contrast is a better predictor of perceptibility than detail size. Below that level, perceptibility must be maximized, and LOD should be improved as eccentricity rises or contrast drops. This directly contradicts prevailing threshold-based LOD control schemes, and strongly suggests a reexamination of LOD control for foveal display.

2025-06-27T19:01:45Z ACM Transactions on Graphics (TOG) (2004), Volume 23, Issue 3, Pages 750-759 Benjamin Watson Neff Walker Larry F Hodges 10.1145/1015706.1015796 http://arxiv.org/abs/2506.22319v1 Asymptotic analysis and design of shell-based thermal lattice metamaterials 2025-06-27T15:34:13Z

We present a rigorous asymptotic analysis framework for investigating the thermal conductivity of shell lattice metamaterials, extending prior work from mechanical stiffness to heat transfer. Central to our analysis is a new metric, the asymptotic directional conductivity (ADC), which captures the leading-order influence of the middle surface geometry on the effective thermal conductivity in the vanishing-thickness limit. A convergence theorem is established for evaluating ADC, along with a sharp upper bound and the necessary and sufficient condition for achieving this bound. These results provide the first theoretical justification for the optimal thermal conductivity of triply periodic minimal surfaces. Furthermore, we show that ADC yields a third-order approximation to the effective conductivity of shell lattices at low volume fractions. To support practical design applications, we develop a discrete algorithm for computing and optimizing ADC over arbitrary periodic surfaces. Numerical results confirm the theoretical predictions and demonstrate the robustness and effectiveness of the proposed optimization algorithm.

2025-06-27T15:34:13Z Di Zhang Ligang Liu http://arxiv.org/abs/2506.22250v1 A Design Space for Visualization Transitions of 3D Spatial Data in Hybrid AR-Desktop Environments 2025-06-27T14:16:07Z

We present a design space for animated transitions of the appearance of 3D spatial datasets in a hybrid Augmented Reality (AR)-desktop context. Such hybrid interfaces combine both traditional and immersive displays to facilitate the exploration of 2D and 3D data representations in the environment in which they are best displayed. One key aspect is to introduce transitional animations that change between the different dimensionalities to illustrate the connection between the different representations and to reduce the potential cognitive load on the user. The specific transitions to be used depend on the type of data, the needs of the application domain, and other factors. We summarize these as a transition design space to simplify the decision-making process and provide inspiration for future designs. First, we discuss 3D visualizations from a spatial perspective: a spatial encoding pipeline, where 3D data sampled from the physical world goes through various transformations, being mapped to visual representations, and then being integrated into a hybrid AR-desktop environment. The transition design then focuses on interpolating between two spatial encoding pipelines to provide a smooth experience. To illustrate the use of our design space, we apply it to three case studies that focus on applications in astronomy, radiology, and chemistry; we then discuss lessons learned from these applications.

2025-06-27T14:16:07Z 14 pages, 6 figures Computer Graphics Forum, vol. 45, article no. e70305, 14 pages, 2026 Yucheng Lu Tobias Rau Benjamin Lee Andreas Köhn Michael Sedlmair Christian Sandor Tobias Isenberg 10.1111/cgf.70305 http://arxiv.org/abs/2503.22605v2 Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis 2025-06-27T02:42:26Z

Talking head synthesis has emerged as a prominent research topic in computer graphics and multimedia, yet most existing methods often struggle to strike a balance between generation quality and computational efficiency, particularly under real-time constraints. In this paper, we propose a novel framework that integrates Gaussian Splatting with a structured Audio Factorization Plane (Audio-Plane) to enable high-quality, audio-synchronized, and real-time talking head generation. For modeling a dynamic talking head, a 4D volume representation, which consists of three axes in 3D space and one temporal axis aligned with audio progression, is typically required. However, directly storing and processing a dense 4D grid is impractical due to the high memory and computation cost, and lack of scalability for longer durations. We address this challenge by decomposing the 4D volume representation into a set of audio-independent spatial planes and audio-dependent planes, forming a compact and interpretable representation for talking head modeling that we refer to as the Audio-Plane. This factorized design allows for efficient and fine-grained audio-aware spatial encoding, and significantly enhances the model's ability to capture complex lip dynamics driven by speech signals. To further improve region-specific motion modeling, we introduce an audio-guided saliency splatting mechanism based on region-aware modulation, which adaptively emphasizes highly dynamic regions such as the mouth area. This allows the model to focus its learning capacity on where it matters most for accurate speech-driven animation. Extensive experiments on both the self-driven and the cross-driven settings demonstrate that our method achieves state-of-the-art visual quality, precise audio-lip synchronization, and real-time performance, outperforming prior approaches across both 2D- and 3D-based paradigms.

2025-03-28T16:50:27Z Demo video at \url{https://sstzal.github.io/Audio-Plane/} Shuai Shen Wanhua Li Yunpeng Zhang Yap-Peng Tan Jiwen Lu http://arxiv.org/abs/2506.21845v1 3Description: An Intuitive Human-AI Collaborative 3D Modeling Approach 2025-06-27T01:33:46Z

This paper presents 3Description, an experimental human-AI collaborative approach for intuitive 3D modeling. 3Description aims to address accessibility and usability challenges in traditional 3D modeling by enabling non-professional individuals to co-create 3D models using verbal and gesture descriptions. Through a combination of qualitative research, product analysis, and user testing, 3Description integrates AI technologies such as Natural Language Processing and Computer Vision, powered by OpenAI and MediaPipe. Recognizing the web has wide cross-platform capabilities, 3Description is web-based, allowing users to describe the desired model and subsequently adjust its components using verbal and gestural inputs. In the era of AI and emerging media, 3Description not only contributes to a more inclusive and user-friendly design process, empowering more people to participate in the construction of the future 3D world, but also strives to increase human engagement in co-creation with AI, thereby avoiding undue surrender to technology and preserving human creativity.

2025-06-27T01:33:46Z 5 pages, 2 figures, 3 tables (containing 21 subfigures) ARTECH '23: Proceedings of the 11th International Conference on Digital and Interactive Arts, ACM, 2024 Zhuodi Cai 10.1145/3632776.3632785 http://arxiv.org/abs/2506.21272v2 FairyGen: Storied Cartoon Video from a Single Child-Drawn Character 2025-06-27T01:04:39Z

We propose FairyGen, an automatic system for generating story-driven cartoon videos from a single child's drawing, while faithfully preserving its unique artistic style. Unlike previous storytelling methods that primarily focus on character consistency and basic motion, FairyGen explicitly disentangles character modeling from stylized background generation and incorporates cinematic shot design to support expressive and coherent storytelling. Given a single character sketch, we first employ an MLLM to generate a structured storyboard with shot-level descriptions that specify environment settings, character actions, and camera perspectives. To ensure visual consistency, we introduce a style propagation adapter that captures the character's visual style and applies it to the background, faithfully retaining the character's full visual identity while synthesizing style-consistent scenes. A shot design module further enhances visual diversity and cinematic quality through frame cropping and multi-view synthesis based on the storyboard. To animate the story, we reconstruct a 3D proxy of the character to derive physically plausible motion sequences, which are then used to fine-tune an MMDiT-based image-to-video diffusion model. We further propose a two-stage motion customization adapter: the first stage learns appearance features from temporally unordered frames, disentangling identity from motion; the second stage models temporal dynamics using a timestep-shift strategy with frozen identity weights. Once trained, FairyGen directly renders diverse and coherent video scenes aligned with the storyboard. Extensive experiments demonstrate that our system produces animations that are stylistically faithful, narratively structured natural motion, highlighting its potential for personalized and engaging story animation. The code will be available at https://github.com/GVCLab/FairyGen

2025-06-26T13:58:16Z Project Page: https://jayleejia.github.io/FairyGen/ ; Code: https://github.com/GVCLab/FairyGen Jiayi Zheng Xiaodong Cun http://arxiv.org/abs/2406.18582v3 CanFields: Consolidating Diffeomorphic Flows for Non-Rigid 4D Interpolation from Arbitrary-Length Sequences 2025-06-26T17:53:33Z

We introduce Canonical Consolidation Fields (CanFields). This novel method interpolates arbitrary-length sequences of independently sampled 3D point clouds into a unified, continuous, and coherent deforming shape. Unlike prior methods that oversmooth geometry or produce topological and geometric artifacts, CanFields optimizes fine-detailed geometry and deformation jointly in an unsupervised fitting with two novel bespoke modules. First, we introduce a dynamic consolidator module that adjusts the input and assigns confidence scores, balancing the optimization of the canonical shape and its motion. Second, we represent the motion as a diffeomorphic flow parameterized by a smooth velocity field. We have validated our robustness and accuracy on more than 50 diverse sequences, demonstrating its superior performance even with missing regions, noisy raw scans, and sparse data. Our project page is at: https://wangmiaowei.github.io/CanFields.github.io/.

2024-06-05T17:07:55Z ICCV2025 Accepted Miaowei Wang Changjian Li Amir Vaxman http://arxiv.org/abs/2506.21456v1 Managing level of detail through head-tracked peripheral degradation: a model and resulting design principles 2025-06-26T16:35:38Z

Previous work has demonstrated the utility of reductions in the level of detail (LOD) in the periphery of head-tracked, large field of view displays. This paper provides a psychophysically based model, centered around an eye/head movement tradeoff, that explains the effectiveness of peripheral degradation and suggests how peripherally degraded displays should be designed. An experiment evaluating the effect on search performance of the shape and area of the high detail central area (inset) in peripherally degraded displays was performed, results indicated that inset shape is not a significant factor in performance. Inset area, however, was significant: performance with displays subtending at least 30 degrees of horizontal and vertical angle was not significantly different from performance with an undegraded display. These results agreed with the proposed model.

2025-06-26T16:35:38Z Proceedings of the ACM symposium on Virtual reality software and technology (1997). Pages 59-63. ACM Benjamin Watson Neff Walker Larry F Hodges 10.1145/261135.261148