https://arxiv.org/api/EKJYaSAzBMUVmzcGl+cYA4z/Cik2026-06-28T21:01:02Z9390198015http://arxiv.org/abs/2506.23001v1The ultimate display: Where will all the pixels come from?2025-06-28T20:27:00ZCould the answer be to compute fewer pixels? Renderers that break traditional framed patterns and opt for temporally adaptive sampling might be the key to printer-resolution wall displays that update hundreds of times per second.2025-06-28T20:27:00ZIEEE Computer (2005). Volume 38, Issue 8, Pages 54-61Benjamin WatsonDavid Luebke10.1109/MC.2005.274http://arxiv.org/abs/2506.22973v1Confident Splatting: Confidence-Based Compression of 3D Gaussian Splatting via Learnable Beta Distributions2025-06-28T18:11:30Z3D Gaussian Splatting enables high-quality real-time rendering but often produces millions of splats, resulting in excessive storage and computational overhead. We propose a novel lossy compression method based on learnable confidence scores modeled as Beta distributions. Each splat's confidence is optimized through reconstruction-aware losses, enabling pruning of low-confidence splats while preserving visual fidelity. The proposed approach is architecture-agnostic and can be applied to any Gaussian Splatting variant. In addition, the average confidence values serve as a new metric to assess the quality of the scene. Extensive experiments demonstrate favorable trade-offs between compression and fidelity compared to prior work. Our code and data are publicly available at https://github.com/amirhossein-razlighi/Confident-Splatting2025-06-28T18:11:30ZAmirHossein Naghi RazlighiElaheh Badali GolezaniShohreh Kasaeihttp://arxiv.org/abs/2407.08906v3AirSketch: Generative Motion to Sketch2025-06-28T17:16:06ZIllustration is a fundamental mode of human expression and communication. Certain types of motion that accompany speech can provide this illustrative mode of communication. While Augmented and Virtual Reality technologies (AR/VR) have introduced tools for producing drawings with hand motions (air drawing), they typically require costly hardware and additional digital markers, thereby limiting their accessibility and portability. Furthermore, air drawing demands considerable skill to achieve aesthetic results. To address these challenges, we introduce the concept of AirSketch, aimed at generating faithful and visually coherent sketches directly from hand motions, eliminating the need for complicated headsets or markers. We devise a simple augmentation-based self-supervised training procedure, enabling a controllable image diffusion model to learn to translate from highly noisy hand tracking images to clean, aesthetically pleasing sketches, while preserving the essential visual cues from the original tracking data. We present two air drawing datasets to study this problem. Our findings demonstrate that beyond producing photo-realistic images from precise spatial inputs, controllable image diffusion can effectively produce a refined, clear sketch from a noisy input. Our work serves as an initial step towards marker-less air drawing and reveals distinct applications of controllable diffusion models to AirSketch and AR/VR in general.2024-07-12T00:52:04ZHui Xian Grace LimXuanming CuiYogesh S RawatSer-Nam Limhttp://arxiv.org/abs/2506.22926v1Coordinated 2D-3D Visualization of Volumetric Medical Data in XR with Multimodal Interactions2025-06-28T15:23:13ZVolumetric medical imaging technologies produce detailed 3D representations of anatomical structures. However, effective medical data visualization and exploration pose significant challenges, especially for individuals with limited medical expertise. We introduce a novel XR-based system with two key innovations: (1) a coordinated visualization module integrating Multi-layered Multi-planar Reconstruction with 3D mesh models and (2) a multimodal interaction framework combining hand gestures with LLM-enabled voice commands. We conduct preliminary evaluations, including a 15-participant user study and expert interviews, to demonstrate the system's abilities to enhance spatial understanding and reduce cognitive load. Experimental results show notable improvements in task completion times, usability metrics, and interaction effectiveness enhanced by LLM-driven voice control. While identifying areas for future refinement, our findings highlight the potential of this immersive visualization system to advance medical training and clinical practice. Our demo application and supplemental materials are available for download at: https://osf.io/bpjq5/.2025-06-28T15:23:13ZIEEE VIS 2025 Short PaperQixuan LiuShi QiuYinqiao WangXiwen WuKenneth Siu Ho ChokChi-Wing FuPheng-Ann Henghttp://arxiv.org/abs/2506.22907v1MagShield: Towards Better Robustness in Sparse Inertial Motion Capture Under Magnetic Disturbances2025-06-28T14:42:59ZThis paper proposes a novel method called MagShield, designed to address the issue of magnetic interference in sparse inertial motion capture (MoCap) systems. Existing Inertial Measurement Unit (IMU) systems are prone to orientation estimation errors in magnetically disturbed environments, limiting their practical application in real-world scenarios. To address this problem, MagShield employs a "detect-then-correct" strategy, first detecting magnetic disturbances through multi-IMU joint analysis, and then correcting orientation errors using human motion priors. MagShield can be integrated with most existing sparse inertial MoCap systems, improving their performance in magnetically disturbed environments. Experimental results demonstrate that MagShield significantly enhances the accuracy of motion capture under magnetic interference and exhibits good compatibility across different sparse inertial MoCap systems.2025-06-28T14:42:59ZYunzhe ShaoXinyu YiLu YinShihui GuoJunhai YongFeng Xuhttp://arxiv.org/abs/2506.22849v1DOBB-BVH: Efficient Ray Traversal by Transforming Wide BVHs into Oriented Bounding Box Trees using Discrete Rotations2025-06-28T11:12:35ZOriented bounding box (OBB) bounding volume hierarchies offer a more precise fit than axis-aligned bounding box hierarchies in scenarios with thin elongated and arbitrarily rotated geometry, enhancing intersection test performance in ray tracing. However, determining optimally oriented bounding boxes can be computationally expensive and have high memory requirements. Recent research has shown that pre-built hierarchies can be efficiently converted to OBB hierarchies on the GPU in a bottom-up pass, yielding significant ray tracing traversal improvements. In this paper, we introduce a novel OBB construction technique where all internal node children share a consistent OBB transform, chosen from a fixed set of discrete quantized rotations. This allows for efficient encoding and reduces the computational complexity of OBB transformations. We further extend our approach to hierarchies with multiple children per node by leveraging Discrete Orientation Polytopes (k-DOPs), demonstrating improvements in traversal performance while limiting the build time impact for real-time applications. Our method is applied as a post-processing step, integrating seamlessly into existing hierarchy construction pipelines. Despite a 12.6% increase in build time, our experimental results demonstrate an average improvement of 18.5% in primary, 32.4% in secondary rays, and maximum gain of 65% in ray intersection performance, highlighting its potential for advancing real-time applications.2025-06-28T11:12:35Z10 pages main content, 3 pages appendixMichael A. KernAlain GalvanDavid OldcornDaniel SkinnerRohan MehalwalLeo Reyes LozanoMatthäus G. Chajdashttp://arxiv.org/abs/2506.22799v1VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding2025-06-28T08:02:43Z3D Gaussian Splatting (3DGS) has become horsepower in high-quality, real-time rendering for novel view synthesis of 3D scenes. However, existing methods focus primarily on geometric and appearance modeling, lacking deeper scene understanding while also incurring high training costs that complicate the originally streamlined differentiable rendering pipeline. To this end, we propose VoteSplat, a novel 3D scene understanding framework that integrates Hough voting with 3DGS. Specifically, Segment Anything Model (SAM) is utilized for instance segmentation, extracting objects, and generating 2D vote maps. We then embed spatial offset vectors into Gaussian primitives. These offsets construct 3D spatial votes by associating them with 2D image votes, while depth distortion constraints refine localization along the depth axis. For open-vocabulary object localization, VoteSplat maps 2D image semantics to 3D point clouds via voting points, reducing training costs associated with high-dimensional CLIP features while preserving semantic unambiguity. Extensive experiments demonstrate effectiveness of VoteSplat in open-vocabulary 3D instance localization, 3D point cloud understanding, click-based 3D object localization, hierarchical segmentation, and ablation studies. Our code is available at https://sy-ja.github.io/votesplat/2025-06-28T08:02:43ZAccepted to ICCV 2025Minchao JiangShunyu JiaJiaming GuXiaoyuan LuGuangming ZhuAnqi DongLiang Zhanghttp://arxiv.org/abs/2506.22583v1Supra-threshold control of peripheral LOD2025-06-27T19:01:45ZLevel of detail (LOD) is widely used to control visual feedback in interactive applications. LOD control is typically based on perception at threshold - the conditions in which a stimulus first becomes perceivable. Yet most LOD manipulations are quite perceivable and occur well above threshold. Moreover, research shows that supra-threshold perception differs drastically from perception at threshold. In that case, should supra-threshold LOD control also differ from LOD control at threshold?
In two experiments, we examine supra-threshold LOD control in the visual periphery and find that indeed, it should differ drastically from LOD control at threshold. Specifically, we find that LOD must support a task-dependent level of reliable perceptibility. Above that level, perceptibility of LOD control manipulations should be minimized, and detail contrast is a better predictor of perceptibility than detail size. Below that level, perceptibility must be maximized, and LOD should be improved as eccentricity rises or contrast drops. This directly contradicts prevailing threshold-based LOD control schemes, and strongly suggests a reexamination of LOD control for foveal display.2025-06-27T19:01:45ZACM Transactions on Graphics (TOG) (2004), Volume 23, Issue 3, Pages 750-759Benjamin WatsonNeff WalkerLarry F Hodges10.1145/1015706.1015796http://arxiv.org/abs/2506.22319v1Asymptotic analysis and design of shell-based thermal lattice metamaterials2025-06-27T15:34:13ZWe present a rigorous asymptotic analysis framework for investigating the thermal conductivity of shell lattice metamaterials, extending prior work from mechanical stiffness to heat transfer. Central to our analysis is a new metric, the asymptotic directional conductivity (ADC), which captures the leading-order influence of the middle surface geometry on the effective thermal conductivity in the vanishing-thickness limit. A convergence theorem is established for evaluating ADC, along with a sharp upper bound and the necessary and sufficient condition for achieving this bound. These results provide the first theoretical justification for the optimal thermal conductivity of triply periodic minimal surfaces. Furthermore, we show that ADC yields a third-order approximation to the effective conductivity of shell lattices at low volume fractions. To support practical design applications, we develop a discrete algorithm for computing and optimizing ADC over arbitrary periodic surfaces. Numerical results confirm the theoretical predictions and demonstrate the robustness and effectiveness of the proposed optimization algorithm.2025-06-27T15:34:13ZDi ZhangLigang Liuhttp://arxiv.org/abs/2506.22250v1A Design Space for Visualization Transitions of 3D Spatial Data in Hybrid AR-Desktop Environments2025-06-27T14:16:07ZWe present a design space for animated transitions of the appearance of 3D spatial datasets in a hybrid Augmented Reality (AR)-desktop context. Such hybrid interfaces combine both traditional and immersive displays to facilitate the exploration of 2D and 3D data representations in the environment in which they are best displayed. One key aspect is to introduce transitional animations that change between the different dimensionalities to illustrate the connection between the different representations and to reduce the potential cognitive load on the user. The specific transitions to be used depend on the type of data, the needs of the application domain, and other factors. We summarize these as a transition design space to simplify the decision-making process and provide inspiration for future designs. First, we discuss 3D visualizations from a spatial perspective: a spatial encoding pipeline, where 3D data sampled from the physical world goes through various transformations, being mapped to visual representations, and then being integrated into a hybrid AR-desktop environment. The transition design then focuses on interpolating between two spatial encoding pipelines to provide a smooth experience. To illustrate the use of our design space, we apply it to three case studies that focus on applications in astronomy, radiology, and chemistry; we then discuss lessons learned from these applications.2025-06-27T14:16:07Z14 pages, 6 figuresComputer Graphics Forum, vol. 45, article no. e70305, 14 pages, 2026Yucheng LuTobias RauBenjamin LeeAndreas KöhnMichael SedlmairChristian SandorTobias Isenberg10.1111/cgf.70305http://arxiv.org/abs/2503.22605v2Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis2025-06-27T02:42:26ZTalking head synthesis has emerged as a prominent research topic in computer graphics and multimedia, yet most existing methods often struggle to strike a balance between generation quality and computational efficiency, particularly under real-time constraints. In this paper, we propose a novel framework that integrates Gaussian Splatting with a structured Audio Factorization Plane (Audio-Plane) to enable high-quality, audio-synchronized, and real-time talking head generation. For modeling a dynamic talking head, a 4D volume representation, which consists of three axes in 3D space and one temporal axis aligned with audio progression, is typically required. However, directly storing and processing a dense 4D grid is impractical due to the high memory and computation cost, and lack of scalability for longer durations. We address this challenge by decomposing the 4D volume representation into a set of audio-independent spatial planes and audio-dependent planes, forming a compact and interpretable representation for talking head modeling that we refer to as the Audio-Plane. This factorized design allows for efficient and fine-grained audio-aware spatial encoding, and significantly enhances the model's ability to capture complex lip dynamics driven by speech signals. To further improve region-specific motion modeling, we introduce an audio-guided saliency splatting mechanism based on region-aware modulation, which adaptively emphasizes highly dynamic regions such as the mouth area. This allows the model to focus its learning capacity on where it matters most for accurate speech-driven animation. Extensive experiments on both the self-driven and the cross-driven settings demonstrate that our method achieves state-of-the-art visual quality, precise audio-lip synchronization, and real-time performance, outperforming prior approaches across both 2D- and 3D-based paradigms.2025-03-28T16:50:27ZDemo video at \url{https://sstzal.github.io/Audio-Plane/}Shuai ShenWanhua LiYunpeng ZhangYap-Peng TanJiwen Luhttp://arxiv.org/abs/2506.21845v13Description: An Intuitive Human-AI Collaborative 3D Modeling Approach2025-06-27T01:33:46ZThis paper presents 3Description, an experimental human-AI collaborative approach for intuitive 3D modeling. 3Description aims to address accessibility and usability challenges in traditional 3D modeling by enabling non-professional individuals to co-create 3D models using verbal and gesture descriptions. Through a combination of qualitative research, product analysis, and user testing, 3Description integrates AI technologies such as Natural Language Processing and Computer Vision, powered by OpenAI and MediaPipe. Recognizing the web has wide cross-platform capabilities, 3Description is web-based, allowing users to describe the desired model and subsequently adjust its components using verbal and gestural inputs. In the era of AI and emerging media, 3Description not only contributes to a more inclusive and user-friendly design process, empowering more people to participate in the construction of the future 3D world, but also strives to increase human engagement in co-creation with AI, thereby avoiding undue surrender to technology and preserving human creativity.2025-06-27T01:33:46Z5 pages, 2 figures, 3 tables (containing 21 subfigures)ARTECH '23: Proceedings of the 11th International Conference on Digital and Interactive Arts, ACM, 2024Zhuodi Cai10.1145/3632776.3632785http://arxiv.org/abs/2506.21272v2FairyGen: Storied Cartoon Video from a Single Child-Drawn Character2025-06-27T01:04:39ZWe propose FairyGen, an automatic system for generating story-driven cartoon videos from a single child's drawing, while faithfully preserving its unique artistic style. Unlike previous storytelling methods that primarily focus on character consistency and basic motion, FairyGen explicitly disentangles character modeling from stylized background generation and incorporates cinematic shot design to support expressive and coherent storytelling. Given a single character sketch, we first employ an MLLM to generate a structured storyboard with shot-level descriptions that specify environment settings, character actions, and camera perspectives. To ensure visual consistency, we introduce a style propagation adapter that captures the character's visual style and applies it to the background, faithfully retaining the character's full visual identity while synthesizing style-consistent scenes. A shot design module further enhances visual diversity and cinematic quality through frame cropping and multi-view synthesis based on the storyboard. To animate the story, we reconstruct a 3D proxy of the character to derive physically plausible motion sequences, which are then used to fine-tune an MMDiT-based image-to-video diffusion model. We further propose a two-stage motion customization adapter: the first stage learns appearance features from temporally unordered frames, disentangling identity from motion; the second stage models temporal dynamics using a timestep-shift strategy with frozen identity weights. Once trained, FairyGen directly renders diverse and coherent video scenes aligned with the storyboard. Extensive experiments demonstrate that our system produces animations that are stylistically faithful, narratively structured natural motion, highlighting its potential for personalized and engaging story animation. The code will be available at https://github.com/GVCLab/FairyGen2025-06-26T13:58:16ZProject Page: https://jayleejia.github.io/FairyGen/ ; Code: https://github.com/GVCLab/FairyGenJiayi ZhengXiaodong Cunhttp://arxiv.org/abs/2406.18582v3CanFields: Consolidating Diffeomorphic Flows for Non-Rigid 4D Interpolation from Arbitrary-Length Sequences2025-06-26T17:53:33ZWe introduce Canonical Consolidation Fields (CanFields). This novel method interpolates arbitrary-length sequences of independently sampled 3D point clouds into a unified, continuous, and coherent deforming shape. Unlike prior methods that oversmooth geometry or produce topological and geometric artifacts, CanFields optimizes fine-detailed geometry and deformation jointly in an unsupervised fitting with two novel bespoke modules. First, we introduce a dynamic consolidator module that adjusts the input and assigns confidence scores, balancing the optimization of the canonical shape and its motion. Second, we represent the motion as a diffeomorphic flow parameterized by a smooth velocity field. We have validated our robustness and accuracy on more than 50 diverse sequences, demonstrating its superior performance even with missing regions, noisy raw scans, and sparse data. Our project page is at: https://wangmiaowei.github.io/CanFields.github.io/.2024-06-05T17:07:55ZICCV2025 AcceptedMiaowei WangChangjian LiAmir Vaxmanhttp://arxiv.org/abs/2506.21456v1Managing level of detail through head-tracked peripheral degradation: a model and resulting design principles2025-06-26T16:35:38ZPrevious work has demonstrated the utility of reductions in the level of detail (LOD) in the periphery of head-tracked, large field of view displays. This paper provides a psychophysically based model, centered around an eye/head movement tradeoff, that explains the effectiveness of peripheral degradation and suggests how peripherally degraded displays should be designed. An experiment evaluating the effect on search performance of the shape and area of the high detail central area (inset) in peripherally degraded displays was performed, results indicated that inset shape is not a significant factor in performance. Inset area, however, was significant: performance with displays subtending at least 30 degrees of horizontal and vertical angle was not significantly different from performance with an undegraded display. These results agreed with the proposed model.2025-06-26T16:35:38ZProceedings of the ACM symposium on Virtual reality software and technology (1997). Pages 59-63. ACMBenjamin WatsonNeff WalkerLarry F Hodges10.1145/261135.261148