https://arxiv.org/api/pIfPWnVK7krWpEjp20Zp/dVTurc 2026-06-28T22:12:42Z 9390 1995 15 http://arxiv.org/abs/2506.21441v1 An evaluation of level of detail degradation in head-mounted display peripheries 2025-06-26T16:26:36Z

A paradigm for the design of systems that manage level of detail in virtual environments is proposed. As an example of the prototyping step in this paradigm, a user study was performed to evaluate the effectiveness of high detail insets used with head-mounted displays. Ten subjects were given a simple search task that required the location and identification of a single target object. All subjects used seven different displays (the independent variable), varying in inset size and peripheral detail, to perform this task. Frame rate, target location, subject input method, and order of display use were all controlled. Primary dependent measures were search time on trials with correct identification, and the percentage of all trials correctly identified. ANOVAs of the results showed that insetless, high detail displays did not lead to significantly different search times or accuracies than displays with insets. In fact, only the insetless, low detail display returned significantly different results. Further research is being performed to examine the effect of varying task complexity, inset size, and level of detail.

2025-06-26T16:26:36Z Presence: Teleoperators & Virtual Environments (1997). Volume 6, Issue 6, Pages 630-637. MIT Press Benjamin Watson Neff Walker Larry F Hodges Martin Reddy 10.1162/pres.1997.6.6.630 http://arxiv.org/abs/2506.21425v1 IDGraphs: Intrusion Detection and Analysis Using Stream Compositing 2025-06-26T16:08:20Z

Traffic anomalies and attacks are commonplace in today's networks and identifying them rapidly and accurately is critical for large network operators. For a statistical intrusion detection system (IDS), it is crucial to detect at the flow-level for accurate detection and mitigation. However, existing IDS systems offer only limited support for 1) interactively examining detected intrusions and anomalies, 2) analyzing worm propagation patterns, 3) and discovering correlated attacks. These problems are becoming even more acute as the traffic on today's high-speed routers continues to grow. IDGraphs is an interactive visualization system for intrusion detection that addresses these challenges. The central visualization in the system is a flow-level trace plotted with time on the horizontal axis and aggregated number of unsuccessful connections on the vertical axis. We then summarize a stack of tens or hundreds of thousands of these traces using the Histographs [RW05] technique, which maps data frequency at each pixel to brightness. Users may then interactively query the summary view, performing analysis by highlighting subsets of the traces. For example, brushing a linked correlation matrix view highlights traces with similar patterns, revealing distributed attacks that are difficult to detect using standard statistical analysis. We apply IDGraphs system to a real network router data-set with 179M flow-level records representing a total traffic of 1.16TB. The system successfully detects and analyzes a variety of attacks and anomalies, including port scanning, worm outbreaks, stealthy TCP SYN floodings, and some distributed attacks.

2025-06-26T16:08:20Z IEEE Computer Graphics and Applications (2006). Volume 26, Issue 2, March-April, Pages 28 - 39 Pin Ren Yan Gao Zhichun Li Yan Chen Benjamin Watson 10.1109/MCG.2006.36 http://arxiv.org/abs/2504.16606v2 HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction for Large-Scale Aerial Scenes 2025-06-26T06:12:14Z

3DGS is an emerging and increasingly popular technology in the field of novel view synthesis. Its highly realistic rendering quality and real-time rendering capabilities make it promising for various applications. However, when applied to large-scale aerial urban scenes, 3DGS methods suffer from issues such as excessive memory consumption, slow training times, prolonged partitioning processes, and significant degradation in rendering quality due to the increased data volume. To tackle these challenges, we introduce \textbf{HUG}, a novel approach that enhances data partitioning and reconstruction quality by leveraging a hierarchical neural Gaussian representation. We first propose a visibility-based data partitioning method that is simple yet highly efficient, significantly outperforming existing methods in speed. Then, we introduce a novel hierarchical weighted training approach, combined with other optimization strategies, to substantially improve reconstruction quality. Our method achieves state-of-the-art results on one synthetic dataset and four real-world datasets.

2025-04-23T10:40:40Z An improved version has recently been accepted to ICCV, manuscript, not camera-ready Mai Su Zhongtao Wang Huishan Au Yilong Li Xizhe Cao Chengwei Pan Yisong Chen Guoping Wang http://arxiv.org/abs/2412.03934v2 InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models 2025-06-26T03:10:09Z

We present InfiniCube, a scalable method for generating unbounded dynamic 3D driving scenes with high fidelity and controllability. Previous methods for scene generation either suffer from limited scales or lack geometric and appearance consistency along generated sequences. In contrast, we leverage the recent advancements in scalable 3D representation and video models to achieve large dynamic scene generation that allows flexible controls through HD maps, vehicle bounding boxes, and text descriptions. First, we construct a map-conditioned sparse-voxel-based 3D generative model to unleash its power for unbounded voxel world generation. Then, we re-purpose a video model and ground it on the voxel world through a set of carefully designed pixel-aligned guidance buffers, synthesizing a consistent appearance. Finally, we propose a fast feed-forward approach that employs both voxel and pixel branches to lift the dynamic videos to dynamic 3D Gaussians with controllable objects. Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness and superiority of our model.

2024-12-05T07:32:20Z ICCV 2025. Project Page: https://research.nvidia.com/labs/toronto-ai/infinicube/ Yifan Lu Xuanchi Ren Jiawei Yang Tianchang Shen Zhangjie Wu Jun Gao Yue Wang Siheng Chen Mike Chen Sanja Fidler Jiahui Huang http://arxiv.org/abs/2506.17450v2 BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing 2025-06-26T02:46:40Z

We present BlenderFusion, a generative visual compositing framework that synthesizes new scenes by recomposing objects, camera, and background. It follows a layering-editing-compositing pipeline: (i) segmenting and converting visual inputs into editable 3D entities (layering), (ii) editing them in Blender with 3D-grounded control (editing), and (iii) fusing them into a coherent scene using a generative compositor (compositing). Our generative compositor extends a pre-trained diffusion model to process both the original (source) and edited (target) scenes in parallel. It is fine-tuned on video frames with two key training strategies: (i) source masking, enabling flexible modifications like background replacement; (ii) simulated object jittering, facilitating disentangled control over objects and camera. BlenderFusion significantly outperforms prior methods in complex compositional scene editing tasks.

2025-06-20T19:38:34Z Project page: https://blenderfusion.github.io Jiacheng Chen Ramin Mehran Xuhui Jia Saining Xie Sanghyun Woo http://arxiv.org/abs/2506.20946v1 Consistent Zero-shot 3D Texture Synthesis Using Geometry-aware Diffusion and Temporal Video Models 2025-06-26T02:25:16Z

Current texture synthesis methods, which generate textures from fixed viewpoints, suffer from inconsistencies due to the lack of global context and geometric understanding. Meanwhile, recent advancements in video generation models have demonstrated remarkable success in achieving temporally consistent videos. In this paper, we introduce VideoTex, a novel framework for seamless texture synthesis that leverages video generation models to address both spatial and temporal inconsistencies in 3D textures. Our approach incorporates geometry-aware conditions, enabling precise utilization of 3D mesh structures. Additionally, we propose a structure-wise UV diffusion strategy, which enhances the generation of occluded areas by preserving semantic information, resulting in smoother and more coherent textures. VideoTex not only achieves smoother transitions across UV boundaries but also ensures high-quality, temporally stable textures across video frames. Extensive experiments demonstrate that VideoTex outperforms existing methods in texture fidelity, seam blending, and stability, paving the way for dynamic real-time applications that demand both visual quality and temporal coherence.

2025-06-26T02:25:16Z Donggoo Kang Jangyeong Kim Dasol Jeong Junyoung Choi Jeonga Wi Hyunmin Lee Joonho Gwon Joonki Paik http://arxiv.org/abs/2506.20901v1 Data Visualization for Improving Financial Literacy: A Systematic Review 2025-06-26T00:13:52Z

Financial literacy empowers individuals to make informed and effective financial decisions, improving their overall financial well-being and security. However, for many people understanding financial concepts can be daunting and only half of US adults are considered financially literate. Data visualization simplifies these concepts, making them accessible and engaging for learners of all ages. This systematic review analyzes 37 research papers exploring the use of data visualization and visual analytics in financial education and literacy enhancement. We classify these studies into five key areas: (1) the evolution of visualization use across time and space, (2) motivations for using visualization tools, (3) the financial topics addressed and instructional approaches used, (4) the types of tools and technologies applied, and (5) how the effectiveness of teaching interventions was evaluated. Furthermore, we identify research gaps and highlight opportunities for advancing financial literacy. Our findings offer practical insights for educators and professionals to effectively utilize or design visual tools for financial literacy.

2025-06-26T00:13:52Z Meng Du Robert Amor Kwan-Liu Ma Burkhard C. Wünsche http://arxiv.org/abs/2506.20875v1 3DGH: 3D Head Generation with Composable Hair and Face 2025-06-25T22:53:52Z

We present 3DGH, an unconditional generative model for 3D human heads with composable hair and face components. Unlike previous work that entangles the modeling of hair and face, we propose to separate them using a novel data representation with template-based 3D Gaussian Splatting, in which deformable hair geometry is introduced to capture the geometric variations across different hairstyles. Based on this data representation, we design a 3D GAN-based architecture with dual generators and employ a cross-attention mechanism to model the inherent correlation between hair and face. The model is trained on synthetic renderings using carefully designed objectives to stabilize training and facilitate hair-face separation. We conduct extensive experiments to validate the design choice of 3DGH, and evaluate it both qualitatively and quantitatively by comparing with several state-of-the-art 3D GAN methods, demonstrating its effectiveness in unconditional full-head image synthesis and composable 3D hairstyle editing. More details will be available on our project page: https://c-he.github.io/projects/3dgh/.

2025-06-25T22:53:52Z Accepted to SIGGRAPH 2025. Project page: https://c-he.github.io/projects/3dgh/ Chengan He Junxuan Li Tobias Kirschstein Artem Sevastopolsky Shunsuke Saito Qingyang Tan Javier Romero Chen Cao Holly Rushmeier Giljoo Nam http://arxiv.org/abs/2506.20652v1 EditP23: 3D Editing via Propagation of Image Prompts to Multi-View 2025-06-25T17:50:20Z

We present EditP23, a method for mask-free 3D editing that propagates 2D image edits to multi-view representations in a 3D-consistent manner. In contrast to traditional approaches that rely on text-based prompting or explicit spatial masks, EditP23 enables intuitive edits by conditioning on a pair of images: an original view and its user-edited counterpart. These image prompts are used to guide an edit-aware flow in the latent space of a pre-trained multi-view diffusion model, allowing the edit to be coherently propagated across views. Our method operates in a feed-forward manner, without optimization, and preserves the identity of the original object, in both structure and appearance. We demonstrate its effectiveness across a range of object categories and editing scenarios, achieving high fidelity to the source while requiring no manual masks.

2025-06-25T17:50:20Z Code, supplementary videos, interactive 3D visualizations, and additional results are available at https://editp23.github.io/ Roi Bar-On Dana Cohen-Bar Daniel Cohen-Or http://arxiv.org/abs/2502.07784v2 MatSwap: Light-aware material transfers in images 2025-06-25T14:52:25Z

We present MatSwap, a method to transfer materials to designated surfaces in an image photorealistically. Such a task is non-trivial due to the large entanglement of material appearance, geometry, and lighting in a photograph. In the literature, material editing methods typically rely on either cumbersome text engineering or extensive manual annotations requiring artist knowledge and 3D scene properties that are impractical to obtain. In contrast, we propose to directly learn the relationship between the input material -- as observed on a flat surface -- and its appearance within the scene, without the need for explicit UV mapping. To achieve this, we rely on a custom light- and geometry-aware diffusion model. We fine-tune a large-scale pre-trained text-to-image model for material transfer using our synthetic dataset, preserving its strong priors to ensure effective generalization to real images. As a result, our method seamlessly integrates a desired material into the target location in the photograph while retaining the identity of the scene. We evaluate our method on synthetic and real images and show that it compares favorably to recent work both qualitatively and quantitatively. We release our code and data on https://github.com/astra-vision/MatSwap

2025-02-11T18:59:59Z Accepted to EGSR, journal track to appear in Computer Graphics Forum Ivan Lopes Valentin Deschaintre Yannick Hold-Geoffroy Raoul de Charette http://arxiv.org/abs/2506.20267v1 X-SiT: Inherently Interpretable Surface Vision Transformers for Dementia Diagnosis 2025-06-25T09:24:07Z

Interpretable models are crucial for supporting clinical decision-making, driving advances in their development and application for medical images. However, the nature of 3D volumetric data makes it inherently challenging to visualize and interpret intricate and complex structures like the cerebral cortex. Cortical surface renderings, on the other hand, provide a more accessible and understandable 3D representation of brain anatomy, facilitating visualization and interactive exploration. Motivated by this advantage and the widespread use of surface data for studying neurological disorders, we present the eXplainable Surface Vision Transformer (X-SiT). This is the first inherently interpretable neural network that offers human-understandable predictions based on interpretable cortical features. As part of X-SiT, we introduce a prototypical surface patch decoder for classifying surface patch embeddings, incorporating case-based reasoning with spatially corresponding cortical prototypes. The results demonstrate state-of-the-art performance in detecting Alzheimer's disease and frontotemporal dementia while additionally providing informative prototypes that align with known disease patterns and reveal classification errors.

2025-06-25T09:24:07Z MICCAI 2025 Fabian Bongratz Tom Nuno Wolf Jaume Gual Ramon Christian Wachinger http://arxiv.org/abs/2506.20202v1 RaRa Clipper: A Clipper for Gaussian Splatting Based on Ray Tracer and Rasterizer 2025-06-25T07:45:56Z

With the advancement of Gaussian Splatting techniques, a growing number of datasets based on this representation have been developed. However, performing accurate and efficient clipping for Gaussian Splatting remains a challenging and unresolved problem, primarily due to the volumetric nature of Gaussian primitives, which makes hard clipping incapable of precisely localizing their pixel-level contributions. In this paper, we propose a hybrid rendering framework that combines rasterization and ray tracing to achieve efficient and high-fidelity clipping of Gaussian Splatting data. At the core of our method is the RaRa strategy, which first leverages rasterization to quickly identify Gaussians intersected by the clipping plane, followed by ray tracing to compute attenuation weights based on their partial occlusion. These weights are then used to accurately estimate each Gaussian's contribution to the final image, enabling smooth and continuous clipping effects. We validate our approach on diverse datasets, including general Gaussians, hair strand Gaussians, and multi-layer Gaussians, and conduct user studies to evaluate both perceptual quality and quantitative performance. Experimental results demonstrate that our method delivers visually superior results while maintaining real-time rendering performance and preserving high fidelity in the unclipped regions.

2025-06-25T07:45:56Z Da Li Donggang Jia Yousef Rajeh Dominik Engel Ivan Viola http://arxiv.org/abs/2408.16767v4 ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model 2025-06-25T07:19:44Z

Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a detailed scene from insufficient captured views is still an ill-posed optimization problem, often resulting in artifacts and distortions in unseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task. The key insight is to unleash the strong generative prior of large pre-trained video diffusion models for sparse-view reconstruction. However, 3D view consistency struggles to be accurately preserved in directly generated video frames from pre-trained models. To address this, given limited input views, the proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition. Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency, ensuring the coherence of the scene from various perspectives. Finally, we recover the 3D scene from the generated video through a confidence-aware 3D Gaussian Splatting optimization scheme. Extensive experiments on various real-world datasets show the superiority of our ReconX over state-of-the-art methods in terms of quality and generalizability.

2024-08-29T17:59:40Z Project page: https://liuff19.github.io/ReconX Fangfu Liu Wenqiang Sun Hanyang Wang Yikai Wang Haowen Sun Junliang Ye Jun Zhang Yueqi Duan http://arxiv.org/abs/2506.21632v1 SkinningGS: Editable Dynamic Human Scene Reconstruction Using Gaussian Splatting Based on a Skinning Model 2025-06-25T05:06:44Z

Reconstructing an interactive human avatar and the background from a monocular video of a dynamic human scene is highly challenging. In this work we adopt a strategy of point cloud decoupling and joint optimization to achieve the decoupled reconstruction of backgrounds and human bodies while preserving the interactivity of human motion. We introduce a position texture to subdivide the Skinned Multi-Person Linear (SMPL) body model's surface and grow the human point cloud. To capture fine details of human dynamics and deformations, we incorporate a convolutional neural network structure to predict human body point cloud features based on texture. This strategy makes our approach free of hyperparameter tuning for densification and efficiently represents human points with half the point cloud of HUGS. This approach ensures high-quality human reconstruction and reduces GPU resource consumption during training. As a result, our method surpasses the previous state-of-the-art HUGS in reconstruction metrics while maintaining the ability to generalize to novel poses and views. Furthermore, our technique achieves real-time rendering at over 100 FPS, $\sim$6$\times$ the HUGS speed using only Linear Blend Skinning (LBS) weights for human transformation. Additionally, this work demonstrates that this framework can be extended to animal scene reconstruction when an accurately-posed model of an animal is available.

2025-06-25T05:06:44Z Da Li Donggang Jia Markus Hadwiger Ivan Viola http://arxiv.org/abs/2506.18251v2 Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models 2025-06-25T03:25:37Z

In this paper, we present Morse, a simple dual-sampling framework for accelerating diffusion models losslessly. The key insight of Morse is to reformulate the iterative generation (from noise to data) process via taking advantage of fast jump sampling and adaptive residual feedback strategies. Specifically, Morse involves two models called Dash and Dot that interact with each other. The Dash model is just the pre-trained diffusion model of any type, but operates in a jump sampling regime, creating sufficient space for sampling efficiency improvement. The Dot model is significantly faster than the Dash model, which is learnt to generate residual feedback conditioned on the observations at the current jump sampling point on the trajectory of the Dash model, lifting the noise estimate to easily match the next-step estimate of the Dash model without jump sampling. By chaining the outputs of the Dash and Dot models run in a time-interleaved fashion, Morse exhibits the merit of flexibly attaining desired image generation performance while improving overall runtime efficiency. With our proposed weight sharing strategy between the Dash and Dot models, Morse is efficient for training and inference. Our method shows a lossless speedup of 1.78X to 3.31X on average over a wide range of sampling step budgets relative to 9 baseline diffusion models on 6 image generation tasks. Furthermore, we show that our method can be also generalized to improve the Latent Consistency Model (LCM-SDXL, which is already accelerated with consistency distillation technique) tailored for few-step text-to-image synthesis. The code and models are available at https://github.com/deep-optimization/Morse.

2025-06-23T02:43:21Z Fixed a prompt typo in Figure 18 of the Appendix. This work is accepted to ICML 2025. The project page: https://github.com/deep-optimization/Morse Chao Li Jiawei Fan Anbang Yao