https://arxiv.org/api/mK0kSLaBOIyu0EhUbicr7AiI5fI 2026-06-14T20:17:58Z 9323 540 15 http://arxiv.org/abs/2512.07988v3 HOLE: Homological Observation of Latent Embeddings for Neural Network Interpretability 2026-04-06T19:32:45Z

Deep learning models have achieved remarkable success across various domains, yet their learned representations and decision-making processes remain largely opaque and hard to interpret. This work introduces HOLE (Homological Observation of Latent Embeddings), a method for analyzing and interpreting discriminative neural networks through persistent homology. HOLE extracts topological features from intermediate activations and presents them using a suite of visualization techniques, including cluster flow diagrams, blob graphs, and heatmap dendrograms. These tools facilitate the examination of representation structure and quality across layers. We evaluate HOLE using a range of discriminative models, focusing on representation quality, interpretability across layers, and robustness to input perturbations and model compression. The results indicate that topological analysis reveals patterns associated with class separation, feature disentanglement, and model robustness, providing a complementary perspective for understanding and improving deep learning systems.

2025-12-08T19:20:05Z Sudhanva Manjunath Athreya Paul Rosen http://arxiv.org/abs/2512.04832v2 Tokenizing Buildings: A Transformer for Layout Synthesis 2026-04-06T18:33:53Z

We introduce Small Building Model (SBM), a Transformer-based architecture for layout synthesis in Building Information Modeling (BIM) scenes. We address the question of how to tokenize buildings by unifying heterogeneous feature sets of architectural elements into sequences while preserving compositional structure. Such feature sets are represented as a sparse attribute-feature matrix that captures room properties. We then design a unified embedding module that learns joint representations of categorical and possibly correlated continuous feature groups. Lastly, we train a single Transformer backbone in two modes: an encoder-only pathway that yields high-fidelity room embeddings, and an encoder-decoder pipeline for autoregressive prediction of residential room entities, referred to as Data-Driven Entity Prediction (DDEP). Experiments across retrieval and generative layout synthesis show that SBM learns compact room embeddings that reliably cluster by type and topology, enabling strong semantic retrieval. In DDEP mode, SBM produces functionally sound layouts with fewer collisions and boundary violations, and improved navigability, outperforming general-purpose LLM/VLM baselines and recent domain-specific methods.

2025-12-04T14:16:09Z 14 pages, 3 page References, 4 figures Manuel Ladron de Guevara Jinmo Rhee Ardavan Bidgoli Vaidas Razgaitis Michael Bergin http://arxiv.org/abs/2604.04905v1 ClickAIXR: On-Device Multimodal Vision-Language Interaction with Real-World Objects in Extended Reality 2026-04-06T17:50:47Z

We present ClickAIXR, a novel on-device framework for multimodal vision-language interaction with objects in extended reality (XR). Unlike prior systems that rely on cloud-based AI (e.g., ChatGPT) or gaze-based selection (e.g., GazePointAR), ClickAIXR integrates an on-device vision-language model (VLM) with a controller-based object selection paradigm, enabling users to precisely click on real-world objects in XR. Once selected, the object image is processed locally by the VLM to answer natural language questions through both text and speech. This object-centered interaction reduces ambiguity inherent in gaze- or voice-only interfaces and improves transparency by performing all inference on-device, addressing concerns around privacy and latency. We implemented ClickAIXR in the Magic Leap SDK (C API) with ONNX-based local VLM inference. We conducted a user study comparing ClickAIXR with Gemini 2.5 Flash and ChatGPT 5, evaluating usability, trust, and user satisfaction. Results show that latency is moderate and user experience is acceptable. Our findings demonstrate the potential of click-based object selection combined with on-device AI to advance trustworthy, privacy-preserving XR interactions. The source code and supplementary materials are available at: nanovis.org/ClickAIXR.html

2026-04-06T17:50:47Z Dawar Khan Alexandre Kouyoumdjian Xinyu Liu Omar Mena Dominik Engel Ivan Viola http://arxiv.org/abs/2601.11109v3 Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning 2026-04-06T13:27:26Z

Vision-as-inverse-graphics, the concept of reconstructing images into editable programs, remains challenging for Vision-Language Models (VLMs), which inherently lack fine-grained spatial grounding in one-shot settings. To address this, we introduce VIGA (Vision-as-Inverse-Graphics Agent), an interleaved multimodal reasoning framework where symbolic logic and visual perception actively cross-verify each other. VIGA operates through a tightly coupled code-render-inspect loop: synthesizing symbolic programs, projecting them into visual states, and inspecting discrepancies to guide iterative edits. Equipped with high-level semantic skills and an evolving multimodal memory, VIGA sustains evidence-based modifications over long horizons. This training-free, task-agnostic framework seamlessly supports 2D document generation, 3D reconstruction, multi-step 3D editing, and 4D physical interaction. Finally, we introduce BlenderBench, a challenging visual-to-code benchmark. Empirically, VIGA substantially improves accuracy compared with one-shot baselines in BlenderGym (35.32%), SlideBench (117.17%) and our proposed BlenderBench (124.70%).

2026-01-16T09:11:55Z Project page: https://fugtemypt123.github.io/VIGA-website/ Shaofeng Yin Jiaxin Ge Zora Zhiruo Wang Chenyang Wang Xiuyu Li Michael J. Black Trevor Darrell Angjoo Kanazawa Haiwen Feng http://arxiv.org/abs/2604.02120v2 GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blending 2026-04-06T06:53:41Z

Neural Radiance Fields (NeRF) enables 3D scene reconstruction from several 2D images but incurs high rendering latency via its point-sampling design. 3D Gaussian Splatting (3DGS) improves on NeRF with explicit scene representation and an optimized pipeline yet still fails to meet practical real-time demands. Existing acceleration works overlook the evolving Tensor Cores of modern GPUs because 3DGS pipeline lacks General Matrix Multiplication (GEMM) operations. This paper proposes GEMM-GS, an acceleration approach utilizing tensor cores on GPUs via GEMM-friendly blending transformation. It equivalently reformulates the 3DGS blending process into a GEMM-compatible form to utilize Tensor Cores. A high-performance CUDA kernel is designed, integrating a three-stage double-buffered pipeline that overlaps computation and memory access. Extensive experiments show that GEMM-GS achieves $1.42\times$ speedup over vanilla 3DGS and provides an additional $1.47\times$ speedup on average when combining with existing acceleration approaches. Code is released at https://github.com/shieldforever/GEMM-GS.

2026-04-02T14:56:06Z Accepted by the 63rd Design Automation Conference (DAC 2026) Haomin Li Bowen Zhu Fangxin Liu Zongwu Wang Xinran Liang Li Jiang Haibing Guan http://arxiv.org/abs/2506.02794v3 PhysGaia: A Physics-Aware Benchmark with Multi-Body Interactions for Dynamic Novel View Synthesis 2026-04-06T05:39:25Z

We introduce PhysGaia, a novel physics-aware benchmark for Dynamic Novel View Synthesis (DyNVS) that encompasses both structured objects and unstructured physical phenomena. While existing datasets primarily focus on photorealistic appearance, PhysGaia is specifically designed to support physics-consistent dynamic reconstruction. Our benchmark features complex scenarios with rich multi-body interactions, where objects realistically collide and exchange forces. Furthermore, it incorporates a diverse range of materials, including liquid, gas, textile, and rheological substance, moving beyond the rigid-body assumptions prevalent in prior work. To ensure physical fidelity, all scenes in PhysGaia are generated using material-specific physics solvers that strictly adhere to fundamental physical laws. We provide comprehensive ground-truth information, including 3D particle trajectories and physical parameters (e.g., viscosity), enabling the quantitative evaluation of physical modeling. To facilitate research adoption, we also provide integration pipelines for recent 4D Gaussian Splatting models along with our dataset and their results. By addressing the critical shortage of physics-aware benchmarks, PhysGaia can significantly advance research in dynamic view synthesis, physics-based scene understanding, and the integration of deep learning with physical simulation, ultimately enabling more faithful reconstruction and interpretation of complex dynamic scenes.

2025-06-03T12:19:18Z Accepted at CVPR 2026 Project page: http://cvlab.snu.ac.kr/research/PhysGaia Dataset: https://huggingface.co/datasets/mijeongkim/PhysGaia/tree/main Mijeong Kim Gunhee Kim Jungyoon Choi Wonjae Roh Bohyung Han http://arxiv.org/abs/2604.04244v1 VisACD: Visibility-Based GPU-Accelerated Approximate Convex Decomposition 2026-04-05T20:03:23Z

Physics-based simulation involves trade-offs between performance and accuracy. In collision detection, one trade-off is the granularity of collider geometry. Primitive-based colliders such as bounding boxes are efficient, while using the original mesh is more accurate but often computationally expensive. Approximate Convex Decomposition (ACD) methods strive for a balance of efficiency and accuracy. Prior works can produce high-quality decompositions but require large numbers of convex parts and are sensitive to the orientation of the input mesh. We address these weaknesses with VisACD, a visibility-based, rotation-equivariant, and intersection-free ACD algorithm with GPU acceleration. Our approach produces high-quality decompositions with fewer convex parts, is not sensitive to shape orientation, and is more efficient than prior work.

2026-04-05T20:03:23Z Egor Fokin Manolis Savva http://arxiv.org/abs/2604.03748v1 Real-time Neural Six-way Lightmaps 2026-04-04T14:44:08Z

Participating media are a pervasive and intriguing visual effect in virtual environments. Unfortunately, rendering such phenomena in real-time is notoriously difficult due to the computational expense of estimating the volume rendering equation. While the six-way lightmaps technique has been widely used in video games to render smoke with a camera-oriented billboard and approximate lighting effects using six precomputed lightmaps, achieving a balance between realism and efficiency, it is limited to pre-simulated animation sequences and is ignorant of camera movement. In this work, we propose a neural six-way lightmaps method to strike a long-sought balance between dynamics and visual realism. Our approach first generates a guiding map from the camera view using ray marching with a large sampling distance to approximate smoke scattering and silhouette. Then, given a guiding map, we train a neural network to predict the corresponding six-way lightmaps. The resulting lightmaps can be seamlessly used in existing game engine pipelines. This approach supports visually appealing rendering effects while enabling real-time user interactivity, including smoke-obstacle interaction, camera movement, and light change. By conducting a series of comprehensive benchmarks, we demonstrate that our method is well-suited for real-time applications, such as games and VR/AR.

2026-04-04T14:44:08Z 11 Pages, 16 Figures Wei Li Hanxiao Sun Tao Huang Haoxiang Wang Tongtong Wang Zherong Pan Kui Wu http://arxiv.org/abs/2512.01501v4 A Unified Architecture for N-Dimensional Visualization and Simulation: 4D Implementation and Evaluation including Boolean Operations 2026-04-04T12:55:58Z

This paper proposes a unified software architecture for visualization and simulation based on a design targeting an N-dimensional space. The contributions of this study are twofold. First, it presents an architectural configuration that integrates multiple processes into a single software architecture: Quickhull-based convex hull mesh generation, Boolean operations, coordinate transformations for high-dimensional exploration (pose transformation and view transformation), and hyperplane slicing for visualization. Second, it defines "Plex" (.plex) as a file format intended for the exchange of N-dimensional mesh data. The proposed approach adopts an approximate implementation that tolerates numerical errors and prioritizes implementation transparency over guarantees of numerical rigor. The experimental results and evaluations presented in this paper are limited to a 4D implementation; no evaluation is conducted for N > 4, and the discussion is restricted to stating that the architecture itself has a dimension-independent structure. This paper also proposes an interaction design for high-dimensional exploration based on FPS navigation. As an input example involving shape changes over time, a non-rigid body simulation based on XPBD (Extended Position Based Dynamics) is integrated into the 4D implementation. Experimental results confirm that the 4D implementation runs on a single PC.

2025-12-01T10:26:23Z 18 pages, 9 figures, 5 tables v4: Clarified scope (4D evaluation) and fixed some expression. Under review at IEEE Access Hirohito Arai http://arxiv.org/abs/2604.03716v1 CGHair: Compact Gaussian Hair Reconstruction with Card Clustering 2026-04-04T12:44:08Z

We present a compact pipeline for high-fidelity hair reconstruction from multi-view images. While recent 3D Gaussian Splatting (3DGS) methods achieve realistic results, they often require millions of primitives, leading to high storage and rendering costs. Observing that hair exhibits structural and visual similarities across a hairstyle, we cluster strands into representative hair cards and group these into shared texture codebooks. Our approach integrates this structure with 3DGS rendering, significantly reducing reconstruction time and storage while maintaining comparable visual quality. In addition, we propose a generative prior accelerated method to reconstruct the initial strand geometry from a set of images. Our experiments demonstrate a 4-fold reduction in strand reconstruction time and achieve comparable rendering performance with over 200x lower memory footprint.

2026-04-04T12:44:08Z Accepted to CVPR 2026. This arXiv version is not the final published version Haimin Luo Srinjay Sarkar Albert Mosella-Montoro Francisco Vicente Carrasco Fernando De la Torre http://arxiv.org/abs/2007.00308v4 Polar Stroking: New Theory and Methods for Stroking Paths 2026-04-04T03:23:24Z

Stroking and filling are the two basic rendering operations on paths in vector graphics. The theory of filling a path is well-understood in terms of contour integrals and winding numbers, but when path rendering standards specify stroking, they resort to the analogy of painting pixels with a brush that traces the outline of the path. This means important standards such as PDF, SVG, and PostScript lack a rigorous way to say what samples are inside or outside a stroked path. Our work fills this gap with a principled theory of stroking. Guided by our theory, we develop a novel polar stroking method to render stroked paths robustly with an intuitive way to bound the tessellation error without needing recursion. Because polar stroking guarantees small uniform steps in tangent angle, it provides an efficient way to accumulate arc length along a path for texturing or dashing. While this paper focuses on developing the theory of our polar stroking method, we have successfully implemented our methods on modern programmable GPUs.

2020-07-01T08:03:09Z 15 pages, 19 figures, ACM Trans. on Graphics (Proceedings of SIGGRAPH 2020); corrected Fig. 8, Eq. 6, and Eq. 12; grammar/typo fixes; uses updated acmart template ACM Transactions on Graphics, Vol. 39, No. 4 (2020) 145:1-15 Mark J. Kilgard 10.1145/3386569.3392458 http://arxiv.org/abs/2604.03462v1 SpectralSplat: Appearance-Disentangled Feed-Forward Gaussian Splatting for Driving Scenes 2026-04-03T21:12:25Z

Feed-forward 3D Gaussian Splatting methods have achieved impressive reconstruction quality for autonomous driving scenes, yet they entangle scene geometry with transient appearance properties such as lighting, weather, and time of day. This coupling prevents relighting, appearance transfer, and consistent rendering across multi-traversal data captured under varying environmental conditions. We present SpectralSplat, a method that disentangles appearance from geometry within a feed-forward Gaussian Splatting framework. Our key insight is to factor color prediction into an appearance-agnostic base stream and and appearance-conditioned adapted stream, both produced by a shared MLP conditioned on a global appearance embedding derived from DINOv2 features. To enforce disentanglement, we train with paired observations generated by a hybrid relighting pipeline that combines physics-based intrinsic decomposition with diffusion based generative refinement, and supervise with complementary consistency, reconstruction, cross-appearance, and base color losses. We further introduce an appearance-adaptable temporal history that stores appearance-agnostic features, enabling accumulated Gaussians to be re-rendered under arbitrary target appearances. Experiments demonstrate that SpectralSplat preserves the reconstruction quality of the underlying backbone while enabling controllable appearance transfer and temporally consistent relighting across driving sequences.

2026-04-03T21:12:25Z Under review Quentin Herau Tianshuo Xu Depu Meng Jiezhi Yang Chensheng Peng Spencer Sherk Yihan Hu Wei Zhan http://arxiv.org/abs/2604.03406v1 SASAV: Self-Directed Agent for Scientific Analysis and Visualization 2026-04-03T19:09:38Z

With recent advances in frontier multimodal large language models (MLLMs) for data understanding and visual reasoning, the role of LLMs has evolved from passive LLM-as-an-interface to proactive LLM-as-a-judge, enabling deeper integration into the scientific data analysis and visualization pipelines. However, existing scientific visualization agents still rely on domain experts to provide prior knowledge for specific datasets or visualization-oriented objective functions to guide the workflow through iterative feedback. This reactive, data-dependent, human-in-the-loop (HITL) paradigm is time-consuming and does not scale effectively to large-scale scientific data. In this work, we propose a Self-Directed Agent for Scientific Analysis and Visualization (SASAV), the first fully autonomous AI agent to perform scientific data analysis and generate insightful visualizations without any external prompting or HITL feedback. SASAV is a multi-agent system that automatically orchestrates data exploration workflows through our proposed components, including automated data profiling, context-aware knowledge retrieval, and reasoning-driven visualization parameter exploration, while supporting downstream interactive visualization tasks. This work establishes a foundational building block for the future AI for Science to accelerate scientific discovery and innovation at scale.

2026-04-03T19:09:38Z Jianxin Sun David Lenz Tom Peterka Hongfeng Yu http://arxiv.org/abs/2604.02851v1 Streaming Real-Time Rendered Scenes as 3D Gaussians 2026-04-03T08:11:27Z

Cloud rendering is widely used in gaming and XR to overcome limited client-side GPU resources and to support heterogeneous devices. Existing systems typically deliver the rendered scene as a 2D video stream, which tightly couples the transmitted content to the server-rendered viewpoint and limits latency compensation to image-space reprojection or warping. In this paper, we investigate an alternative approach based on streaming a live 3D Gaussian Splatting (3DGS) scene representation instead of only rendered video. We present a Unity-based prototype in which a server constructs and continuously optimizes a 3DGS model from real-time rendered reference views, while streaming the evolving representation to remote clients using full model snapshots and incremental updates supporting relighting and rigid object dynamics. The clients reconstruct the streamed Gaussian model locally and render their current viewpoint from the received representation. This approach aims to improve viewpoint flexibility for latency compensation and to better amortize server-side scene modeling across multiple users than per-user rendering and video streaming. We describe the system design, evaluate it, and compare it with conventional image warping.

2026-04-03T08:11:27Z Matti Siekkinen Teemu Kämäräinen http://arxiv.org/abs/2604.02586v1 TrackerSplat: Exploiting Point Tracking for Fast and Robust Dynamic 3D Gaussians Reconstruction 2026-04-02T23:43:55Z

Recent advancements in 3D Gaussian Splatting (3DGS) have demonstrated its potential for efficient and photorealistic 3D reconstructions, which is crucial for diverse applications such as robotics and immersive media. However, current Gaussian-based methods for dynamic scene reconstruction struggle with large inter-frame displacements, leading to artifacts and temporal inconsistencies under fast object motions. To address this, we introduce \textit{TrackerSplat}, a novel method that integrates advanced point tracking methods to enhance the robustness and scalability of 3DGS for dynamic scene reconstruction. TrackerSplat utilizes off-the-shelf point tracking models to extract pixel trajectories and triangulate per-view pixel trajectories onto 3D Gaussians to guide the relocation, rotation, and scaling of Gaussians before training. This strategy effectively handles large displacements between frames, dramatically reducing the fading and recoloring artifacts prevalent in prior methods. By accurately positioning Gaussians prior to gradient-based optimization, TrackerSplat overcomes the quality degradation associated with large frame gaps when processing multiple adjacent frames in parallel across multiple devices, thereby boosting reconstruction throughput while preserving rendering quality. Experiments on real-world datasets confirm the robustness of TrackerSplat in challenging scenarios with significant displacements, achieving superior throughput under parallel settings and maintaining visual quality compared to baselines. The code is available at https://github.com/yindaheng98/TrackerSplat.

2026-04-02T23:43:55Z 11 pages, 6 figures SA Conference Papers '25: Proceedings of the SIGGRAPH Asia 2025 Conference Papers Article No.: 71, Pages 1 - 11 Daheng Yin Isaac Ding Yili Jin Jianxin Shi Jiangchuan Liu 10.1145/3757377.3763829