https://arxiv.org/api/e/KbUdFrNlQJbP+YMTePaHROiNY 2026-03-20T08:42:55Z 8832 0 15 http://arxiv.org/abs/2603.19234v1 Matryoshka Gaussian Splatting 2026-03-19T17:59:56Z

The ability to render scenes at adjustable fidelity from a single model, known as level of detail (LoD), is crucial for practical deployment of 3D Gaussian Splatting (3DGS). Existing discrete LoD methods expose only a limited set of operating points, while concurrent continuous LoD approaches enable smoother scaling but often suffer noticeable quality degradation at full capacity, making LoD a costly design decision. We introduce Matryoshka Gaussian Splatting (MGS), a training framework that enables continuous LoD for standard 3DGS pipelines without sacrificing full-capacity rendering quality. MGS learns a single ordered set of Gaussians such that rendering any prefix, the first k splats, produces a coherent reconstruction whose fidelity improves smoothly with increasing budget. Our key idea is stochastic budget training: each iteration samples a random splat budget and optimises both the corresponding prefix and the full set. This strategy requires only two forward passes and introduces no architectural modifications. Experiments across four benchmarks and six baselines show that MGS matches the full-capacity performance of its backbone while enabling a continuous speed-quality trade-off from a single model. Extensive ablations on ordering strategies, training objectives, and model capacity further validate the designs.

2026-03-19T17:59:56Z project page: https://zhilinguo.github.io/MGS Zhilin Guo Boqiao Zhang Hakan Aktas Kyle Fogarty Jeffrey Hu Nursena Koprucu Aslan Wenzhao Li Canberk Baykal Albert Miao Josef Bengtson Chenliang Zhou Weihao Xia Cristina Nader Vasconcelos. Cengiz Oztireli http://arxiv.org/abs/2603.19063v1 Fire as a Service: Augmenting Robot Simulators with Thermally and Visually Accurate Fire Dynamics 2026-03-19T15:55:02Z

Most existing robot simulators prioritize rigid-body dynamics and photorealistic rendering, but largely neglect the thermally and optically complex phenomena that characterize real-world fire environments. For robots envisioned as future firefighters, this limitation hinders both reliable capability evaluation and the generation of representative training data prior to deployment in hazardous scenarios. To address these challenges, we introduce Fire as a Service (FaaS), a novel, asynchronous co-simulation framework that augments existing robot simulators with high-fidelity and computationally efficient fire simulations. Our pipeline enables robots to experience accurate, multi-species thermodynamic heat transfer and visually consistent volumetric smoke without disrupting high-frequency rigid-body control loops. We demonstrate that our framework can be integrated with diverse robot simulators to generate physically accurate fire behavior, benchmark thermal hazards encountered by robotic platforms, and collect realistic multimodal perceptual data. Crucially, its real-time performance supports human-in-the-loop teleoperation, enabling the successful training of reactive, multimodal policies via Behavioral Cloning. By adding fire dynamics to robot simulations, FaaS provides a scalable pathway toward safer, more reliable deployment of robots in fire scenarios.

2026-03-19T15:55:02Z Anton R. Wagner Madhan Balaji Rao Helge Wrede Sören Pirk Xuesu Xiao http://arxiv.org/abs/2603.19053v1 SwiftTailor: Efficient 3D Garment Generation with Geometry Image Representation 2026-03-19T15:47:43Z

Realistic and efficient 3D garment generation remains a longstanding challenge in computer vision and digital fashion. Existing methods typically rely on large vision- language models to produce serialized representations of 2D sewing patterns, which are then transformed into simulation-ready 3D meshes using garment modeling framework such as GarmentCode. Although these approaches yield high-quality results, they often suffer from slow inference times, ranging from 30 seconds to a minute. In this work, we introduce SwiftTailor, a novel two-stage framework that unifies sewing-pattern reasoning and geometry-based mesh synthesis through a compact geometry image representation. SwiftTailor comprises two lightweight modules: PatternMaker, an efficient vision-language model that predicts sewing patterns from diverse input modalities, and GarmentSewer, an efficient dense prediction transformer that converts these patterns into a novel Garment Geometry Image, encoding the 3D surface of all garment panels in a unified UV space. The final 3D mesh is reconstructed through an efficient inverse mapping process that incorporates remeshing and dynamic stitching algorithms to directly assemble the garment, thereby amortizing the cost of physical simulation. Extensive experiments on the Multimodal GarmentCodeData demonstrate that SwiftTailor achieves state-of-the-art accuracy and visual fidelity while significantly reducing inference time. This work offers a scalable, interpretable, and high-performance solution for next-generation 3D garment generation.

2026-03-19T15:47:43Z CVPR 2026 Phuc Pham Uy Dieu Tran Binh-Son Hua Phong Nguyen http://arxiv.org/abs/2510.20558v2 From Far and Near: Perceptual Evaluation of Crowd Representations Across Levels of Detail 2026-03-19T15:04:19Z

In this paper, we investigate how users perceive the visual quality of crowd character representations at different levels of detail (LoD) and viewing distances. Each representation, including geometric meshes, image-based impostors, Neural Radiance Fields (NeRFs), and 3D Gaussians, exhibits distinct trade-offs between visual fidelity and computational performance. Our qualitative and quantitative results provide insights to guide the design of perceptually optimized LoD strategies for crowd rendering.

2025-10-23T13:39:18Z Xiaohan Sun Carol O'Sullivan http://arxiv.org/abs/2512.09162v3 GTAvatar: Bridging Gaussian Splatting and Texture Mapping for Relightable and Editable Gaussian Avatars 2026-03-19T12:23:13Z

Recent advancements in Gaussian Splatting have enabled increasingly accurate reconstruction of photorealistic head avatars, opening the door to numerous applications in visual effects, videoconferencing, and virtual reality. This, however, comes with the lack of intuitive editability offered by traditional triangle mesh-based methods. In contrast, we propose a method that combines the accuracy and fidelity of 2D Gaussian Splatting with the intuitiveness of UV texture mapping. By embedding each canonical Gaussian primitive's local frame into a patch in the UV space of a template mesh in a computationally efficient manner, we reconstruct continuous editable material head textures from a single monocular video on a conventional UV domain. Furthermore, we leverage an efficient physically based reflectance model to enable relighting and editing of these intrinsic material maps. Through extensive comparisons with state-of-the-art methods, we demonstrate the accuracy of our reconstructions, the quality of our relighting results, and the ability to provide intuitive controls for modifying an avatar's appearance and geometry via texture mapping without additional optimization.

2025-12-09T22:19:28Z Accepted to Eurographics 2026. Project page: https://kelianb.github.io/GTAvatar/ Kelian Baert Mae Younes Francois Bourel Marc Christie Adnane Boukhayma 10.1111/cgf.70351 http://arxiv.org/abs/2603.18707v1 From ex(p) to poly: Gaussian Splatting with Polynomial Kernels 2026-03-19T10:05:38Z

Recent advancements in Gaussian Splatting (3DGS) have introduced various modifications to the original kernel, resulting in significant performance improvements. However, many of these kernel changes are incompatible with existing datasets optimized for the original Gaussian kernel, presenting a challenge for widespread adoption. In this work, we address this challenge by proposing an alternative kernel that maintains compatibility with existing datasets while improving computational efficiency. Specifically, we replace the original exponential kernel with a polynomial approximation combined with a ReLU function. This modification allows for more aggressive culling of Gaussians, leading to enhanced performance across different 3DGS implementations. Our results show a notable performance improvement of 4 to 15% with negligible impact on image quality. We also provide a detailed mathematical analysis of the new kernel and discuss its potential benefits for 3DGS implementations on NPU hardware.

2026-03-19T10:05:38Z Joerg H. Mueller Martin Winter Markus Steinberger http://arxiv.org/abs/2510.02691v3 FSFSplatter: Build Surface and Novel Views with Sparse-Views within 2min 2026-03-19T08:27:28Z

Gaussian Splatting has become a leading reconstruction technique, known for its high-quality novel view synthesis and detailed reconstruction. However, most existing methods require dense, calibrated views. Reconstructing from free sparse images often leads to poor surface due to limited overlap and overfitting. We introduce FSFSplatter, a new approach for fast surface reconstruction from free sparse images. Our method integrates end-to-end dense Gaussian initialization, camera parameter estimation, and geometry-enhanced scene optimization. Specifically, FSFSplatter employs a large Transformer to encode multi-view images and generates a dense and geometrically consistent Gaussian scene initialization via a self-splitting Gaussian head. It eliminates local floaters through contribution-based pruning and mitigates overfitting to limited views by leveraging depth and multi-view feature supervision with differentiable camera parameters during rapid optimization. FSFSplatter outperforms current state-of-the-art methods on widely used DTU, Replica, and BlendedMVS datasets.

2025-10-03T03:17:00Z Yibin Zhao Yihan Pan Jun Nan Liwei Chen Jianjun Yi http://arxiv.org/abs/2412.10488v4 SVGBuilder: Component-Based Colored SVG Generation with Text-Guided Autoregressive Transformers 2026-03-19T07:29:34Z

Scalable Vector Graphics (SVG) are essential XML-based formats for versatile graphics, offering resolution independence and scalability. Unlike raster images, SVGs use geometric shapes and support interactivity, animation, and manipulation via CSS and JavaScript. Current SVG generation methods face challenges related to high computational costs and complexity. In contrast, human designers use component-based tools for efficient SVG creation. Inspired by this, SVGBuilder introduces a component-based, autoregressive model for generating high-quality colored SVGs from textual input. It significantly reduces computational overhead and improves efficiency compared to traditional methods. Our model generates SVGs up to 604 times faster than optimization-based approaches. To address the limitations of existing SVG datasets and support our research, we introduce ColorSVG-100K, the first large-scale dataset of colored SVGs, comprising 100,000 graphics. This dataset fills the gap in color information for SVG generation models and enhances diversity in model training. Evaluation against state-of-the-art models demonstrates SVGBuilder's superior performance in practical applications, highlighting its efficiency and quality in generating complex SVG graphics.

2024-12-13T15:24:11Z Accepted by AAAI 2025. Project: https://svgbuilder.github.io Proceedings of the AAAI Conference on Artificial Intelligence, 2025, 39(3), 2358-2366 Zehao Chen Rong Pan 10.1609/aaai.v39i3.32236 http://arxiv.org/abs/2507.02861v3 LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans 2026-03-19T02:04:27Z

We propose LiteReality, a novel pipeline that converts RGB-D scans of indoor environments into compact, realistic, and interactive 3D virtual replicas. LiteReality not only reconstructs scenes that visually resemble reality but also supports key features essential for graphics pipelines -- such as object individuality, articulation, high-quality physically based rendering materials, and physically based interaction. At its core, LiteReality first performs scene understanding and parses the results into a coherent 3D layout and objects with the help of a structured scene graph. It then reconstructs the scene by retrieving the most visually similar 3D artist-crafted models from a curated asset database. Next, the Material Painting module enhances realism by recovering high-quality, spatially varying materials. Finally, the reconstructed scene is integrated into a simulation engine with basic physical properties to enable interactive behavior. The resulting scenes are compact, editable, and fully compatible with standard graphics pipelines, making them suitable for applications in AR/VR, gaming, robotics, and digital twins. In addition, LiteReality introduces a training-free object retrieval module that achieves state-of-the-art similarity performance on the Scan2CAD benchmark, along with a robust material painting module capable of transferring appearances from images of any style to 3D assets -- even under severe misalignment, occlusion, and poor lighting. We demonstrate the effectiveness of LiteReality on both real-life scans and public datasets. Project page: https://litereality.github.io; Video: https://www.youtube.com/watch?v=ecK9m3LXg2c

2025-07-03T17:59:55Z Project Page: https://litereality.github.io; Video: https://www.youtube.com/watch?v=ecK9m3LXg2c&feature=youtu.be Camera-Ready Version Zhening Huang Xiaoyang Wu Fangcheng Zhong Hengshuang Zhao Matthias Nießner Joan Lasenby http://arxiv.org/abs/2603.17995v1 LoST: Level of Semantics Tokenization for 3D Shapes 2026-03-18T17:56:06Z

Tokenization is a fundamental technique in the generative modeling of various modalities. In particular, it plays a critical role in autoregressive (AR) models, which have recently emerged as a compelling option for 3D generation. However, optimal tokenization of 3D shapes remains an open question. State-of-the-art (SOTA) methods primarily rely on geometric level-of-detail (LoD) hierarchies, originally designed for rendering and compression. These spatial hierarchies are often token-inefficient and lack semantic coherence for AR modeling. We propose Level-of-Semantics Tokenization (LoST), which orders tokens by semantic salience, such that early prefixes decode into complete, plausible shapes that possess principal semantics, while subsequent tokens refine instance-specific geometric and semantic details. To train LoST, we introduce Relational Inter-Distance Alignment (RIDA), a novel 3D semantic alignment loss that aligns the relational structure of the 3D shape latent space with that of the semantic DINO feature space. Experiments show that LoST achieves SOTA reconstruction, surpassing previous LoD-based 3D shape tokenizers by large margins on both geometric and semantic reconstruction metrics. Moreover, LoST achieves efficient, high-quality AR 3D generation and enables downstream tasks like semantic retrieval, while using only 0.1%-10% of the tokens needed by prior AR models.

2026-03-18T17:56:06Z CVPR 2026; Project website-- https://lost3d.github.io Niladri Shekhar Dutt Zifan Shi Paul Guerrero Chun-Hao Paul Huang Duygu Ceylan Niloy J. Mitra Xuelin Chen http://arxiv.org/abs/2506.13212v3 Volumetric Functional Maps 2026-03-18T14:20:18Z

Computing volumetric correspondences between 3D shapes is a prominent tool for medical and industrial applications. In this work, we pave the way for spectral volume mapping, extending for the first time the surface-based functional maps framework. We show that the eigenfunctions of the volumetric Laplace operator define a functional space that is suitable for high-quality signal transfer. We also experiment with various techniques that edit this functional space, porting them to volume domains. We validate our method on novel volumetric datasets and on tetrahedralizations of well established surface datasets, also showcasing practical applications involving both discrete and continuous signal mapping, for segmentation transfer, mesh connectivity transfer and solid texturing. Finally, we show that the volumetric spectrum greatly improves the accuracy for classical shape matching tasks among surfaces, consistently outperforming surface-only spectral methods.

2025-06-16T08:13:57Z Filippo Maggioli Simone Melzi Marco Livesu http://arxiv.org/abs/2603.17704v1 DancingBox: A Lightweight MoCap System for Character Animation from Physical Proxies 2026-03-18T13:23:20Z

Creating compelling 3D character animations typically requires either expert use of professional software or expensive motion capture systems operated by skilled actors. We present DancingBox, a lightweight, vision-based system that makes motion capture accessible to novices by reimagining the process as digital puppetry. Instead of tracking precise human motions, DancingBox captures the approximate movements of everyday objects manipulated by users with a single webcam. These coarse proxy motions are then refined into realistic character animations by conditioning a generative motion model on bounding-box representations, enriched with human motion priors learned from large-scale datasets. To overcome the lack of paired proxy-animation data, we synthesize training pairs by converting existing motion capture sequences into proxy representations. A user study demonstrates that DancingBox enables intuitive and creative character animation using diverse proxies, from plush toys to bananas, lowering the barrier to entry for novice animators.

2026-03-18T13:23:20Z Accepted to CHI2026 Haocheng Yuan Adrien Bousseau Hao Pan Lei Zhong Changjian Li http://arxiv.org/abs/2603.14925v2 Workflow-Aware Structured Layer Decomposition for Illustration Production 2026-03-18T13:18:12Z

Recent generative image editing methods adopt layered representations to mitigate the entangled nature of raster images and improve controllability, typically relying on object-based segmentation. However, such strategies may fail to capture the structural and stylized properties of human-created images, such as anime illustrations. To solve this issue, we propose a workflow-aware structured layer decomposition framework tailored to the illustration production of anime artwork. Inspired by the creation pipeline of anime production, our method decomposes the illustration into semantically meaningful production layers, including line art, flat color, shadow, and highlight. To decouple all these layers, we introduce lightweight layer semantic embeddings to provide specific task guidance for each layer. Furthermore, a set of layer-wise losses is incorporated to supervise the training process of individual layers. To overcome the lack of ground-truth layered data, we construct a high-quality illustration dataset that simulated the standard anime production workflow. Experiments demonstrate that the accurate and visually coherent layer decompositions were achieved by using our method. We believe that the resulting layered representation further enables downstream tasks such as recoloring and embedding texture, supporting content creation, and illustration editing. Code is available at: https://github.com/zty0304/Anime-layer-decomposition

2026-03-16T07:28:37Z 17 pages, 15 figures Tianyu Zhang Dongchi Li Keiichi Sawada Haoran Xie http://arxiv.org/abs/2509.25857v2 Vector sketch animation generation with differentiable motion trajectories 2026-03-18T10:19:37Z

Sketching is a direct and inexpensive means of visual expression. Though image-based sketching has been well studied, video-based sketch animation generation is still very challenging due to the temporal coherence requirement. In this paper, we propose a novel end-to-end automatic generation approach for vector sketch animation. To solve the flickering issue, we introduce a Differentiable Motion Trajectory (DMT) representation that describes the frame-wise movement of stroke control points using differentiable polynomial-based trajectories. DMT enables global semantic gradient propagation across multiple frames, significantly improving the semantic consistency and temporal coherence, and producing high-framerate output. DMT employs a Bernstein basis to balance the sensitivity of polynomial parameters, thus achieving more stable optimization. Instead of implicit fields, we introduce sparse track points for explicit spatial modeling, which improves efficiency and supports long-duration video processing. Evaluations on DAVIS and LVOS datasets demonstrate the superiority of our approach over SOTA methods. Cross-domain validation on 3D models and text-to-video data confirms the robustness and compatibility of our approach.

2025-09-30T06:53:04Z 14 pages, 12 figures Xinding Zhu Xinye Yang Shuyang Zheng Zhexin Zhang Fei Gao Jing Huang Jiazhou Chen http://arxiv.org/abs/2603.17337v1 Scale-Aware Navigation of Astronomical Survey Imagery Data on High Resolution Immersive Displays 2026-03-18T04:04:39Z

Upcoming astronomical surveys produce imagery that spans many orders of magnitude in spatial scale, requiring scientists to reason fluidly between global structure and local detail. Data from the Vera C. Rubin Observatory exemplifies this challenge, as traditional desktop-based workflows often rely on discrete views or static cutouts that fragment context during exploration. This paper presents a design-oriented framework for scale-aware navigation of astronomical survey imagery in high-resolution immersive display environments. We illustrate these principles through representative usage scenarios using Vera Rubin Observatory and Milky Way survey imagery deployed in room-scale immersive environments, including tiled high-resolution displays and curved immersive systems. Our goal is to contribute design insights that inform the development of immersive interaction paradigms for exploratory analysis of extreme-scale scientific imagery.

2026-03-18T04:04:39Z 4 pages, 2 figures, to appear in IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (IEEE VRW 2026) Ava Nederlander Stony Brook University Zainab Aamir Stony Brook University Arie E. Kaufman Stony Brook University