https://arxiv.org/api/c/p8WVjOgmJvJoFeT+4Xdsu7dxY 2026-03-24T09:58:00Z 8847 30 15 http://arxiv.org/abs/2603.14925v2 Workflow-Aware Structured Layer Decomposition for Illustration Production 2026-03-18T13:18:12Z Recent generative image editing methods adopt layered representations to mitigate the entangled nature of raster images and improve controllability, typically relying on object-based segmentation. However, such strategies may fail to capture the structural and stylized properties of human-created images, such as anime illustrations. To solve this issue, we propose a workflow-aware structured layer decomposition framework tailored to the illustration production of anime artwork. Inspired by the creation pipeline of anime production, our method decomposes the illustration into semantically meaningful production layers, including line art, flat color, shadow, and highlight. To decouple all these layers, we introduce lightweight layer semantic embeddings to provide specific task guidance for each layer. Furthermore, a set of layer-wise losses is incorporated to supervise the training process of individual layers. To overcome the lack of ground-truth layered data, we construct a high-quality illustration dataset that simulated the standard anime production workflow. Experiments demonstrate that the accurate and visually coherent layer decompositions were achieved by using our method. We believe that the resulting layered representation further enables downstream tasks such as recoloring and embedding texture, supporting content creation, and illustration editing. Code is available at: https://github.com/zty0304/Anime-layer-decomposition 2026-03-16T07:28:37Z 17 pages, 15 figures Tianyu Zhang Dongchi Li Keiichi Sawada Haoran Xie http://arxiv.org/abs/2509.25857v2 Vector sketch animation generation with differentiable motion trajectories 2026-03-18T10:19:37Z Sketching is a direct and inexpensive means of visual expression. Though image-based sketching has been well studied, video-based sketch animation generation is still very challenging due to the temporal coherence requirement. In this paper, we propose a novel end-to-end automatic generation approach for vector sketch animation. To solve the flickering issue, we introduce a Differentiable Motion Trajectory (DMT) representation that describes the frame-wise movement of stroke control points using differentiable polynomial-based trajectories. DMT enables global semantic gradient propagation across multiple frames, significantly improving the semantic consistency and temporal coherence, and producing high-framerate output. DMT employs a Bernstein basis to balance the sensitivity of polynomial parameters, thus achieving more stable optimization. Instead of implicit fields, we introduce sparse track points for explicit spatial modeling, which improves efficiency and supports long-duration video processing. Evaluations on DAVIS and LVOS datasets demonstrate the superiority of our approach over SOTA methods. Cross-domain validation on 3D models and text-to-video data confirms the robustness and compatibility of our approach. 2025-09-30T06:53:04Z 14 pages, 12 figures Xinding Zhu Xinye Yang Shuyang Zheng Zhexin Zhang Fei Gao Jing Huang Jiazhou Chen http://arxiv.org/abs/2603.20284v1 STAC: Plug-and-Play Spatio-Temporal Aware Cache Compression for Streaming 3D Reconstruction 2026-03-18T06:36:46Z Online 3D reconstruction from streaming inputs requires both long-term temporal consistency and efficient memory usage. Although causal VGGT transformers address this challenge through a key-value (KV) cache mechanism, the cache grows linearly with the stream length, creating a major memory bottleneck. Under limited memory budgets, early cache eviction significantly degrades reconstruction quality and temporal consistency. In this work, we observe that attention in causal transformers for 3D reconstruction exhibits intrinsic spatio-temporal sparsity. Based on this insight, we propose STAC, a Spatio-Temporally Aware Cache Compression framework for streaming 3D reconstruction with large causal transformers. STAC consists of three key components: (1) a Working Temporal Token Caching mechanism that preserves long-term informative tokens using decayed cumulative attention scores; (2) a Long-term Spatial Token Caching scheme that compresses spatially redundant tokens into voxel-aligned representations for memory-efficient storage; and (3) a Chunk-based Multi-frame Optimization strategy that jointly processes consecutive frames to improve temporal coherence and GPU efficiency. Extensive experiments show that STAC achieves state-of-the-art reconstruction quality while reducing memory consumption by nearly 10x and accelerating inference by 4x, substantially improving the scalability of real-time 3D reconstruction in streaming settings. 2026-03-18T06:36:46Z 10 pages, 6 figures. Accepted by CVPR 2026 Runze Wang Yuxuan Song Youcheng Cai Ligang Liu http://arxiv.org/abs/2603.17337v1 Scale-Aware Navigation of Astronomical Survey Imagery Data on High Resolution Immersive Displays 2026-03-18T04:04:39Z Upcoming astronomical surveys produce imagery that spans many orders of magnitude in spatial scale, requiring scientists to reason fluidly between global structure and local detail. Data from the Vera C. Rubin Observatory exemplifies this challenge, as traditional desktop-based workflows often rely on discrete views or static cutouts that fragment context during exploration. This paper presents a design-oriented framework for scale-aware navigation of astronomical survey imagery in high-resolution immersive display environments. We illustrate these principles through representative usage scenarios using Vera Rubin Observatory and Milky Way survey imagery deployed in room-scale immersive environments, including tiled high-resolution displays and curved immersive systems. Our goal is to contribute design insights that inform the development of immersive interaction paradigms for exploratory analysis of extreme-scale scientific imagery. 2026-03-18T04:04:39Z 4 pages, 2 figures, to appear in IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (IEEE VRW 2026) Ava Nederlander Stony Brook University Zainab Aamir Stony Brook University Arie E. Kaufman Stony Brook University http://arxiv.org/abs/2603.16866v1 ManiTwin: Scaling Data-Generation-Ready Digital Object Dataset to 100K 2026-03-17T17:59:49Z Learning in simulation provides a useful foundation for scaling robotic manipulation capabilities. However, this paradigm often suffers from a lack of data-generation-ready digital assets, in both scale and diversity. In this work, we present ManiTwin, an automated and efficient pipeline for generating data-generation-ready digital object twins. Our pipeline transforms a single image into simulation-ready and semantically annotated 3D asset, enabling large-scale robotic manipulation data generation. Using this pipeline, we construct ManiTwin-100K, a dataset containing 100K high-quality annotated 3D assets. Each asset is equipped with physical properties, language descriptions, functional annotations, and verified manipulation proposals. Experiments demonstrate that ManiTwin provides an efficient asset synthesis and annotation workflow, and that ManiTwin-100K offers high-quality and diverse assets for manipulation data generation, random scene synthesis, and VQA data generation, establishing a strong foundation for scalable simulation data synthesis and policy learning. Our webpage is available at https://manitwin.github.io/. 2026-03-17T17:59:49Z Website: https://manitwin.github.io/ Kaixuan Wang Tianxing Chen Jiawei Liu Honghao Su Shaolong Zhu Minxuan Wang Zixuan Li Yue Chen Huan-ang Gao Yusen Qin Jiawei Wang Qixuan Zhang Lan Xu Jingyi Yu Yao Mu Ping Luo http://arxiv.org/abs/2603.16853v1 BrickSim: A Physics-Based Simulator for Manipulating Interlocking Brick Assemblies 2026-03-17T17:56:53Z Interlocking brick assemblies provide a standardized yet challenging testbed for contact-rich and long-horizon robotic manipulation, but existing rigid-body simulators do not faithfully capture snap-fit mechanics. We present BrickSim, the first real-time physics-based simulator for interlocking brick assemblies. BrickSim introduces a compact force-based mechanics model for snap-fit connections and solves the resulting internal force distribution using a structured convex quadratic program. Combined with a hybrid architecture that delegates rigid-body dynamics to the underlying physics engine while handling snap-fit mechanics separately, BrickSim enables real-time, high-fidelity simulation of assembly, disassembly, and structural collapse. On 150 real-world assemblies, BrickSim achieves 100% accuracy in static stability prediction with an average solve time of 5 ms. In dynamic drop tests, it also faithfully reproduces real-world structural collapse, precisely mirroring both the occurrence of breakage and the specific breakage locations. Built on Isaac Sim, BrickSim further supports seamless integration with a wide variety of robots and existing pipelines. We demonstrate robotic construction of brick assemblies using BrickSim, highlighting its potential as a foundation for research in dexterous, long-horizon robotic manipulation. BrickSim is open-source, and the code is available at https://github.com/intelligent-control-lab/BrickSim. 2026-03-17T17:56:53Z 9 pages, 9 figures Haowei Wen Ruixuan Liu Weiyi Piao Siyu Li Changliu Liu http://arxiv.org/abs/2603.16801v1 A low-data, low-cost, and open-source workflow for 3D printing lithographs for digital accessibility of microscopy images 2026-03-17T17:05:28Z Describe an animal without using the verb look. Can you effectively provide an alternative method for interpreting complex microscopy images while preserving the length scale? The world is filled with features too small for our eyes to see: the setae on a gecko's feet, the cuticles covering a rat's whisker, or the fuzziness of a bat's wing. Furthermore, these structures are non-homogeneous, often shifting from stiff to soft. We provide a workflow for producing low-data, low-cost, and open-source lithograph files, allowing tactile accessibility in microscopy images. The lithographs made with this workflow can be printed on a 350 USD 3D printer using 3D files under 100 Mb, for a total cost per print of 0.75 USD. This work seeks to leverage advanced 3D printing to create tactile graphics and art that make science more accessible and enable tactile exploration of biological structures. This framework in this text is aligned with a GitHub repository that will be constantly updated, allowing tactile media to be created as 3D printing and lithography become more streamlined in the years to come. 2026-03-17T17:05:28Z 3 figures, Abigale Stangl and Andrew K. Schulz are co-corresponding authors Robert Faulkner Natalia Gonzalez-Vazquez Victoria Gamez Karly E. Cohen Gunther Richter Abigale Stangl Andrew K. Schulz http://arxiv.org/abs/2603.16612v1 Retrieval-Augmented Sketch-Guided 3D Building Generation 2026-03-17T14:51:52Z In the early design stage of Japanese detached houses, the lack of a unified design representation among clients, sales representatives, and designers leads to design drift and inefficient feedback. Usually, sketches handed off by sales representatives may lose details for quick drawing, which reduces the fidelity of subsequent 3D generation using generative AI models. The generated 3D model typically takes the form of a single unified mesh, preventing component-level editing. To solve these issues, we propose a multi-stage 3D generative design framework capable of producing architectural models from rough design sketches. The framework combines generative and retrieval-based methods to enable component-level editing and personalized customization. It adopts a multimodal representation for 3D model generation and applies component segmentation to localize architectural components such as windows and doors and uses retrieval to support targeted replacement of components. Experiments show that the work enables modular customization which is thought to be suitable for personalized architectural design. This work introduces a multi-stage sketch-to-3D framework for Japanese detached houses, provides facade and component datasets, and shows effectiveness through quantitative and expert evaluations. 2026-03-17T14:51:52Z 10 pages, 4 figures, Proceeding of CAADRIA 2026 Zhengyang Wang Nuttapong Rochanavibhata Yuxiao Ren Xusheng Du Ye Zhang Haoran Xie http://arxiv.org/abs/2603.16566v1 VideoMatGen: PBR Materials through Joint Generative Modeling 2026-03-17T14:24:20Z We present a method for generating physically-based materials for 3D shapes based on a video diffusion transformer architecture. Our method is conditioned on input geometry and a text description, and jointly models multiple material properties (base color, roughness, metallicity, height map) to form physically plausible materials. We further introduce a custom variational auto-encoder which encodes multiple material modalities into a compact latent space, which enables joint generation of multiple modalities without increasing the number of tokens. Our pipeline generates high-quality materials for 3D shapes given a text prompt, compatible with common content creation tools. 2026-03-17T14:24:20Z Jon Hasselgren Zheng Zeng Milos Hasan Jacob Munkberg http://arxiv.org/abs/2512.11237v2 WildCap: Facial Albedo Capture in the Wild via Hybrid Inverse Rendering 2026-03-17T13:12:15Z Existing methods achieve high-quality facial albedo capture under controllable lighting, which increases capture cost and limits usability. We propose WildCap, a novel method for high-quality facial albedo capture from a smartphone video recorded in the wild. To disentangle high-quality albedo from complex lighting effects in in-the-wild captures, we propose a novel hybrid inverse rendering framework. We first apply a data-driven method, i.e., SwitchLight, to convert the captured images into more constrained conditions and then adopt model-based inverse rendering. However, unavoidable local artifacts in network predictions, such as shadow-baking, are non-physical and thus hinder accurate inverse rendering of lighting and material. To address this, we propose a novel texel grid lighting model to explain non-physical effects as clean albedo illuminated by local physical lighting. During optimization, we jointly sample a diffusion prior for the albedo map and optimize the lighting, effectively resolving scale ambiguity between local lights and albedo. Other reflectance maps are then predicted from the albedo. Our method achieves significantly better results than prior arts in the same capture setup, closing the quality gap between in-the-wild and controllable recordings by a large margin. 2025-12-12T02:37:03Z CVPR 2026. project page: https://yxuhan.github.io/WildCap/index.html; code: https://github.com/yxuhan/WildCap Yuxuan Han Xin Ming Tianxiao Li Zhuofan Shen Qixuan Zhang Lan Xu Feng Xu http://arxiv.org/abs/2603.16478v1 Fast and Reliable Gradients for Deformables Across Frictional Contact Regimes 2026-03-17T13:04:23Z Differentiable simulation establishes the mathematical foundation for solving challenging inverse problems in computer graphics and robotics, such as physical system identification and inverse dynamics control. However, rigor in frictional contact remains the "elephant in the room." Current frameworks often avoid contact singularities via non-Markovian position approximations or heuristic gradients. This lack of mathematical consistency distorts gradients, causing optimization stagnation or failure in complex frictional contact and large-deformation scenarios. We introduce our unified fully GPU-accelerated differentiable simulator, which establishes a rigorous theoretical paradigm through: Long-Horizon Consistency: enforcing strict Markovian dynamics on a coupled position-velocity manifold to prevent gradient collapse; Unified Contact Stability: employing a mass-aligned preconditioner and soft Fischer--Burmeister operator for smooth frictional optimization; Robust Material Identification: resolving FEM singularities via a derived "Within-block Commutation" condition. Our experiments demonstrate our solver efficacy in bridging the Sim-to-Real gap, delivering precise, low-noise gradients in contact-rich tasks like dexterous manipulation and cloth folding. By mitigating the gradient instability issues common in conventional approaches, our framework significantly enhances the fidelity of physical system identification and control. 2026-03-17T13:04:23Z Ziqiu Zeng Gang Yang Zhenhao Huang Yulin Li Jason Pho Siyuan Luo Fan Shi http://arxiv.org/abs/2603.16447v1 ProgressiveAvatars: Progressive Animatable 3D Gaussian Avatars 2026-03-17T12:30:27Z In practical real-time XR and telepresence applications, network and computing resources fluctuate frequently. Therefore, a progressive 3D representation is needed. To this end, we propose ProgressiveAvatars, a progressive avatar representation built on a hierarchy of 3D Gaussians grown by adaptive implicit subdivision on a template mesh. 3D Gaussians are defined in face-local coordinates to remain animatable under varying expressions and head motion across multiple detail levels. The hierarchy expands when screen-space signals indicate a lack of detail, allocating resources to important areas. Leveraging importance ranking, ProgressiveAvatars supports incremental loading and rendering, adding new Gaussians as they arrive while preserving previous content, thus achieving smooth quality improvements across varying bandwidths. ProgressiveAvatars enables progressive delivery and progressive rendering under fluctuating network bandwidth and varying compute and memory resources. 2026-03-17T12:30:27Z Accepted to CVPR 2026, Project page: https://ustc3dv.github.io/ProgressiveAvatars/ Kaiwen Song Jinkai Cui Juyong Zhang http://arxiv.org/abs/2504.05296v3 Let it Snow! Animating 3D Gaussian Scenes with Dynamic Weather Effects via Physics-Guided Score Distillation 2026-03-17T12:14:12Z 3D Gaussian Splatting has recently enabled fast and photorealistic reconstruction of static 3D scenes. However, dynamic editing of such scenes remains a significant challenge. We introduce a novel framework, Physics-Guided Score Distillation, to address a fundamental conflict: physics simulation provides a strong motion prior that is insufficient for photorealism , while video-based Score Distillation Sampling (SDS) alone cannot generate coherent motion for complex, multi-particle scenarios. We resolve this through a unified optimization framework where physics simulation guides Score Distillation to jointly refine the motion prior for photorealism while simultaneously optimizing appearance. Specifically, we learn a neural dynamics model that predicts particle motion and appearance, optimized end-to-end via a combined loss integrating Video-SDS for photorealism with our physics-guidance prior. This allows for photorealistic refinements while ensuring the dynamics remain plausible. Our framework enables scene-wide dynamic weather effects, including snowfall, rainfall, fog, and sandstorms, with physically plausible motion. Experiments demonstrate our physics-guided approach significantly outperforms baselines, with ablations confirming this joint refinement is essential for generating coherent, high-fidelity dynamics. 2025-04-07T17:51:21Z Accepted to CVPR 2026. Project webpage: https://galfiebelman.github.io/let-it-snow/ Gal Fiebelman Hadar Averbuch-Elor Sagie Benaim http://arxiv.org/abs/2511.02580v2 TAUE: Training-free Noise Transplant and Cultivation Diffusion Model 2026-03-17T10:21:32Z Despite the remarkable success of text-to-image diffusion models, their output of a single, flattened image remains a critical bottleneck for professional applications requiring layer-wise control. Existing solutions either rely on fine-tuning with large, inaccessible datasets or are training-free yet limited to generating isolated foreground elements, failing to produce a complete and coherent scene. To address this, we introduce the Training-free Noise Transplantation and Cultivation Diffusion Model (TAUE), a novel framework for layer-wise image generation that requires neither fine-tuning nor additional data. TAUE embeds global structural information from intermediate denoising latents into the initial noise to preserve spatial coherence, and integrates semantic cues through cross-layer attention sharing to maintain contextual and visual consistency across layers. Extensive experiments demonstrate that TAUE achieves state-of-the-art performance among training-free methods, delivering image quality comparable to fine-tuned models while improving inter-layer consistency. Moreover, it enables new applications, such as layout-aware editing, multi-object composition, and background replacement, indicating potential for interactive, layer-separated generation systems in real-world creative workflows. 2025-11-04T13:56:39Z Accepted to CVPR 2026 Findings. The first two authors contributed equally. Project Page: https://iyatomilab.github.io/TAUE Daichi Nagai Ryugo Morita Shunsuke Kitada Hitoshi Iyatomi http://arxiv.org/abs/2603.16103v1 NanoGS: Training-Free Gaussian Splat Simplification 2026-03-17T03:58:02Z 3D Gaussian Splat (3DGS) enables high-fidelity, real-time novel view synthesis by representing scenes with large sets of anisotropic primitives, but often requires millions of Splats, incurring significant storage and transmission costs. Most existing compression methods rely on GPU-intensive post-training optimization with calibrated images, limiting practical deployment. We introduce NanoGS, a training-free and lightweight framework for Gaussian Splat simplification. Instead of relying on image-based rendering supervision, NanoGS formulates simplification as local pairwise merging over a sparse spatial graph. The method approximates a pair of Gaussians with a single primitive using mass preserved moment matching and evaluates merge quality through a principled merge cost between the original mixture and its approximation. By restricting merge candidates to local neighborhoods and selecting compatible pairs efficiently, NanoGS produces compact Gaussian representations while preserving scene structure and appearance. NanoGS operates directly on existing Gaussian Splat models, runs efficiently on CPU, and preserves the standard 3DGS parameterization, enabling seamless integration with existing rendering pipelines. Experiments demonstrate that NanoGS substantially reduces primitive count while maintaining high rendering fidelity, providing an efficient and practical solution for Gaussian Splat simplification. Our project website is available at https://saliteta.github.io/NanoGS/. 2026-03-17T03:58:02Z Butian Xiong Rong Liu Tiantian Zhou Meida Chen Zhiwen Fan Andrew Feng