https://arxiv.org/api/S0kwJZ8qy9uz/KSLRrKAq7dAAM4 2026-06-15T01:45:56Z 9324 615 15 http://arxiv.org/abs/2603.24086v1 LGTM: Training-Free Light-Guided Text-to-Image Diffusion Model via Initial Noise Manipulation 2026-03-25T08:46:31Z

Diffusion models have demonstrated high-quality performance in conditional text-to-image generation, particularly with structural cues such as edges, layouts, and depth. However, lighting conditions have received limited attention and remain difficult to control within the generative process. Existing methods handle lighting through a two-stage pipeline that relights images after generation, which is inefficient. Moreover, they rely on fine-tuning with large datasets and heavy computation, limiting their adaptability to new models and tasks. To address this, we propose a novel Training-Free Light-Guided Text-to-Image Diffusion Model via Initial Noise Manipulation (LGTM), which manipulates the initial latent noise of the diffusion process to guide image generation with text prompts and user-specified light directions. Through a channel-wise analysis of the latent space, we find that selectively manipulating latent channels enables fine-grained lighting control without fine-tuning or modifying the pre-trained model. Extensive experiments show that our method surpasses prompt-based baselines in lighting consistency, while preserving image quality and text alignment. This approach introduces new possibilities for dynamic, user-guided light control. Furthermore, it integrates seamlessly with models like ControlNet, demonstrating adaptability across diverse scenarios.

2026-03-25T08:46:31Z Accepted to IJCNN2026 Ryugo Morita Stanislav Frolov Brian Bernhard Moser Ko Watanabe Riku Takahashi Andreas Dengel http://arxiv.org/abs/2603.24039v1 SemLayer: Semantic-aware Generative Segmentation and Layer Construction for Abstract Icons 2026-03-25T07:51:04Z

Graphic icons are a cornerstone of modern design workflows, yet they are often distributed as flattened single-path or compound-path graphics, where the original semantic layering is lost. This absence of semantic decomposition hinders downstream tasks such as editing, restyling, and animation. We formalize this problem as semantic layer construction for flattened vector art and introduce SemLayer, a visual generation empowered pipeline that restores editable layered structures. Given an abstract icon, SemLayer first generates a chromatically differentiated representation in which distinct semantic components become visually separable. To recover the complete geometry of each part, including occluded regions, we then perform a semantic completion step that reconstructs coherent object-level shapes. Finally, the recovered parts are assembled into a layered vector representation with inferred occlusion relationships. Extensive qualitative comparisons and quantitative evaluations demonstrate the effectiveness of SemLayer, enabling editing workflows previously inapplicable to flattened vector graphics and establishing semantic layer reconstruction as a practical and valuable task. Project page: https://xxuhaiyang.github.io/SemLayer/

2026-03-25T07:51:04Z Accepted to CVPR 2026 Haiyang Xu Ronghuan Wu Li-Yi Wei Nanxuan Zhao Chenxi Liu Cuong Nguyen Zhuowen Tu Zhaowen Wang http://arxiv.org/abs/2512.14187v3 Establishing Stochastic Object Models from Noisy Data via Ambient Measurement-Integrated Diffusion 2026-03-25T07:43:46Z

Task-based measures of image quality (IQ) are critical for evaluating medical imaging systems, which must account for randomness including anatomical variability. Stochastic object models (SOMs) provide a statistical description of such variability, but conventional mathematical SOMs fail to capture realistic anatomy, while data-driven approaches typically require clean data rarely available in clinical tasks. To address this challenge, we propose AMID, an unsupervised Ambient Measurement-Integrated Diffusion with noise decoupling, which establishes clean SOMs directly from noisy measurements. AMID introduces a measurement-integrated strategy aligning measurement noise with the diffusion trajectory, and explicitly models coupling between measurement and diffusion noise across steps, an ambient loss is thus designed base on it to learn clean SOMs. Experiments on real CT and mammography datasets show that AMID outperforms existing methods in generation fidelity and yields more reliable task-based IQ evaluation, demonstrating its potential for unsupervised medical imaging analysis.

2025-12-16T08:33:08Z Xiaoning Lei Jianwei Sun Wenhao Cai Xichen Xu Yanshu Wang Hu Gao http://arxiv.org/abs/2603.23973v1 SLAT-Phys: Fast Material Property Field Prediction from Structured 3D Latents 2026-03-25T06:14:03Z

Estimating the material property field of 3D assets is critical for physics-based simulation, robotics, and digital twin generation. Existing vision-based approaches are either too expensive and slow or rely on 3D information. We present SLAT-Phys, an end-to-end method that predicts spatially varying material property fields of 3D assets directly from a single RGB image without explicit 3D reconstruction. Our approach leverages spatially organised latent features from a pretrained 3D asset generation model that encodes rich geometry and semantic prior, and trains a lightweight neural decoder to estimate Young's modulus, density, and Poisson's ratio. The coarse volumetric layout and semantic cues of the latent representation about object geometry and appearance enable accurate material estimation. Our experiments demonstrate that our method provides competitive accuracy in predicting continuous material parameters when compared against prior approaches, while significantly reducing computation time. In particular, SLAT-Phys requires only 9.9 seconds per object on an NVIDIA RTXA5000 GPU and avoids reconstruction and voxelization preprocessing. This results in 120x speedup compared to prior methods and enables faster material property estimation from a single image.

2026-03-25T06:14:03Z 8 page, 4 figures Rocktim Jyoti Das Dinesh Manocha http://arxiv.org/abs/2601.12527v2 Deep Feature Deformation Weights 2026-03-25T05:20:14Z

Handle-based mesh deformation is a classic paradigm in computer graphics which enables intuitive edits from sparse controls. Classical techniques are fast and precise, but require users to know ideal handle placement apriori, which can be unintuitive and inconsistent. Handle sets cannot be adjusted easily, as weights are typically optimized through energies defined by the handles. Modern data-driven methods, on the other hand, provide semantic edits but sacrifice fine-grained control and speed. We propose a technique that achieves the best of both worlds: deep feature proximity yields smooth, visual-aware deformation weights with no additional regularization. Importantly, these weights are computed in real-time for any surface point, unlike prior methods which require expensive optimization. We introduce barycentric feature distillation, an improved feature distillation pipeline which leverages the full visual signal from shape renders to make distillation complexity robust to mesh resolution. This enables high resolution meshes to be processed in minutes versus potentially hours for prior methods. We preserve and extend classical properties through feature space constraints and locality weighting. Our field representation enables automatic visual symmetry detection, which we use to produce symmetry-preserving deformations. We show a proof-of-concept application which can produce deformations for meshes up to 1 million faces in real-time on a consumer-grade machine. Project page at https://threedle.github.io/dfd.

2026-01-18T18:23:03Z Project page at https://threedle.github.io/dfd Richard Liu Itai Lang Rana Hanocka http://arxiv.org/abs/2511.18370v2 MimiCAT: Mimic with Correspondence-Aware Cascade-Transformer for Category-Free 3D Pose Transfer 2026-03-25T05:03:23Z

3D pose transfer aims to transfer the pose-style of a source mesh to a target character while preserving both the target's geometry and the source's pose characteristic. Existing methods are largely restricted to characters with similar structures and fail to generalize to category-free settings (e.g., transferring a humanoid's pose to a quadruped). The key challenge lies in the structural and transformation diversity inherent in distinct character types, which often leads to mismatched regions and poor transfer quality. To address these issues, we first construct a million-scale pose dataset across hundreds of distinct characters. We further propose MimiCAT, a cascade-transformer model designed for category-free 3D pose transfer. Instead of relying on strict one-to-one correspondence mappings, MimiCAT leverages semantic keypoint labels to learn a novel soft correspondence that enables flexible many-to-many matching across characters. The pose transfer is then formulated as a conditional generation process, in which the source transformations are first projected onto the target through soft correspondence matching and subsequently refined using shape-conditioned representations. Extensive qualitative and quantitative experiments demonstrate that MimiCAT generalizes plausible poses across diverse character morphologies, surpassing prior approaches restricted to narrow-category transfer (e.g., humanoid-to-humanoid).

2025-11-23T09:28:57Z Accepted to CVPR 2026. Project page: https://mimicat3d.github.io/ Zenghao Chai Chen Tang Yongkang Wong Xulei Yang Mohan Kankanhalli http://arxiv.org/abs/2603.23933v1 ORACLE: Orchestrate NPC Daily Activities using Contrastive Learning with Transformer-CVAE 2026-03-25T04:46:01Z

The integration of Non-player characters (NPCs) within digital environments has been increasingly recognized for its potential to augment user immersion and cognitive engagement. The sophisticated orchestration of their daily activities, reflecting the nuances of human daily routines, contributes significantly to the realism of digital environments. Nevertheless, conventional approaches often produce monotonous repetition, falling short of capturing the intricacies of real human activity plans. In response to this, we introduce ORACLE, a novel generative model for the synthesis of realistic indoor daily activity plans, ensuring NPCs' authentic presence in digital habitats. Exploiting the CASAS smart home dataset's 24-hour indoor activity sequences, ORACLE addresses challenges in the dataset, including its imbalanced sequential data, the scarcity of training samples, and the absence of pre-trained models encapsulating human daily activity patterns. ORACLE's training leverages the sequential data processing prowess of Transformers, the generative controllability of Conditional Variational Autoencoders (CVAE), and the discriminative refinement of contrastive learning. Our experimental results validate the superiority of generating NPC activity plans and the efficacy of our design strategies over existing methods.

2026-03-25T04:46:01Z 17 pages, 7 figures. Accepted to CVM 2026 Seong-Eun Hong JuYeong Hwang RyunHa Lee HyeongYeop Kang http://arxiv.org/abs/2603.23639v1 Augmented Reality Visualization for Musical Instrument Learning 2026-03-24T18:28:08Z

We contribute two design studies for augmented reality visualizations that support learning musical instruments. First, we designed simple, glanceable encodings for drum kits, which we display through a projector. As second instrument, we chose guitar and designed visualizations to be displayed either on a screen as an augmented mirror or as an optical see-through AR headset. These modalities allow us to also show information around the instrument and in 3D. We evaluated our prototypes through case studies and our results demonstrate the general effectivity and revealed design-related and technical limitations.

2026-03-24T18:28:08Z Presented at the ISMIR 2022 Late-Breaking Demo Session, see https://ismir2022program.ismir.net/lbd_376.html Frank Heyen Michael Sedlmair http://arxiv.org/abs/2603.23631v1 Supporting Music Education through Visualizations of MIDI Recordings 2026-03-24T18:15:58Z

Musicians mostly have to rely on their ears when they want to analyze what they play, for example to detect errors. Since hearing is sequential, it is not possible to quickly grasp an overview over one or multiple recordings of a whole piece of music at once. We therefore propose various visualizations that allow analyzing errors and stylistic variance. Our current approach focuses on rhythm and uses MIDI data for simplicity.

2026-03-24T18:15:58Z Presented at the IEEE VIS 2020 Poster Session Frank Heyen Michael Sedlmair http://arxiv.org/abs/2603.23386v1 SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM 2026-03-24T16:16:52Z

High-quality articulated 3D assets are indispensable for embodied AI and physical simulation, yet 3D generation still focuses on static meshes, leaving a gap in "sim-ready" interactive objects. Most recent articulated object creation methods rely on multi-stage pipelines that accumulate errors across decoupled modules. Alternatively, unified MLLMs offer a single-stage path to joint static asset understanding and sim-ready asset generation. However dense voxel-based 3D tokenization yields long 3D token sequences and high memory overhead, limiting scalability to complex articulated objects. To address this, we propose SIMART, a unified MLLM framework that jointly performs part-level decomposition and kinematic prediction. By introducing a Sparse 3D VQ-VAE, SIMART reduces token counts by 70% vs. dense voxel tokens, enabling high-fidelity multi-part assemblies. SIMART achieves state-of-the-art performance on PartNet-Mobility and in-the-wild AIGC datasets, and enables physics-based robotic simulation.

2026-03-24T16:16:52Z Chuanrui Zhang Minghan Qin Yuang Wang Baifeng Xie Hang Li Ziwei Wang http://arxiv.org/abs/2605.16266v1 Patchwork: A compact representation for 3D polygonal shapes 2026-03-24T15:20:32Z

We introduce Patchwork, a new general-purpose shape representation capable of modeling 2D and 3D geometry with a small number of parameters. Patchwork is grounded in a rigorous mathematical framework, providing provable complexity bounds and the ability to approximate arbitrary shapes with arbitrary precision in any dimension. We propose an efficient gradient-based optimization scheme to fit Patchwork representations to 2D and 3D data, along with a novel regularization loss that progressively prunes redundant elements, yielding high compactness after convergence. Our approach offers fast fitting performance, a fraction of the required parameters compared to existing alternatives, and native support for inside-outside classification, making it a versatile and compact representation for geometric learning and reconstruction tasks, with future potential for 3D generation. Our implementation is available at: https://github.com/Ankbzpx/patchwork-experiment.

2026-03-24T15:20:32Z Ruichen Zheng Biao Zhang Michael Birsak Mikhail Skopenkov Peter Wonka http://arxiv.org/abs/2506.14315v3 ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies 2026-03-24T14:20:07Z

Automating immersive VR scene creation remains a primary research challenge. Existing methods typically rely on complex geometry with post-simplification, resulting in inefficient pipelines or limited realism. In this paper, we introduce ImmerseGen, a novel agent-guided framework for compact and photorealistic world generation that decouples realism from exhaustive geometric modeling. ImmerseGen represents scenes as hierarchical compositions of lightweight geometric proxies with synthesized RGBA textures, facilitating real-time rendering on mobile VR headsets. We propose terrain-conditioned texturing for base world generation, combined with context-aware texturing for scenery, to produce diverse and visually coherent worlds. VLM-based agents employ semantic grid-based analysis for precise asset placement and enrich scenes with multimodal enhancements such as visual dynamics and ambient sound. Experiments and real-time VR applications demonstrate that ImmerseGen achieves superior photorealism, spatial coherence, and rendering efficiency compared to existing methods.

2025-06-17T08:50:05Z Accepted by IEEE VR 2026 and TVCG Special Issue. Project webpage: https://immersegen.github.io Jinyan Yuan Bangbang Yang Keke Wang Panwang Pan Lin Ma Xuehai Zhang Xiao Liu Zhaopeng Cui Yuewen Ma http://arxiv.org/abs/2603.23192v1 GTLR-GS: Geometry-Texture Aware LiDAR-Regularized 3D Gaussian Splatting for Realistic Scene Reconstruction 2026-03-24T13:37:52Z

Recent advances in 3D Gaussian Splatting (3DGS) have enabled real-time, photorealistic scene reconstruction. However, conventional 3DGS frameworks typically rely on sparse point clouds derived from Structure-from-Motion (SfM), which inherently suffer from scale ambiguity, limited geometric consistency, and strong view dependency due to the lack of geometric priors. In this work, a LiDAR-centric 3D Gaussian Splatting framework is proposed that explicitly incorporates metric geometric priors into the entire Gaussian optimization process. Instead of treating LiDAR data as a passive initialization source, 3DGS optimization is reformulated as a geometry-conditioned allocation and refinement problem under a fixed representational budget. Specifically, this work introduces (i) a geometry-texture-aware allocation strategy that selectively assigns Gaussian primitives to regions with high structural or appearance complexity, (ii) a curvature-adaptive refinement mechanism that dynamically guides Gaussian splitting toward geometrically complex areas during training, and (iii) a confidence-aware metric depth regularization that anchors the reconstructed geometry to absolute scale using LiDAR measurements while maintaining optimization stability. Extensive experiments on the ScanNet++ dataset and a custom real-world dataset validate the proposed approach. The results demonstrate state-of-the-art performance in metric-scale reconstruction with high geometric fidelity.

2026-03-24T13:37:52Z Yan Fang Jianfei Ge Jiangjian Xiao http://arxiv.org/abs/2602.22625v2 DiffBMP: Differentiable Rendering with Bitmap Primitives 2026-03-24T11:52:47Z

We introduce DiffBMP, a scalable and efficient differentiable rendering engine for a collection of bitmap images. Our work addresses a limitation that traditional differentiable renderers are constrained to vector graphics, given that most images in the world are bitmaps. Our core contribution is a highly parallelized rendering pipeline, featuring a custom CUDA implementation for calculating gradients. This system can, for example, optimize the position, rotation, scale, color, and opacity of thousands of bitmap primitives all in under 1 min using a consumer GPU. We employ and validate several techniques to facilitate the optimization: soft rasterization via Gaussian blur, structure-aware initialization, noisy canvas, and specialized losses/heuristics for videos or spatially constrained images. We demonstrate DiffBMP is not just an isolated tool, but a practical one designed to integrate into creative workflows. It supports exporting compositions to a native, layered file format, and the entire framework is publicly accessible via an easy-to-hack Python package.

2026-02-26T04:56:05Z Accepted to CVPR 2026, https://diffbmp.com Seongmin Hong Junghun James Kim Daehyeop Kim Insoo Chung Se Young Chun http://arxiv.org/abs/2603.22780v1 Curve resampling based high-quality high-order unstructured quadrilateral mesh generation 2026-03-24T04:17:03Z

High-order quadrilateral meshes offer superior accuracy and computational efficiency in numerical simulations. However, existing methods struggle to simultaneously preserve boundary/interface features, ensure high quality, and achieve efficient generation, particularly for complex geometries where degenerate and inverted elements frequently occur. To address this issue, this paper proposes a high-quality high-order unstructured quadrilateral mesh generation method based on geometric error-bounded curve reconstruction, which employs an indirect approach to enforce interface consistency. By optimization-based curve reconstruction strategies, our method improves mesh quality while maintaining the validity of high-order elements. Compared to direct high-order mesh optimization techniques, our approach reduces the optimization problem to curve reconstruction problem, significantly lowering computational complexity and enhancing efficiency. Experimental results demonstrate that the proposed method efficiently generates high-quality high-order quadrilateral meshes while preserving boundary/interface geometric features, offering improved adaptability and numerical stability in complex geometries.

2026-03-24T04:17:03Z Yongjia Weng Lufeng Liu Zhonggui Chen Xuan Zhou Juan Cao