https://arxiv.org/api/KDEuvqa91JtW2JLBplEk2+iMfiA 2026-06-17T12:55:46Z 9346 660 15 http://arxiv.org/abs/2603.21978v1 GeoFusion-CAD: Structure-Aware Diffusion with Geometric State Space for Parametric 3D Design 2026-03-23T13:41:39Z

Parametric Computer-Aided Design (CAD) is fundamental to modern 3D modeling, yet existing methods struggle to generate long command sequences, especially under complex geometric and topological dependencies. Transformer-based architectures dominate CAD sequence generation due to their strong dependency modeling, but their quadratic attention cost and limited context windowing hinder scalability to long programs. We propose GeoFusion-CAD, an end-to-end diffusion framework for scalable and structure-aware generation. Our proposal encodes CAD programs as hierarchical trees, jointly capturing geometry and topology within a state-space diffusion process. Specifically, a lightweight C-Mamba block models long-range structural dependencies through selective state transitions, enabling coherent generation across extended command sequences. To support long-sequence evaluation, we introduce DeepCAD-240, an extended benchmark that increases the sequence length ranging from 40 to 240 while preserving sketch-extrusion semantics from the ABC dataset. Extensive experiments demonstrate that GeoFusion-CAD achieves superior performance on both short and long command ranges, maintaining high geometric fidelity and topological consistency where Transformer-based models degrade. Our approach sets new state-of-the-art scores for long-sequence parametric CAD generation, establishing a scalable foundation for next-generation CAD modeling systems. Code and datasets are available at GitHub.

2026-03-23T13:41:39Z Accepted to CVPR 2026 (Findings). Includes supplementary material Xiaolei Zhou Chuangjie Fang Jie Wu Jingyi Yang Boyi Lin Jianwei Zheng http://arxiv.org/abs/2601.05162v2 GenAI-DrawIO-Creator: A Framework for Automated Diagram Generation 2026-03-23T09:24:55Z

Diagrams are crucial for communicating complex information, yet creating and modifying them remains a labor-intensive task. We present GenAI-DrawIO-Creator, a novel framework that leverages Large Language Models (LLMs) to automate diagram generation and manipulation in the structured XML format used by draw.io. Our system integrates Claude 3.7 to reason about structured visual data and produce valid diagram representations. Key contributions include a high-level system design enabling real-time diagram updates, specialized prompt engineering and error-checking to ensure well-formed XML outputs. We demonstrate a working prototype capable of generating accurate diagrams (such as network architectures and flowcharts) from natural language or code, and even replicating diagrams from images. Simulated evaluations show that our approach significantly reduces diagram creation time and produces outputs with high structural fidelity. Our results highlight the promise of Claude 3.7 in handling structured visual reasoning tasks and lay the groundwork for future research in AI-assisted diagramming applications.

2026-01-08T17:51:35Z Jinze Yu Dayuan Jiang http://arxiv.org/abs/2603.21695v1 RefracGS: Novel View Synthesis Through Refractive Water Surfaces with 3D Gaussian Ray Tracing 2026-03-23T08:31:08Z

Novel view synthesis (NVS) through non-planar refractive surfaces presents fundamental challenges due to severe, spatially varying optical distortions. While recent representations like NeRF and 3D Gaussian Splatting (3DGS) excel at NVS, their assumption of straight-line ray propagation fails under these conditions, leading to significant artifacts. To overcome this limitation, we introduce RefracGS, a framework that jointly reconstructs the refractive water surface and the scene beneath the interface. Our key insight is to explicitly decouple the refractive boundary from the target objects: the refractive surface is modeled via a neural height field, capturing wave geometry, while the underlying scene is represented as a 3D Gaussian field. We formulate a refraction-aware Gaussian ray tracing approach that accurately computes non-linear ray trajectories using Snell's law and efficiently renders the underlying Gaussian field while backpropagating the loss gradients to the parameterized refractive surface. Through end-to-end joint optimization of both representations, our method ensures high-fidelity NVS and view-consistent surface recovery. Experiments on both synthetic and real-world scenes with complex waves demonstrate that RefracGS outperforms prior refractive methods in visual quality, while achieving 15x faster training and real-time rendering at 200 FPS. The project page for RefracGS is available at https://yimgshao.github.io/refracgs/.

2026-03-23T08:31:08Z Yiming Shao Qiyu Dai Chong Gao Guanbin Li Yeqiang Wang He Sun Qiong Zeng Baoquan Chen Wenzheng Chen http://arxiv.org/abs/2603.21661v1 Cross-Scenario Deraining Adaptation with Unpaired Data: Superpixel Structural Priors and Multi-Stage Pseudo-Rain Synthesis 2026-03-23T07:38:55Z

Image deraining plays a pivotal role in low-level computer vision, serving as a prerequisite for robust outdoor surveillance and autonomous driving systems. While deep learning paradigms have achieved remarkable success in firmly aligned settings, they often suffer from severe performance degradation when generalized to unseen Out-of-Distribution (OOD) scenarios. This failure stems primarily from the significant domain discrepancy between synthetic training datasets and the complex physical dynamics of real-world rain. To address these challenges, this paper proposes a pioneering cross-scenario deraining adaptation framework. Diverging from conventional approaches, our method obviates the requirements for paired rainy observations in the target domain, leveraging exclusively rain-free background images. We design a Superpixel Generation (Sup-Gen) module to extract stable structural priors from the source domain using Simple Linear Iterative Clustering. Subsequently, a Resolution-adaptive Fusion strategy is introduced to align these source structures with target backgrounds through texture similarity, ensuring the synthesis of diverse and realistic pseudo-data. Finally, we implement a pseudo-label re-Synthesize mechanism that employs multi-stage noise generation to simulate realistic rain streaks. This framework functions as a versatile plug-and-play module capable of seamless integration into arbitrary deraining architectures. Extensive experiments on state-of-the-art models demonstrate that our approach yields remarkable PSNR gains of up to 32% to 59% in OOD domains while significantly accelerating training convergence.

2026-03-23T07:38:55Z We aim at addressing the cross-scenario (i.e., O.O.D) de-rain challenge, which has been neglected for a long period Kangbo Zhao Miaoxin Guan Xiang Chen Yukai Shi Jinshan Pan http://arxiv.org/abs/2511.16068v2 Time-Critical Adversarial Influence Blocking Maximization 2026-03-22T10:02:16Z

Adversarial Influence Blocking Maximization (AIBM) aims to select a set of positive seed nodes that propagate synchronously with the known negative seed nodes to counteract their negative influence. Time factor plays a particularly vital role for many AIBM application scenarios. However, the AIBM problem with time constraint remains unexplored. More importantly, existing AIBM studies have not thoroughly investigated the submodularity of the objective function, thereby failing to establish a theoretical approximation guarantee. To address these challenges, firstly, we establish the Time-Critical Adversarial Influence Blocking Maximization (TC-AIBM), which explicitly incorporates time constraint. Then, we provide a theoretical proof of the submodularity of the TC-AIBM objective function under three different tie-breaking rules. Finally, a Bidirectional Influence Sampling (BIS) algorithm is proposed to solve the TC-AIBM problem. Leveraging the monotonicity and submodularity of the objective function, BIS achieves an approximation guarantee of $(1-1/e-ε)(1-ψ)$. Comprehensive experiments on four real-world datasets demonstrate that the proposed BIS algorithm exhibits excellent robustness across various negative seeds, time constraint, and tie-breaking rules, outperforming state-of-the-art baselines. In addition, BIS is up to three orders of magnitude faster than the Greedy algorithm.

2025-11-20T05:59:21Z Jilong Shi Qiangpeng Fang Xiaobin Rui Jian Zhang Zhixiao Wang http://arxiv.org/abs/2512.12984v2 VoroLight: Learning Voronoi Surface Meshes via Sphere Intersection 2026-03-22T08:00:16Z

Voronoi diagrams naturally produce convex, watertight, and topologically consistent cells, making them an appealing representation for 3D shape reconstruction. However, standard differentiable Voronoi approaches typically optimize generator positions in stable configurations, which can lead to locally uneven surface geometry. We present VoroLight, a differentiable framework that promotes controlled Voronoi degeneracy for smooth surface reconstruction. Instead of optimizing generator positions alone, VoroLight associates each Voronoi surface vertex with a trainable sphere and introduces a sphere--intersection loss that encourages higher-order equidistance among face-incident generators. This formulation improves surface regularity while preserving intrinsic Voronoi properties such as watertightness and convexity. Because losses are defined directly on surface vertices, VoroLight supports multimodal shape supervision from implicit fields, point clouds, meshes, and multi--view images. By introducing additional interior generators optimized under a centroidal Voronoi tessellation objective, the framework naturally extends to volumetric Voronoi meshes with consistent surface--interior topology. Across diverse input modalities, VoroLight achieves competitive reconstruction fidelity while producing smoother and more geometrically regular Voronoi surfaces. Project page: https://jiayinlu19960224.github.io/vorolight/

2025-12-15T05:01:59Z Jiayin Lu Ying Jiang Yumeng He Yin Yang Chenfanfu Jiang http://arxiv.org/abs/2509.15130v3 Taming Video Models for 3D and 4D Generation via Zero-Shot Camera Control 2026-03-21T16:47:47Z

Video diffusion models have rich world priors, but their use in spatial tasks is limited by poor control, spatial-temporal inconsistent results, and entangled scene-camera dynamics. Current approaches, such as per-task fine-tuning or post-process warping, often introduce visual artifacts, fail to generalize, or incur high computational costs. We introduce WorldForge, a novel, training-free framework that operates purely at inference time to resolve these issues. Our method comprises three synergistic components. First, an intra-step refinement loop injects fine-grained motion guidance during the denoising process, iteratively correcting the output to ensure strict adherence to the target camera path. Second, an optical flow-based analysis identifies and isolates motion-related channels within the latent space. This allows our framework to selectively apply guidance, thereby decoupling motion from appearance and preserving visual fidelity. Third, a dual-path guidance strategy adaptively corrects for drift by comparing the guided generation against an unguided, reference denoising path, effectively neutralizing artifacts caused by misaligned structural inputs. Together, these components inject precise, trajectory-aligned control without model retraining, achieving accurate motion guidance and photorealistic synthesis. As a plug-and-play, model-agnostic solution, WorldForge demonstrates highly versatile generalizability. Beyond robust zero-shot 3D/4D generation, it readily empowers over a dozen diverse downstream applications, seamlessly enabling tasks like video editing, stabilization, and virtual try-on. Extensive experiments confirm state-of-the-art performance in trajectory adherence and perceptual quality, outperforming both training-dependent and inference-only baselines.

2025-09-18T16:40:47Z Accepted to CVPR 2026. Project Webpage: https://worldforge-agi.github.io/ Chenxi Song Yanming Yang Tong Zhao Ruibo Li Chi Zhang http://arxiv.org/abs/2603.20857v1 Fast and Robust Deformable 3D Gaussian Splatting 2026-03-21T15:24:54Z

3D Gaussian Splatting has demonstrated remarkable real-time rendering capabilities and superior visual quality in novel view synthesis for static scenes. Building upon these advantages, researchers have progressively extended 3D Gaussians to dynamic scene reconstruction. Deformation field-based methods have emerged as a promising approach among various techniques. These methods maintain 3D Gaussian attributes in a canonical field and employ the deformation field to transform this field across temporal sequences. Nevertheless, these approaches frequently encounter challenges such as suboptimal rendering speeds, significant dependence on initial point clouds, and vulnerability to local optima in dim scenes. To overcome these limitations, we present FRoG, an efficient and robust framework for high-quality dynamic scene reconstruction. FRoG integrates per-Gaussian embedding with a coarse-to-fine temporal embedding strategy, accelerating rendering through the early fusion of temporal embeddings. Moreover, to enhance robustness against sparse initializations, we introduce a novel depth- and error-guided sampling strategy. This strategy populates the canonical field with new 3D Gaussians at low-deviation initial positions, significantly reducing the optimization burden on the deformation field and improving detail reconstruction in both static and dynamic regions. Furthermore, by modulating opacity variations, we mitigate the local optima problem in dim scenes, improving color fidelity. Comprehensive experimental results validate that our method achieves accelerated rendering speeds while maintaining state-of-the-art visual quality.

2026-03-21T15:24:54Z Han Jiao Jiakai Sun Lei Zhao Zhanjie Zhang Wei Xing Huaizhong Lin http://arxiv.org/abs/2512.19402v2 Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface 2026-03-21T09:08:03Z

Recent progress in robot learning has been driven by large-scale datasets and powerful visuomotor policy architectures, yet policy robustness remains limited by the substantial cost of collecting diverse demonstrations, particularly for spatial generalization in manipulation tasks. To reduce repetitive data collection, we present Real2Edit2Real, a framework that generates new demonstrations by bridging 3D editability with 2D visual data through a 3D control interface. Our approach first reconstructs scene geometry from multi-view RGB observations with a metric-scale 3D reconstruction model. Based on the reconstructed geometry, we perform depth-reliable 3D editing on point clouds to generate new manipulation trajectories while geometrically correcting the robot poses to recover physically consistent depth, which serves as a reliable condition for synthesizing new demonstrations. Finally, we propose a multi-conditional video generation model guided by depth as the primary control signal, together with action, edge, and ray maps, to synthesize spatially augmented multi-view manipulation videos. Experiments on four real-world manipulation tasks demonstrate that policies trained on data generated from only 1-5 source demonstrations can match or outperform those trained on 50 real-world demonstrations, improving data efficiency by up to 10-50x. Moreover, experimental results on height and texture editing demonstrate the framework's flexibility and extensibility, indicating its potential to serve as a unified data generation framework. Project website is https://real2edit2real.github.io/.

2025-12-22T13:53:25Z Accepted to CVPR 2026 Yujie Zhao Hongwei Fan Di Chen Shengcong Chen Liliang Chen Xiaoqi Li Guanghui Ren Hao Dong http://arxiv.org/abs/2603.20560v1 Nevis Digital Twin: Photogrammetry and Immersive Visualization of Historical Sites 2026-03-20T23:25:51Z

In this work, we present a multimodal data acquisition workflow for the digital preservation and virtual reconstruction of at-risk historical sites in the island of Nevis. Facing threats from coastal erosion, rising sea levels, and aggressive vegetation, the archaeological heritage of Nevis requires documentation strategies that bridge the gap between high-cost professional surveying and consumer accessibility. Experimental test compared acquisition variables, specifically camera height (1m vs. 3m) and operator trajectory against high-resolution control data. Moreover, we explore the virtual reconstruction between mesh reconstruction and 3D gaussian splatting to serve as different modalities for documentation. The resulting data is fused into immersive virtual reality (VR) environments, offering a scalable, non-proprietary model for democratizing digital heritage in the Caribbean.

2026-03-20T23:25:51Z ARCHERIX Workshop - IEEE VR 2026 Alex Apffel Huy Tran Vuthea Chheang http://arxiv.org/abs/2603.20556v1 Towards Extended Reality Intelligence for Monitoring and Predicting Patient Readmission Risks 2026-03-20T23:16:34Z

Hospital readmissions remain a challenge for healthcare systems, especially among patients with chronic conditions such as diabetes. Unplanned readmissions within 30 days are costly, strain hospital resources, and can indicate poor care coordination or discharge planning. In this work, we explore the use of machine learning to predict readmission risk for diabetic inpatients and propose a mixed reality (MR) to provide effective visualization and insights. We trained an XGBoost classifier after data cleaning, encoding, and feature engineering. The model achieved an Area Under the Receiver Operating characteristic Curve (AUROC) of 0.72 and an Area Under the Precision-Recall Curve (AUPRC) of 0.11. Key predictive factors included prior inpatient visits, discharge disposition, and glycemic control indicators such as A1C (blood sugar test) results and medication adjustments. Additionally, we developed an MR prototype that visualize patient records and predictions containing risk level, major contributing factors, and a concise summary of care. Together, the predictive model and the MR interface aim to improve clinician awareness and communication around readmission risk in real-time clinical settings.

2026-03-20T23:16:34Z XR Health Workshop, IEEE VR 2026 Martin Sanchez Nick Tran Vuthea Chheang http://arxiv.org/abs/2603.19234v2 Matryoshka Gaussian Splatting 2026-03-20T10:58:51Z

The ability to render scenes at adjustable fidelity from a single model, known as level of detail (LoD), is crucial for practical deployment of 3D Gaussian Splatting (3DGS). Existing discrete LoD methods expose only a limited set of operating points, while concurrent continuous LoD approaches enable smoother scaling but often suffer noticeable quality degradation at full capacity, making LoD a costly design decision. We introduce Matryoshka Gaussian Splatting (MGS), a training framework that enables continuous LoD for standard 3DGS pipelines without sacrificing full-capacity rendering quality. MGS learns a single ordered set of Gaussians such that rendering any prefix, the first k splats, produces a coherent reconstruction whose fidelity improves smoothly with increasing budget. Our key idea is stochastic budget training: each iteration samples a random splat budget and optimises both the corresponding prefix and the full set. This strategy requires only two forward passes and introduces no architectural modifications. Experiments across four benchmarks and six baselines show that MGS matches the full-capacity performance of its backbone while enabling a continuous speed-quality trade-off from a single model. Extensive ablations on ordering strategies, training objectives, and model capacity further validate the designs.

2026-03-19T17:59:56Z project page: https://zhilinguo.github.io/MGS Zhilin Guo Boqiao Zhang Hakan Aktas Kyle Fogarty Jeffrey Hu Nursena Koprucu Aslan Wenzhao Li Canberk Baykal Albert Miao Josef Bengtson Chenliang Zhou Weihao Xia Cristina Nader Vasconcelos Cengiz Oztireli http://arxiv.org/abs/2603.19753v1 ReLi3D: Relightable Multi-view 3D Reconstruction with Disentangled Illumination 2026-03-20T08:37:39Z

Reconstructing 3D assets from images has long required separate pipelines for geometry reconstruction, material estimation, and illumination recovery, each with distinct limitations and computational overhead. We present ReLi3D, the first unified end-to-end pipeline that simultaneously reconstructs complete 3D geometry, spatially-varying physically-based materials, and environment illumination from sparse multi-view images in under one second. Our key insight is that multi-view constraints can dramatically improve material and illumination disentanglement, a problem that remains fundamentally ill-posed for single-image methods. Key to our approach is the fusion of the multi-view input via a transformer cross-conditioning architecture, followed by a novel unified two-path prediction strategy. The first path predicts the object's structure and appearance, while the second path predicts the environment illumination from image background or object reflections. This, combined with a differentiable Monte Carlo multiple importance sampling renderer, creates an optimal illumination disentanglement training pipeline. In addition, with our mixed domain training protocol, which combines synthetic PBR datasets with real-world RGB captures, we establish generalizable results in geometry, material accuracy, and illumination quality. By unifying previously separate reconstruction tasks into a single feed-forward pass, we enable near-instantaneous generation of complete, relightable 3D assets. Project Page: https://reli3d.jdihlmann.com/

2026-03-20T08:37:39Z Project Page: https://reli3d.jdihlmann.com/ ICLR 2026 Jan-Niklas Dihlmann Mark Boss Simon Donne Andreas Engelhardt Hendrik P. A. Lensch Varun Jampani http://arxiv.org/abs/2506.20703v2 Generative Blocks World: Moving Things Around in Pictures 2026-03-20T03:36:38Z

We describe Generative Blocks World to interact with the scene of a generated image by manipulating simple geometric abstractions. Our method represents scenes as assemblies of convex 3D primitives, and the same scene can be represented by different numbers of primitives, allowing an editor to move either whole structures or small details. Once the scene geometry has been edited, the image is generated by a flow-based method, which is conditioned on depth and a texture hint. Our texture hint takes into account the modified 3D primitives, exceeding the texture-consistency provided by existing techniques. These texture hints (a) allow accurate object and camera moves and (b) preserve the identity of objects. Our experiments demonstrate that our approach outperforms prior works in visual fidelity, editability, and compositional generalization.

2025-06-25T17:59:55Z ICLR 2026 34 pages, 25 figures, 4 tables Vaibhav Vavilala Seemandhar Jain Rahul Vasanth D. A. Forsyth Anand Bhattad http://arxiv.org/abs/2603.20310v1 GraphiContact: Pose-aware Human-Scene Robust Contact Perception for Interactive Systems 2026-03-19T17:17:04Z

Monocular vertex-level human-scene contact prediction is a fundamental capability for interactive systems such as assistive monitoring, embodied AI, and rehabilitation analysis. In this work, we study this task jointly with single-image 3D human mesh reconstruction, using reconstructed body geometry as a scaffold for contact reasoning. Existing approaches either focus on contact prediction without sufficiently exploiting explicit 3D human priors, or emphasize pose/mesh reconstruction without directly optimizing robust vertex-level contact inference under occlusion and perceptual noise. To address this gap, we propose GraphiContact, a pose-aware framework that transfers complementary human priors from two pretrained Transformer encoders and predicts per-vertex human-scene contact on the reconstructed mesh. To improve robustness in real-world scenarios, we further introduce a Single-Image Multi-Infer Uncertainty (SIMU) training strategy with token-level adaptive routing, which simulates occlusion and noisy observations during training while preserving efficient single-branch inference at test time. Experiments on five benchmark datasets show that GraphiContact achieves consistent gains on both contact prediction and 3D human reconstruction. Our code, based on the GraphiContact method, provides comprehensive 3D human reconstruction and interaction analysis, and will be publicly available at https://github.com/Aveiro-Lin/GraphiContact.

2026-03-19T17:17:04Z 15 pages, 9 figures, Accepted at ICME 2026 Xiaojian Lin Yaomin Shen Junyuan Ma Yujie Sun Chengqing Bu Wenxin Zhang Zongzheng Zhang Hao Fei Lei Jin Hao Zhao