https://arxiv.org/api/mgaSCqRgQAcRaBBfF3b/9Zlykh0 2026-06-17T22:01:01Z 9346 795 15 http://arxiv.org/abs/2601.14207v2 Copy-Trasform-Paste: Zero-Shot Object-Object Alignment Guided by Vision-Language and Geometric Constraints 2026-02-28T06:46:28Z We study zero-shot 3D alignment of two given meshes, using a text prompt describing their spatial relation -- an essential capability for content creation and scene assembly. Earlier approaches primarily rely on geometric alignment procedures, while recent work leverages pretrained 2D diffusion models to model language-conditioned object-object spatial relationships. In contrast, we directly optimize the relative pose at test time, updating translation, rotation, and isotropic scale with CLIP-driven gradients via a differentiable renderer, without training a new model. Our framework augments language supervision with geometry-aware objectives: a variant of soft-Iterative Closest Point (ICP) term to encourage surface attachment and a penetration loss to discourage interpenetration. A phased schedule strengthens contact constraints over time, and camera control concentrates the optimization on the interaction region. To enable evaluation, we curate a benchmark containing diverse categories and relations, and compare against baselines. Our method outperforms all alternatives, yielding semantically faithful and physically plausible alignments. 2026-01-20T18:12:55Z GitHub Page: https://rotemgat.github.io/CopyTransformPaste/ Rotem Gatenyo Ohad Fried http://arxiv.org/abs/2603.00413v1 DiffTrans: Differentiable Geometry-Materials Decomposition for Reconstructing Transparent Objects 2026-02-28T02:21:31Z Reconstructing transparent objects from a set of multi-view images is a challenging task due to the complicated nature and indeterminate behavior of light propagation. Typical methods are primarily tailored to specific scenarios, such as objects following a uniform topology, exhibiting ideal transparency and surface specular reflections, or with only surface materials, which substantially constrains their practical applicability in real-world settings. In this work, we propose a differentiable rendering framework for transparent objects, dubbed DiffTrans, which allows for efficient decomposition and reconstruction of the geometry and materials of transparent objects, thereby reconstructing transparent objects accurately in intricate scenes with diverse topology and complex texture. Specifically, we first utilize FlexiCubes with dilation and smoothness regularization as the iso-surface representation to reconstruct an initial geometry efficiently from the multi-view object silhouette. Meanwhile, we employ the environment light radiance field to recover the environment of the scene. Then we devise a recursive differentiable ray tracer to further optimize the geometry, index of refraction and absorption rate simultaneously in a unified and end-to-end manner, leading to high-quality reconstruction of transparent objects in intricate scenes. A prominent advantage of the designed ray tracer is that it can be implemented in CUDA, enabling a significantly reduced computational cost. Extensive experiments on multiple benchmarks demonstrate the superior reconstruction performance of our DiffTrans compared with other methods, especially in intricate scenes involving transparent objects with diverse topology and complex texture. The code is available at https://github.com/lcp29/DiffTrans. 2026-02-28T02:21:31Z Changpu Li Shuang Wu Songlin Tang Guangming Lu Jun Yu Wenjie Pei http://arxiv.org/abs/2505.12734v2 SounDiT: Geo-Contextual Soundscape-to-Landscape Generation 2026-02-27T23:27:56Z Recent audio-to-image models have shown impressive performance in generating images of specific objects conditioned on their corresponding sounds. However, these models fail to reconstruct real-world landscapes conditioned on environmental soundscapes. To address this gap, we present Geo-contextual Soundscape-to-Landscape (GeoS2L) generation, a novel and practically significant task that aims to synthesize geographically realistic landscape images from environmental soundscapes. To support this task, we construct two large-scale geo-contextual multi-modal datasets, SoundingSVI and SonicUrban, which pair diverse environmental soundscapes with real-world landscape images. We propose SounDiT, a diffusion transformer (DiT)-based model that incorporates environmental soundscapes and geo-contextual scene conditioning to synthesize geographically coherent landscape images. Furthermore, we propose the Place Similarity Score (PSS), a practically-informed geo-contextual evaluation framework to measure consistency between input soundscapes and generated landscape images. Extensive experiments demonstrate that SounDiT outperforms existing baselines in the GeoS2L, while the PSS effectively captures multi-level generation consistency across element, scene,and human perception. Project page: https://gisense.github.io/SounDiT-Page/ 2025-05-19T05:47:13Z 12 pages, 4 figures Junbo Wang Haofeng Tan Bowen Liao Albert Jiang Teng Fei Qixing Huang Bing Zhou Zhengzhong Tu Shan Ye Yuhao Kang http://arxiv.org/abs/2603.00292v1 Ray Tracing using HIP 2026-02-27T20:21:08Z In this technical report, we introduce the basics of ray tracing and explain how to accelerate the computation of the rendering algorithm in HIP. We also show how to use a HIP ray tracing framework - HIPRT, leveraging hardware ray tracing features of AMD GPUs. We conclude this technical report with a list of references for further reading. 2026-02-27T20:21:08Z Atsushi Yoshimura Kenta Eto Daniel Meister Takahiro Harada http://arxiv.org/abs/2509.25094v3 Unsupervised Representation Learning for 3D Mesh Parameterization with Semantic and Visibility Objectives 2026-02-27T18:42:50Z Recent 3D generative models produce high-quality textures for 3D mesh objects. However, they commonly rely on the heavy assumption that input 3D meshes are accompanied by manual mesh parameterization (UV mapping), a manual task that requires both technical precision and artistic judgment. Industry surveys show that this process often accounts for a significant share of asset creation, creating a major bottleneck for 3D content creators. Moreover, existing automatic methods often ignore two perceptually important criteria: (1) semantic awareness (UV charts should align semantically similar 3D parts across shapes) and (2) visibility awareness (cutting seams should lie in regions unlikely to be seen). To overcome these shortcomings and to automate the mesh parameterization process, we present an unsupervised differentiable framework that augments standard geometry-preserving UV learning with semantic- and visibility-aware objectives. For semantic-awareness, our pipeline (i) segments the mesh into semantic 3D parts, (ii) applies an unsupervised learned per-part UV-parameterization backbone, and (iii) aggregates per-part charts into a unified UV atlas. For visibility-awareness, we use ambient occlusion (AO) as an exposure proxy and back-propagate a soft differentiable AO-weighted seam objective to steer cutting seams toward occluded regions. By conducting qualitative and quantitative evaluations against state-of-the-art methods, we show that the proposed method produces UV atlases that better support texture generation and reduce perceptible seam artifacts compared to recent baselines. Our implementation code is publicly available at: https://github.com/AHHHZ975/Semantic-Visibility-UV-Param. 2025-09-29T17:28:58Z AmirHossein Zamani Bruno Roy Arianna Rampini http://arxiv.org/abs/2602.24224v1 Random-Forest-Induced Graph Neural Networks for Tabular Learning 2026-02-27T17:51:18Z Graphs are essential for modeling complex relationships and capturing structured interactions in data. Graph Neural Networks (GNNs) are particularly effective when such relational structure is explicitly available, but many real-world datasets, most notably tabular data, lack an inherent graph representation. To address this limitation, we propose RF-GNN, a framework that constructs instance-level graphs from tabular data using proximity measures induced by random forests. These proximities capture nonlinear feature interactions and data-adaptive similarity without imposing restrictive assumptions on feature geometry. The resulting graphs enable the direct application of GNNs to tabular learning problems. Extensive experiments on 36 benchmark datasets demonstrate that RF-GNN consistently outperforms strong classical baselines and recent graph-construction methods in terms of weighted F1-score. Additional ablation studies highlight the impact of proximity design choices and graph construction settings. 2026-02-27T17:51:18Z Haozhe Chen Soheila Farokhi Kelvyn Bladen Hamid Karimi Kevin R. Moon http://arxiv.org/abs/2508.06316v2 The Beauty of Anisotropic Mesh Refinement: Omnitrees for Efficient Dyadic Discretizations 2026-02-27T06:27:22Z Structured adaptive mesh refinement (AMR), commonly implemented via quadtrees and octrees, underpins a wide range of applications including databases, computer graphics, physics simulations, and machine learning. However, octrees enforce isotropic refinement in regions of interest, which can be especially inefficient for problems that are intrinsically anisotropic--much resolution is spent where little information is gained. This paper presents omnitrees as an anisotropic generalization of octrees and related data structures. Omnitrees allow to refine only the locally most important dimensions, providing tree structures that are less deep than bintrees and less wide than octrees. As a result, the convergence of the AMR schemes can be increased by up to a factor of the dimensionality d for very anisotropic problems, quickly offsetting their modest increase in storage overhead. We validate this finding on the problem of binary shape representation across 4,166 three-dimensional objects: Omnitrees increase the mean convergence rate by 1.5x, require less storage to achieve equivalent error bounds, and maximize the information density of the stored function faster than octrees. These advantages are projected to be even stronger for higher-dimensional problems. We provide a first validation by introducing a time-dependent rotation to create four-dimensional representations, and discuss the properties of their 4-d octree and omnitree approximations. Overall, omnitree discretizations can make existing AMR approaches more efficient, and open up new possibilities for high-dimensional applications. 2025-08-08T13:42:59Z contains pdf animations; we recommend Okular or Firefox for viewing Theresa Pollinger Masado Ishii Jens Domke http://arxiv.org/abs/2602.23660v1 Assessment of Display Performance and Comparative Evaluation of Web Map Libraries for Extensive 3D Geospatial Data 2026-02-27T04:01:18Z Large-scale 3D geospatial data visualization has become increasingly critical for the development of the digital society infrastructure in Japan. This study conducted a comprehensive performance evaluation of two major WebGL-based web mapping libraries, CesiumJS and MapLibre GL JS, using large-scale 3D point-cloud data from the VIRTUAL SHIZUOKA and PLATEAU building models. The research employs standardized 3D Tiles 1.1, and Mapbox Vector Tiles (MVT) formats, comparing performance across different data scales (2nd and 3rd grid levels) using Core Web Vitals metrics, including First Contentful Paint (FCP), Largest Contentful Paint (LCP), Speed Index, Total Blocking Time (TBT), and Cumulative Layout Shift (CLS). The results demonstrate that MVT-based building visualization with MapLibre GL JS achieves optimal performance (FCP 0.8s, TBT 0ms), whereas MapLibre GL JS combined with deck.gl shows superior performance for large-scale point cloud processing (TBT: 3ms, CesiumJS: 21,357ms). This study provides data-driven selection guidelines for appropriate technology choices according to use cases, establishing reproducible performance evaluation frameworks for 3D web mapping technologies during the WebGPU and OGC 3D Tiles 1.1 standardization era. 2026-02-27T04:01:18Z 6 pages, 5 figures, 1 table Toshikazu Seto Yohei Shiwaku Takayuki Miyauchi Daisuke Yoshida Yuichiro Nishimura http://arxiv.org/abs/2510.12768v3 Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction 2026-02-27T02:48:13Z Reconstructing dynamic 3D scenes from monocular input is fundamentally under-constrained, with ambiguities arising from occlusion and extreme novel views. While dynamic Gaussian Splatting offers an efficient representation, vanilla models optimize all Gaussian primitives uniformly, ignoring whether they are well or poorly observed. This limitation leads to motion drifts under occlusion and degraded synthesis when extrapolating to unseen views. We argue that uncertainty matters: Gaussians with recurring observations across views and time act as reliable anchors to guide motion, whereas those with limited visibility are treated as less reliable. To this end, we introduce USplat4D, a novel Uncertainty-aware dynamic Gaussian Splatting framework that propagates reliable motion cues to enhance 4D reconstruction. Our approach estimates time-varying per-Gaussian uncertainty and leverages it to construct a spatio-temporal graph for uncertainty-aware optimization. Experiments on diverse real and synthetic datasets show that explicitly modeling uncertainty consistently improves dynamic Gaussian Splatting models, yielding more stable geometry under occlusion and high-quality synthesis at extreme viewpoints. 2025-10-14T17:47:11Z Accepted to ICLR 2026. Project page: https://tamu-visual-ai.github.io/usplat4d/ Fengzhi Guo Chih-Chuan Hsu Sihao Ding Cheng Zhang http://arxiv.org/abs/2510.03312v3 Universal Beta Splatting 2026-02-26T22:20:36Z We introduce Universal Beta Splatting (UBS), a unified framework that generalizes 3D Gaussian Splatting to N-dimensional anisotropic Beta kernels for explicit radiance field rendering. Unlike fixed Gaussian primitives, Beta kernels enable controllable dependency modeling across spatial, angular, and temporal dimensions within a single representation. Our unified approach captures complex light transport effects, handles anisotropic view-dependent appearance, and models scene dynamics without requiring auxiliary networks or specific color encodings. UBS maintains backward compatibility by approximating to Gaussian Splatting as a special case, guaranteeing plug-in usability and lower performance bounds. The learned Beta parameters naturally decompose scene properties into interpretable without explicit supervision: spatial (surface vs. texture), angular (diffuse vs. specular), and temporal (static vs. dynamic). Our CUDA-accelerated implementation achieves real-time rendering while consistently outperforming existing methods across static, view-dependent, and dynamic benchmarks, establishing Beta kernels as a scalable universal primitive for radiance field rendering. Our project website is available at https://rongliu-leo.github.io/universal-beta-splatting/. 2025-09-30T22:03:22Z ICLR 2026 Rong Liu Zhongpai Gao Benjamin Planche Meida Chen Van Nguyen Nguyen Meng Zheng Anwesa Choudhuri Terrence Chen Yue Wang Andrew Feng Ziyan Wu http://arxiv.org/abs/2409.02108v3 Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era 2026-02-26T12:11:30Z Shadows, formed by the occlusion of light, play an essential role in visual perception and directly influence scene understanding, image quality, and visual realism. This paper presents a unified survey and benchmark of deep-learning-based shadow detection, removal, and generation across images and videos. We introduce consistent taxonomies for architectures, supervision strategies, and learning paradigms; review major datasets and evaluation protocols; and re-train representative methods under standardized settings to enable fair comparison. Our benchmark reveals key findings, including inconsistencies in prior reports, strong dependence on model design and resolution, and limited cross-dataset generalization due to dataset bias. By synthesizing insights across the three tasks, we highlight shared illumination cues and priors that connect detection, removal, and generation. We further outline future directions involving unified all-in-one frameworks, semantics- and geometry-aware reasoning, shadow-based AIGC authenticity analysis, and the integration of physics-guided priors into multimodal foundation models. Corrected datasets, trained models, and evaluation tools are released to support reproducible research. 2024-09-03T17:59:05Z Accepted by International Journal of Computer Vision (IJCV). Publicly available results, trained models, and evaluation metrics at https://github.com/xw-hu/Unveiling-Deep-Shadows International Journal of Computer Vision (IJCV), vol.134, article 158, 2026 Xiaowei Hu Zhenghao Xing Tianyu Wang Chi-Wing Fu Pheng-Ann Heng 10.1007/s11263-026-02744-z http://arxiv.org/abs/2406.09293v4 StableMaterials: Enhancing Diversity in Material Generation via Semi-Supervised Learning 2026-02-26T08:31:53Z We introduce StableMaterials, a novel approach for generating photorealistic physical-based rendering (PBR) materials that integrate semi-supervised learning with Latent Diffusion Models (LDMs). Our method employs adversarial training to distill knowledge from existing large-scale image generation models, minimizing the reliance on annotated data and enhancing the diversity in generation. This distillation approach aligns the distribution of the generated materials with that of image textures from an SDXL model, enabling the generation of novel materials that are not present in the initial training dataset. Furthermore, we employ a diffusion-based refiner model to improve the visual quality of the samples and achieve high-resolution generation. Finally, we distill a latent consistency model for fast generation in just four steps and propose a new tileability technique that removes visual artifacts typically associated with fewer diffusion steps. We detail the architecture and training process of StableMaterials, the integration of semi-supervised training within existing LDM frameworks and show the advantages of our approach. Comparative evaluations with state-of-the-art methods show the effectiveness of StableMaterials, highlighting its potential applications in computer graphics and beyond. StableMaterials is publicly available at https://gvecchio.com/stablematerials. 2024-06-13T16:29:46Z Giuseppe Vecchio http://arxiv.org/abs/2508.05115v2 RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer 2026-02-26T08:08:56Z Audio-driven portrait animation aims to synthesize realistic and natural talking head videos from an input audio signal and a single reference image. While existing methods achieve high-quality results by leveraging high-dimensional intermediate representations and explicitly modeling motion dynamics, their computational complexity renders them unsuitable for real-time deployment. Real-time inference imposes stringent latency and memory constraints, often necessitating the use of highly compressed latent representations. However, operating in such compact spaces hinders the preservation of fine-grained spatiotemporal details, thereby complicating audio-visual synchronization RAP (Real-time Audio-driven Portrait animation), a unified framework for generating high-quality talking portraits under real-time constraints. Specifically, RAP introduces a hybrid attention mechanism for fine-grained audio control, and a static-dynamic training-inference paradigm that avoids explicit motion supervision. Through these techniques, RAP achieves precise audio-driven control, mitigates long-term temporal drift, and maintains high visual fidelity. Extensive experiments demonstrate that RAP achieves state-of-the-art performance while operating under real-time constraints. 2025-08-07T07:47:16Z 11 pages, 9 figures Fangyu Du Taiqing Li Qian Qiao Tan Yu Ziwei Zhang Dingcheng Zhen Xu Jia Yang Yang Shunshun Yin Siyuan Liu http://arxiv.org/abs/2602.22701v1 BRepMAE: Self-Supervised Masked BRep Autoencoders for Machining Feature Recognition 2026-02-26T07:22:32Z We propose a masked self-supervised learning framework, called BRepMAE, for automatically extracting a valuable representation of the input computer-aided design (CAD) model to recognize its machining features. Representation learning is conducted on a large-scale, unlabeled CAD model dataset using the geometric Attributed Adjacency Graph (gAAG) representation, derived from the boundary representation (BRep). The self-supervised network is a masked graph autoencoder (MAE) that focuses on reconstructing geometries and attributes of BRep facets, rather than graph structures. After pre-training, we fine-tune a network that contains both the encoder and a task-specific classification network for machining feature recognition (MFR). In the experiments, our fine-tuned network achieves high recognition rates with only a small amount of data (e.g., 0.1% of the training data), significantly enhancing its practicality in real-world (or private) scenarios where only limited data is available. Compared with other MFR methods, our fine-tuned network achieves a significant improvement in recognition rate with the same amount of training data, especially when the number of training samples is limited. 2026-02-26T07:22:32Z 16 pages Can Yao Kang Wu Zuheng Zheng Siyuan Xing Xiao-Ming Fu http://arxiv.org/abs/2411.15468v2 SplatSDF: Boosting SDF-NeRF via Architecture-Level Fusion with Gaussian Splats 2026-02-26T05:29:03Z Signed distance-radiance field (SDF-NeRF) is a promising environment representation that offers both photo-realistic rendering and geometric reasoning such as proximity queries for collision avoidance. However, the slow training speed and convergence of SDF-NeRF hinder their use in practical robotic systems. We propose SplatSDF, a novel SDF-NeRF architecture that accelerates convergence using 3D Gaussian splats (3DGS), which can be quickly pre-trained. Unlike prior approaches that introduce a consistency loss between separate 3DGS and SDF-NeRF models, SplatSDF directly fuses 3DGS at an architectural level by consuming it as an input to SDF-NeRF during training. This is achieved using a novel sparse 3DGS fusion strategy that injects neural embeddings of 3DGS into SDF-NeRF around the object surface, while also permitting inference without 3DGS for minimal operation. Experimental results show SplatSDF achieves 3X faster convergence to the same geometric accuracy than the best baseline, and outperforms state-of-the-art SDF-NeRF methods in terms of chamfer distance and peak signal to noise ratio, unlike consistency loss-based approaches that in fact provide limited gains. We also present computational techniques for accelerating gradient and Hessian steps by 3X. We expect these improvements will contribute to deploying SDF-NeRF on practical systems. 2024-11-23T06:35:19Z Runfa Blark Li Keito Suzuki Bang Du Ki Myung Brian Lee Nikolay Atanasov Truong Nguyen