https://arxiv.org/api/rJiVAhUKLswAPzKVdzmfd959M0s 2026-07-01T06:38:00Z 9421 2070 15 http://arxiv.org/abs/2503.18682v2 Hardware-Rasterized Ray-Based Gaussian Splatting 2025-06-17T09:31:20Z We present a novel, hardware rasterized rendering approach for ray-based 3D Gaussian Splatting (RayGS), obtaining both fast and high-quality results for novel view synthesis. Our work contains a mathematically rigorous and geometrically intuitive derivation about how to efficiently estimate all relevant quantities for rendering RayGS models, structured with respect to standard hardware rasterization shaders. Our solution is the first enabling rendering RayGS models at sufficiently high frame rates to support quality-sensitive applications like Virtual and Mixed Reality. Our second contribution enables alias-free rendering for RayGS, by addressing MIP-related issues arising when rendering diverging scales during training and testing. We demonstrate significant performance gains, across different benchmark scenes, while retaining state-of-the-art appearance quality of RayGS. 2025-03-24T13:53:30Z Samuel Rota Bulò Nemanja Bartolovic Lorenzo Porzi Peter Kontschieder http://arxiv.org/abs/2503.12553v2 Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View 2025-06-17T05:47:02Z Recent advances in single-view 3D scene reconstruction have highlighted the challenges in capturing fine geometric details and ensuring structural consistency, particularly in high-fidelity outdoor scene modeling. This paper presents Niagara, a new single-view 3D scene reconstruction framework that can faithfully reconstruct challenging outdoor scenes from a single input image for the first time. Our approach integrates monocular depth and normal estimation as input, which substantially improves its ability to capture fine details, mitigating common issues like geometric detail loss and deformation. Additionally, we introduce a geometric affine field (GAF) and 3D self-attention as geometry-constraint, which combines the structural properties of explicit geometry with the adaptability of implicit feature fields, striking a balance between efficient rendering and high-fidelity reconstruction. Our framework finally proposes a specialized encoder-decoder architecture, where a depth-based 3D Gaussian decoder is proposed to predict 3D Gaussian parameters, which can be used for novel view synthesis. Extensive results and analyses suggest that our Niagara surpasses prior SoTA approaches such as Flash3D in both single-view and dual-view settings, significantly enhancing the geometric accuracy and visual fidelity, especially in outdoor scenes. 2025-03-16T15:50:18Z Xianzu Wu Zhenxin Ai Harry Yang Ser-Nam Lim Jun Liu Huan Wang http://arxiv.org/abs/2506.14104v1 Innovating China's Intangible Cultural Heritage with DeepSeek + MidJourney: The Case of Yangliuqing theme Woodblock Prints 2025-06-17T01:47:17Z Yangliuqing woodblock prints, a cornerstone of China's intangible cultural heritage, are celebrated for their intricate designs and vibrant colors. However, preserving these traditional art forms while fostering innovation presents significant challenges. This study explores the DeepSeek + MidJourney approach to generating creative, themed Yangliuqing woodblock prints focused on the fight against COVID-19 and depicting joyous winners. Using Fréchet Inception Distance (FID) scores for evaluation, the method that combined DeepSeek-generated thematic prompts, MidJourney-generated thematic images, original Yangliuqing prints, and DeepSeek-generated key prompts in MidJourney-generated outputs achieved the lowest mean FID score (150.2) with minimal variability (σ = 4.9). Additionally, feedback from 62 participants, collected via questionnaires, confirmed that this hybrid approach produced the most representative results. Moreover, the questionnaire data revealed that participants demonstrated the highest willingness to promote traditional culture and the strongest interest in consuming the AI-generated images produced through this method. These findings underscore the effectiveness of an innovative approach that seamlessly blends traditional artistic elements with modern AI-driven creativity, ensuring both cultural preservation and contemporary relevance. 2025-06-17T01:47:17Z RuiKun Yang ZhongLiang Wei Longdi Xian http://arxiv.org/abs/2506.09665v2 VideoMat: Extracting PBR Materials from Video Diffusion Models 2025-06-16T12:02:05Z We leverage finetuned video diffusion models, intrinsic decomposition of videos, and physically-based differentiable rendering to generate high quality materials for 3D models given a text prompt or a single image. We condition a video diffusion model to respect the input geometry and lighting condition. This model produces multiple views of a given 3D model with coherent material properties. Secondly, we use a recent model to extract intrinsics (base color, roughness, metallic) from the generated video. Finally, we use the intrinsics alongside the generated video in a differentiable path tracer to robustly extract PBR materials directly compatible with common content creation tools. 2025-06-11T12:36:49Z Project website: https://nvlabs.github.io/videomat/ Jacob Munkberg Zian Wang Ruofan Liang Tianchang Shen Jon Hasselgren http://arxiv.org/abs/2506.13827v1 Balancing Preservation and Modification: A Region and Semantic Aware Metric for Instruction-Based Image Editing 2025-06-15T17:12:57Z Instruction-based image editing, which aims to modify the image faithfully according to the instruction while preserving irrelevant content unchanged, has made significant progress. However, there still lacks a comprehensive metric for assessing the editing quality. Existing metrics either require high human evaluation costs, which hinder large-scale evaluation, or are adapted from other tasks and lose task-specific concerns, failing to comprehensively evaluate both instruction-based modification and preservation of irrelevant regions, resulting in biased evaluation. To tackle this, we introduce a new metric called Balancing Preservation and Modification (BPM), tailored for instruction-based image editing by explicitly disentangling the image into editing-relevant and irrelevant regions for specific consideration. We first identify and locate editing-relevant regions, followed by a two-tier process to assess editing quality: Region-Aware Judge evaluates whether the position and size of the edited region align with the instruction, and Semantic-Aware Judge further assesses the instruction content compliance within editing-relevant regions as well as content preservation within irrelevant regions, yielding comprehensive and interpretable quality assessment. Moreover, the editing-relevant region localization in BPM can be integrated into image editing approaches to improve editing quality, demonstrating its broad applicability. We verify the effectiveness of the BPM metric on comprehensive instruction-editing data, and the results show the highest alignment with human evaluation compared to existing metrics, indicating its efficacy. Code is available at: https://joyli-x.github.io/BPM/ 2025-06-15T17:12:57Z Zhuoying Li Zhu Xu Yuxin Peng Yang Liu http://arxiv.org/abs/2506.12847v1 iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer 2025-06-15T13:41:43Z Digital human video generation is gaining traction in fields like education and e-commerce, driven by advancements in head-body animation and lip-syncing technologies. However, realistic Hand-Object Interaction (HOI) - the complex dynamics between human hands and objects - continues to pose challenges. Generating natural and believable HOI reenactments is difficult due to issues such as occlusion between hands and objects, variations in object shapes and orientations, and the necessity for precise physical interactions, and importantly, the ability to generalize to unseen humans and objects. This paper presents a novel framework iDiT-HOI that enables in-the-wild HOI reenactment generation. Specifically, we propose a unified inpainting-based token process method, called Inp-TPU, with a two-stage video diffusion transformer (DiT) model. The first stage generates a key frame by inserting the designated object into the hand region, providing a reference for subsequent frames. The second stage ensures temporal coherence and fluidity in hand-object interactions. The key contribution of our method is to reuse the pretrained model's context perception capabilities without introducing additional parameters, enabling strong generalization to unseen objects and scenarios, and our proposed paradigm naturally supports long video generation. Comprehensive evaluations demonstrate that our approach outperforms existing methods, particularly in challenging real-world scenes, offering enhanced realism and more seamless hand-object interactions. 2025-06-15T13:41:43Z Technical report, 12 pages Zhelun Shen Chenming Wu Junsheng Zhou Chen Zhao Kaisiyuan Wang Hang Zhou Yingying Li Haocheng Feng Wei He Jingdong Wang http://arxiv.org/abs/2506.13814v1 ReFrame: Layer Caching for Accelerated Inference in Real-Time Rendering 2025-06-14T20:17:43Z Graphics rendering applications increasingly leverage neural networks in tasks such as denoising, supersampling, and frame extrapolation to improve image quality while maintaining frame rates. The temporal coherence inherent in these tasks presents an opportunity to reuse intermediate results from previous frames and avoid redundant computations. Recent work has shown that caching intermediate features to be reused in subsequent inferences is an effective method to reduce latency in diffusion models. We extend this idea to real-time rendering and present ReFrame, which explores different caching policies to optimize trade-offs between quality and performance in rendering workloads. ReFrame can be applied to a variety of encoder-decoder style networks commonly found in rendering pipelines. Experimental results show that we achieve 1.4x speedup on average with negligible quality loss in three real-time rendering tasks. Code available: https://ubc-aamodt-group.github.io/reframe-layer-caching/ 2025-06-14T20:17:43Z Published at ICML 2025 Lufei Liu Tor M. Aamodt http://arxiv.org/abs/2506.05268v2 Uniform Sampling of Surfaces by Casting Rays 2025-06-13T16:21:09Z Randomly sampling points on surfaces is an essential operation in geometry processing. This sampling is computationally straightforward on explicit meshes, but it is much more difficult on other shape representations, such as widely-used implicit surfaces. This work studies a simple and general scheme for sampling points on a surface, which is derived from a connection to the intersections of random rays with the surface. Concretely, given a subroutine to cast a ray against a surface and find all intersections, we can use that subroutine to uniformly sample white noise points on the surface. This approach is particularly effective in the context of implicit signed distance functions, where sphere marching allows us to efficiently cast rays and sample points, without needing to extract an intermediate mesh. We analyze the basic method to show that it guarantees uniformity, and find experimentally that it is significantly more efficient than alternative strategies on a variety of representations. Furthermore, we show extensions to blue noise sampling and stratified sampling, and applications to deform neural implicit surfaces as well as moment estimation. 2025-06-05T17:26:48Z 15 pages, 17 figures, Symposium on Geometry Processing 2025 Selena Ling Abhishek Madan Nicholas Sharp Alec Jacobson http://arxiv.org/abs/2506.15821v1 VEIGAR: View-consistent Explicit Inpainting and Geometry Alignment for 3D object Removal 2025-06-13T11:31:44Z Recent advances in Novel View Synthesis (NVS) and 3D generation have significantly improved editing tasks, with a primary emphasis on maintaining cross-view consistency throughout the generative process. Contemporary methods typically address this challenge using a dual-strategy framework: performing consistent 2D inpainting across all views guided by embedded priors either explicitly in pixel space or implicitly in latent space; and conducting 3D reconstruction with additional consistency guidance. Previous strategies, in particular, often require an initial 3D reconstruction phase to establish geometric structure, introducing considerable computational overhead. Even with the added cost, the resulting reconstruction quality often remains suboptimal. In this paper, we present VEIGAR, a computationally efficient framework that outperforms existing methods without relying on an initial reconstruction phase. VEIGAR leverages a lightweight foundation model to reliably align priors explicitly in the pixel space. In addition, we introduce a novel supervision strategy based on scale-invariant depth loss, which removes the need for traditional scale-and-shift operations in monocular depth regularization. Through extensive experimentation, VEIGAR establishes a new state-of-the-art benchmark in reconstruction quality and cross-view consistency, while achieving a threefold reduction in training time compared to the fastest existing method, highlighting its superior balance of efficiency and effectiveness. 2025-06-13T11:31:44Z Pham Khai Nguyen Do Bao Nguyen Tran Nam Nguyen Duc Dung Nguyen http://arxiv.org/abs/2506.11546v1 CGVQM+D: Computer Graphics Video Quality Metric and Dataset 2025-06-13T07:59:55Z While existing video and image quality datasets have extensively studied natural videos and traditional distortions, the perception of synthetic content and modern rendering artifacts remains underexplored. We present a novel video quality dataset focused on distortions introduced by advanced rendering techniques, including neural supersampling, novel-view synthesis, path tracing, neural denoising, frame interpolation, and variable rate shading. Our evaluations show that existing full-reference quality metrics perform sub-optimally on these distortions, with a maximum Pearson correlation of 0.78. Additionally, we find that the feature space of pre-trained 3D CNNs aligns strongly with human perception of visual quality. We propose CGVQM, a full-reference video quality metric that significantly outperforms existing metrics while generating both per-pixel error maps and global quality scores. Our dataset and metric implementation is available at https://github.com/IntelLabs/CGVQM. 2025-06-13T07:59:55Z Akshay Jindal Nabil Sadaka Manu Mathew Thomas Anton Sochenov Anton Kaplanyan http://arxiv.org/abs/2506.11510v1 Adaptive Tetrahedral Grids for Volumetric Path-Tracing 2025-06-13T07:10:41Z We advertise the use of tetrahedral grids constructed via the longest edge bisection algorithm for rendering volumetric data with path tracing. The key benefits of such grids is two-fold. First, they provide a highly adaptive space-partitioning representation that limits the memory footprint of volumetric assets. Second, each (tetrahedral) cell has exactly 4 neighbors within the volume (one per face of each tetrahedron) or less at boundaries. We leverage these properties to devise optimized algorithms and data-structures to compute and path-trace adaptive tetrahedral grids on the GPU. In practice, our GPU implementation outperforms regular grids by up to x30 and renders production assets in real time at 32 samples per pixel. 2025-06-13T07:10:41Z Anis Benyoub Jonathan Dupuy 10.1145/3721239.3734093 http://arxiv.org/abs/2506.11273v1 On Ray Reordering Techniques for Faster GPU Ray Tracing 2025-06-12T20:28:46Z We study ray reordering as a tool for increasing the performance of existing GPU ray tracing implementations. We focus on ray reordering that is fully agnostic to the particular trace kernel. We summarize the existing methods for computing the ray sorting keys and discuss their properties. We propose a novel modification of a previously proposed method using the termination point estimation that is well-suited to tracing secondary rays. We evaluate the ray reordering techniques in the context of the wavefront path tracing using the RTX trace kernels. We show that ray reordering yields significantly higher trace speed on recent GPUs (1.3-2.0x), but to recover the reordering overhead in the hardware-accelerated trace phase is problematic. 2025-06-12T20:28:46Z Daniel Meister Jakub Bokšanský Michael Guthe Jiří Bittner 10.1145/3384382.3384534 http://arxiv.org/abs/2402.00652v4 Robust Construction of Polycube Segmentations via Dual Loops 2025-06-12T15:01:20Z Polycube segmentations for 3D models effectively support a wide variety of applications such as seamless texture mapping, spline fitting, structured multi-block grid generation, and hexahedral mesh construction. However, the automated construction of valid polycube segmentations suffers from robustness issues: state-of-the-art methods are not guaranteed to find a valid solution. In this paper we present DualCube: an iterative algorithm which is guaranteed to return a valid polycube segmentation for 3D models of any genus. Our algorithm is based on a dual representation of polycubes. Starting from an initial simple polycube of the correct genus, together with the corresponding dual loop structure and polycube segmentation, we iteratively refine the polycube, loop structure, and segmentation, while maintaining the correctness of the solution. DualCube is robust by construction: at any point during the iterative process the current segmentation is valid. Its iterative nature furthermore facilitates a seamless trade-off between quality and complexity of the solution. DualCube can be implemented using comparatively simple algorithmic building blocks; our experimental evaluation establishes that the quality of our polycube segmentations is on par with, or exceeding, the state-of-the-art. 2024-02-01T15:13:14Z Computer Graphics Forum (Symposium on Geometry Processing 2025) Maxim Snoep Bettina Speckmann Kevin Verbeek 10.1111/cgf.70195 http://arxiv.org/abs/2506.10580v1 Transformer IMU Calibrator: Dynamic On-body IMU Calibration for Inertial Motion Capture 2025-06-12T11:18:40Z In this paper, we propose a novel dynamic calibration method for sparse inertial motion capture systems, which is the first to break the restrictive absolute static assumption in IMU calibration, i.e., the coordinate drift RG'G and measurement offset RBS remain constant during the entire motion, thereby significantly expanding their application scenarios. Specifically, we achieve real-time estimation of RG'G and RBS under two relaxed assumptions: i) the matrices change negligibly in a short time window; ii) the human movements/IMU readings are diverse in such a time window. Intuitively, the first assumption reduces the number of candidate matrices, and the second assumption provides diverse constraints, which greatly reduces the solution space and allows for accurate estimation of RG'G and RBS from a short history of IMU readings in real time. To achieve this, we created synthetic datasets of paired RG'G, RBS matrices and IMU readings, and learned their mappings using a Transformer-based model. We also designed a calibration trigger based on the diversity of IMU readings to ensure that assumption ii) is met before applying our method. To our knowledge, we are the first to achieve implicit IMU calibration (i.e., seamlessly putting IMUs into use without the need for an explicit calibration process), as well as the first to enable long-term and accurate motion capture using sparse IMUs. The code and dataset are available at https://github.com/ZuoCX1996/TIC. 2025-06-12T11:18:40Z Accepted by SIGGRAPH 2025 (TOG) Chengxu Zuo Jiawei Huang Xiao Jiang Yuan Yao Xiangren Shi Rui Cao Xinyu Yi Feng Xu Shihui Guo Yipeng Qin http://arxiv.org/abs/2506.10468v1 Low-Barrier Dataset Collection with Real Human Body for Interactive Per-Garment Virtual Try-On 2025-06-12T08:18:49Z Existing image-based virtual try-on methods are often limited to the front view and lack real-time performance. While per-garment virtual try-on methods have tackled these issues by capturing per-garment datasets and training per-garment neural networks, they still encounter practical limitations: (1) the robotic mannequin used to capture per-garment datasets is prohibitively expensive for widespread adoption and fails to accurately replicate natural human body deformation; (2) the synthesized garments often misalign with the human body. To address these challenges, we propose a low-barrier approach for collecting per-garment datasets using real human bodies, eliminating the necessity for a customized robotic mannequin. We also introduce a hybrid person representation that enhances the existing intermediate representation with a simplified DensePose map. This ensures accurate alignment of synthesized garment images with the human body and enables human-garment interaction without the need for customized wearable devices. We performed qualitative and quantitative evaluations against other state-of-the-art image-based virtual try-on methods and conducted ablation studies to demonstrate the superiority of our method regarding image quality and temporal consistency. Finally, our user study results indicated that most participants found our virtual try-on system helpful for making garment purchasing decisions. 2025-06-12T08:18:49Z Zaiqiang Wu Yechen Li Jingyuan Liu Yuki Shibata Takayuki Hori I-Chao Shen Takeo Igarashi