https://arxiv.org/api/6o3vA+q3ZsuvfoAQYlYE+kf28qw2026-06-29T00:39:43Z9390202515http://arxiv.org/abs/2506.17636v13D Gaussian Splatting for Fine-Detailed Surface Reconstruction in Large-Scale Scene2025-06-21T08:41:28ZRecent developments in 3D Gaussian Splatting have made significant advances in surface reconstruction. However, scaling these methods to large-scale scenes remains challenging due to high computational demands and the complex dynamic appearances typical of outdoor environments. These challenges hinder the application in aerial surveying and autonomous driving. This paper proposes a novel solution to reconstruct large-scale surfaces with fine details, supervised by full-sized images. Firstly, we introduce a coarse-to-fine strategy to reconstruct a coarse model efficiently, followed by adaptive scene partitioning and sub-scene refining from image segments. Additionally, we integrate a decoupling appearance model to capture global appearance variations and a transient mask model to mitigate interference from moving objects. Finally, we expand the multi-view constraint and introduce a single-view regularization for texture-less areas. Our experiments were conducted on the publicly available dataset GauU-Scene V2, which was captured using unmanned aerial vehicles. To the best of our knowledge, our method outperforms existing NeRF-based and Gaussian-based methods, achieving high-fidelity visual results and accurate surface from full-size image optimization. Open-source code will be available on GitHub.2025-06-21T08:41:28ZIROS 2025Shihan ChenZhaojin LiZeyu ChenQingsong YanGaoyang ShenRan Duanhttp://arxiv.org/abs/2506.17206v1DreamCube: 3D Panorama Generation via Multi-plane Synchronization2025-06-20T17:55:06Z3D panorama synthesis is a promising yet challenging task that demands high-quality and diverse visual appearance and geometry of the generated omnidirectional content. Existing methods leverage rich image priors from pre-trained 2D foundation models to circumvent the scarcity of 3D panoramic data, but the incompatibility between 3D panoramas and 2D single views limits their effectiveness. In this work, we demonstrate that by applying multi-plane synchronization to the operators from 2D foundation models, their capabilities can be seamlessly extended to the omnidirectional domain. Based on this design, we further introduce DreamCube, a multi-plane RGB-D diffusion model for 3D panorama generation, which maximizes the reuse of 2D foundation model priors to achieve diverse appearances and accurate geometry while maintaining multi-view consistency. Extensive experiments demonstrate the effectiveness of our approach in panoramic image generation, panoramic depth estimation, and 3D scene generation.2025-06-20T17:55:06ZProject page: https://yukun-huang.github.io/DreamCube/Yukun HuangYanning ZhouJianan WangKaiyi HuangXihui Liuhttp://arxiv.org/abs/2506.17025v1Volumetric Parameterization for 3-Dimensional Simply-Connected Manifolds2025-06-20T14:21:34ZWith advances in technology, there has been growing interest in developing effective mapping methods for 3-dimensional objects in recent years. Volumetric parameterization for 3D solid manifolds plays an important role in processing 3D data. However, the conventional approaches cannot control the bijectivity and local geometric distortions of the result mappings due to the complex structure of the solid manifolds. Moreover, prior methods mainly focus on one property instead of balancing different properties during the mapping process. In this paper, we propose several novel methods for computing volumetric parameterizations for 3D simply-connected manifolds. Analogous to surface parameterization, our framework incorporates several models designed to preserve geometric structure, achieve density equalization, and optimally balance geometric and density distortions. With these methods, various 3D manifold parameterizations with different desired properties can be achieved. These methods are tested on different examples and manifold remeshing applications, demonstrating their effectiveness and accuracy.2025-06-20T14:21:34ZZhiyuan LyuQiguang ChenGary P. T. ChoiLok Ming Luihttp://arxiv.org/abs/2506.16054v1PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models2025-06-19T06:25:02ZIn visual generation, the quadratic complexity of attention mechanisms results in high memory and computational costs, especially for longer token sequences required in high-resolution image or multi-frame video generation. To address this, prior research has explored techniques such as sparsification and quantization. However, these techniques face significant challenges under low density and reduced bitwidths. Through systematic analysis, we identify that the core difficulty stems from the dispersed and irregular characteristics of visual attention patterns. Therefore, instead of introducing specialized sparsification and quantization design to accommodate such patterns, we propose an alternative strategy: *reorganizing* the attention pattern to alleviate the challenges. Inspired by the local aggregation nature of visual feature extraction, we design a novel **Pattern-Aware token ReOrdering (PARO)** technique, which unifies the diverse attention patterns into a hardware-friendly block-wise pattern. This unification substantially simplifies and enhances both sparsification and quantization. We evaluate the performance-efficiency trade-offs of various design choices and finalize a methodology tailored for the unified pattern. Our approach, **PAROAttention**, achieves video and image generation with lossless metrics, and nearly identical results from full-precision (FP) baselines, while operating at notably lower density (~20%-30%) and bitwidth (**INT8/INT4**), achieving a **1.9x** to **2.7x** end-to-end latency speedup.2025-06-19T06:25:02Zproject page: https://a-suozhang.xyz/paroattn.github.ioTianchen ZhaoKe HongXinhao YangXuefeng XiaoHuixia LiFeng LingRuiqi XieSiqi ChenHongyu ZhuYichong ZhangYu Wanghttp://arxiv.org/abs/2506.15860v1User-Guided Force-Directed Graph Layout2025-06-18T20:11:46ZVisual analysis of relational data is essential for many real-world analytics tasks, with layout quality being key to interpretability. However, existing layout algorithms often require users to navigate complex parameters to express their intent. We present a user-guided force-directed layout approach that enables intuitive control through freehand sketching. Our method uses classical image analysis techniques to extract structural information from sketches, which is then used to generate positional constraints that guide the layout process. We evaluate the approach on various real and synthetic graphs ranging from small to medium scale, demonstrating its ability to produce layouts aligned with user expectations. An implementation of our method along with documentation and a demo page is freely available on GitHub at https://github.com/sciluna/uggly.2025-06-18T20:11:46ZHasan BalciAugustin Lunahttp://arxiv.org/abs/2506.15786v1Graphics4Science: Computer Graphics for Scientific Impacts2025-06-18T18:06:58ZComputer graphics, often associated with films, games, and visual effects, has long been a powerful tool for addressing scientific challenges--from its origins in 3D visualization for medical imaging to its role in modern computational modeling and simulation. This course explores the deep and evolving relationship between computer graphics and science, highlighting past achievements, ongoing contributions, and open questions that remain. We show how core methods, such as geometric reasoning and physical modeling, provide inductive biases that help address challenges in both fields, especially in data-scarce settings. To that end, we aim to reframe graphics as a modeling language for science by bridging vocabulary gaps between the two communities. Designed for both newcomers and experts, Graphics4Science invites the graphics community to engage with science, tackle high-impact problems where graphics expertise can make a difference, and contribute to the future of scientific discovery. Additional details are available on the course website: https://graphics4science.github.io2025-06-18T18:06:58ZPeter Yichen ChenMinghao GuoHanspeter PfisterMing LinWilliam FreemanQixing HuangHan-Wei ShenWojciech Matusikhttp://arxiv.org/abs/2506.15684v1Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards2025-06-18T17:59:59ZGenerating high-quality and photorealistic 3D assets remains a longstanding challenge in 3D vision and computer graphics. Although state-of-the-art generative models, such as diffusion models, have made significant progress in 3D generation, they often fall short of human-designed content due to limited ability to follow instructions, align with human preferences, or produce realistic textures, geometries, and physical attributes. In this paper, we introduce Nabla-R2D3, a highly effective and sample-efficient reinforcement learning alignment framework for 3D-native diffusion models using 2D rewards. Built upon the recently proposed Nabla-GFlowNet method, which matches the score function to reward gradients in a principled manner for reward finetuning, our Nabla-R2D3 enables effective adaptation of 3D diffusion models using only 2D reward signals. Extensive experiments show that, unlike vanilla finetuning baselines which either struggle to converge or suffer from reward hacking, Nabla-R2D3 consistently achieves higher rewards and reduced prior forgetting within a few finetuning steps.2025-06-18T17:59:59ZTechnical Report (21 pages, 21 figures)Qingming LiuZhen LiuDinghuai ZhangKui Jiahttp://arxiv.org/abs/2506.15571v1MicroRicci: A Greedy and Local Ricci Flow Solver for Self-Tuning Mesh Smoothing2025-06-18T15:48:30ZReal-time mesh smoothing at scale remains a formidable challenge: classical Ricci-flow solvers demand costly global updates, while greedy heuristics suffer from slow convergence or brittle tuning. We present MicroRicci, the first truly self-tuning, local Ricci-flow solver that borrows ideas from coding theory and packs them into just 1K + 200 parameters. Its primary core is a greedy syndrome-decoding step that pinpoints and corrects the largest curvature error in O(E) time, augmented by two tiny neural modules that adaptively choose vertices and step sizes on the fly. On a diverse set of 110 SJTU-TMQA meshes, MicroRicci slashes iteration counts from 950+=140 to 400+=80 (2.4x speedup), tightens curvature spread from 0.19 to 0.185, and achieves a remarkable UV-distortion-to-MOS correlation of r = -0.93. It adds only 0.25 ms per iteration (0.80 to 1.05 ms), yielding an end-to-end 1.8x runtime acceleration over state-of-the-art methods. MicroRicci's combination of linear-time updates, automatic hyperparameter adaptation, and high-quality geometric and perceptual results makes it well suited for real-time, resource-limited applications in graphics, simulation, and related fields.2025-06-18T15:48:30Z9 pages, 8 figures, 4 tablesLe Vu AnhNguyen Viet AnhMehmet DikTu Nguyen Thi Ngochttp://arxiv.org/abs/2506.15312v1One-shot Face Sketch Synthesis in the Wild via Generative Diffusion Prior and Instruction Tuning2025-06-18T09:41:30ZFace sketch synthesis is a technique aimed at converting face photos into sketches. Existing face sketch synthesis research mainly relies on training with numerous photo-sketch sample pairs from existing datasets. However, these large-scale discriminative learning methods will have to face problems such as data scarcity and high human labor costs. Once the training data becomes scarce, their generative performance significantly degrades. In this paper, we propose a one-shot face sketch synthesis method based on diffusion models. We optimize text instructions on a diffusion model using face photo-sketch image pairs. Then, the instructions derived through gradient-based optimization are used for inference. To simulate real-world scenarios more accurately and evaluate method effectiveness more comprehensively, we introduce a new benchmark named One-shot Face Sketch Dataset (OS-Sketch). The benchmark consists of 400 pairs of face photo-sketch images, including sketches with different styles and photos with different backgrounds, ages, sexes, expressions, illumination, etc. For a solid out-of-distribution evaluation, we select only one pair of images for training at each time, with the rest used for inference. Extensive experiments demonstrate that the proposed method can convert various photos into realistic and highly consistent sketches in a one-shot context. Compared to other methods, our approach offers greater convenience and broader applicability. The dataset will be available at: https://github.com/HanWu3125/OS-Sketch2025-06-18T09:41:30ZWe propose a novel framework for face sketch synthesis, where merely a single pair of samples suffices to enable in-the-wild face sketch synthesisHan WuJunyao LiKangbo ZhaoSen ZhangYukai ShiLiang Linhttp://arxiv.org/abs/2506.15183v1You Only Render Once: Enhancing Energy and Computation Efficiency of Mobile Virtual Reality2025-06-18T06:59:33ZMobile Virtual Reality (VR) is essential to achieving convenient and immersive human-computer interaction and realizing emerging applications such as Metaverse. However, existing VR technologies require two separate renderings of binocular images, causing a significant bottleneck for mobile devices with limited computing capability and power supply. This paper proposes an approach to rendering optimization for mobile VR called EffVR. By utilizing the per-pixel attribute, EffVR can generate binocular VR images from the monocular image through genuinely one rendering, saving half the computation over conventional approaches. Our evaluation indicates that, compared with the state-of-art, EffVRcan save 27% power consumption on average while achieving high binocular image quality (0.9679 SSIM and 34.09 PSNR) in mobile VR applications. Additionally, EffVR can increase the frame rate by 115.2%. These results corroborate EffVRsuperior computation/energy-saving performance, paving the road to a sustainable mobile VR. The source code, demo video, android app, and more are released anonymously at https://yoro-vr.github.io/2025-06-18T06:59:33ZXingyu ChenXinmin FangShuting ZhangXinyu ZhangLiang HeZhengxiong Lihttp://arxiv.org/abs/2507.02877v1AuraGenome: An LLM-Powered Framework for On-the-Fly Reusable and Scalable Circular Genome Visualizations2025-06-18T03:29:30ZCircular genome visualizations are essential for exploring structural variants and gene regulation. However, existing tools often require complex scripting and manual configuration, making the process time-consuming, error-prone, and difficult to learn. To address these challenges, we introduce AuraGenome, an LLM-powered framework for rapid, reusable, and scalable generation of multi-layered circular genome visualizations. AuraGenome combines a semantic-driven multi-agent workflow with an interactive visual analytics system. The workflow employs seven specialized LLM-driven agents, each assigned distinct roles such as intent recognition, layout planning, and code generation, to transform raw genomic data into tailored visualizations. The system supports multiple coordinated views tailored for genomic data, offering ring, radial, and chord-based layouts to represent multi-layered circular genome visualizations. In addition to enabling interactions and configuration reuse, the system supports real-time refinement and high-quality report export. We validate its effectiveness through two case studies and a comprehensive user study. AuraGenome is available at: https://github.com/Darius18/AuraGenome.2025-06-18T03:29:30ZChi ZhangYu DongYang WangYuetong HanGuihua ShanBixia Tanghttp://arxiv.org/abs/2506.14414v1GHAR: GeoPose-based Handheld Augmented Reality for Architectural Positioning, Manipulation and Visual Exploration2025-06-17T11:17:26ZHandheld Augmented Reality (HAR) is revolutionizing the civil infrastructure application domain. The current trend in HAR relies on marker tracking technology. However, marker-based systems have several limitations, such as difficulty in use and installation, sensitivity to light, and marker design. In this paper, we propose a markerless HAR framework with GeoPose-based tracking. We use different gestures for manipulation and achieve 7 DOF (3 DOF each for translation and rotation, and 1 DOF for scaling). The proposed framework, called GHAR, is implemented for architectural building models. It augments virtual CAD models of buildings on the ground, enabling users to manipulate and visualize an architectural model before actual construction. The system offers a quick view of the building infrastructure, playing a vital role in requirement analysis and planning in construction technology. We evaluated the usability, manipulability, and comprehensibility of the proposed system using a standard user study with the System Usability Scale (SUS) and Handheld Augmented Reality User Study (HARUS). We compared our GeoPose-based markerless HAR framework with a marker-based HAR framework, finding significant improvement in the aforementioned three parameters with the markerless framework.2025-06-17T11:17:26ZSabahat IsrarDawar KhanZhanglin ChengMukhtaj KhanKiyoshi Kiyokawahttp://arxiv.org/abs/2503.18682v2Hardware-Rasterized Ray-Based Gaussian Splatting2025-06-17T09:31:20ZWe present a novel, hardware rasterized rendering approach for ray-based 3D Gaussian Splatting (RayGS), obtaining both fast and high-quality results for novel view synthesis. Our work contains a mathematically rigorous and geometrically intuitive derivation about how to efficiently estimate all relevant quantities for rendering RayGS models, structured with respect to standard hardware rasterization shaders. Our solution is the first enabling rendering RayGS models at sufficiently high frame rates to support quality-sensitive applications like Virtual and Mixed Reality. Our second contribution enables alias-free rendering for RayGS, by addressing MIP-related issues arising when rendering diverging scales during training and testing. We demonstrate significant performance gains, across different benchmark scenes, while retaining state-of-the-art appearance quality of RayGS.2025-03-24T13:53:30ZSamuel Rota BulòNemanja BartolovicLorenzo PorziPeter Kontschiederhttp://arxiv.org/abs/2503.12553v2Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View2025-06-17T05:47:02ZRecent advances in single-view 3D scene reconstruction have highlighted the challenges in capturing fine geometric details and ensuring structural consistency, particularly in high-fidelity outdoor scene modeling. This paper presents Niagara, a new single-view 3D scene reconstruction framework that can faithfully reconstruct challenging outdoor scenes from a single input image for the first time.
Our approach integrates monocular depth and normal estimation as input, which substantially improves its ability to capture fine details, mitigating common issues like geometric detail loss and deformation.
Additionally, we introduce a geometric affine field (GAF) and 3D self-attention as geometry-constraint, which combines the structural properties of explicit geometry with the adaptability of implicit feature fields, striking a balance between efficient rendering and high-fidelity reconstruction.
Our framework finally proposes a specialized encoder-decoder architecture, where a depth-based 3D Gaussian decoder is proposed to predict 3D Gaussian parameters, which can be used for novel view synthesis. Extensive results and analyses suggest that our Niagara surpasses prior SoTA approaches such as Flash3D in both single-view and dual-view settings, significantly enhancing the geometric accuracy and visual fidelity, especially in outdoor scenes.2025-03-16T15:50:18ZXianzu WuZhenxin AiHarry YangSer-Nam LimJun LiuHuan Wanghttp://arxiv.org/abs/2506.14104v1Innovating China's Intangible Cultural Heritage with DeepSeek + MidJourney: The Case of Yangliuqing theme Woodblock Prints2025-06-17T01:47:17ZYangliuqing woodblock prints, a cornerstone of China's intangible cultural heritage, are celebrated for their intricate designs and vibrant colors. However, preserving these traditional art forms while fostering innovation presents significant challenges. This study explores the DeepSeek + MidJourney approach to generating creative, themed Yangliuqing woodblock prints focused on the fight against COVID-19 and depicting joyous winners. Using Fréchet Inception Distance (FID) scores for evaluation, the method that combined DeepSeek-generated thematic prompts, MidJourney-generated thematic images, original Yangliuqing prints, and DeepSeek-generated key prompts in MidJourney-generated outputs achieved the lowest mean FID score (150.2) with minimal variability (σ = 4.9). Additionally, feedback from 62 participants, collected via questionnaires, confirmed that this hybrid approach produced the most representative results. Moreover, the questionnaire data revealed that participants demonstrated the highest willingness to promote traditional culture and the strongest interest in consuming the AI-generated images produced through this method. These findings underscore the effectiveness of an innovative approach that seamlessly blends traditional artistic elements with modern AI-driven creativity, ensuring both cultural preservation and contemporary relevance.2025-06-17T01:47:17ZRuiKun YangZhongLiang WeiLongdi Xian