https://arxiv.org/api/qBX0hX9ea+pa0VjfPOtLdSEqW8Q 2026-06-23T19:09:01Z 9374 1035 15 http://arxiv.org/abs/2601.03869v2 Bayesian Monocular Depth Refinement via Neural Radiance Fields 2026-01-15T01:46:55Z

Monocular depth estimation has applications in many fields, such as autonomous navigation and extended reality, making it an essential computer vision task. However, current methods often produce smooth depth maps that lack the fine geometric detail needed for accurate scene understanding. We propose MDENeRF, an iterative framework that refines monocular depth estimates using depth information from Neural Radiance Fields (NeRFs). MDENeRF consists of three components: (1) an initial monocular estimate for global structure, (2) a NeRF trained on perturbed viewpoints, with per-pixel uncertainty, and (3) Bayesian fusion of the noisy monocular and NeRF depths. We derive NeRF uncertainty from the volume rendering process to iteratively inject high-frequency fine details. Meanwhile, our monocular prior maintains global structure. We demonstrate improvements on key metrics and experiments using indoor scenes from the SUN RGB-D dataset.

2026-01-07T12:32:39Z IEEE 8th International Conference on Algorithms, Computing and Artificial Intelligence (ACAI 2025) Proc. IEEE 8th International Conference on Algorithms, Computing and Artificial Intelligence (ACAI), pp. 488-492, 2025 Arun Muthukkumar 10.1109/ACAI68217.2025.11406626 http://arxiv.org/abs/2512.23696v2 OpenPBR: Novel Features and Implementation Details 2026-01-14T13:18:48Z

OpenPBR is a physically based, standardized uber-shader developed for interoperable material authoring and rendering across VFX, animation, and design visualization workflows. This document serves as a companion to the official specification, offering deeper insight into the model's development and more detailed implementation guidance, including code examples and mathematical derivations. We begin with a description of the model's formal structure and theoretical foundations - covering slab-based layering, statistical mixing, and microfacet theory - before turning to its physical components. These include metallic, dielectric, subsurface, and glossy-diffuse base substrates, followed by thin-film iridescence, coat, and fuzz layers. A special-case mode for rendering thin-walled objects is also described. Additional sections explore technical topics in greater depth, such as the decoupling of specular reflectivity from transmission, the choice of parameterization for subsurface scattering, and the detailed physics of coat darkening and thin-film interference. We also discuss planned extensions, including hazy specular reflection and retroreflection.

2025-12-29T18:53:00Z Part of Physically Based Shading in Theory and Practice, SIGGRAPH 2025 Course Jamie Portsmouth Peter Kutz Stephen Hill 10.1145/3721241.3733991 http://arxiv.org/abs/2601.09428v1 Draw it like Euclid: Teaching transformer models to generate CAD profiles using ruler and compass construction steps 2026-01-14T12:17:34Z

We introduce a new method of generating Computer Aided Design (CAD) profiles via a sequence of simple geometric constructions including curve offsetting, rotations and intersections. These sequences start with geometry provided by a designer and build up the points and curves of the final profile step by step. We demonstrate that adding construction steps between the designer's input geometry and the final profile improves generation quality in a similar way to the introduction of a chain of thought in language models. Similar to the constraints in a parametric CAD model, the construction sequences reduce the degrees of freedom in the modeled shape to a small set of parameter values which can be adjusted by the designer, allowing parametric editing with the constructed geometry evaluated to floating point precision. In addition we show that applying reinforcement learning to the construction sequences gives further improvements over a wide range of metrics, including some which were not explicitly optimized.

2026-01-14T12:17:34Z Siyi Li Joseph G. Lambourne Longfei Zhang Pradeep Kumar Jayaraman Karl. D. D. Willis http://arxiv.org/abs/2601.09417v1 Variable Basis Mapping for Real-Time Volumetric Visualization 2026-01-14T12:11:14Z

Real-time visualization of large-scale volumetric data remains challenging, as direct volume rendering and voxel-based methods suffer from prohibitively high computational cost. We propose Variable Basis Mapping (VBM), a framework that transforms volumetric fields into 3D Gaussian Splatting (3DGS) representations through wavelet-domain analysis. First, we precompute a compact Wavelet-to-Gaussian Transition Bank that provides optimal Gaussian surrogates for canonical wavelet atoms across multiple scales. Second, we perform analytical Gaussian construction that maps discrete wavelet coefficients directly to 3DGS parameters using a closed-form, mathematically principled rule. Finally, a lightweight image-space fine-tuning stage further refines the representation to improve rendering fidelity. Experiments on diverse datasets demonstrate that VBM significantly accelerates convergence and enhances rendering quality, enabling real-time volumetric visualization.

2026-01-14T12:11:14Z 11 pages. Under review Qibiao Li Yuxuan Wang Youcheng Cai Huangsheng Du Ligang Liu http://arxiv.org/abs/2603.29569v1 AdaptDiff: Adaptive Guidance in Diffusion Models for Diverse and Identity-Consistent Face Synthesis (Student Abstract) 2026-01-14T11:03:51Z

Diffusion models conditioned on identity embeddings enable the generation of synthetic face images that consistently preserve identity across multiple samples. Recent work has shown that introducing an additional negative condition through classifier-free guidance during sampling provides a mechanism to suppress undesired attributes, thus improving inter-class separability. Building on this insight, we propose a dynamic weighting scheme for the negative condition that adapts throughout the sampling trajectory. This strategy leverages the complementary strengths of positive and negative conditions at different stages of generation, leading to more diverse yet identity-consistent synthetic data.

2026-01-14T11:03:51Z Accepted at AAAI 2026 Student Abstract and Poster Program Eduarda Caldeira Tahar Chettaoui Naser Damer Fadi Boutros http://arxiv.org/abs/2508.13990v2 Uncertainty-Aware PCA for Arbitrarily Distributed Data Modeled by Gaussian Mixture Models 2026-01-14T09:53:51Z

Multidimensional data is often associated with uncertainties that are not well-described by normal distributions. In this work, we describe how such distributions can be projected to a low-dimensional space using uncertainty-aware principal component analysis (UAPCA). We propose to model multidimensional distributions using Gaussian mixture models (GMMs) and derive the projection from a general formulation that allows projecting arbitrary probability density functions. The low-dimensional projections of the densities exhibit more details about the distributions and represent them more faithfully compared to UAPCA mappings. Further, we support including user-defined weights between the different distributions, which allows for varying the importance of the multidimensional distributions. We evaluate our approach by comparing the distributions in low-dimensional space obtained by our method and UAPCA to those obtained by sample-based projections.

2025-08-19T16:31:41Z 10 pages, 6 figures Daniel Klötzl Ozan Tastekin David Hägele Marina Evers Daniel Weiskopf 10.1109/UncertaintyVisualization68947.2025.00010 http://arxiv.org/abs/2506.10035v3 FastFLUX: Pruning FLUX with Block-wise Replacement and Sandwich Training 2026-01-13T18:20:18Z

Recent advancements in text-to-image (T2I) generation have led to the emergence of highly expressive models such as diffusion transformers (DiTs), exemplified by FLUX. However, their massive parameter sizes lead to slow inference, high memory usage, and poor deployability. Existing acceleration methods (e.g., single-step distillation and attention pruning) often suffer from significant performance degradation and incur substantial training costs. To address these limitations, we propose FastFLUX, an architecture-level pruning framework designed to enhance the inference efficiency of FLUX. At its core is the Block-wise Replacement with Linear Layers (BRLL) method, which replaces structurally complex residual branches in ResBlocks with lightweight linear layers while preserving the original shortcut connections for stability. Furthermore, we introduce Sandwich Training (ST), a localized fine-tuning strategy that leverages LoRA to supervise neighboring blocks, mitigating performance drops caused by structural replacement. Experiments show that our FastFLUX maintains high image quality under both qualitative and quantitative evaluations, while significantly improving inference speed, even with 20\% of the hierarchy pruned. Our code will be available soon.

2025-06-10T20:48:30Z 14 pages Fuhan Cai Yong Guo Jie Li Wenbo Li Jian Chen Xiangzhong Fang http://arxiv.org/abs/2507.17336v3 Temporal Smoothness-Aware Rate-Distortion Optimized 4D Gaussian Splatting 2026-01-13T17:15:56Z

Dynamic 4D Gaussian Splatting (4DGS) effectively extends the high-speed rendering capabilities of 3D Gaussian Splatting (3DGS) to represent volumetric videos. However, the large number of Gaussians, substantial temporal redundancies, and especially the absence of an entropy-aware compression framework result in large storage requirements. Consequently, this poses significant challenges for practical deployment, efficient edge-device processing, and data transmission. In this paper, we introduce a novel end-to-end RD-optimized compression framework tailored for 4DGS, aiming to enable flexible, high-fidelity rendering across varied computational platforms. Leveraging Fully Explicit Dynamic Gaussian Splatting (Ex4DGS), one of the state-of-the-art 4DGS methods, as our baseline, we start from the existing 3DGS compression methods for compatibility while effectively addressing additional challenges introduced by the temporal axis. In particular, instead of storing motion trajectories independently per point, we employ a wavelet transform to reflect the real-world smoothness prior, significantly enhancing storage efficiency. This approach yields significantly improved compression ratios and provides a user-controlled balance between compression efficiency and rendering quality. Extensive experiments demonstrate the effectiveness of our method, achieving up to 91$\times$ compression compared to the original Ex4DGS model while maintaining high visual fidelity. These results highlight the applicability of our framework for real-time dynamic scene rendering in diverse scenarios, from resource-constrained edge devices to high-performance environments. The source code is available at https://github.com/HyeongminLEE/RD4DGS.

2025-07-23T09:05:13Z 24 pages, 10 figures, NeurIPS 2025 Hyeongmin Lee Kyungjune Baek http://arxiv.org/abs/2601.08429v1 Deep Learning Based Facial Retargeting Using Local Patches 2026-01-13T10:56:15Z

In the era of digital animation, the quest to produce lifelike facial animations for virtual characters has led to the development of various retargeting methods. While the retargeting facial motion between models of similar shapes has been very successful, challenges arise when the retargeting is performed on stylized or exaggerated 3D characters that deviate significantly from human facial structures. In this scenario, it is important to consider the target character's facial structure and possible range of motion to preserve the semantics assumed by the original facial motions after the retargeting. To achieve this, we propose a local patch-based retargeting method that transfers facial animations captured in a source performance video to a target stylized 3D character. Our method consists of three modules. The Automatic Patch Extraction Module extracts local patches from the source video frame. These patches are processed through the Reenactment Module to generate correspondingly re-enacted target local patches. The Weight Estimation Module calculates the animation parameters for the target character at every frame for the creation of a complete facial animation sequence. Extensive experiments demonstrate that our method can successfully transfer the semantic meaning of source facial expressions to stylized characters with considerable variations in facial feature proportion.

2026-01-13T10:56:15Z Eurographics 25 Computer Graphics Forum 2024 Yeonsoo Choi Inyup Lee Sihun Cha Seonghyeon Kim Sunjin Jung Junyong Noh 10.1111/cgf.15263 http://arxiv.org/abs/2601.08371v1 Geo-NVS-w: Geometry-Aware Novel View Synthesis In-the-Wild with an SDF Renderer 2026-01-13T09:34:01Z

We introduce Geo-NVS-w, a geometry-aware framework for high-fidelity novel view synthesis from unstructured, in-the-wild image collections. While existing in-the-wild methods already excel at novel view synthesis, they often lack geometric grounding on complex surfaces, sometimes producing results that contain inconsistencies. Geo-NVS-w addresses this limitation by leveraging an underlying geometric representation based on a Signed Distance Function (SDF) to guide the rendering process. This is complemented by a novel Geometry-Preservation Loss which ensures that fine structural details are preserved. Our framework achieves competitive rendering performance, while demonstrating a 4-5x reduction reduction in energy consumption compared to similar methods. We demonstrate that Geo-NVS-w is a robust method for in-the-wild NVS, yielding photorealistic results with sharp, geometrically coherent details.

2026-01-13T09:34:01Z Presented at the ICCV 2025 Workshop on Large Scale Cross Device Localization Anastasios Tsalakopoulos Angelos Kanlis Evangelos Chatzis Antonis Karakottas Dimitrios Zarpalas http://arxiv.org/abs/2601.08256v1 Data-Induced Groupings and How To Find Them 2026-01-13T06:28:36Z

Making sense of a visualization requires the reader to consider both the visualization design and the underlying data values. Existing work in the visualization community has largely considered affordances driven by visualization design elements, such as color or chart type, but how visual design interacts with data values to impact interpretation and reasoning has remained under-explored. Dot plots and bar graphs are commonly used to help users identify groups of points that form trends and clusters, but are liable to manifest groupings that are artifacts of spatial arrangement rather than inherent patterns in the data itself. These ``Data-induced Groups'' can drive suboptimal data comparisons and potentially lead the user to incorrect conclusions. We conduct two user studies using dot plots as a case study to understand the prevalence of data-induced groupings. We find that users rely on data-induced groupings in both conditions despite the fact that trend-based groupings are irrelevant in nominal data. Based on the study results, we build a model to predict whether users are likely to perceive a given set of dot plot points as a group. We discuss two use cases illustrating how the model can assist visualization designers by both diagnosing potential user-perceived groupings in dot plots and offering redesigns that better accentuate desired groupings through data rearrangement.

2026-01-13T06:28:36Z Yilan Jiang Cindy Xiong Bearfield Steven Franconeri Eugene Wu http://arxiv.org/abs/2601.08179v1 Instruction-Driven 3D Facial Expression Generation and Transition 2026-01-13T03:12:48Z

A 3D avatar typically has one of six cardinal facial expressions. To simulate realistic emotional variation, we should be able to render a facial transition between two arbitrary expressions. This study presents a new framework for instruction-driven facial expression generation that produces a 3D face and, starting from an image of the face, transforms the facial expression from one designated facial expression to another. The Instruction-driven Facial Expression Decomposer (IFED) module is introduced to facilitate multimodal data learning and capture the correlation between textual descriptions and facial expression features. Subsequently, we propose the Instruction to Facial Expression Transition (I2FET) method, which leverages IFED and a vertex reconstruction loss function to refine the semantic comprehension of latent vectors, thus generating a facial expression sequence according to the given instruction. Lastly, we present the Facial Expression Transition model to generate smooth transitions between facial expressions. Extensive evaluation suggests that the proposed model outperforms state-of-the-art methods on the CK+ and CelebV-HQ datasets. The results show that our framework can generate facial expression trajectories according to text instruction. Considering that text prompts allow us to make diverse descriptions of human emotional states, the repertoire of facial expressions and the transitions between them can be expanded greatly. We expect our framework to find various practical applications More information about our project can be found at https://vohoanganh.github.io/tg3dfet/

2026-01-13T03:12:48Z IEEE Transactions on Multimedia, 2025 Anh H. Vo Tae-Seok Kim Hulin Jin Soo-Mi Choi Yong-Guk Kim 10.1109/TMM.2025.3565929 http://arxiv.org/abs/1001.4002v4 Aplicación Gráfica para el estudio de un Modelo de Celda Electrolítica usando Técnicas de Visualización de Campos Vectoriales 2026-01-13T02:50:09Z

The use of floating bipolar electrodes in copper electro-winning cells represents an emerging technology that promises economic and operational impacts. This thesis presents EWCellCAD, a computational tool designed for the simulation and analysis of these electrochemical systems. Based on the generalization and optimization of an existing 2D finite difference model for calculating electrical variables in rectangular cells, EWCellCAD implements a new 3D model capable of processing complex geometries, not necessarily rectangular, which also accelerates calculations by several orders of magnitude. At the same time, a new analytical method for estimating potentials in floating electrodes is introduced, overcoming the inaccuracies of previous heuristic approaches. The analysis of the results is supported by an interactive visualization technique of three-dimensional vector fields as flow lines.

2010-01-22T18:23:27Z BSc Thesis in Electronic Engineering (part of the research project FONDECYT 1970955), Universidad de Concepción, 2000, 105 pages, 22 figures, in Spanish. Related publication: arXiv:1001.3974 [cs.GR]. Metadata-only update: Author name standardized (maternal surname removed; paternal surname as sole last name). Title orthography corrected with TeX accents. Abstract refined César Mena http://arxiv.org/abs/2504.13339v2 Volume Encoding Gaussians: Transfer Function-Agnostic 3D Gaussians for Volume Rendering 2026-01-12T21:43:06Z

Visualizing the large-scale datasets output by HPC resources presents a difficult challenge, as the memory and compute power required become prohibitively expensive for end user systems. Novel view synthesis techniques can address this by producing a small, interactive model of the data, requiring only a set of training images to learn from. While these models allow accessible visualization of large data and complex scenes, they do not provide the interactions needed for scientific volumes, as they do not support interactive selection of transfer functions and lighting parameters. To address this, we introduce Volume Encoding Gaussians (VEG), a 3D Gaussian-based representation for volume visualization that supports arbitrary color and opacity mappings. Unlike prior 3D Gaussian Splatting (3DGS) methods that store color and opacity for each Gaussian, VEG decouple the visual appearance from the data representation by encoding only scalar values, enabling transfer function-agnostic rendering of 3DGS models. To ensure complete scalar field coverage, we introduce an opacity-guided training strategy, using differentiable rendering with multiple transfer functions to optimize our data representation. This allows VEG to preserve fine features across the full scalar range of a dataset while remaining independent of any specific transfer function. Across a diverse set of volume datasets, we demonstrate that our method outperforms the state-of-the-art on transfer functions unseen during training, while requiring a fraction of the memory and training time.

2025-04-17T21:17:54Z Landon Dyken Andres Sewell Will Usher Nathan Debardeleben Steve Petruzza Sidharth Kumar http://arxiv.org/abs/2601.05394v2 Sketch&Patch++: Efficient Structure-Aware 3D Gaussian Representation 2026-01-12T15:16:39Z

We observe that Gaussians exhibit distinct roles and characteristics analogous to traditional artistic techniques -- like how artists first sketch outlines before filling in broader areas with color, some Gaussians capture high-frequency features such as edges and contours, while others represent broader, smoother regions analogous to brush strokes that add volume and depth. Based on this observation, we propose a hybrid representation that categorizes Gaussians into (i) Sketch Gaussians, which represent high-frequency, boundary-defining features, and (ii) Patch Gaussians, which cover low-frequency, smooth regions. This semantic separation naturally enables layered progressive streaming, where the compact Sketch Gaussians establish the structural skeleton before Patch Gaussians incrementally refine volumetric detail. In this work, we extend our previous method to arbitrary 3D scenes by proposing a novel hierarchical adaptive categorization framework that operates directly on the 3DGS representation. Our approach employs multi-criteria density-based clustering, combined with adaptive quality-driven refinement. This method eliminates dependency on external 3D line primitives while ensuring optimal parametric encoding effectiveness. Our comprehensive evaluation across diverse scenes, including both man-made and natural environments, demonstrates that our method achieves up to 1.74 dB improvement in PSNR, 6.7% in SSIM, and 41.4% in LPIPS at equivalent model sizes compared to uniform pruning baselines. For indoor scenes, our method can maintain visual quality with only 0.5\% of the original model size. This structure-aware representation enables efficient storage, adaptive streaming, and rendering of high-fidelity 3D content across bandwidth-constrained networks and resource-limited devices.

2026-01-08T21:32:54Z Yuang Shi Géraldine Morin Simone Gasparini Wei Tsang Ooi