https://arxiv.org/api/1UE/k5TYwApMA592s0g5O1rnM4w2026-06-22T16:29:30Z9354102015http://arxiv.org/abs/2601.08371v1Geo-NVS-w: Geometry-Aware Novel View Synthesis In-the-Wild with an SDF Renderer2026-01-13T09:34:01ZWe introduce Geo-NVS-w, a geometry-aware framework for high-fidelity novel view synthesis from unstructured, in-the-wild image collections. While existing in-the-wild methods already excel at novel view synthesis, they often lack geometric grounding on complex surfaces, sometimes producing results that contain inconsistencies. Geo-NVS-w addresses this limitation by leveraging an underlying geometric representation based on a Signed Distance Function (SDF) to guide the rendering process. This is complemented by a novel Geometry-Preservation Loss which ensures that fine structural details are preserved. Our framework achieves competitive rendering performance, while demonstrating a 4-5x reduction reduction in energy consumption compared to similar methods. We demonstrate that Geo-NVS-w is a robust method for in-the-wild NVS, yielding photorealistic results with sharp, geometrically coherent details.2026-01-13T09:34:01ZPresented at the ICCV 2025 Workshop on Large Scale Cross Device LocalizationAnastasios TsalakopoulosAngelos KanlisEvangelos ChatzisAntonis KarakottasDimitrios Zarpalashttp://arxiv.org/abs/2601.08256v1Data-Induced Groupings and How To Find Them2026-01-13T06:28:36ZMaking sense of a visualization requires the reader to consider both the visualization design and the underlying data values. Existing work in the visualization community has largely considered affordances driven by visualization design elements, such as color or chart type, but how visual design interacts with data values to impact interpretation and reasoning has remained under-explored. Dot plots and bar graphs are commonly used to help users identify groups of points that form trends and clusters, but are liable to manifest groupings that are artifacts of spatial arrangement rather than inherent patterns in the data itself. These ``Data-induced Groups'' can drive suboptimal data comparisons and potentially lead the user to incorrect conclusions. We conduct two user studies using dot plots as a case study to understand the prevalence of data-induced groupings. We find that users rely on data-induced groupings in both conditions despite the fact that trend-based groupings are irrelevant in nominal data. Based on the study results, we build a model to predict whether users are likely to perceive a given set of dot plot points as a group. We discuss two use cases illustrating how the model can assist visualization designers by both diagnosing potential user-perceived groupings in dot plots and offering redesigns that better accentuate desired groupings through data rearrangement.2026-01-13T06:28:36ZYilan JiangCindy Xiong BearfieldSteven FranconeriEugene Wuhttp://arxiv.org/abs/2601.08179v1Instruction-Driven 3D Facial Expression Generation and Transition2026-01-13T03:12:48ZA 3D avatar typically has one of six cardinal facial expressions. To simulate realistic emotional variation, we should be able to render a facial transition between two arbitrary expressions. This study presents a new framework for instruction-driven facial expression generation that produces a 3D face and, starting from an image of the face, transforms the facial expression from one designated facial expression to another. The Instruction-driven Facial Expression Decomposer (IFED) module is introduced to facilitate multimodal data learning and capture the correlation between textual descriptions and facial expression features. Subsequently, we propose the Instruction to Facial Expression Transition (I2FET) method, which leverages IFED and a vertex reconstruction loss function to refine the semantic comprehension of latent vectors, thus generating a facial expression sequence according to the given instruction. Lastly, we present the Facial Expression Transition model to generate smooth transitions between facial expressions. Extensive evaluation suggests that the proposed model outperforms state-of-the-art methods on the CK+ and CelebV-HQ datasets. The results show that our framework can generate facial expression trajectories according to text instruction. Considering that text prompts allow us to make diverse descriptions of human emotional states, the repertoire of facial expressions and the transitions between them can be expanded greatly. We expect our framework to find various practical applications More information about our project can be found at https://vohoanganh.github.io/tg3dfet/2026-01-13T03:12:48ZIEEE Transactions on Multimedia, 2025Anh H. VoTae-Seok KimHulin JinSoo-Mi ChoiYong-Guk Kim10.1109/TMM.2025.3565929http://arxiv.org/abs/1001.4002v4Aplicación Gráfica para el estudio de un Modelo de Celda Electrolítica usando Técnicas de Visualización de Campos Vectoriales2026-01-13T02:50:09ZThe use of floating bipolar electrodes in copper electro-winning cells represents an emerging technology that promises economic and operational impacts. This thesis presents EWCellCAD, a computational tool designed for the simulation and analysis of these electrochemical systems. Based on the generalization and optimization of an existing 2D finite difference model for calculating electrical variables in rectangular cells, EWCellCAD implements a new 3D model capable of processing complex geometries, not necessarily rectangular, which also accelerates calculations by several orders of magnitude. At the same time, a new analytical method for estimating potentials in floating electrodes is introduced, overcoming the inaccuracies of previous heuristic approaches. The analysis of the results is supported by an interactive visualization technique of three-dimensional vector fields as flow lines.2010-01-22T18:23:27ZBSc Thesis in Electronic Engineering (part of the research project FONDECYT 1970955), Universidad de Concepción, 2000, 105 pages, 22 figures, in Spanish. Related publication: arXiv:1001.3974 [cs.GR]. Metadata-only update: Author name standardized (maternal surname removed; paternal surname as sole last name). Title orthography corrected with TeX accents. Abstract refinedCésar Menahttp://arxiv.org/abs/2504.13339v2Volume Encoding Gaussians: Transfer Function-Agnostic 3D Gaussians for Volume Rendering2026-01-12T21:43:06ZVisualizing the large-scale datasets output by HPC resources presents a difficult challenge, as the memory and compute power required become prohibitively expensive for end user systems. Novel view synthesis techniques can address this by producing a small, interactive model of the data, requiring only a set of training images to learn from. While these models allow accessible visualization of large data and complex scenes, they do not provide the interactions needed for scientific volumes, as they do not support interactive selection of transfer functions and lighting parameters. To address this, we introduce Volume Encoding Gaussians (VEG), a 3D Gaussian-based representation for volume visualization that supports arbitrary color and opacity mappings. Unlike prior 3D Gaussian Splatting (3DGS) methods that store color and opacity for each Gaussian, VEG decouple the visual appearance from the data representation by encoding only scalar values, enabling transfer function-agnostic rendering of 3DGS models. To ensure complete scalar field coverage, we introduce an opacity-guided training strategy, using differentiable rendering with multiple transfer functions to optimize our data representation. This allows VEG to preserve fine features across the full scalar range of a dataset while remaining independent of any specific transfer function. Across a diverse set of volume datasets, we demonstrate that our method outperforms the state-of-the-art on transfer functions unseen during training, while requiring a fraction of the memory and training time.2025-04-17T21:17:54ZLandon DykenAndres SewellWill UsherNathan DebardelebenSteve PetruzzaSidharth Kumarhttp://arxiv.org/abs/2601.05394v2Sketch&Patch++: Efficient Structure-Aware 3D Gaussian Representation2026-01-12T15:16:39ZWe observe that Gaussians exhibit distinct roles and characteristics analogous to traditional artistic techniques -- like how artists first sketch outlines before filling in broader areas with color, some Gaussians capture high-frequency features such as edges and contours, while others represent broader, smoother regions analogous to brush strokes that add volume and depth. Based on this observation, we propose a hybrid representation that categorizes Gaussians into (i) Sketch Gaussians, which represent high-frequency, boundary-defining features, and (ii) Patch Gaussians, which cover low-frequency, smooth regions. This semantic separation naturally enables layered progressive streaming, where the compact Sketch Gaussians establish the structural skeleton before Patch Gaussians incrementally refine volumetric detail.
In this work, we extend our previous method to arbitrary 3D scenes by proposing a novel hierarchical adaptive categorization framework that operates directly on the 3DGS representation. Our approach employs multi-criteria density-based clustering, combined with adaptive quality-driven refinement. This method eliminates dependency on external 3D line primitives while ensuring optimal parametric encoding effectiveness. Our comprehensive evaluation across diverse scenes, including both man-made and natural environments, demonstrates that our method achieves up to 1.74 dB improvement in PSNR, 6.7% in SSIM, and 41.4% in LPIPS at equivalent model sizes compared to uniform pruning baselines. For indoor scenes, our method can maintain visual quality with only 0.5\% of the original model size. This structure-aware representation enables efficient storage, adaptive streaming, and rendering of high-fidelity 3D content across bandwidth-constrained networks and resource-limited devices.2026-01-08T21:32:54ZYuang ShiGéraldine MorinSimone GaspariniWei Tsang Ooihttp://arxiv.org/abs/2601.07484v1R3-RECON: Radiance-Field-Free Active Reconstruction via Renderability2026-01-12T12:37:26ZIn active reconstruction, an embodied agent must decide where to look next to efficiently acquire views that support high-quality novel-view rendering. Recent work on active view planning for neural rendering largely derives next-best-view (NBV) criteria by backpropagating through radiance fields or estimating information entropy over 3D Gaussian primitives. While effective, these strategies tightly couple view selection to heavy, representation-specific mechanisms and fail to account for the computational and resource constraints required for lightweight online deployment. In this paper, we revisit active reconstruction from a renderability-centric perspective. We propose $\mathbb{R}^{3}$-RECON, a radiance-fields-free active reconstruction framework that induces an implicit, pose-conditioned renderability field over SE(3) from a lightweight voxel map. Our formulation aggregates per-voxel online observation statistics into a unified scalar renderability score that is cheap to update and can be queried in closed form at arbitrary candidate viewpoints in milliseconds, without requiring gradients or radiance-field training. This renderability field is strongly correlated with image-space reconstruction error, naturally guiding NBV selection. We further introduce a panoramic extension that estimates omnidirectional (360$^\circ$) view utility to accelerate candidate evaluation. In the standard indoor Replica dataset, $\mathbb{R}^{3}$-RECON achieves more uniform novel-view quality and higher 3D Gaussian splatting (3DGS) reconstruction accuracy than recent active GS baselines with matched view and time budgets.2026-01-12T12:37:26Z18 pages, 11 figuresXiaofeng JinMatteo FrosiYiran GuoMatteo Matteuccihttp://arxiv.org/abs/2601.08234v1Statistical Blendshape Calculation and Analysis for Graphics Applications2026-01-12T05:12:36ZWith the development of virtualization and AI, real-time facial avatar animation is widely used in entertainment, office, business and other fields. Against this background, blendshapes have become a common industry animation solution because of their relative simplicity and ease of interpretation. Aiming for real-time performance and low computing resource dependence, we independently developed an accurate blendshape prediction system for low-power VR applications using a standard webcam. First, blendshape feature vectors are extracted through affine transformation and segmentation. Through further transformation and regression analysis, we were able to identify models for most blendshapes with significant predictive power. Post-processing was used to further improve response stability, including smoothing filtering and nonlinear transformations to minimize error. Experiments showed the system achieved accuracy similar to ARKit 6. Our model has low sensor/hardware requirements and realtime response with a consistent, accurate and smooth visual experience.2026-01-12T05:12:36Z12 figuresShuxian LiTianyue WangChris Twomblyhttp://arxiv.org/abs/2510.03434v2Paris: A Decentralized Trained Open-Weight Diffusion Model2026-01-12T03:17:26ZWe present Paris, the first publicly released diffusion model pre-trained entirely through decentralized computation. Paris demonstrates that high-quality text-to-image generation can be achieved without centrally coordinated infrastructure. Paris is open for research and commercial use. Paris required implementing our Distributed Diffusion Training framework from scratch. The model consists of 8 expert diffusion models (129M-605M parameters each) trained in complete isolation with no gradient, parameter, or intermediate activation synchronization. Rather than requiring synchronized gradient updates across thousands of GPUs, we partition data into semantically coherent clusters where each expert independently optimizes its subset while collectively approximating the full distribution. A lightweight transformer router dynamically selects appropriate experts at inference, achieving generation quality comparable to centrally coordinated baselines. Eliminating synchronization enables training on heterogeneous hardware without specialized interconnects. Empirical validation confirms that Paris's decentralized training maintains generation quality while removing the dedicated GPU cluster requirement for large-scale diffusion models. Paris achieves this using 14$\times$ less training data and 16$\times$ less compute than the prior decentralized baseline.2025-10-03T18:53:12ZZhiying JiangRaihan SerajMarcos VillagraBidhan Royhttp://arxiv.org/abs/2601.04382v2Radiant Foam Rendering on a Graph Processor2026-01-11T16:43:29ZMany emerging many-core accelerators replace a single large device memory with hundreds to thousands of lightweight cores, each owning only a small local SRAM and exchanging data via explicit on-chip communication. This organization offers high aggregate bandwidth, but it breaks a key assumption behind many volumetric rendering techniques: that rays can randomly access a large, unified scene representation. Rendering efficiently on such hardware therefore requires distributing both data and computation, keeping ray traversal mostly local, and structuring communication into predictable routes.
We present a fully in-SRAM, distributed renderer for the Radiant Foam Voronoi-cell volumetric representation on the Graphcore Mk2 IPU(Intelligence Processing Unit), a many-core accelerator with tile-local SRAM and explicit inter-tile communication. Our system shards the scene across tiles and forwards rays between shards through a hierarchical routing overlay, enabling ray marching entirely from on-chip SRAM with predictable communication. On Mip-NeRF~360 scenes, the system attains near-interactive throughput of approximately 1 fps at 640x480 with image and depth map quality close to the original GPU-based Radiant Foam implementation, while keeping all scene data and ray state in on-chip SRAM. Beyond demonstrating feasibility, we analyze routing, memory, and scheduling bottlenecks that inform how future distributed-memory accelerators can better support irregular, data-movement-heavy rendering workloads.2026-01-07T20:44:04Z24 pages, 26 figuresZulkhuu TuyaIgnacio AlzugarayNicholas FryAndrew J. Davisonhttp://arxiv.org/abs/2601.06980v1A New Perspective on Drawing Venn Diagrams for Data Visualization2026-01-11T16:31:41ZWe introduce VennFan, a method for generating $n$-set Venn diagrams based on the polar coordinate projection of trigonometric boundaries, resulting in Venn diagrams that resemble a set of fan blades. Unlike most classical constructions, our method emphasizes readability and customizability by using shaped sinusoids and amplitude scaling. We describe both sine- and cosine-based variants of VennFan and propose an automatic label placement heuristic tailored to these fan-like layouts. VennFan is available as a Python package (https://pypi.org/project/vennfan/).2026-01-11T16:31:41Z15 pages, 19 figuresBálint Csanádyhttp://arxiv.org/abs/2601.07870v1HOSC: A Periodic Activation with Saturation Control for High-Fidelity Implicit Neural Representations2026-01-10T22:24:28ZPeriodic activations such as sine preserve high-frequency information in implicit neural representations (INRs) through their oscillatory structure, but often suffer from gradient instability and limited control over multi-scale behavior. We introduce the Hyperbolic Oscillator with Saturation Control (HOSC) activation, $\text{HOSC}(x) = \tanh\bigl(β\sin(ω_0 x)\bigr)$, which exposes an explicit parameter $β$ that controls the Lipschitz bound of the activation by $βω_0$. This provides a direct mechanism to tune gradient magnitudes while retaining a periodic carrier. We provide a mathematical analysis and conduct a comprehensive empirical study across images, audio, video, NeRFs, and SDFs using standardized training protocols. Comparative analysis against SIREN, FINER, and related methods shows where HOSC provides substantial benefits and where it achieves competitive parity. Results establish HOSC as a practical periodic activation for INR applications, with domain-specific guidance on hyperparameter selection. For code visit the project page https://hosc-nn.github.io/ .2026-01-10T22:24:28Z16 pages including appendices, 12 figures, 15 tablesMichal Jan WlodarczykDanzel SerranoPrzemyslaw Musialskihttp://arxiv.org/abs/2601.11617v1PointSLAM++: Robust Dense Neural Gaussian Point Cloud-based SLAM2026-01-10T04:12:13ZReal-time 3D reconstruction is crucial for robotics and augmented reality, yet current simultaneous localization and mapping(SLAM) approaches often struggle to maintain structural consistency and robust pose estimation in the presence of depth noise. This work introduces PointSLAM++, a novel RGB-D SLAM system that leverages a hierarchically constrained neural Gaussian representation to preserve structural relationships while generating Gaussian primitives for scene mapping. It also employs progressive pose optimization to mitigate depth sensor noise, significantly enhancing localization accuracy. Furthermore, it utilizes a dynamic neural representation graph that adjusts the distribution of Gaussian nodes based on local geometric complexity, enabling the map to adapt to intricate scene details in real time. This combination yields high-precision 3D mapping and photorealistic scene rendering. Experimental results show PointSLAM++ outperforms existing 3DGS-based SLAM methods in reconstruction accuracy and rendering quality, demonstrating its advantages for large-scale AR and robotics.2026-01-10T04:12:13ZXu WangBoyao HanXiaojun ChenYing LiuRuihui Lihttp://arxiv.org/abs/2601.06378v1RigMo: Unifying Rig and Motion Learning for Generative Animation2026-01-10T01:26:28ZDespite significant progress in 4D generation, rig and motion, the core structural and dynamic components of animation are typically modeled as separate problems. Existing pipelines rely on ground-truth skeletons and skinning weights for motion generation and treat auto-rigging as an independent process, undermining scalability and interpretability. We present RigMo, a unified generative framework that jointly learns rig and motion directly from raw mesh sequences, without any human-provided rig annotations. RigMo encodes per-vertex deformations into two compact latent spaces: a rig latent that decodes into explicit Gaussian bones and skinning weights, and a motion latent that produces time-varying SE(3) transformations. Together, these outputs define an animatable mesh with explicit structure and coherent motion, enabling feed-forward rig and motion inference for deformable objects. Beyond unified rig-motion discovery, we introduce a Motion-DiT model operating in RigMo's latent space and demonstrate that these structure-aware latents can naturally support downstream motion generation tasks. Experiments on DeformingThings4D, Objaverse-XL, and TrueBones demonstrate that RigMo learns smooth, interpretable, and physically plausible rigs, while achieving superior reconstruction and category-level generalization compared to existing auto-rigging and deformation baselines. RigMo establishes a new paradigm for unified, structure-aware, and scalable dynamic 3D modeling.2026-01-10T01:26:28ZProject Page: https://RigMo-Page.github.ioHao ZhangJiahao LuoBohui WanYizhou ZhaoZongrui LiMichael VasilkovskyChaoyang WangJian WangNarendra AhujaBing Zhouhttp://arxiv.org/abs/2601.06239v1A survey of facial recognition techniques2026-01-09T16:24:44ZAs multimedia content is quickly growing, the field of facial recognition has become one of the major research fields, particularly in the recent years. The most problematic area to researchers in image processing and computer vision is the human face which is a complex object with myriads of distinctive features that can be used to identify the face. The survey of this survey is particularly focused on most challenging facial characteristics, including differences in the light, ageing, variation in poses, partial occlusion, and facial expression and presents methodological solutions. The factors, therefore, are inevitable in the creation of effective facial recognition mechanisms used on facial images. This paper reviews the most sophisticated methods of facial detection which are Hidden Markov Models, Principal Component Analysis (PCA), Elastic Cluster Plot Matching, Support Vector Machine (SVM), Gabor Waves, Artificial Neural Networks (ANN), Eigenfaces, Independent Component Analysis (ICA), and 3D Morphable Model. Alongside the works mentioned above, we have also analyzed the images of a number of facial databases, namely JAFEE, FEI, Yale, LFW, AT&T (then called ORL), and AR (created by Martinez and Benavente), to analyze the results. However, this survey is aimed at giving a thorough literature review of face recognition, and its applications, and some experimental results are provided at the end after a detailed discussion.2026-01-09T16:24:44Z12 pages, 12 figures, articleInternational Journal of Communication and Information Technology 2025; 6(2): 214-225Aya Kaysan Bahjat10.33545/2707661X.2025.v6.i2c.167