https://arxiv.org/api/RhiILocnixg1Pr17AfWs638bEn42026-06-23T20:21:44Z9374105015http://arxiv.org/abs/2601.07484v1R3-RECON: Radiance-Field-Free Active Reconstruction via Renderability2026-01-12T12:37:26ZIn active reconstruction, an embodied agent must decide where to look next to efficiently acquire views that support high-quality novel-view rendering. Recent work on active view planning for neural rendering largely derives next-best-view (NBV) criteria by backpropagating through radiance fields or estimating information entropy over 3D Gaussian primitives. While effective, these strategies tightly couple view selection to heavy, representation-specific mechanisms and fail to account for the computational and resource constraints required for lightweight online deployment. In this paper, we revisit active reconstruction from a renderability-centric perspective. We propose $\mathbb{R}^{3}$-RECON, a radiance-fields-free active reconstruction framework that induces an implicit, pose-conditioned renderability field over SE(3) from a lightweight voxel map. Our formulation aggregates per-voxel online observation statistics into a unified scalar renderability score that is cheap to update and can be queried in closed form at arbitrary candidate viewpoints in milliseconds, without requiring gradients or radiance-field training. This renderability field is strongly correlated with image-space reconstruction error, naturally guiding NBV selection. We further introduce a panoramic extension that estimates omnidirectional (360$^\circ$) view utility to accelerate candidate evaluation. In the standard indoor Replica dataset, $\mathbb{R}^{3}$-RECON achieves more uniform novel-view quality and higher 3D Gaussian splatting (3DGS) reconstruction accuracy than recent active GS baselines with matched view and time budgets.2026-01-12T12:37:26Z18 pages, 11 figuresXiaofeng JinMatteo FrosiYiran GuoMatteo Matteuccihttp://arxiv.org/abs/2601.08234v1Statistical Blendshape Calculation and Analysis for Graphics Applications2026-01-12T05:12:36ZWith the development of virtualization and AI, real-time facial avatar animation is widely used in entertainment, office, business and other fields. Against this background, blendshapes have become a common industry animation solution because of their relative simplicity and ease of interpretation. Aiming for real-time performance and low computing resource dependence, we independently developed an accurate blendshape prediction system for low-power VR applications using a standard webcam. First, blendshape feature vectors are extracted through affine transformation and segmentation. Through further transformation and regression analysis, we were able to identify models for most blendshapes with significant predictive power. Post-processing was used to further improve response stability, including smoothing filtering and nonlinear transformations to minimize error. Experiments showed the system achieved accuracy similar to ARKit 6. Our model has low sensor/hardware requirements and realtime response with a consistent, accurate and smooth visual experience.2026-01-12T05:12:36Z12 figuresShuxian LiTianyue WangChris Twomblyhttp://arxiv.org/abs/2510.03434v2Paris: A Decentralized Trained Open-Weight Diffusion Model2026-01-12T03:17:26ZWe present Paris, the first publicly released diffusion model pre-trained entirely through decentralized computation. Paris demonstrates that high-quality text-to-image generation can be achieved without centrally coordinated infrastructure. Paris is open for research and commercial use. Paris required implementing our Distributed Diffusion Training framework from scratch. The model consists of 8 expert diffusion models (129M-605M parameters each) trained in complete isolation with no gradient, parameter, or intermediate activation synchronization. Rather than requiring synchronized gradient updates across thousands of GPUs, we partition data into semantically coherent clusters where each expert independently optimizes its subset while collectively approximating the full distribution. A lightweight transformer router dynamically selects appropriate experts at inference, achieving generation quality comparable to centrally coordinated baselines. Eliminating synchronization enables training on heterogeneous hardware without specialized interconnects. Empirical validation confirms that Paris's decentralized training maintains generation quality while removing the dedicated GPU cluster requirement for large-scale diffusion models. Paris achieves this using 14$\times$ less training data and 16$\times$ less compute than the prior decentralized baseline.2025-10-03T18:53:12ZZhiying JiangRaihan SerajMarcos VillagraBidhan Royhttp://arxiv.org/abs/2601.04382v2Radiant Foam Rendering on a Graph Processor2026-01-11T16:43:29ZMany emerging many-core accelerators replace a single large device memory with hundreds to thousands of lightweight cores, each owning only a small local SRAM and exchanging data via explicit on-chip communication. This organization offers high aggregate bandwidth, but it breaks a key assumption behind many volumetric rendering techniques: that rays can randomly access a large, unified scene representation. Rendering efficiently on such hardware therefore requires distributing both data and computation, keeping ray traversal mostly local, and structuring communication into predictable routes.
We present a fully in-SRAM, distributed renderer for the Radiant Foam Voronoi-cell volumetric representation on the Graphcore Mk2 IPU(Intelligence Processing Unit), a many-core accelerator with tile-local SRAM and explicit inter-tile communication. Our system shards the scene across tiles and forwards rays between shards through a hierarchical routing overlay, enabling ray marching entirely from on-chip SRAM with predictable communication. On Mip-NeRF~360 scenes, the system attains near-interactive throughput of approximately 1 fps at 640x480 with image and depth map quality close to the original GPU-based Radiant Foam implementation, while keeping all scene data and ray state in on-chip SRAM. Beyond demonstrating feasibility, we analyze routing, memory, and scheduling bottlenecks that inform how future distributed-memory accelerators can better support irregular, data-movement-heavy rendering workloads.2026-01-07T20:44:04Z24 pages, 26 figuresZulkhuu TuyaIgnacio AlzugarayNicholas FryAndrew J. Davisonhttp://arxiv.org/abs/2601.06980v1A New Perspective on Drawing Venn Diagrams for Data Visualization2026-01-11T16:31:41ZWe introduce VennFan, a method for generating $n$-set Venn diagrams based on the polar coordinate projection of trigonometric boundaries, resulting in Venn diagrams that resemble a set of fan blades. Unlike most classical constructions, our method emphasizes readability and customizability by using shaped sinusoids and amplitude scaling. We describe both sine- and cosine-based variants of VennFan and propose an automatic label placement heuristic tailored to these fan-like layouts. VennFan is available as a Python package (https://pypi.org/project/vennfan/).2026-01-11T16:31:41Z15 pages, 19 figuresBálint Csanádyhttp://arxiv.org/abs/2601.07870v1HOSC: A Periodic Activation with Saturation Control for High-Fidelity Implicit Neural Representations2026-01-10T22:24:28ZPeriodic activations such as sine preserve high-frequency information in implicit neural representations (INRs) through their oscillatory structure, but often suffer from gradient instability and limited control over multi-scale behavior. We introduce the Hyperbolic Oscillator with Saturation Control (HOSC) activation, $\text{HOSC}(x) = \tanh\bigl(β\sin(ω_0 x)\bigr)$, which exposes an explicit parameter $β$ that controls the Lipschitz bound of the activation by $βω_0$. This provides a direct mechanism to tune gradient magnitudes while retaining a periodic carrier. We provide a mathematical analysis and conduct a comprehensive empirical study across images, audio, video, NeRFs, and SDFs using standardized training protocols. Comparative analysis against SIREN, FINER, and related methods shows where HOSC provides substantial benefits and where it achieves competitive parity. Results establish HOSC as a practical periodic activation for INR applications, with domain-specific guidance on hyperparameter selection. For code visit the project page https://hosc-nn.github.io/ .2026-01-10T22:24:28Z16 pages including appendices, 12 figures, 15 tablesMichal Jan WlodarczykDanzel SerranoPrzemyslaw Musialskihttp://arxiv.org/abs/2601.11617v1PointSLAM++: Robust Dense Neural Gaussian Point Cloud-based SLAM2026-01-10T04:12:13ZReal-time 3D reconstruction is crucial for robotics and augmented reality, yet current simultaneous localization and mapping(SLAM) approaches often struggle to maintain structural consistency and robust pose estimation in the presence of depth noise. This work introduces PointSLAM++, a novel RGB-D SLAM system that leverages a hierarchically constrained neural Gaussian representation to preserve structural relationships while generating Gaussian primitives for scene mapping. It also employs progressive pose optimization to mitigate depth sensor noise, significantly enhancing localization accuracy. Furthermore, it utilizes a dynamic neural representation graph that adjusts the distribution of Gaussian nodes based on local geometric complexity, enabling the map to adapt to intricate scene details in real time. This combination yields high-precision 3D mapping and photorealistic scene rendering. Experimental results show PointSLAM++ outperforms existing 3DGS-based SLAM methods in reconstruction accuracy and rendering quality, demonstrating its advantages for large-scale AR and robotics.2026-01-10T04:12:13ZXu WangBoyao HanXiaojun ChenYing LiuRuihui Lihttp://arxiv.org/abs/2601.06378v1RigMo: Unifying Rig and Motion Learning for Generative Animation2026-01-10T01:26:28ZDespite significant progress in 4D generation, rig and motion, the core structural and dynamic components of animation are typically modeled as separate problems. Existing pipelines rely on ground-truth skeletons and skinning weights for motion generation and treat auto-rigging as an independent process, undermining scalability and interpretability. We present RigMo, a unified generative framework that jointly learns rig and motion directly from raw mesh sequences, without any human-provided rig annotations. RigMo encodes per-vertex deformations into two compact latent spaces: a rig latent that decodes into explicit Gaussian bones and skinning weights, and a motion latent that produces time-varying SE(3) transformations. Together, these outputs define an animatable mesh with explicit structure and coherent motion, enabling feed-forward rig and motion inference for deformable objects. Beyond unified rig-motion discovery, we introduce a Motion-DiT model operating in RigMo's latent space and demonstrate that these structure-aware latents can naturally support downstream motion generation tasks. Experiments on DeformingThings4D, Objaverse-XL, and TrueBones demonstrate that RigMo learns smooth, interpretable, and physically plausible rigs, while achieving superior reconstruction and category-level generalization compared to existing auto-rigging and deformation baselines. RigMo establishes a new paradigm for unified, structure-aware, and scalable dynamic 3D modeling.2026-01-10T01:26:28ZProject Page: https://RigMo-Page.github.ioHao ZhangJiahao LuoBohui WanYizhou ZhaoZongrui LiMichael VasilkovskyChaoyang WangJian WangNarendra AhujaBing Zhouhttp://arxiv.org/abs/2601.06239v1A survey of facial recognition techniques2026-01-09T16:24:44ZAs multimedia content is quickly growing, the field of facial recognition has become one of the major research fields, particularly in the recent years. The most problematic area to researchers in image processing and computer vision is the human face which is a complex object with myriads of distinctive features that can be used to identify the face. The survey of this survey is particularly focused on most challenging facial characteristics, including differences in the light, ageing, variation in poses, partial occlusion, and facial expression and presents methodological solutions. The factors, therefore, are inevitable in the creation of effective facial recognition mechanisms used on facial images. This paper reviews the most sophisticated methods of facial detection which are Hidden Markov Models, Principal Component Analysis (PCA), Elastic Cluster Plot Matching, Support Vector Machine (SVM), Gabor Waves, Artificial Neural Networks (ANN), Eigenfaces, Independent Component Analysis (ICA), and 3D Morphable Model. Alongside the works mentioned above, we have also analyzed the images of a number of facial databases, namely JAFEE, FEI, Yale, LFW, AT&T (then called ORL), and AR (created by Martinez and Benavente), to analyze the results. However, this survey is aimed at giving a thorough literature review of face recognition, and its applications, and some experimental results are provided at the end after a detailed discussion.2026-01-09T16:24:44Z12 pages, 12 figures, articleInternational Journal of Communication and Information Technology 2025; 6(2): 214-225Aya Kaysan Bahjat10.33545/2707661X.2025.v6.i2c.167http://arxiv.org/abs/2601.05853v1LayerGS: Decomposition and Inpainting of Layered 3D Human Avatars via 2D Gaussian Splatting2026-01-09T15:30:12ZWe propose a novel framework for decomposing arbitrarily posed humans into animatable multi-layered 3D human avatars, separating the body and garments. Conventional single-layer reconstruction methods lock clothing to one identity, while prior multi-layer approaches struggle with occluded regions. We overcome both limitations by encoding each layer as a set of 2D Gaussians for accurate geometry and photorealistic rendering, and inpainting hidden regions with a pretrained 2D diffusion model via score-distillation sampling (SDS). Our three-stage training strategy first reconstructs the coarse canonical garment via single-layer reconstruction, followed by multi-layer training to jointly recover the inner-layer body and outer-layer garment details. Experiments on two 3D human benchmark datasets (4D-Dress, Thuman2.0) show that our approach achieves better rendering quality and layer decomposition and recomposition than the previous state-of-the-art, enabling realistic virtual try-on under novel viewpoints and poses, and advancing practical creation of high-fidelity 3D human assets for immersive applications. Our code is available at https://github.com/RockyXu66/LayerGS2026-01-09T15:30:12ZYinghan XuJohn Dinglianahttp://arxiv.org/abs/2601.04494v2Differential Locally Injective Grid Deformation and Optimization2026-01-09T07:16:16ZGrids are a general representation for capturing regularly-spaced information, but since they are uniform in space, they cannot dynamically allocate resolution to regions with varying levels of detail. There has been some exploration of indirect grid adaptivity by replacing uniform grids with tetrahedral meshes or locally subdivided grids, as inversion-free deformation of grids is difficult. This work develops an inversion-free grid deformation method that optimizes differential weight to adaptively compress space. The method is the first to optimize grid vertices as differential elements using vertex-colorings, decomposing a dense input linear system into many independent sets of vertices which can be optimized concurrently. This method is then also extended to optimize UV meshes with convex boundaries. Experimentally, this differential representation leads to a smoother optimization manifold than updating extrinsic vertex coordinates. By optimizing each sets of vertices in a coloring separately, local injectivity checks are straightforward since the valid region for each vertex is fixed. This enables the use of optimizers such as Adam, as each vertex can be optimized independently of other vertices. We demonstrate the generality and efficacy of this approach through applications in isosurface extraction for inverse rendering, image compaction, and mesh parameterization.2026-01-08T01:58:20ZJulian KnodtSeung-Hwan Baekhttp://arxiv.org/abs/2506.23364v2Data-Driven Compute Overlays for Interactive Geographic Simulation and Visualization2026-01-08T20:02:47ZWe present interactive data-driven compute overlays for native and web-based 3D geographic map applications based on WebGPU. Our data-driven overlays are generated in a multi-step compute workflow from multiple data sources on the GPU. We demonstrate their potential by showing results from snow cover and avalanche simulations, where simulation parameters can be adjusted interactively and results are visualized instantly. Benchmarks show that our approach can compute large-scale avalanche simulations in milliseconds to seconds, depending on the size of the terrain and the simulation parameters, which is multiple orders of magnitude faster than a state-of-the-art Python implementation.2025-06-29T18:52:19ZPatrick KomonGerald KimmersdorferAdam CelarekManuela Waldner10.1109/VIS60296.2025.00043http://arxiv.org/abs/2601.05181v1Spacecube: A fast inverse hyperspectral georectification system2026-01-08T18:04:09ZHyperspectral cameras provide numerous advantages in terms of the utility of the data captured. They capture hundreds of data points per sample (pixel) instead of only the few of RGB or multispectral camera systems. Aerial systems sense such data remotely, but the data must be georectified to produce consistent images before analysis. We find the traditional direct georectification method to be slow, and it is prone to artifacts. To address its downsides, we propose Spacecube, a program that implements a complete hyperspectral georectification pipeline, including our own fast inverse georectification technique, using OpenGL graphics programming technologies. Spacecube operates substantially faster than real-time and eliminates pixel coverage artifacts. It facilitates high quality interactive viewing, data exploration, and export of final products. We release Spacecube's source code publicly for the community to use.2026-01-08T18:04:09Z9 pages, 16 figures. source code available after peer-reviewed publicationThomas P. WatsonEddie L. Jacobshttp://arxiv.org/abs/2601.05016v1From Idea to Co-Creation: A Planner-Actor-Critic Framework for Agent Augmented 3D Modeling2026-01-08T15:18:12ZWe present a framework that extends the Actor-Critic architecture to creative 3D modeling through multi-agent self-reflection and human-in-the-loop supervision. While existing approaches rely on single-prompt agents that directly execute modeling commands via tools like Blender MCP, our approach introduces a Planner-Actor-Critic architecture. In this design, the Planner coordinates modeling steps, the Actor executes them, and the Critic provides iterative feedback, while human users act as supervisors and advisors throughout the process. Through systematic comparison between single-prompt modeling and our reflective multi-agent approach, we demonstrate improvements in geometric accuracy, aesthetic quality, and task completion rates across diverse 3D modeling scenarios. Our evaluation reveals that critic-guided reflection, combined with human supervisory input, reduces modeling errors and increases complexity and quality of the result compared to direct single-prompt execution. This work establishes that structured agent self-reflection, when augmented by human oversight and advisory guidance, produces higher-quality 3D models while maintaining efficient workflow integration through real-time Blender synchronization.2026-01-08T15:18:12ZJin GaoSaichandu Julurihttp://arxiv.org/abs/2507.09792v3CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design2026-01-08T03:47:27ZComputer-aided design (CAD) is the digital construction of 2D and 3D objects, and is central to a wide range of engineering and manufacturing applications like automobile and aviation. Despite its importance, CAD modeling remains largely a time-intensive, manual task. Recent works have attempted to automate this process with small transformer-based models and handcrafted CAD sequence representations. However, there has been little effort to leverage the potential of large language models (LLMs) for sequential CAD design. In this work, we introduce a new large-scale dataset of more than 170k CAD models annotated with high-quality, human-like descriptions generated with our pipeline based on GPT-4.1. Using this dataset, we fine-tune powerful code-LLMs to generate CAD sequences represented in a JSON-based format from natural language descriptions, demonstrating the viability and effectiveness of this approach for text-conditioned CAD generation. Because simple metrics often fail to reflect the quality of generated objects, we introduce geometric and topological metrics based on sphericity, mean curvature, and Euler characteristic to provide richer structural insights. Our experiments and ablation studies on both synthetic and human-annotated data demonstrate that CADmium is able to automate CAD design, drastically speeding up the design of new objects. The dataset, code, and fine-tuned models are available online.2025-07-13T21:11:53ZPublished in Transactions on Machine Learning Research (TMLR) 01/2026Transactions on Machine Learning Research (TMLR) 01/2026; https://openreview.net/forum?id=lExqWvQht8Prashant GovindarajanDavide BaldelliJay PathakQuentin FournierSarath Chandar