https://arxiv.org/api/lF7B/wPFEwqkUhrw1eKXCqF0TCc2026-06-23T21:27:40Z9374106515http://arxiv.org/abs/2601.04370v1End-to-end differentiable design of geometric waveguide displays2026-01-07T20:19:11ZGeometric waveguides are a promising architecture for optical see-through augmented reality displays, but their performance is severely bottlenecked by the difficulty of jointly optimizing non-sequential light transport and polarization-dependent multilayer thin-film coatings. Here we present the first end-to-end differentiable optimization framework for geometric waveguide that couples non-sequential Monte Carlo polarization ray tracing with a differentiable transfer-matrix thin-film solver. A differentiable Monte Carlo ray tracer avoids the exponential growth of deterministic ray splitting while enabling gradients backpropagation from eyebox metrics to design parameters. With memory-saving strategies, we optimize more than one thousand layer-thickness parameters and billions of non-sequential ray-surface intersections on a single multi-GPU workstation. Automated layer pruning is achieved by starting from over-parameterized stacks and driving redundant layers to zero thickness under discrete manufacturability constraints, effectively performing topology optimization to discover optimal coating structures. On a representative design, starting from random initialization within thickness bounds, our method increases light efficiency from 4.1\% to 33.5\% and improves eyebox and FoV uniformity by $\sim$17$\times$ and $\sim$11$\times$, respectively. Furthermore, we jointly optimize the waveguide and an image preprocessing network to improve perceived image quality. Our framework not only enables system-level, high-dimensional coating optimization inside the waveguide, but also expands the scope of differentiable optics for next-generation optical design.2026-01-07T20:19:11ZXinge YangZhaocheng LiuZhaoyu NieQingyuan FanZhimin ShiJim BonarWolfgang Heidrichhttp://arxiv.org/abs/2601.04348v1SCAR-GS: Spatial Context Attention for Residuals in Progressive Gaussian Splatting2026-01-07T19:34:51ZRecent advances in 3D Gaussian Splatting have allowed for real-time, high-fidelity novel view synthesis. Nonetheless, these models have significant storage requirements for large and medium-sized scenes, hindering their deployment over cloud and streaming services. Some of the most recent progressive compression techniques for these models rely on progressive masking and scalar quantization techniques to reduce the bitrate of Gaussian attributes using spatial context models. While effective, scalar quantization may not optimally capture the correlations of high-dimensional feature vectors, which can potentially limit the rate-distortion performance.
In this work, we introduce a novel progressive codec for 3D Gaussian Splatting that replaces traditional methods with a more powerful Residual Vector Quantization approach to compress the primitive features. Our key contribution is an auto-regressive entropy model, guided by a multi-resolution hash grid, that accurately predicts the conditional probability of each successive transmitted index, allowing for coarse and refinement layers to be compressed with high efficiency.2026-01-07T19:34:51ZDiego RevillaPooja SureshAnand BhojanOoi Wei Tsanghttp://arxiv.org/abs/2601.04194v1Choreographing a World of Dynamic Objects2026-01-07T18:59:40ZDynamic objects in our physical 4D (3D + time) world are constantly evolving, deforming, and interacting with other objects, leading to diverse 4D scene dynamics. In this paper, we present a universal generative pipeline, CHORD, for CHOReographing Dynamic objects and scenes and synthesizing this type of phenomena. Traditional rule-based graphics pipelines to create these dynamics are based on category-specific heuristics, yet are labor-intensive and not scalable. Recent learning-based methods typically demand large-scale datasets, which may not cover all object categories in interest. Our approach instead inherits the universality from the video generative models by proposing a distillation-based pipeline to extract the rich Lagrangian motion information hidden in the Eulerian representations of 2D videos. Our method is universal, versatile, and category-agnostic. We demonstrate its effectiveness by conducting experiments to generate a diverse range of multi-body 4D dynamics, show its advantage compared to existing methods, and demonstrate its applicability in generating robotics manipulation policies. Project page: https://yanzhelyu.github.io/chord2026-01-07T18:59:40ZYanzhe LyuChen GengKarthik DharmarajanYunzhi ZhangHadi AlzayerShangzhe WuJiajun Wuhttp://arxiv.org/abs/1001.3974v3Modelación y Visualización Tridimensional Interactiva de Variables Eléctricas en Celdas de Electro-Obtención con Electrodos Bipolares2026-01-07T17:25:45ZThe use of floating bipolar electrodes in copper electro-winning cells represents an emerging technology that promises economic and operational impacts. This article presents a computational tool designed for the simulation and analysis of these electrochemical systems. Based on the generalization and optimization of an existing 2D finite difference model for calculating electrical variables in rectangular cells, a new 3D model capable of processing complex geometries, not necessarily rectangular, has been developed. At the same time, a new analytical method for estimating potentials in floating electrodes is introduced, overcoming the inaccuracies of previous heuristic approaches. The analysis of the results is supported by an interactive visualization technique of three-dimensional vector fields as flow lines.2010-01-22T12:57:59Z6 pages, 3 figures, in Spanish. For more details, see arXiv:1001.4002 [cs.GR]. Metadata-only update: Authors' names standardized (maternal surnames removed; paternal surnames as sole last name). Title orthography corrected with TeX accents. Abstract refinedAnales del XIV Congreso de la Asociacion Chilena de Control Automatico, ACCA, 2000, pp. 362-367César MenaRicardo SánchezLautaro Salazarhttp://arxiv.org/abs/2508.08930v2How Does a Virtual Agent Decide Where to Look? Symbolic Cognitive Reasoning for Embodied Head Rotation2026-01-07T03:50:59ZNatural head rotation is critical for believable embodied virtual agents, yet this micro-level behavior remains largely underexplored. While head-rotation prediction algorithms could, in principle, reproduce this behavior, they typically focus on visually salient stimuli and overlook the cognitive motives that guide head rotation. This yields agents that look at conspicuous objects while overlooking obstacles or task-relevant cues, diminishing realism in a virtual environment. We introduce SCORE, a Symbolic Cognitive Reasoning framework for Embodied Head Rotation, a data-agnostic framework that produces context-aware head movements without task-specific training or hand-tuned heuristics. A controlled VR study (N=20) identifies five motivational drivers of human head movements: Interest, Information Seeking, Safety, Social Schema, and Habit. SCORE encodes these drivers as symbolic predicates, perceives the scene with a Vision-Language Model (VLM), and plans head poses with a Large Language Model (LLM). The framework employs a hybrid workflow: the VLM-LLM reasoning is executed offline, after which a lightweight FastVLM performs online validation to suppress hallucinations while maintaining responsiveness to scene dynamics. The result is an agent that predicts not only where to look but also why, generalizing to unseen scenes and multi-agent crowds while retaining behavioral plausibility.2025-08-12T13:32:18Z13 pages, 8 figures. Accepted to SIGGRAPH Asia Conference Papers '25SIGGRAPH Asia Conference Papers '25, December 15-18, 2025, HongkongJuyeong HwangSeong-Eun HongJaeYoung SeonHyeongyeop Kang10.1145/3757377.3763849http://arxiv.org/abs/2601.03114v1Stroke Patches: Customizable Artistic Image Styling Using Regression2026-01-06T15:44:18ZWe present a novel, regression-based method for artistically styling images. Unlike recent neural style transfer or diffusion-based approaches, our method allows for explicit control over the stroke composition and level of detail in the rendered image through the use of an extensible set of stroke patches. The stroke patch sets are procedurally generated by small programs that control the shape, size, orientation, density, color, and noise level of the strokes in the individual patches. Once trained on a set of stroke patches, a U-Net based regression model can render any input image in a variety of distinct, evocative and customizable styles.2026-01-06T15:44:18Z39th Conference on Neural Information Processing Systems (NeurIPS 2025) Creative AI TrackIan JaffrayJohn Bronskillhttp://arxiv.org/abs/2601.03319v1CaricatureGS: Exaggerating 3D Gaussian Splatting Faces With Gaussian Curvature2026-01-06T13:56:28ZA photorealistic and controllable 3D caricaturization framework for faces is introduced. We start with an intrinsic Gaussian curvature-based surface exaggeration technique, which, when coupled with texture, tends to produce over-smoothed renders. To address this, we resort to 3D Gaussian Splatting (3DGS), which has recently been shown to produce realistic free-viewpoint avatars. Given a multiview sequence, we extract a FLAME mesh, solve a curvature-weighted Poisson equation, and obtain its exaggerated form. However, directly deforming the Gaussians yields poor results, necessitating the synthesis of pseudo-ground-truth caricature images by warping each frame to its exaggerated 2D representation using local affine transformations. We then devise a training scheme that alternates real and synthesized supervision, enabling a single Gaussian collection to represent both natural and exaggerated avatars. This scheme improves fidelity, supports local edits, and allows continuous control over the intensity of the caricature. In order to achieve real-time deformations, an efficient interpolation between the original and exaggerated surfaces is introduced. We further analyze and show that it has a bounded deviation from closed-form solutions. In both quantitative and qualitative evaluations, our results outperform prior work, delivering photorealistic, geometry-controlled caricature avatars.2026-01-06T13:56:28ZEldad MatmonAmit BrachaNoam RotsteinRon Kimmelhttp://arxiv.org/abs/2601.02829v1Resolution deficits drive simulator sickness and compromise reading performance in virtual environments2026-01-06T09:01:16ZExtended reality (XR) is evolving into a general-purpose computing platform, yet its adoption for productivity is hindered by visual fatigue and simulator sickness. While these symptoms are often attributed to latency or motion conflicts, the precise impact of textual clarity on physiological comfort remains undefined. Here we show that sub-optimal effective resolution, the clarity that reaches the eye after the full display-optics-rendering pipeline, is a primary driver of simulator sickness during reading tasks in both virtual reality and video see-through environments. By systematically manipulating end-to-end effective resolution on a unified logMAR scale, we measured reading psychophysics and sickness symptoms in a controlled within-subjects study. We find that reading performance and user comfort degrade exponentially as resolution drops below 0 logMAR (normal visual acuity). Notably, our results reveal 0 logMAR as a key physiological tipping point: resolutions better than this threshold yield naked-eye-level performance with minimal sickness, whereas poorer resolutions trigger rapid, non-linear increases in nausea and oculomotor strain. These findings suggest that the cognitive and perceptual effort required to resolve blurry text directly compromises user comfort, establishing human-eye resolution as a critical baseline for the design of future ergonomic XR systems.2026-01-06T09:01:16Z18 pages, 7 figures, 7 tablesJialin WangXinru ChengBoyong HouHai-Ning Lianghttp://arxiv.org/abs/2601.02805v1The perceptual gap between video see-through displays and natural human vision2026-01-06T08:28:23ZVideo see-through (VST) technology aims to seamlessly blend virtual and physical worlds by reconstructing reality through cameras. While manufacturers promise perceptual fidelity, it remains unclear how close these systems are to replicating natural human vision across varying environmental conditions. In this work, we quantify the perceptual gap between the human eye and different popular VST headsets (Apple Vision Pro, Meta Quest 3, Quest Pro) using psychophysical measures of visual acuity, contrast sensitivity, and color vision. We show that despite hardware advancements, all tested VST systems fail to match the dynamic range and adaptability of the naked eye. While high-end devices approach human performance in ideal lighting, they exhibit significant degradation in low-light conditions, particularly in contrast sensitivity and acuity. Our results map the physiological limitations of digital reality reconstruction, establishing a specific perceptual gap that defines the roadmap for achieving indistinguishable VST experiences.2026-01-06T08:28:23Z19 pages, 9 figures, 4 tablesJialin WangSongming PingKemu XuYue LiHai-Ning Lianghttp://arxiv.org/abs/2601.02096v1Dancing Points: Synthesizing Ballroom Dancing with Three-Point Inputs2026-01-05T13:24:12ZBallroom dancing is a structured yet expressive motion category. Its highly diverse movement and complex interactions between leader and follower dancers make the understanding and synthesis challenging. We demonstrate that the three-point trajectory available from a virtual reality (VR) device can effectively serve as a dancer's motion descriptor, simplifying the modeling and synthesis of interplay between dancers' full-body motions down to sparse trajectories. Thanks to the low dimensionality, we can employ an efficient MLP network to predict the follower's three-point trajectory directly from the leader's three-point input for certain types of ballroom dancing, addressing the challenge of modeling high-dimensional full-body interaction. It also prevents our method from overfitting thanks to its compact yet explicit representation. By leveraging the inherent structure of the movements and carefully planning the autoregressive procedure, we show a deterministic neural network is able to translate three-point trajectories into a virtual embodied avatar, which is typically considered under-constrained and requires generative models for common motions. In addition, we demonstrate this deterministic approach generalizes beyond small, structured datasets like ballroom dancing, and performs robustly on larger, more diverse datasets such as LaFAN. Our method provides a computationally- and data-efficient solution, opening new possibilities for immersive paired dancing applications. Code and pre-trained models for this paper are available at https://peizhuoli.github.io/dancing-points.2026-01-05T13:24:12ZPeizhuo LiSebastian StarkeYuting YeOlga Sorkine-Hornunghttp://arxiv.org/abs/2511.11618v3On The Topology of Polygonal Meshes2026-01-05T13:07:58ZThis paper is an introductory and informal exposition on the topology of polygonal meshes. We begin with a broad overview of topological notions and discuss how homeomorphisms, homotopy, and homology can be used to characterise topology. We move on to define polygonal meshes and make a distinction between intrinsic topology and extrinsic topology which depends on the space in which the mesh is immersed. A distinction is also made between quantitative topological properties and qualitative properties. Next, we outline proofs of the Euler and the Euler-Poincaré formulas. The Betti numbers are then defined in terms of the Euler-Poincaré formula and other mesh statistics rather than as cardinalities of the homology groups which allows us to avoid abstract algebra. Finally, we discuss how it is possible to cut a polygonal mesh such that it becomes a topological disc.2025-11-05T13:56:24Z26 pages, 22 figures (including nine in the margin)Andreas Bærentzenhttp://arxiv.org/abs/2601.02072v1SketchRodGS: Sketch-based Extraction of Slender Geometries for Animating Gaussian Splatting Scenes2026-01-05T12:51:12ZPhysics simulation of slender elastic objects often requires discretization as a polyline. However, constructing a polyline from Gaussian splatting is challenging as Gaussian splatting lacks connectivity information and the configuration of Gaussian primitives contains much noise. This paper presents a method to extract a polyline representation of the slender part of the objects in a Gaussian splatting scene from the user's sketching input. Our method robustly constructs a polyline mesh that represents the slender parts using the screen-space shortest path analysis that can be efficiently solved using dynamic programming. We demonstrate the effectiveness of our approach in several in-the-wild examples.2026-01-05T12:51:12ZPresented at SIGGRAPH Asia 2025 (Technical Communications). Best Technical Communications AwardProceedings of the SIGGRAPH Asia 2025 Technical Communications, Article No. 29, pp. 1 - 4Haato WatanabeNobuyuki Umetani10.1145/3757376.3771403http://arxiv.org/abs/2412.10977v2Point Cloud to Mesh Reconstruction: Methods, Trade-offs, and Implementation Guide2026-01-05T09:49:06ZReconstructing meshes from point clouds is a fundamental task in computer vision with applications spanning robotics, autonomous systems, and medical imaging. Selecting an appropriate learning-based method requires understanding trade-offs between computational efficiency, geometric accuracy, and output constraints. This paper categorizes over fifteen methods into five paradigms -- PointNet family, autoencoder architectures, deformation-based methods, point-move techniques, and primitive-based approaches -- and provides practical guidance for method selection. We contribute: (1) a decision framework mapping input/output requirements to suitable paradigms, (2) a failure mode analysis to assist practitioners in debugging implementations, (3) standardized comparisons on ShapeNet benchmarks, and (4) a curated list of maintained codebases with implementation resources. By synthesizing both theoretical foundations and practical considerations, this work serves as an entry point for practitioners and researchers new to learning-based 3D mesh reconstruction.2024-12-14T21:39:43ZFatima Zahra IguenferAchraf HsainHiba AmissaYousra Chtoukihttp://arxiv.org/abs/2506.21811v2Revisiting Graph Analytics Benchmark2026-01-04T06:07:09ZThe rise of graph analytics platforms has led to the development of various benchmarks for evaluating and comparing platform performance. However, existing benchmarks often fall short of fully assessing performance due to limitations in core algorithm selection, data generation processes (and the corresponding synthetic datasets), as well as the neglect of API usability evaluation. To address these shortcomings, we propose a novel graph analytics benchmark. First, we select eight core algorithms by extensively reviewing both academic and industrial settings. Second, we design an efficient and flexible data generator and produce eight new synthetic datasets as the default datasets for our benchmark. Lastly, we introduce a multi-level large language model (LLM)-based framework for API usability evaluation-the first of its kind in graph analytics benchmarks. We conduct comprehensive experimental evaluations on existing platforms (GraphX, PowerGraph, Flash, Grape, Pregel+, Ligra and G-thinker). The experimental results demonstrate the superiority of our proposed benchmark.2025-03-04T08:11:27ZLingkai MengYu ShaoLong YuanLongbin LaiPeng ChengXue LiWenyuan YuWenjie ZhangXuemin LinJingren Zhouhttp://arxiv.org/abs/2601.01361v1VARTS: A Tool for the Visualization and Analysis of Representative Time Series Data2026-01-04T04:18:22ZLarge-scale time series visualization often suffers from excessive visual clutter and redundant patterns, making it difficult for users to understand the main temporal trends. To address this challenge, we present VARTS, an interactive visual analytics tool for representative time series selection and visualization. Building upon our previous work M4-Greedy, VARTS integrates M4-based sampling, DTW-based similarity computation, and greedy selection into a unified workflow for the identification and visualization of representative series. The tool provides a responsive graphical interface that allows users to import time series datasets, perform representative selection, and visualize both raw and reduced data through multiple coordinated views. By reducing redundancy while preserving essential data patterns, VARTS effectively enhances visual clarity and interpretability for large-scale time series analysis. The demo video is available at https://youtu.be/mS9f12Rf0jo.2026-01-04T04:18:22ZDuosi JinJianqiu XuGuidong Zhang