https://arxiv.org/api/UTM4yUxfxb4gvB0lFnakpLsnJt8 2026-06-23T22:28:50Z 9374 1080 15 http://arxiv.org/abs/2601.01288v1 PyBatchRender: A Python Library for Batched 3D Rendering at Up to One Million FPS 2026-01-03T21:19:57Z

Reinforcement learning from pixels is often bottlenecked by the performance and complexity of 3D rendered environments. Researchers face a trade-off between high-speed, low-level engines and slower, more accessible Python frameworks. To address this, we introduce PyBatchRender, a Python library for high-throughput, batched 3D rendering that achieves over 1 million FPS on simple scenes. Built on the Panda3D game engine, it utilizes its mature ecosystem while enhancing performance through optimized batched rendering for up to 1000X speedups. Designed as a physics-agnostic renderer for reinforcement learning from pixels, PyBatchRender offers greater flexibility than dedicated libraries, simpler setup than typical game-engine wrappers, and speeds rivaling state-of-the-art C++ engines like Madrona. Users can create custom scenes entirely in Python with tens of lines of code, enabling rapid prototyping for scalable AI training. Open-source and easy to integrate, it serves to democratize high-performance 3D simulation for researchers and developers. The library is available at https://github.com/dolphin-in-a-coma/PyBatchRender.

2026-01-03T21:19:57Z Evgenii Rudakov Jonathan Shock Benjamin Ultan Cowley http://arxiv.org/abs/2601.01257v1 Seamlessly Natural: Image Stitching with Natural Appearance Preservation 2026-01-03T18:40:35Z

This paper introduces SENA (SEamlessly NAtural), a geometry-driven image stitching approach that prioritizes structural fidelity in challenging real-world scenes characterized by parallax and depth variation. Conventional image stitching relies on homographic alignment, but this rigid planar assumption often fails in dual-camera setups with significant scene depth, leading to distortions such as visible warps and spherical bulging. SENA addresses these fundamental limitations through three key contributions. First, we propose a hierarchical affine-based warping strategy, combining global affine initialization with local affine refinement and smooth free-form deformation. This design preserves local shape, parallelism, and aspect ratios, thereby avoiding the hallucinated structural distortions commonly introduced by homography-based models. Second, we introduce a geometry-driven adequate zone detection mechanism that identifies parallax-minimized regions directly from the disparity consistency of RANSAC-filtered feature correspondences, without relying on semantic segmentation. Third, building upon this adequate zone, we perform anchor-based seamline cutting and segmentation, enforcing a one-to-one geometric correspondence across image pairs by construction, which effectively eliminates ghosting, duplication, and smearing artifacts in the final panorama. Extensive experiments conducted on challenging datasets demonstrate that SENA achieves alignment accuracy comparable to leading homography-based methods, while significantly outperforming them in critical visual metrics such as shape preservation, texture integrity, and overall visual realism.

2026-01-03T18:40:35Z Gaetane Lorna N. Tchana Damaris Belle M. Fotso Antonio Hendricks Christophe Bobda http://arxiv.org/abs/2601.01027v1 A Platform for Interactive AI Character Experiences 2026-01-03T01:27:19Z

From movie characters to modern science fiction - bringing characters into interactive, story-driven conversations has captured imaginations across generations. Achieving this vision is highly challenging and requires much more than just language modeling. It involves numerous complex AI challenges, such as conversational AI, maintaining character integrity, managing personality and emotions, handling knowledge and memory, synthesizing voice, generating animations, enabling real-world interactions, and integration with physical environments. Recent advancements in the development of foundation models, prompt engineering, and fine-tuning for downstream tasks have enabled researchers to address these individual challenges. However, combining these technologies for interactive characters remains an open problem. We present a system and platform for conveniently designing believable digital characters, enabling a conversational and story-driven experience while providing solutions to all of the technical challenges. As a proof-of-concept, we introduce Digital Einstein, which allows users to engage in conversations with a digital representation of Albert Einstein about his life, research, and persona. While Digital Einstein exemplifies our methods for a specific character, our system is flexible and generalizes to any story-driven or conversational character. By unifying these diverse AI components into a single, easy-to-adapt platform, our work paves the way for immersive character experiences, turning the dream of lifelike, story-based interactions into a reality.

2026-01-03T01:27:19Z SIGGRAPH Conference Papers '25, August 10-14, 2025, Vancouver, BC, Canada Rafael Wampfler Chen Yang Dillon Elste Nikola Kovacevic Philine Witzig Markus Gross 10.1145/3721238.3730762 http://arxiv.org/abs/2601.00775v1 Spatiotemporal Detection and Uncertainty Visualization of Atmospheric Blocking Events 2026-01-02T18:12:12Z

Atmospheric blocking events are quasi-stationary high-pressure systems that disrupt the typical paths of polar and subtropical air currents, often producing prolonged extreme weather events such as summer heat waves or winter cold spells. Despite their critical role in shaping mid-latitude weather, accurately modeling and analyzing blocking events in long meteorological records remains a significant challenge. To address this challenge, we present an uncertainty visualization framework for detecting and characterizing atmospheric blocking events. First, we introduce a geometry-based detection and tracking method, evaluated on both pre-industrial climate model simulations (UKESM) and reanalysis data (ERA5), which represent historical Earth observations assimilated from satellite and station measurements onto regular numerical grids using weather models. Second, we propose a suite of uncertainty-aware summaries: contour boxplots that capture representative boundaries and their variability, frequency heatmaps that encode occurrences, and 3D temporal stacks that situate these patterns in time. Third, we demonstrate our framework in a case study of the 2003 European heatwave, mapping the spatiotemporal occurrences of blocking events using these summaries. Collectively, these uncertainty visualizations reveal where blocking events are most likely to occur and how their spatial footprints evolve over time. We envision our framework as a valuable tool for climate scientists and meteorologists: by analyzing how blocking frequency, duration, and intensity vary across regions and climate scenarios, it supports both the study of historical blocking events and the assessment of scenario-dependent climate risks associated with changes in extreme weather linked to blocking.

2026-01-02T18:12:12Z in IEEE Transactions on Visualization and Computer Graphics, 2026 Mingzhe Li Peer Nowack Bei Wang http://arxiv.org/abs/2502.10872v7 Corotational Hinge-based Thin Plates/Shells 2026-01-02T09:54:06Z

We present six thin plate/shell models, derived from three distinct types of curvature operators formulated within the corotational frame, for simulating both rest-flat and rest-curved triangular meshes. Each curvature operator derives a curvature expression corresponding to both a plate model and a shell model. The corotational edge-based hinge model uses an edge-based stencil to compute directional curvature, while the corotational FVM hinge model utilizes a triangle-centered stencil, applying the finite volume method (FVM) to superposition directional curvatures across edges, yielding a generalized curvature. The corotational smoothed hinge model also employs a triangle-centered stencil but transforms directional curvatures into a generalized curvature based on a quadratic surface fit. All models assume small strain and small curvature, leading to constant bending energy Hessians, which benefit implicit integrators. Through quantitative benchmarks and qualitative elastodynamic simulations with large time steps, we demonstrate the accuracy, efficiency, and stability of these models. Our contributions enhance the thin plate/shell library for use in both computer graphics and engineering applications.

2025-02-15T18:09:10Z Accepted at Eurographics 2025 Comput. Graph. Forum 44(2), e70022 (Proc. Eurographics 2025) Qixin Liang 10.1111/cgf.70022 http://arxiv.org/abs/2601.02410v1 The Vibe-Check Protocol: Quantifying Cognitive Offloading in AI Programming 2026-01-02T06:13:41Z

The integration of Large Language Models (LLMs) into software engineering education has driven the emergence of ``Vibe Coding,'' a paradigm where developers articulate high-level intent through natural language and delegate implementation to AI agents. While proponents argue this approach modernizes pedagogy by emphasizing conceptual design over syntactic memorization, accumulating empirical evidence raises concerns regarding skill retention and deep conceptual understanding. This paper proposes a theoretical framework to investigate the research question: \textit{Is Vibe Coding a better way to learn software engineering?} We posit a divergence in student outcomes between those leveraging AI for acceleration versus those using it for cognitive offloading. To evaluate these educational trade-offs, we propose the \textbf{Vibe-Check Protocol (VCP)}, a systematic benchmarking framework incorporating three quantitative metrics: the \textit{Cold Start Refactor} ($M_{CSR}$) for modeling skill decay; \textit{Hallucination Trap Detection} ($M_{HT}$) based on signal detection theory to evaluate error identification; and the \textit{Explainability Gap} ($E_{gap}$) for quantifying the divergence between code complexity and conceptual comprehension. Through controlled comparisons, VCP aims to provide a quantitative basis for educators to determine the optimal pedagogical boundary: identifying contexts where Vibe Coding fosters genuine mastery and contexts where it introduces hidden technical debt and superficial competence.

2026-01-02T06:13:41Z Aizierjiang Aiersilan http://arxiv.org/abs/2601.00569v1 Modeling and Simulating Origami Structures using Bilinear Solid-Shell Element 2026-01-02T05:13:43Z

We propose a novel computational framework for modeling and simulating origami structures. In this framework, bilinear solid-shell elements are employed to model the origami panels while crease folding is considered through the angle between the director vectors of the adjacent panels. The director vector is the vector normal to the mid-surface before displacement/deformation comes in. To mitigate locking issues in the solid-shell element, we introduce the assumed natural strain method. To validate the effectiveness of our framework, we conduct origami simulations involving both straight- and curved-creases. The accuracy and efficacy of the framework are demonstrated through quantitative and qualitative analyses.

2026-01-02T05:13:43Z Qixin Liang http://arxiv.org/abs/2601.00504v1 MotionPhysics: Learnable Motion Distillation for Text-Guided Simulation 2026-01-01T22:56:37Z

Accurately simulating existing 3D objects and a wide variety of materials often demands expert knowledge and time-consuming physical parameter tuning to achieve the desired dynamic behavior. We introduce MotionPhysics, an end-to-end differentiable framework that infers plausible physical parameters from a user-provided natural language prompt for a chosen 3D scene of interest, removing the need for guidance from ground-truth trajectories or annotated videos. Our approach first utilizes a multimodal large language model to estimate material parameter values, which are constrained to lie within plausible ranges. We further propose a learnable motion distillation loss that extracts robust motion priors from pretrained video diffusion models while minimizing appearance and geometry inductive biases to guide the simulation. We evaluate MotionPhysics across more than thirty scenarios, including real-world, human-designed, and AI-generated 3D objects, spanning a wide range of materials such as elastic solids, metals, foams, sand, and both Newtonian and non-Newtonian fluids. We demonstrate that MotionPhysics produces visually realistic dynamic simulations guided by natural language, surpassing the state of the art while automatically determining physically plausible parameters. The code and project page are available at: https://wangmiaowei.github.io/MotionPhysics.github.io/.

2026-01-01T22:56:37Z AAAI2026 Accepted Miaowei Wang Jakub Zadrożny Oisin Mac Aodha Amir Vaxman http://arxiv.org/abs/2509.17168v2 StyGazeTalk: Learning Stylized Generation of Gaze and Head Dynamics 2026-01-01T13:27:57Z

Gaze and head movements play a central role in expressive 3D media, human-agent interaction, and immersive communication. Existing works often model facial components in isolation and lack mechanisms for generating personalized, style-aware gaze behaviors. We propose StyGazeTalk, a multimodal framework that synthesizes synchronized gaze-head dynamics with controllable styles. To support high-fidelity training, we construct HAGE, a high-precision multimodal dataset containing eye-tracking data, audio, head pose, and 3D facial parameters. Experiments show that our method produces temporally coherent, style-consistent gaze-head motions, enhancing realism in 3D face generation.

2025-09-21T17:27:57Z arXiv submission Chengwei Shi Chong Cao http://arxiv.org/abs/2509.04481v2 Narrative-to-Scene Generation: An LLM-Driven Pipeline for 2D Game Environments 2025-12-31T23:32:38Z

Recent advances in large language models (LLMs) enable compelling story generation, but connecting narrative text to playable visual environments remains an open challenge in procedural content generation (PCG). We present a lightweight pipeline that transforms short narrative prompts into a sequence of 2D tile-based game scenes, reflecting the temporal structure of stories. Given an LLM-generated narrative, our system identifies three key time frames, extracts spatial predicates in the form of "Object-Relation-Object" triples, and retrieves visual assets using affordance-aware semantic embeddings from the GameTileNet dataset. A layered terrain is generated using Cellular Automata, and objects are placed using spatial rules grounded in the predicate structure. We evaluated our system in ten diverse stories, analyzing tile-object matching, affordance-layer alignment, and spatial constraint satisfaction across frames. This prototype offers a scalable approach to narrative-driven scene generation and lays the foundation for future work on multi-frame continuity, symbolic tracking, and multi-agent coordination in story-centered PCG.

2025-08-31T01:45:56Z Camera-ready version of a paper accepted at the AIIDE 2025 Workshop on Experimental AI in Games (EXAG) Yi-Chun Chen Arnav Jhala http://arxiv.org/abs/2601.00114v1 DiffTetVR: Differentiable Tetrahedral Volume Rendering 2025-12-31T21:18:05Z

Differentiable rendering is a technique that aims to invert the rendering process to enable optimizing rendering parameters from a set of images. In this article, we present a differentiable volume rendering solution called DiffTetVR for tetrahedral meshes. Unlike previous works based on regular grids, this enables the optimization of vertex positions and the local subdivision of the mesh without relying on multigrid methods. We present an efficient implementation of the forward rendering process, deduce the derivatives for the backwards pass and regularization terms for avoiding degenerate tetrahedra, and finally show how the tetrahedral mesh can be subdivided locally to enable a coarse-to-fine optimization process. The source code is made publicly available on GitHub at https://github.com/chrismile/DiffTetVR.

2025-12-31T21:18:05Z Christoph Neuhauser http://arxiv.org/abs/2406.06146v4 Empirical Study on the Representation of 3D Scatterplots as 2D Figures 2025-12-31T20:49:46Z

3D scatterplots are a well-established plotting technique that can be used to represent data with three or more dimensions. On paper and computer monitors they are essentially two-dimensional projections of the three-dimensional Cartesian coordinate system. This transition from the 3D space to two dimensions is not done consistently among scientific software, as there is currently limited quantifiable evidence on the effectiveness of each approach. Notably, the frequent lack of visual cues such as with regard to depth perception is equivalent to a reduction of dimensionality by one. Hence, their use in manuscripts is less common or straightforward. In this empirical study, an online survey is conducted within an academic institution to identify and quantify the effectiveness of feature or feature combinations on 3D scatterplots in terms of reading time and accuracy.

2024-06-10T10:08:50Z Philippos Papaphilippou Lucy Hederman http://arxiv.org/abs/2512.24986v1 PhysTalk: Language-driven Real-time Physics in 3D Gaussian Scenes 2025-12-31T17:32:31Z

Realistic visual simulations are omnipresent, yet their creation requires computing time, rendering, and expert animation knowledge. Open-vocabulary visual effects generation from text inputs emerges as a promising solution that can unlock immense creative potential. However, current pipelines lack both physical realism and effective language interfaces, requiring slow offline optimization. In contrast, PhysTalk takes a 3D Gaussian Splatting (3DGS) scene as input and translates arbitrary user prompts into real time, physics based, interactive 4D animations. A large language model (LLM) generates executable code that directly modifies 3DGS parameters through lightweight proxies and particle dynamics. Notably, PhysTalk is the first framework to couple 3DGS directly with a physics simulator without relying on time consuming mesh extraction. While remaining open vocabulary, this design enables interactive 3D Gaussian animation via collision aware, physics based manipulation of arbitrary, multi material objects. Finally, PhysTalk is train-free and computationally lightweight: this makes 4D animation broadly accessible and shifts these workflows from a "render and wait" paradigm toward an interactive dialogue with a modern, physics-informed pipeline.

2025-12-31T17:32:31Z Luca Collorone Mert Kiray Indro Spinelli Fabio Galasso Benjamin Busam http://arxiv.org/abs/2512.24794v1 Nonlinear Noise2Noise for Efficient Monte Carlo Denoiser Training 2025-12-31T11:30:38Z

The Noise2Noise method allows for training machine learning-based denoisers with pairs of input and target images where both the input and target can be noisy. This removes the need for training with clean target images, which can be difficult to obtain. However, Noise2Noise training has a major limitation: nonlinear functions applied to the noisy targets will skew the results. This bias occurs because the nonlinearity makes the expected value of the noisy targets different from the clean target image. Since nonlinear functions are common in image processing, avoiding them limits the types of preprocessing that can be performed on the noisy targets. Our main insight is that certain nonlinear functions can be applied to the noisy targets without adding significant bias to the results. We develop a theoretical framework for analyzing the effects of these nonlinearities, and describe a class of nonlinear functions with minimal bias. We demonstrate our method on the denoising of high dynamic range (HDR) images produced by Monte Carlo rendering. Noise2Noise training can have trouble with HDR images, where the training process is overwhelmed by outliers and performs poorly. We consider a commonly used method of addressing these training issues: applying a nonlinear tone mapping function to the model output and target images to reduce their dynamic range. This method was previously thought to be incompatible with Noise2Noise training because of the nonlinearities involved. We show that certain combinations of loss functions and tone mapping functions can reduce the effect of outliers while introducing minimal bias. We apply our method to an existing machine learning-based Monte Carlo denoiser, where the original implementation was trained with high-sample count reference images. Our results approach those of the original implementation, but are produced using only noisy training data.

2025-12-31T11:30:38Z 15 pages, 7 figures, 2 tables SIGGRAPH Asia 2025 Conference Papers, Article 49, 1-11 Andrew Tinits Stephen Mann 10.1145/3757377.3763931 http://arxiv.org/abs/2303.02156v2 SymX: Energy-based Simulation from Symbolic Expressions 2025-12-30T17:49:46Z

Optimization time integrators are effective at solving complex multi-physics problems including deformable solids with non-linear material models, contact with friction, strain limiting, etc. For challenging problems, Newton-type optimizers are often used, which necessitates first- and second-order derivatives of the global non-linear objective function. Manually differentiating, implementing, testing, optimizing, and maintaining the resulting code is extremely time-consuming, error-prone, and precludes quick changes to the model, even when using tools that assist with parts of such pipeline. We present SymX, an open source framework that computes the required derivatives of the different energy contributions by symbolic differentiation, generates optimized code, compiles it on-the-fly, and performs the global assembly. The user only has to provide the symbolic expression of each energy for a single representative element in its corresponding discretization and our system will determine the assembled derivatives for the whole simulation. We demonstrate the versatility of SymX in complex simulations featuring different non-linear materials, high-order finite elements, rigid body systems, adaptive discretizations, frictional contact, and coupling of multiple interacting physical systems. SymX's derivatives offer performance on par with SymPy, an established off-the-shelf symbolic engine, and produces simulations at least one order of magnitude faster than TinyAD, an alternative state-of-the-art integral solution.

2023-02-22T13:53:34Z Accepted to ACM TOG. Author version ACM Trans. Graph., Vol. 45, No. 1, Article 5. Pages 1 - 19. Publication date: October 2025 José Antonio Fernández-Fernández Fabian Löschner Lukas Westhofen Andreas Longva Jan Bender 10.1145/3764928