https://arxiv.org/api/C3i19blARoPsU3dWu74/b4FTv30 2026-06-14T17:37:54Z 9323 495 15 http://arxiv.org/abs/2604.10263v1 Infernux: A Python-Native Game Engine with JIT-Accelerated Scripting 2026-04-11T16:01:38Z

This report describes Infernux, an open-source game engine that pairs a C++17/Vulkan real-time core with a Python production layer connected through a single pybind11 boundary. To close the throughput gap between Python scripting and native-code engines, Infernux combines two established techniques - batch-oriented data transfer and JIT compilation - into a cohesive engine-level integration: (i) a batch data bridge that transfers per-frame state into contiguous NumPy arrays in one boundary crossing, and (ii) an optional JIT path via Numba that compiles annotated update functions to LLVM machine code with automatic loop parallelization. We compare against Unity 6 as a reference on three workloads; readers should note differences in shading complexity, draw-call batching, and editor tooling maturity between the two engines. Infernux is MIT-licensed and available at https://chenlizheme.github.io/Infernux/.

2026-04-11T16:01:38Z 9 pages, 6 figures, 4 tables Lizhe Chen http://arxiv.org/abs/2604.10259v1 Real-Time Human Reconstruction and Animation using Feed-Forward Gaussian Splatting 2026-04-11T15:52:58Z

We present a generalizable feed-forward Gaussian splatting framework for human 3D reconstruction and real-time animation that operates directly on multi-view RGB images and their associated SMPL-X poses. Unlike prior methods that rely on depth supervision, fixed input views, UV map, or repeated feed-forward inference for each target view or pose, our approach predicts, in a canonical pose, a set of 3D Gaussian primitives associated with each SMPL-X vertex. One Gaussian is regularized to remain close to the SMPL-X surface, providing a strong geometric prior and stable correspondence to the parametric body model, while an additional small set of unconstrained Gaussians per vertex allows the representation to capture geometric structures that deviate from the parametric surface, such as clothing and hair. In contrast to recent approaches such as HumanRAM, which require repeated network inference to synthesize novel poses, our method produces an animatable human representation from a single forward pass; by explicitly associating Gaussian primitives with SMPL-X vertices, the reconstructed model can be efficiently animated via linear blend skinning without further network evaluation. We evaluate our method on the THuman 2.1, AvatarReX and THuman 4.0 datasets, where it achieves reconstruction quality comparable to state-of-the-art methods while uniquely supporting real-time animation and interactive applications. Code and pre-trained models are available at https://github.com/Devdoot57/HumanGS .

2026-04-11T15:52:58Z Devdoot Chatterjee Zakaria Laskar C. V. Jawahar http://arxiv.org/abs/2502.19056v2 Fatigue-PINN: Physics-Informed Fatigue-Driven Motion Modulation and Synthesis 2026-04-11T14:39:40Z

Fatigue modeling is essential for motion synthesis tasks to model human motions under fatigued conditions and biomechanical engineering applications, such as investigating the variations in movement patterns and posture due to fatigue, defining injury risk mitigation and prevention strategies, formulating fatigue minimization schemes, and creating improved ergonomic designs. Nevertheless, employing datadriven methods for synthesizing the impact of fatigue on motion, receives little to no attention in the literature. In this work, we present Fatigue-PINN, a deep learning framework based on Physics-Informed Neural Networks, for modeling fatigued human movements, while providing joint-specific fatigue configurations for adaptation and mitigation of motion artifacts on a joint level, resulting in more smooth, hence physicallyplausible animations. To account for muscle fatigue, we simulate the fatigue-induced fluctuations in the maximum exerted joint torques by leveraging a PINN adaptation of the Three-Compartment Controller model to exploit physics-domain knowledge for improving accuracy. This model also introduces parametric motion alignment with respect to joint-specific fatigue, hence avoiding sharp frame transitions. Our results indicate that Fatigue-PINN accurately simulates the effects of externally perceived fatigue on open-type human movements being consistent with findings from real-world experimental fatigue studies. Since fatigue is incorporated in torque space, Fatigue-PINN provides an end-to-end encoder-decoder-like architecture, to ensure transforming joint angles to joint torques and vice-versa, thus, being compatible with motion synthesis frameworks operating on joint angles.

2025-02-26T11:14:48Z 21 pages, 10 pages. This work has been submitted to the IEEE for possible publication in IEEE Access, vol. 13, pp. 109378-109398, 2025 Iliana Loi Konstantinos Moustakas 10.1109/ACCESS.2025.3582731 http://arxiv.org/abs/2604.10223v1 A 129FPS Full HD Real-Time Accelerator for 3D Gaussian Splatting 2026-04-11T14:12:45Z

Rendering large-scale, unbounded scenes on AR/VR-class devices is constrained by the computation, bandwidth, and storage cost of 3D Gaussian Splatting (3DGS). We propose a low-power, low-cost 3DGS hardware accelerator that renders full-HD images in real time, together with a hardware-friendly compression pipeline that combines iterative Gaussian pruning and fine-tuning, progressive spherical harmonics (SH) degree reduction, and vector quantization of all SH coefficients and colors. The scheme achieves a $51.6\times$ model-size reduction with a 0.743 dB PSNR loss. The accelerator uses a frame-level pipeline that integrates point-based culling and projection with tile-based sorting and rasterization, skips zero-Jacobian matrix multiplications (reducing processing elements by 63\% and computation by 53\%), and adopts comparison-free tile-based sorting with deterministic latency. Implemented in a TSMC 28-nm process at 800 MHz, the design occupies $0.66~\text{mm}^2$ with 1.1438 M gates and 120 kB SRAM, consumes 0.219 W, and delivers 1219 Mpixels/J at 267.5 Mpixels/s, enabling 1080p at 129 FPS. Overall, it is $5.98\times$ smaller in area, $5.94\times$ higher throughput, and delivers $7.5\times$ higher energy efficiency than prior 3DGS accelerators.

2026-04-11T14:12:45Z IEEE Transactions on Visualization and Computer Graphics, 2026 Fang-Chi Chang Tian-Sheuan Chang 10.1109/TVCG.2026.3683714 http://arxiv.org/abs/2604.10199v1 FatigueFusion: Latent Space Fusion for Fatigue-Driven Motion Synthesis 2026-04-11T13:12:20Z

Investigating the impact of fatigue on human physiological function and motor behavior is crucial for developing biomechanics and medical applications aimed at mitigating fatigue, reducing injury risk, and creating sophisticated ergonomic designs, as well as for producing physically-plausible 3D animation sequences. While the former has a prominent position in state-of-the-art literature, fatigue-driven motion generation is still an underexplored area. In this study, we present FatigueFusion, a deep-learning architecture for the fusion of fatigue features within a latent representation space, enabling the creation of a variation of novel fatigued movements, intermediate fatigued states, and progressively fatigued motions. Unlike existing approaches that focus on imitating the effects of fatigue accumulation in motion patterns, our framework incorporates algorithmic and data-driven modules to impose subject-specific temporal and spatial fatigue features on nonfatigued motions, while leveraging PINN-based techniques to simulate fatigue intensity. Since all motion modulation tasks are taking place in latent space, FatigueFusion offers an end-to-end architecture that operates directly on non-fatigued joint angle sequences and control parameters, allowing seamless integration into any motion synthesis pipeline, without relying on fatigue input data. Overall, our framework can be employed for various fatigue-driven synthesis tasks, such as fatigue profile transfer and fusion, while it also provides a solution for accurate rendering of the human fatigue state in both animation and simulation pipelines.

2026-04-11T13:12:20Z 13 pages, 9 figures. This work has been submitted to the IEEE for possible publication Iliana Loi Konstantinos Moustakas http://arxiv.org/abs/2509.20128v2 KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation 2026-04-11T06:13:45Z

Audio-driven facial animation has made significant progress in multimedia applications, with diffusion models showing strong potential for talking-face synthesis. However, most existing works treat speech features as a monolithic representation and fail to capture their fine-grained roles in driving different facial motions, while also overlooking the importance of modeling keyframes with intense dynamics. To address these limitations, we propose KSDiff, a Keyframe-Augmented Speech-Aware Dual-Path Diffusion framework. Specifically, the raw audio and transcript are processed by a Dual-Path Speech Encoder (DPSE) to disentangle expression-related and head-pose-related features, while an autoregressive Keyframe Establishment Learning (KEL) module predicts the most salient motion frames. These components are integrated into a Dual-path Motion generator to synthesize coherent and realistic facial motions. Extensive experiments on HDTF and VoxCeleb demonstrate that KSDiff achieves state-of-the-art performance, with improvements in both lip synchronization accuracy and head-pose naturalness. Our results highlight the effectiveness of combining speech disentanglement with keyframe-aware diffusion for talking-head generation. The demo page is available at: https://kincin.github.io/KSDiff/.

2025-09-24T13:54:52Z Paper accepted at ICASSP 2026, 5 pages, 3 figures, 3 tables Tianle Lyu Junchuan Zhao Ye Wang http://arxiv.org/abs/2603.11346v2 Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning 2026-04-10T22:40:59Z

Humanoid robotics has strong potential to transform daily service and caregiving applications. Although recent advances in general motion tracking within physics engines (GMT) have enabled virtual characters and humanoid robots to reproduce a broad range of human motions, these behaviors are primarily limited to contact-less social interactions or isolated movements. Assistive scenarios, by contrast, require continuous awareness of a human partner and rapid adaptation to their evolving posture and dynamics. In this paper, we formulate the imitation of closely interacting, force-exchanging human-human motion sequences as a multi-agent reinforcement learning problem. We jointly train partner-aware policies for both the supporter (assistant) agent and the recipient agent in a physics simulator to track assistive motion references. To make this problem tractable, we introduce a partner policies initialization scheme that transfers priors from single-human motion-tracking controllers, greatly improving exploration. We further propose dynamic reference retargeting and contact-promoting reward, which adapt the assistant's reference motion to the recipient's real-time pose and encourage physically meaningful support. We show that AssistMimic is the first method capable of successfully tracking assistive interaction motions on established benchmarks, demonstrating the benefits of a multi-agent RL formulation for physically grounded and socially aware humanoid control.

2026-03-11T22:25:44Z Accepted at CVPR 2026 (main). Project page: https://yutoshibata07.github.io/AssistMimic/ Yuto Shibata Kashu Yamazaki Lalit Jayanti Yoshimitsu Aoki Mariko Isogawa Katerina Fragkiadaki http://arxiv.org/abs/2604.14216v1 Neuro-Oracle: A Trajectory-Aware Agentic RAG Framework for Interpretable Epilepsy Surgical Prognosis 2026-04-10T21:47:25Z

Predicting post-surgical seizure outcomes in pharmacoresistant epilepsy is a clinical challenge. Conventional deep-learning approaches operate on static, single-timepoint pre-operative scans, omitting longitudinal morphological changes. We propose \emph{Neuro-Oracle}, a three-stage framework that: (i) distils pre-to-post-operative MRI changes into a compact 512-dimensional trajectory vector using a 3D Siamese contrastive encoder; (ii) retrieves historically similar surgical trajectories from a population archive via nearest-neighbour search; and (iii) synthesises a natural-language prognosis grounded in the retrieved evidence using a quantized Llama-3-8B reasoning agent. Evaluations are conducted on the public EPISURG dataset ($N{=}268$ longitudinally paired cases) using five-fold stratified cross-validation. Since ground-truth seizure-freedom scores are unavailable, we utilize a clinical proxy label based on the resection type. We acknowledge that the network representations may potentially learn the anatomical features of the resection cavities (i.e., temporal versus non-temporal locations) rather than true prognostic morphometry. Our current evaluation thus serves mainly as a proof-of-concept for the trajectory-aware retrieval architecture. Trajectory-based classifiers achieve AUC values between 0.834 and 0.905, compared with 0.793 for a single-timepoint ResNet-50 baseline. The Neuro-Oracle agent (M5) matches the AUC of purely discriminative trajectory classifiers (0.867) while producing structured justifications with zero observed hallucinations under our audit protocol. A Siamese Diversity Ensemble (M6) of trajectory-space classifiers attains an AUC of 0.905 without language-model overhead.

2026-04-10T21:47:25Z Aizierjiang Aiersilan Mohamad Koubeissi http://arxiv.org/abs/2601.22160v2 Screen, Cache, and Match: A Training-Free Causality-Consistent Reference Frame Framework for Human Animation 2026-04-10T13:20:41Z

Human animation aims to generate temporally coherent and visually consistent videos over long sequences, yet modeling long-range dependencies while preserving frame quality remains challenging. Inspired by the human ability to leverage past observations for interpreting ongoing actions, we propose FrameCache, a training-free, causality-consistent reference frame framework. FrameCache explicitly converts historical generation results into causal guidance through two complementary mechanisms. First, at the reference level, a novel Screen-Cache-Match (SCM) strategy constructs a dynamic, high-quality reference memory, ensuring motion-consistent appearance guidance to reduce identity drift. Second, at the generative level, a Trajectory-Aware Autoregressive Generation (TAAG) mechanism aligns denoising trajectories across adjacent video chunks. This is achieved through an overlap-aware latent propagation and a dual-domain fusion strategy that seamlessly blends low-frequency structural layouts with high-frequency textural details. Extensive experiments on standard benchmarks demonstrate that FrameCache consistently improves temporal coherence and visual stability while integrating seamlessly with diverse diffusion baselines. Code will be made publicly available.

2025-12-13T08:45:03Z Jianan Wang Nailei Hei Li He Huanzhen Wang Aoxing Li Yingkai Zhao Yuxuan Lin Haofen Wang Chunyang Wang Yan Wang Wenqiang Zhang http://arxiv.org/abs/2604.09260v1 Beyond Segmentation: Structurally Informed Facade Parsing from Imperfect Images 2026-04-10T12:20:07Z

Standard object detectors typically treat architectural elements independently, often resulting in facade parsings that lack the structural coherence required for downstream procedural reconstruction. We address this limitation by augmenting the YOLOv8 training objective with a custom lightweight alignment loss. This regularization encourages grid-consistent arrangements of bounding boxes during training, effectively injecting geometric priors without altering the standard inference pipeline. Experiments on the CMP dataset demonstrate that our method successfully improves structural regularity, correcting alignment errors caused by perspective and occlusion while maintaining a controllable trade-off with standard detection accuracy.

2026-04-10T12:20:07Z 4 pages, 4 figures, EUROGRAPHICS 2026 Short Paper Maciej Janicki Aleksander Plocharski Przemyslaw Musialski http://arxiv.org/abs/2604.09134v1 Enhance Comprehension of Over-the-Counter Drug Instructions for the General Public and Medical Professionals through Visualization Design 2026-04-10T09:14:51Z

Drug instructions are crucial for guiding the rational use of medication. We conduct a visualization design study to enhance the comprehension of over-the-counter (OTC) drug instructions, targeting both the general public and medical professionals. We devise two tailored drug instruction designs for different audience groups through an iterative design process. A controlled user study reveals that our design outperforms traditional text-based instructions in terms of response time and usability, and the availability of two versions is also found to be beneficial. This study also motivates a taxonomy based on a systematic classification of OTC drug instructions sampled from an official drug database, which received positive expert feedback. Finally, this study summarizes a workflow for a visualization design strategy based on our design exploration and user study feedback, which can be generalized to other OTC drug instructions.

2026-04-10T09:14:51Z Computers & Graphics, Volume 136, May 2026, 104587 Mengjie Fan Katrin Angerbauer Yinchu Cheng Yingying Yan Xiaohan Xu Tianfu Wang Michael Sedlmair Yu Yang Liang Zhou 10.1016/j.cag.2026.104587 http://arxiv.org/abs/2604.06161v2 DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models 2026-04-10T00:14:18Z

Most digital videos are stored in 8-bit low dynamic range (LDR) formats, where much of the original high dynamic range (HDR) scene radiance is lost due to saturation and quantization. This loss of highlight and shadow detail precludes mapping accurate luminance to HDR displays and limits meaningful re-exposure in post-production workflows. Although techniques have been proposed to convert LDR images to HDR through dynamic range expansion, they struggle to restore realistic detail in the over- and underexposed regions. To address this, we present DiffHDR, a framework that formulates LDR-to-HDR conversion as a generative radiance inpainting task within the latent space of a video diffusion model. By operating in Log-Gamma color space, DiffHDR leverages spatio-temporal generative priors from a pretrained video diffusion model to synthesize plausible HDR radiance in over- and underexposed regions while recovering the continuous scene radiance of the quantized pixels. Our framework further enables controllable LDR-to-HDR video conversion guided by text prompts or reference images. To address the scarcity of paired HDR video data, we develop a pipeline that synthesizes high-quality HDR video training data from static HDRI maps. Extensive experiments demonstrate that DiffHDR significantly outperforms state-of-the-art approaches in radiance fidelity and temporal stability, producing realistic HDR videos with considerable latitude for re-exposure.

2026-04-07T17:56:18Z 28 pages, 13 figures Zhengming Yu Li Ma Mingming He Leo Isikdogan Yuancheng Xu Dmitriy Smirnov Pablo Salamanca Dao Mi Pablo Delgado Ning Yu Julien Philip Xin Li Wenping Wang Paul Debevec http://arxiv.org/abs/2604.08799v1 MeshOn: Intersection-Free Mesh-to-Mesh Composition 2026-04-09T22:14:56Z

We propose MeshOn, a method that finds physically and semantically realistic compositions of two input meshes. Given an accessory, a base mesh with a user-defined target region, and optional text strings for both meshes, MeshOn uses a multi-step optimization framework to realistically fit the meshes onto each other while preventing intersections. We initialize the shapes' rigid configuration via a structured alignment scheme using Vision-to-Language Models, which we then optimize using a combination of attractive geometric losses, and a physics-inspired barrier loss that prevents surface intersections. We then obtain a final deformation of the object, assisted by a diffusion prior. Our method successfully fits accessories of various materials over a breadth of target regions, and is designed to fit directly into existing digital artist workflows. We demonstrate the robustness and accuracy of our pipeline by comparing it with generative approaches and traditional registration algorithms.

2026-04-09T22:14:56Z Project page: \hyperlink{https://threedle.github.io/MeshOn/}{this https URL} Hyunwoo Kim Itai Lang Hadar Averbuch-Elor Silvia Sellán Rana Hanocka http://arxiv.org/abs/2604.08547v1 GaussiAnimate: Reconstruct and Rig Animatable Categories with Level of Dynamics 2026-04-09T17:59:59Z

Free-form bones, that conform closely to the surface, can effectively capture non-rigid deformations, but lack a kinematic structure necessary for intuitive control. Thus, we propose a Scaffold-Skin Rigging System, termed "Skelebones", with three key steps: (1) Bones: compress temporally-consistent deformable Gaussians into free-form bones, approximating non-rigid surface deformations; (2) Skeleton: extract a Mean Curvature Skeleton from canonical Gaussians and refine it temporally, ensuring a category-agnostic, motion-adaptive, and topology-correct kinematic structure; (3) Binding: bind the skeleton and bones via non-parametric partwise motion matching (PartMM), synthesizing novel bone motions by matching, retrieving, and blending existing ones. Collectively, these three steps enable us to compress the Level of Dynamics of 4D shapes into compact skelebones that are both controllable and expressive. We validate our approach on both synthetic and real-world datasets, achieving significant improvements in reanimation performance across unseen poses-with 17.3% PSNR gains over Linear Blend Skinning (LBS) and 21.7% over Bag-of-Bones (BoB)-while maintaining excellent reconstruction fidelity, particularly for characters exhibiting complex non-rigid surface dynamics. Our Partwise Motion Matching algorithm demonstrates strong generalization to both Gaussian and mesh representations, especially under low-data regime (~1000 frames), achieving 48.4% RMSE improvement over robust LBS and outperforming GRU- and MLP-based learning methods by >20%. Code will be made publicly available for research purposes at cookmaker.cn/gaussianimate.

2026-04-09T17:59:59Z Page: https://cookmaker.cn/gaussianimate Jiaxin Wang Dongxin Lyu Zeyu Cai Zhiyang Dou Cheng Lin Anpei Chen Yuliang Xiu http://arxiv.org/abs/2604.08526v1 FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On 2026-04-09T17:57:50Z

Given a person and a garment image, virtual try-on (VTO) aims to synthesize a realistic image of the person wearing the garment, while preserving their original pose and identity. Although recent VTO methods excel at visualizing garment appearance, they largely overlook a crucial aspect of the try-on experience: the accuracy of garment fit -- for example, depicting how an extra-large shirt looks on an extra-small person. A key obstacle is the absence of datasets that provide precise garment and body size information, particularly for "ill-fit" cases, where garments are significantly too large or too small. Consequently, current VTO methods default to generating well-fitted results regardless of the garment or person size. In this paper, we take the first steps towards solving this open problem. We introduce FIT (Fit-Inclusive Try-on), a large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. We overcome the challenges of data collection via a scalable synthetic strategy: (1) We programmatically generate 3D garments using GarmentCode and drape them via physics simulation to capture realistic garment fit. (2) We employ a novel re-texturing framework to transform synthetic renderings into photorealistic images while strictly preserving geometry. (3) We introduce person identity preservation into our re-texturing model to generate paired person images (same person, different garments) for supervised training. Finally, we leverage our FIT dataset to train a baseline fit-aware virtual try-on model. Our data and results set the new state-of-the-art for fit-aware virtual try-on, as well as offer a robust benchmark for future research. We will make all data and code publicly available on our project page: https://johannakarras.github.io/FIT.

2026-04-09T17:57:50Z SIGGRAPH 2026 Johanna Karras Yuanhao Wang Yingwei Li Ira Kemelmacher-Shlizerman