https://arxiv.org/api/BFGO1smqted10Es23U7hsxS8Dl4 2026-06-14T16:41:10Z 9323 480 15 http://arxiv.org/abs/2604.13333v1 SSD-GS: Scattering and Shadow Decomposition for Relightable 3D Gaussian Splatting 2026-04-14T22:47:04Z

We present SSD-GS, a physically-based relighting framework built upon 3D Gaussian Splatting (3DGS) that achieves high-quality reconstruction and photorealistic relighting under novel lighting conditions. In physically-based relighting, accurately modeling light-material interactions is essential for faithful appearance reproduction. However, existing 3DGS-based relighting methods adopt coarse shading decompositions, either modeling only diffuse and specular reflections or relying on neural networks to approximate shadows and scattering. This leads to limited fidelity and poor physical interpretability, particularly for anisotropic metals and translucent materials. To address these limitations, SSD-GS decomposes reflectance into four components: diffuse, specular, shadow, and subsurface scattering. We introduce a learnable dipole-based scattering module for subsurface transport, an occlusion-aware shadow formulation that integrates visibility estimates with a refinement network, and an enhanced specular component with an anisotropic Fresnel-based model. Through progressive integration of all components during training, SSD-GS effectively disentangles lighting and material properties, even for unseen illumination conditions, as demonstrated on the challenging OLAT dataset. Experiments demonstrate superior quantitative and perceptual relighting quality compared to prior methods and pave the way for downstream tasks, including controllable light source editing and interactive scene relighting. The source code is available at: https://github.com/irisfreesiri/SSD-GS.

2026-04-14T22:47:04Z Accepted to ICLR 2026. Code available at: https://github.com/irisfreesiri/SSD-GS Iris Zheng Guojun Tang Alexander Doronin Paul Teal Fang-Lue Zhang http://arxiv.org/abs/2604.13256v1 Counterfactual Peptide Editing for Causal TCR--pMHC Binding Inference 2026-04-14T19:40:46Z

Neural models for TCR-pMHC binding prediction are susceptible to shortcut learning: they exploit spurious correlations in training data -- such as peptide length bias or V-gene co-occurrence -- rather than the physical binding interface. This renders predictions brittle under family-held-out and distance-aware evaluation, where such shortcuts do not transfer. We introduce \emph{Counterfactual Invariant Prediction} (CIP), a training framework that generates biologically constrained counterfactual peptide edits and enforces invariance to edits at non-anchor positions while amplifying sensitivity at MHC anchor residues. CIP augments the base classifier with two auxiliary objectives: (1) an invariance loss penalizing prediction changes under conservative non-anchor substitutions, and (2) a contrastive loss encouraging large prediction changes under anchor-position disruptions. Evaluated on a curated VDJdb-IEDB benchmark under family-held-out, distance-aware, and random splits, CIP achieves AUROC 0.831 and counterfactual consistency (CFC) 0.724 under the challenging family-held-out protocol -- a 39.7\% reduction in shortcut index relative to the unconstrained baseline. Ablations confirm that anchor-aware edit generation is the dominant driver of OOD gains, providing a practical recipe for causally-grounded TCR specificity modeling.

2026-04-14T19:40:46Z Sanjar Khudoyberdiev Arman Bekov http://arxiv.org/abs/2604.13254v1 Calibrated Abstention for Reliable TCR--pMHC Binding Prediction under Epitope Shift 2026-04-14T19:38:54Z

Predicting T-cell receptor (TCR)--peptide-MHC (pMHC) binding is central to vaccine design and T-cell therapy, yet deployed models frequently encounter epitopes unseen during training, causing silent overconfidence and unreliable prioritization. We address this by framing TCR--pMHC prediction as a \emph{selective prediction} problem: a calibrated model should either output a trustworthy confidence score or explicitly abstain. Concretely, we (1) introduce a dual-encoder architecture encoding both CDR3$α$/CDR3$β$ and peptide sequences via a pre-trained protein language model; (2) apply temperature scaling to correct systematic probability miscalibration; and (3) impose a conformal abstention rule that provides finite-sample coverage guarantees at a user-specified target error rate. Evaluated under three split strategies -- random, epitope-held-out, and distance-aware -- our method achieves AUROC 0.813 and ECE 0.043 under the challenging epitope-held-out protocol, reducing ECE by 69.7\% relative to an uncalibrated baseline. At 80\% coverage, the selective model further reduces error rate from 18.7\% to 10.9\%, demonstrating that calibrated abstention enables principled coverage-risk trade-offs aligned with practical screening budgets.

2026-04-14T19:38:54Z Arman Bekov Timur Bekzhanov Bekzat Sadykov http://arxiv.org/abs/2604.13191v1 Fast Voxelization and Level of Detail for Microgeometry Rendering 2026-04-14T18:16:15Z

Many materials show anisotropic light scattering patterns due to the shape and local alignment of their underlying micro structures: surfaces with small elements such as fibers, or the ridges of a brushed metal, are very sparse and require a high spatial resolution to be properly represented as a volume. The acquisition of voxel data from such objects is a time and memory-intensive task, and most rendering approaches require an additional Level-of-Detail (LoD) data structure to aggregate the visual appearance, as observed from multiple distances, in order to reduce the number of samples computed per pixel (E.g.: MIP mapping). In this work we introduce first, an efficient parallel voxelization method designed to facilitate fast data aggregation at multiple resolution levels, and second, a novel representation based on hierarchical SGGX clustering that provides better accuracy than baseline methods. We validate our approach with a CUDA-based implementation of the voxelizer, tested both on triangle meshes and volumetric fabrics modeled with explicit fibers. Finally, we show the results generated with a path tracer based on the proposed LoD rendering model.

2026-04-14T18:16:15Z Accepted for publication in The Visual Computer. 16 pages, 7 figures, 3 tables. Supplementary material: https://javierfabre.com/projects/voxel-lod/supp.pdf Javier Fabre Carlos Castillo Carlos Rodriguez-Pardo Jorge Lopez-Moreno http://arxiv.org/abs/2604.08746v2 AniGen: Unified $S^3$ Fields for Animatable 3D Asset Generation 2026-04-14T17:33:59Z

Animatable 3D assets, defined as geometry equipped with an articulated skeleton and skinning weights, are fundamental to interactive graphics, embodied agents, and animation production. While recent 3D generative models can synthesize visually plausible shapes from images, the results are typically static. Obtaining usable rigs via post-hoc auto-rigging is brittle and often produces skeletons that are topologically inconsistent with the generated geometry. We present AniGen, a unified framework that directly generates animate-ready 3D assets conditioned on a single image. Our key insight is to represent shape, skeleton, and skinning as mutually consistent $S^3$ Fields (Shape, Skeleton, Skin) defined over a shared spatial domain. To enable the robust learning of these fields, we introduce two technical innovations: (i) a confidence-decaying skeleton field that explicitly handles the geometric ambiguity of bone prediction at Voronoi boundaries, and (ii) a dual skin feature field that decouples skinning weights from specific joint counts, allowing a fixed-architecture network to predict rigs of arbitrary complexity. Built upon a two-stage flow-matching pipeline, AniGen first synthesizes a sparse structural scaffold and then generates dense geometry and articulation in a structured latent space. Extensive experiments demonstrate that AniGen substantially outperforms state-of-the-art sequential baselines in rig validity and animation quality, generalizing effectively to in-the-wild images across diverse categories including animals, humanoids, and machinery. Homepage: https://yihua7.github.io/AniGen-web/

2026-04-09T20:22:06Z 16 pages, 12 figures Yi-Hua Huang Zi-Xin Zou Yuting He Chirui Chang Cheng-Feng Pu Ziyi Yang Yuan-Chen Guo Yan-Pei Cao Xiaojuan Qi http://arxiv.org/abs/2604.12765v1 A Dataset and Evaluation for Complex 4D Markerless Human Motion Capture 2026-04-14T14:06:43Z

Marker-based motion capture (MoCap) systems have long been the gold standard for accurate 4D human modeling, yet their reliance on specialized hardware and markers limits scalability and real-world deployment. Advancing reliable markerless 4D human motion capture requires datasets that reflect the complexity of real-world human interactions. Yet, existing benchmarks often lack realistic multi-person dynamics, severe occlusions, and challenging interaction patterns, leading to a persistent domain gap. In this work, we present a new dataset and evaluation for complex 4D markerless human motion capture. Our proposed MoCap dataset captures both single and multi-person scenarios with intricate motions, frequent inter-person occlusions, rapid position exchanges between similarly dressed subjects, and varying subject distances. It includes synchronized multi-view RGB and depth sequences, accurate camera calibration, ground-truth 3D motion capture from a Vicon system, and corresponding SMPL/SMPL-X parameters. This setup ensures precise alignment between visual observations and motion ground truth. Benchmarking state-of-the-art markerless MoCap models reveals substantial performance degradation under these realistic conditions, highlighting limitations of current approaches. We further demonstrate that targeted fine-tuning improves generalization, validating the dataset's realism and value for model development. Our evaluation exposes critical gaps in existing models and provides a rigorous foundation for advancing robust markerless 4D human motion capture.

2026-04-14T14:06:43Z 14 pages, 11 figures, 4 tables. Accepted for publication at CVPR 2026 4D World Models Workshop Yeeun Park Miqdad Naduthodi Suryansh Kumar http://arxiv.org/abs/2405.20330v4 OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer 2026-04-14T03:42:20Z

In this paper, we introduce OmniHands, a universal approach to recovering interactive hand meshes and their relative movement from monocular or multi-view inputs. Our approach addresses two major limitations of previous methods: lacking a unified solution for handling various hand image inputs and neglecting the positional relationship of two hands within images. To overcome these challenges, we develop a universal architecture with novel tokenization and contextual feature fusion strategies, capable of adapting to a variety of tasks. Specifically, we propose a Relation-aware Two-Hand Tokenization (RAT) method to embed positional relation information into the hand tokens. In this way, our network can handle both single-hand and two-hand inputs and explicitly leverage relative hand positions, facilitating the reconstruction of intricate hand interactions in real-world scenarios. As such tokenization indicates the relative relationship of two hands, it also supports more effective feature fusion. To this end, we further develop a 4D Interaction Reasoning (FIR) module to fuse hand tokens in 4D with attention and decode them into 3D hand meshes and relative temporal movements. The efficacy of our approach is validated on several benchmark datasets. The results on in-the-wild videos and real-world scenarios demonstrate the superior performances of our approach for interactive hand reconstruction. More video results can be found on the project page: https://OmniHand.github.io.

2024-05-30T17:59:02Z An extended journal version of 4DHands, featured with versatile module that can adapt to temporal task and multi-view task. Additional detailed comparison experiments and results presentation have been added. More demo videos can be seen at our project page: https://OmniHand.github.io Dixuan Lin Yuxiang Zhang Mengcheng Li Wei Jing Qi Yan Qianying Wang Yebin Liu Hongwen Zhang http://arxiv.org/abs/2604.12217v1 VVGT: Visual Volume-Grounded Transformer 2026-04-14T02:51:41Z

Volumetric visualization has long been dominated by Direct Volume Rendering (DVR), which operates on dense voxel grids and suffers from limited scalability as resolution and interactivity demands increase. Recent advances in 3D Gaussian Splatting (3DGS) offer a representation-centric alternative; however, existing volumetric extensions still depend on costly per-scene optimization, limiting scalability and interactivity. We present VVGT (Visual Volume-Grounded Transformer), a feed-forward, representation-first framework that directly maps volumetric data to a 3D Gaussian Splatting representation, advancing a new paradigm for volumetric visualization beyond DVR. Unlike prior feed-forward 3DGS methods designed for surface-centric reconstruction, VVGT explicitly accounts for volumetric rendering, where each pixel aggregates contributions along a ray. VVGT employs a dual-transformer network and introduces Volume Geometry Forcing, an epipolar cross-attention mechanism that integrates multi-view observations into distributed 3D Gaussian primitives without surface assumptions. This design eliminates per-scene optimization while enabling accurate volumetric representations. Extensive experiments show that VVGT achieves high-quality visualization with orders-of-magnitude faster conversion, improved geometric consistency, and strong zero-shot generalization across diverse datasets, enabling truly interactive and scalable volumetric visualization. The code will be publicly released upon acceptance.

2026-04-14T02:51:41Z Yuxuan Wang Qibiao Li Youcheng Cai http://arxiv.org/abs/2604.11723v1 Predicting User Satisfaction in Online Education Platforms: A Large Language Model Based Multi-Modal Review Mining Framework 2026-04-13T16:58:02Z

Online education platforms have experienced explosive growth over the past decade, generating massive volumes of user-generated content in the form of reviews, ratings, and behavioral logs. These heterogeneous signals provide unprecedented opportunities for understanding learner satisfaction, which is a critical determinant of course retention, engagement, and long-term learning outcomes. However, accurately predicting satisfaction remains challenging due to the short length, noise, contextual dependency, and multi-dimensional nature of online reviews. In this paper, we propose a unified \textbf{Large Language Model (LLM)-based multi-modal framework} for predicting both platform-level and course-level learner satisfaction. The proposed framework integrates three complementary information sources: (1) short-text topic distributions that capture latent thematic structures, (2) contextualized sentiment representations learned from pretrained Transformer-based language models, and (3) behavioral interaction features derived from learner activity logs. These heterogeneous representations are fused within a hybrid regression architecture to produce accurate satisfaction predictions. We conduct extensive experiments on large-scale MOOC review datasets collected from multiple public platforms. The experimental results demonstrate that the proposed LLM-based multi-modal framework consistently outperforms traditional text-only models, shallow sentiment baselines, and single-modality regression approaches. Comprehensive ablation studies further validate the necessity of jointly modeling topic semantics, deep sentiment representations, and behavioral analytics. Our findings highlight the critical role of large-scale contextual language representations in advancing learning analytics and provide actionable insights for platform design, course improvement, and personalized recommendation.

2026-04-13T16:58:02Z Arman Bekov Azamat Nurgali http://arxiv.org/abs/2603.19240v2 Beltrami coefficient and angular distortion of discrete geometric mappings 2026-04-13T08:36:56Z

Over the past several decades, geometric mapping methods have been extensively developed and utilized for many practical problems in science and engineering. To assess the quality of geometric mappings, one common consideration is their conformality. In particular, it is well-known that conformal mappings preserve angles and hence the local geometry, which is beneficial in many applications. Therefore, many existing works have focused on the angular distortion as a measure of the conformality of mappings. More recently, quasi-conformal theory has attracted increasing attention in the development of geometric mapping methods, in which the Beltrami coefficient has also been considered as a representation of the conformal distortion. However, the precise connection between these two concepts has not been analyzed. In this work, we study the connection between the two concepts and establish a series of theoretical results. In particular, we discover a simple relationship between the norm of the Beltrami coefficient of a mapping and the absolute angular distortion of triangle elements under the mapping. We can further estimate the maximal angular distortion using a simple formula in terms of the Beltrami coefficient. We verify the developed theoretical results and estimates using numerical experiments on multiple geometric mapping methods, covering conformal mapping, quasi-conformal mapping, and area-preserving mapping algorithms, for a variety of surface meshes in biology and engineering. Altogether, by establishing the theoretical foundation for the relationship between the angular distortion and Beltrami coefficient, our work opens up new avenues for the quantification and analysis of surface mapping algorithms.

2026-02-08T03:10:15Z Zhiyuan Lyu Gary P. T. Choi http://arxiv.org/abs/2604.11172v1 NeuVolEx: Implicit Neural Features for Volume Exploration 2026-04-13T08:30:39Z

Direct volume rendering (DVR) aims to help users identify and examine regions of interest (ROIs) within volumetric data, and feature representations that support effective ROI classification and clustering play a fundamental role in volume exploration. Existing approaches typically rely on either explicit local feature representations or implicit convolutional feature representations learned from raw volumes. However, explicit local feature representations are limited in capturing broader geometric patterns and spatial correlations, while implicit convolutional feature representations do not necessarily ensure robust performance in practice, where user supervision is typically limited. Meanwhile, implicit neural representations (INRs) have recently shown strong promise in DVR for volume compression, owing to their ability to compactly parameterize continuous volumetric fields. In this work, we propose NeuVolEx, a neural volume exploration approach that extends the role of INRs beyond volume compression. Unlike prior compression methods that focus on INR outputs, NeuVolEx leverages feature representations learned during INR training as a robust basis for volume exploration. To better adapt these feature representations to exploration tasks, we augment a base INR with a structural encoder and a multi-task learning scheme that improve spatial coherence for ROI characterization. We validate NeuVolEx on two fundamental volume exploration tasks: image-based transfer function (TF) design and viewpoint recommendation. NeuVolEx enables accurate ROI classification under sparse user supervision for image-based TF design and supports unsupervised clustering to identify compact complementary viewpoints that reveal different ROI clusters. Experiments on diverse volume datasets with varying modalities and ROI complexities demonstrate NeuVolEx improves both effectiveness and usability over prior methods

2026-04-13T08:30:39Z 11 pages, 9 figures. Under review Haill An Suhyeon Kim Donghyuk Choo Younhyun Jung http://arxiv.org/abs/2604.10885v1 Product Review Based on Optimized Facial Expression Detection 2026-04-13T01:20:23Z

This paper proposes a method to review public acceptance of products based on their brand by analyzing the facial expression of the customer intending to buy the product from a supermarket or hypermarket. In such cases, facial expression recognition plays a significant role in product review. Here, facial expression detection is performed by extracting feature points using a modified Harris algorithm. The modified Harris algorithm reduced the time complexity of the existing feature extraction Harris Algorithm. A comparison of time complexities of existing algorithms is done with proposed algorithm. The algorithm proved to be significantly faster and nearly accurate for the needed application by reducing the time complexity for corner points detection.

2026-04-13T01:20:23Z 9 pages, 11 figures, Published in the 2016 Ninth International Conference on Contemporary Computing (IC3), August 11-13, 2016, Noida, India. This is a pre-print version of the paper 2016 Ninth International Conference on Contemporary Computing (IC3), Noida, India, 2016 Vikrant Chaugule Abhishek D Aadheeshwar Vijayakumar Pravin Bhaskar Ramteke Shashidhar G. Koolagudi 10.1109/IC3.2016.7880213 http://arxiv.org/abs/2507.12156v3 SmokeSVD: Smoke Reconstruction from A Single View via Progressive Novel View Synthesis and Refinement with Diffusion Models 2026-04-12T04:43:31Z

Reconstructing dynamic fluids from sparse views is a long-standing and challenging problem, due to the severe lack of 3D information from insufficient view coverage. While several pioneering approaches have attempted to address this issue using differentiable rendering or novel view synthesis, they are often limited by time-consuming optimization under ill-posed conditions. We propose SmokeSVD, an efficient and effective framework to progressively reconstruct dynamic smoke from a single video by integrating the generative capabilities of diffusion models with physically guided consistency optimization. Specifically, we first propose a physically guided side-view synthesizer based on diffusion models, which explicitly incorporates velocity field constraints to generate spatio-temporally consistent side-view images frame by frame, significantly alleviating the ill-posedness of single-view reconstruction. Subsequently, we iteratively refine novel-view images and reconstruct 3D density fields through a progressive multi-stage process that renders and enhances images from increasing viewing angles, generating high-quality multi-view sequences. Finally, we estimate fine-grained density and velocity fields via differentiable advection by leveraging the Navier-Stokes equations. Our approach supports re-simulation and downstream applications while achieving superior reconstruction quality and computational efficiency compared to state-of-the-art methods.

2025-07-16T11:37:04Z Chen Li Shanshan Dong Sheng Qiu Jianmin Han Yibo Zhao Zan Gao Taku Komura Kemeng Huang http://arxiv.org/abs/2604.10393v1 CV-HoloSR: Hologram to hologram super-resolution through volume-upsampling three-dimensional scenes 2026-04-12T00:55:17Z

Existing hologram super-resolution (HSR) methods primarily focus on angle-of-view expansion. Adapting them for volumetric spatial up-sampling introduces severe quadratic depth distortion, degrading 3D focal accuracy. We propose CV-HoloSR, a complex-valued HSR framework specifically designed to preserve physically consistent linear depth scaling during volume up-sampling. Built upon a Complex-Valued Residual Dense Network (CV-RDN) and optimized with a novel depth-aware perceptual reconstruction loss, our model effectively suppresses over-smoothing to recover sharp, high-frequency interference patterns. To support this, we introduce a comprehensive large-depth-range dataset with resolutions up to 4K. Furthermore, to overcome the inherent depth bias of pre-trained encoders when scaling to massive target volumes, we integrate a parameter-efficient fine-tuning strategy utilizing complex-valued Low-Rank Adaptation (LoRA). Extensive numerical and physical optical experiments demonstrate our method's superiority. CV-HoloSR achieves a 32% improvement in perceptual realism (LPIPS of 0.2001) over state-of-the-art baselines. Additionally, our tailored LoRA strategy requires merely 200 samples, reducing training time by over 75% (from 22.5 to 5.2 hours) while successfully adapting the pre-trained backbone to unseen depth ranges and novel display configurations.

2026-04-12T00:55:17Z 33 pages, 11 figures Youchan No Jaehong Lee Daejun Choi Dae Youl Park Duksu Kim http://arxiv.org/abs/2604.10356v1 A Minimal Mathematical Model for Conducting Patterns 2026-04-11T21:47:36Z

We present a minimal mathematical model for conducting patterns that separates geometric trajectory from temporal parametrization. The model is based on a cyclic sequence of preparation and ictus points connected by cubic Hermite segments with constrained horizontal tangents, combined with a quintic timing law controlling acceleration and deceleration. A single parameter governs the balance between uniform motion and expressive emphasis. The model provides a compact yet expressive representation of conducting gestures. It is implemented as the interactive Wolfram Demonstration "Conducting Patterns" and is used in the Crusis web app.

2026-04-11T21:47:36Z 11 pages, 5 figures Tom Verhoeff