https://arxiv.org/api/OOPs3mlnpU345X2NkDZfy1sNomM 2026-06-17T19:41:02Z 9346 765 15 http://arxiv.org/abs/2603.05152v1 SSR-GS: Separating Specular Reflection in Gaussian Splatting for Glossy Surface Reconstruction 2026-03-05T13:24:13Z

In recent years, 3D Gaussian splatting (3DGS) has achieved remarkable progress in novel view synthesis. However, accurately reconstructing glossy surfaces under complex illumination remains challenging, particularly in scenes with strong specular reflections and multi-surface interreflections. To address this issue, we propose SSR-GS, a specular reflection modeling framework for glossy surface reconstruction. Specifically, we introduce a prefiltered Mip-Cubemap to model direct specular reflections efficiently, and propose an IndiASG module to capture indirect specular reflections. Furthermore, we design Visual Geometry Priors (VGP) that couple a reflection-aware visual prior via a reflection score (RS) to downweight the photometric loss contribution of reflection-dominated regions, with geometry priors derived from VGGT, including progressively decayed depth supervision and transformed normal constraints. Extensive experiments on both synthetic and real-world datasets demonstrate that SSR-GS achieves state-of-the-art performance in glossy surface reconstruction.

2026-03-05T13:24:13Z Project page: https://gsflyer.github.io/SSR-GS/ Ningjing Fan Yiqun Wang http://arxiv.org/abs/2603.05079v1 Beyond Positional Encoding: A 5D Spatio-Directional Hash Encoding 2026-03-05T11:52:07Z

In this work, we propose a new spatio-directional neural encoding that is compact and efficient, and supports all-frequency signals in both space and direction. Current learnable encodings focus on Cartesian orthonormal spaces, which have been shown to be useful for representing high-frequency signals in the spatial domain. However, directly applying these encodings in the directional domain results in distortions, singularities, and discontinuities. As a result, most related works have used more traditional encodings for the directional domain, which lack the expressivity of learnable neural encodings. We address this by proposing a new angular encoding that generalizes the hash-grid approach from proach from Müller et al. [2022] to the directional domain by encoding directions using a hierarchical geodesic grid. Each vertex in the geodesic grid stores a learnable latent parameter, which is used to feed a neural network. Armed with this directional encoding, we propose a five-dimensional encoding for spatio-directional signals. We demonstrate that both encodings significantly outperform other hash-based alternatives. We apply our five-dimensional encoding in the context of neural path guiding, outperforming the state of the art by up to a factor of 2 in terms of variance reduction for the same number of samples.

2026-03-05T11:52:07Z Philippe Weier Lukas Bode Philipp Slusallek Adrián Jarabo Sébastien Speierer http://arxiv.org/abs/2603.04958v1 Revisiting an Old Perspective Projection for Monocular 3D Morphable Models Regression 2026-03-05T08:52:20Z

We introduce a novel camera model for monocular 3D Morphable Model (3DMM) regression methods that effectively captures the perspective distortion effect commonly seen in close-up facial images. Fitting 3D morphable models to video is a key technique in content creation. In particular, regression-based approaches have produced fast and accurate results by matching the rendered output of the morphable model to the target image. These methods typically achieve stable performance with orthographic projection, which eliminates the ambiguity between focal length and object distance. However, this simplification makes them unsuitable for close-up footage, such as that captured with head-mounted cameras. We extend orthographic projection with a new shrinkage parameter, incorporating a pseudo-perspective effect while preserving the stability of the original projection. We present several techniques that allow finetuning of existing models, and demonstrate the effectiveness of our modification through both quantitative and qualitative comparisons using a custom dataset recorded with head-mounted cameras.

2026-03-05T08:52:20Z WACV 2026, https://zukunfcs.github.io/RevisitingAnOldPerspective/ Toby Chong Ryota Nakajima http://arxiv.org/abs/2512.05106v3 NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation 2026-03-05T06:46:06Z

Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion (φ-PD), a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD significantly improves sim-to-real planner transfer performance. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our \href{https://yuzeng-at-tri.github.io/ppd-page/}{project page}.

2025-12-04T18:59:18Z Yu Zeng Charles Ochoa Mingyuan Zhou Vishal M. Patel Vitor Guizilini Rowan McAllister http://arxiv.org/abs/2603.04847v1 GloSplat: Joint Pose-Appearance Optimization for Faster and More Accurate 3D Reconstruction 2026-03-05T06:02:50Z

Feature extraction, matching, structure from motion (SfM), and novel view synthesis (NVS) have traditionally been treated as separate problems with independent optimization objectives. We present GloSplat, a framework that performs \emph{joint pose-appearance optimization} during 3D Gaussian Splatting training. Unlike prior joint optimization methods (BARF, NeRF--, 3RGS) that rely purely on photometric gradients for pose refinement, GloSplat preserves \emph{explicit SfM feature tracks} as first-class entities throughout training: track 3D points are maintained as separate optimizable parameters from Gaussian primitives, providing persistent geometric anchors via a reprojection loss that operates alongside photometric supervision. This architectural choice prevents early-stage pose drift while enabling fine-grained refinement -- a capability absent in photometric-only approaches. We introduce two pipeline variants: (1) \textbf{GloSplat-F}, a COLMAP-free variant using retrieval-based pair selection for efficient reconstruction, and (2) \textbf{GloSplat-A}, an exhaustive matching variant for maximum quality. Both employ global SfM initialization followed by joint photometric-geometric optimization during 3DGS training. Experiments demonstrate that GloSplat-F achieves state-of-the-art among COLMAP-free methods while GloSplat-A surpasses all COLMAP-based baselines.

2026-03-05T06:02:50Z Tianyu Xiong Rui Li Linjie Li Jiaqi Yang http://arxiv.org/abs/2603.15648v1 Improving Generative Adversarial Network Generalization for Facial Expression Synthesis 2026-03-04T20:20:58Z

Facial expression synthesis aims to generate realistic facial expressions while preserving identity. Existing conditional generative adversarial networks (GANs) achieve excellent image-to-image translation results, but their performance often degrades when test images differ from the training dataset. We present Regression GAN (RegGAN), a model that learns an intermediate representation to improve generalization beyond the training distribution. RegGAN consists of two components: a regression layer with local receptive fields that learns expression details by minimizing the reconstruction error through a ridge regression loss, and a refinement network trained adversarially to enhance the realism of generated images. We train RegGAN on the CFEE dataset and evaluate its generalization performance both on CFEE and challenging out-of-distribution images, including celebrity photos, portraits, statues, and avatar renderings. For evaluation, we employ four widely used metrics: Expression Classification Score (ECS) for expression quality, Face Similarity Score (FSS) for identity preservation, QualiCLIP for perceptual realism, and Fréchet Inception Distance (FID) for assessing both expression quality and realism. RegGAN outperforms six state-of-the-art models in ECS, FID, and QualiCLIP, while ranking second in FSS. Human evaluations indicate that RegGAN surpasses the best competing model by 25% in expression quality, 26% in identity preservation, and 30% in realism.

2026-03-04T20:20:58Z Multimedia Tools and Applications (2026) Arbish Akram Nazar Khan Arif Mahmood 10.1007/s11042-026-21213-w http://arxiv.org/abs/2603.05542v1 Human-Data Interaction, Exploration, and Visualization in the AI Era: Challenges and Opportunities 2026-03-04T14:18:17Z

The rapid advancement of AI is transforming human-centered systems, with profound implications for human-AI interaction, human-data interaction, and visual analytics. In the AI era, data analysis increasingly involves large-scale, heterogeneous, and multimodal data that is predominantly unstructured, as well as foundation models such as LLMs and VLMs, which introduce additional uncertainty into analytical processes. These shifts expose persistent challenges for human-data interactive systems, including perceptually misaligned latency, scalability constraints, limitations of existing interaction and exploration paradigms, and growing uncertainty regarding the reliability and interpretability of AI-generated insights. Responding to these challenges requires moving beyond conventional efficiency and scalability metrics, redefining the roles of humans and machines in analytical workflows, and incorporating cognitive, perceptual, and design principles into every level of the human-data interaction stack. This paper investigates the challenges introduced by recent advances in AI and examines how these developments are reshaping the ways users engage with data, while outlining limitations and open research directions for building human-centered AI systems for interactive data analysis in the AI era.

2026-03-04T14:18:17Z Jean-Daniel Fekete Yifan Hu Dominik Moritz Arnab Nandi Senjuti Basu Roy Eugene Wu Nikos Bikakis George Papastefanatos Panos K. Chrysanthis Guoliang Li Lingyun Yu http://arxiv.org/abs/2603.04090v1 EgoPoseFormer v2: Accurate Egocentric Human Motion Estimation for AR/VR 2026-03-04T14:01:16Z

Egocentric human motion estimation is essential for AR/VR experiences, yet remains challenging due to limited body coverage from the egocentric viewpoint, frequent occlusions, and scarce labeled data. We present EgoPoseFormer v2, a method that addresses these challenges through two key contributions: (1) a transformer-based model for temporally consistent and spatially grounded body pose estimation, and (2) an auto-labeling system that enables the use of large unlabeled datasets for training. Our model is fully differentiable, introduces identity-conditioned queries, multi-view spatial refinement, causal temporal attention, and supports both keypoints and parametric body representations under a constant compute budget. The auto-labeling system scales learning to tens of millions of unlabeled frames via uncertainty-aware semi-supervised training. The system follows a teacher-student schema to generate pseudo-labels and guide training with uncertainty distillation, enabling the model to generalize to different environments. On the EgoBody3M benchmark, with a 0.8 ms latency on GPU, our model outperforms two state-of-the-art methods by 12.2% and 19.4% in accuracy, and reduces temporal jitter by 22.2% and 51.7%. Furthermore, our auto-labeling system further improves the wrist MPJPE by 13.1%.

2026-03-04T14:01:16Z Accepted to CVPR 2026 Zhenyu Li Sai Kumar Dwivedi Filip Maric Carlos Chacon Nadine Bertsch Filippo Arcadu Tomas Hodan Michael Ramamonjisoa Peter Wonka Amy Zhao Robin Kips Cem Keskin Anastasia Tkach Chenhongyi Yang http://arxiv.org/abs/2603.02887v2 Generalized non-exponential Gaussian splatting 2026-03-04T12:56:06Z

In this work we generalize 3D Gaussian splatting (3DGS) to a wider family of physically-based alpha-blending operators. 3DGS has become the standard de-facto for radiance field rendering and reconstruction, given its flexibility and efficiency. At its core, it is based on alpha-blending sorted semitransparent primitives, which in the limit converges to the classic radiative transfer function with exponential transmittance. Inspired by recent research on non-exponential radiative transfer, we generalize the image formation model of 3DGS to non-exponential regimes. Based on this generalization, we use a quadratic transmittance to define sub-linear, linear, and super-linear versions of 3DGS, which exhibit faster-than-exponential decay. We demonstrate that these new non-exponential variants achieve similar quality than the original 3DGS but significantly reduce the number of overdraws, which result on speed-ups of up to $4\times$ in complex real-world captures, on a ray-tracing-based renderer.

2026-03-03T11:36:13Z 13 pages, 6 figures, 4 tables Sébastien Speierer Adrian Jarabo http://arxiv.org/abs/2603.03978v1 Map-Agnostic And Interactive Safety-Critical Scenario Generation via Multi-Objective Tree Search 2026-03-04T12:19:02Z

Generating safety-critical scenarios is essential for validating the robustness of autonomous driving systems, yet existing methods often struggle to produce collisions that are both realistic and diverse while ensuring explicit interaction logic among traffic participants. This paper presents a novel framework for traffic-flow level safety-critical scenario generation via multi-objective Monte Carlo Tree Search (MCTS). We reframe trajectory feasibility and naturalistic behavior as optimization objectives within a unified evaluation function, enabling the discovery of diverse collision events without compromising realism. A hybrid Upper Confidence Bound (UCB) and Lower Confidence Bound (LCB) search strategy is introduced to balance exploratory efficiency with risk-averse decision-making. Furthermore, our method is map-agnostic and supports interactive scenario generation with each vehicle individually powered by SUMO's microscopic traffic models, enabling realistic agent behaviors in arbitrary geographic locations imported from OpenStreetMap. We validate our approach across four high-risk accident zones in Hong Kong's complex urban environments. Experimental results demonstrate that our framework achieves an 85\% collision failure rate while generating trajectories with superior feasibility and comfort metrics. The resulting scenarios exhibit greater complexity, as evidenced by increased vehicle mileage and CO$_2$ emissions. Our work provides a principled solution for stress testing autonomous vehicles through the generation of realistic yet infrequent corner cases at traffic-flow level.

2026-03-04T12:19:02Z Wenyun Li Zejian Deng Chen Sun http://arxiv.org/abs/2601.08554v4 Maintaining Leiden Communities in Large Dynamic Graphs 2026-03-04T03:12:04Z

Community detection is a foundational capability in large-scale industrial graph analytics, powering applications such as fraud-ring discovery, recommendation systems, and hierarchical indexing for retrieval-augmented generation. Among modularity-based methods, the Leiden algorithm has been widely adopted in production because it delivers high-quality communities with connectivity guarantees. However, real-world graphs evolve continuously, and timely community updates are needed to keep downstream features and retrieval indices fresh. Meanwhile, existing dynamic Leiden approaches recompute the communities whenever their vertices and edges change, thereby almost degrading to near-full recomputation under frequent updates. To alleviate the efficiency issue, we study the efficient maintenance of Leiden communities in large dynamic graphs and present a novel algorithm, called Hierarchical Incremental Tree Leiden (HIT-Leiden). We first provide a boundedness analysis showing that prior incremental Leiden methods can incur essentially unbounded work even for small updates. Guided by this analysis, we propose HIT-Leiden which effectively reduces the range of affected vertices by maintaining connected components and hierarchical community structures. Extensive experiments on large real-world dynamic graphs demonstrate that HIT-Leiden achieves community quality comparable to the state-of-the-art competitors while delivering speedups of up to five orders of magnitude over existing solutions. The production deployment results show that HIT-Leiden meets stringent latency requirements under high-rate updates at scale.

2026-01-13T13:39:22Z Chunxu Lin Yumao Xie Yixiang Fang Yongmin Hu Yingqian Hu Chen Cheng http://arxiv.org/abs/2603.03231v1 Quadratic-Order Geodesics on Meshes 2026-03-03T18:23:44Z

We introduce a novel representation and optimization framework for discrete geodesics on triangle meshes that reduces artifacts of linear methods on uneven and coarse discretizations. Our method computes squared geodesic distances from point and curve sources using piecewise-quadratic elements, exactly reproducing flat distances regardless of mesh quality while improving accuracy over existing approaches on curved meshes. The formulation naturally supports sources placed anywhere on the mesh, not just at vertices.

2026-03-03T18:23:44Z Yue Ruan Albert Chern Tzu-Mao Li Kartic Subr Amir Vaxman http://arxiv.org/abs/2506.24108v3 Navigating with Annealing Guidance Scale in Diffusion Space 2026-03-03T16:19:23Z

Denoising diffusion models excel at generating high-quality images conditioned on text prompts, yet their effectiveness heavily relies on careful guidance during the sampling process. Classifier-Free Guidance (CFG) provides a widely used mechanism for steering generation by setting the guidance scale, which balances image quality and prompt alignment. However, the choice of the guidance scale has a critical impact on the convergence toward a visually appealing and prompt-adherent image. In this work, we propose an annealing guidance scheduler which dynamically adjusts the guidance scale over time based on the conditional noisy signal. By learning a scheduling policy, our method addresses the temperamental behavior of CFG. Empirical results demonstrate that our guidance scheduler significantly enhances image quality and alignment with the text prompt, advancing the performance of text-to-image generation. Notably, our novel scheduler requires no additional activations or memory consumption, and can seamlessly replace the common classifier-free guidance, offering an improved trade-off between prompt alignment and quality.

2025-06-30T17:55:00Z SIGGRAPH Asia, 2025. Project page: https://annealing-guidance.github.io/annealing-guidance/ ACM Trans. Graph., Vol. 44, No. 6, Article 5. Publication date: December 2025 Shai Yehezkel Omer Dahary Andrey Voynov Daniel Cohen-Or 10.1145/3757377.3763830 http://arxiv.org/abs/2412.09646v2 RealOSR: Latent Guidance Boosts Diffusion-based Real-world Omnidirectional Image Super-Resolutions 2026-03-03T14:23:45Z

Omnidirectional image super-resolution (ODISR) aims to upscale low-resolution (LR) omnidirectional images (ODIs) to high-resolution (HR), catering to the growing demand for detailed visual content across a $ 180^{\circ}\times360^{\circ}$ viewport. Existing ODISR methods are limited by simplified degradation assumptions (e.g., bicubic downsampling), failing to model and exploit the real-world degradation information. Recent latent-based diffusion approaches using condition guidance suffer from slow inference due to their hundreds of updating steps and frequent use of VAE. To tackle these challenges, we propose \textbf{RealOSR}, a diffusion-based framework tailored for real-world ODISR, featuring efficient latent-based condition guidance within a one-step denoising paradigm. Central to efficient latent-based condition guidance is the proposed \textbf{Latent Gradient Alignment Routing (LaGAR)}, a lightweight module that enables effective pixel-latent space interactions and simulates gradient descent directly in the latent space, thereby leveraging the semantic richness and multi-scale features captured by the denoising UNet. Compared to the recent diffusion-based ODISR method, OmniSSR, RealOSR achieves significant improvements in visual quality and over \textbf{200$\times$} inference acceleration. Our code and models will be released upon acceptance.

2024-12-11T06:23:14Z Xuhan Sheng Runyi Li Bin Chen Weiqi Li Xu Jiang Jian Zhang http://arxiv.org/abs/2603.02986v1 VIRGi: View-dependent Instant Recoloring of 3D Gaussians Splats 2026-03-03T13:41:17Z

3D Gaussian Splatting (3DGS) has recently transformed the fields of novel view synthesis and 3D reconstruction due to its ability to accurately model complex 3D scenes and its unprecedented rendering performance. However, a significant challenge persists: the absence of an efficient and photorealistic method for editing the appearance of the scene's content. In this paper we introduce VIRGi, a novel approach for rapidly editing the color of scenes modeled by 3DGS while preserving view-dependent effects such as specular highlights. Key to our method are a novel architecture that separates color into diffuse and view-dependent components, and a multi-view training strategy that integrates image patches from multiple viewpoints. Improving over the conventional single-view batch training, our 3DGS representation provides more accurate reconstruction and serves as a solid representation for the recoloring task. For 3DGS recoloring, we then introduce a rapid scheme requiring only one manually edited image of the scene from the end-user. By fine-tuning the weights of a single MLP, alongside a module for single-shot segmentation of the editable area, the color edits are seamlessly propagated to the entire scene in just two seconds, facilitating real-time interaction and providing control over the strength of the view-dependent effects. An exhaustive validation on diverse datasets demonstrates significant quantitative and qualitative advancements over competitors based on Neural Radiance Fields representations.

2026-03-03T13:41:17Z IEEE Transactions on Pattern Analysis and Machine Intelligence. 2026 Feb 24 Alessio Mazzucchelli Ivan Ojeda-Martin Fernando Rivas-Manzaneque Elena Garces Adrian Penate-Sanchez Francesc Moreno-Noguer 10.1109/TPAMI.2026.3665650