https://arxiv.org/api/S6kbugx+E/O7BIBwXFjUgNHnE1Q 2026-06-14T00:13:32Z 9323 240 15 http://arxiv.org/abs/2605.17252v1 Monocular Depth Perception Enhancement Based on Joint Shading/Contrast Model and Motion Parallax (JSM) 2026-05-17T04:19:31Z

Stereoscopic 3D displays adopt a binocular depth cue to provide depth perception. However, users should be equipped with expensive special devices to appreciate depth perception based on the binocular depth cues. Also, visual fatigue induced by the stereoscopic display is still a challenging open problem. In order to overcome this limitation, this paper proposes a novel framework, JSM, to enhance monocular depth perception, significantly improving both depth volume perception and depth range perception. The proposed framework can not only provide an enhanced depth perception on any conventional 2D display devices, but also it can be applicable to the 3D display devices since it is complementary to binocular depth cues. The qualitative evaluation, ablation study, and subjective user evaluation proved the advantages and practicability of the proposed framework.

2026-05-17T04:19:31Z Seungchul Ryu Hyunjin Yoo Tara Akhavan http://arxiv.org/abs/2605.17102v1 VoxScene: Anchor-Conditioned Voxel Diffusion for Indoor Scene Arrangement 2026-05-16T18:10:48Z

We present VoxScene, a novel anchor-conditioned voxel diffusion framework tailored for 3D scene synthesis. Current data-driven layout generation techniques typically rely on bounding proxies or implicit representations, which overlook volumetric structures. This geometric blindness inevitably leads to severe physical collisions and structural entanglement, particularly in densely populated environments. To overcome these limitations, we shift the paradigm to an explicit, object-centric voxel representation. Our pipeline sequentially synthesizes discrete volumetric occupancies conditioned on prior anchors and local context. By exploiting the mutually exclusive nature of discrete voxels, our approach eliminates spatial ambiguities and guarantees collision-free arrangements, even in highly complex environments. Furthermore, the synthesized high-fidelity voxel grids serve as discriminative geometric queries for downstream asset retrieval. Extensive experiments demonstrate the universality of our method, achieving state-of-the-art physical plausibility and unlocking shape diversity compared to existing layout planners.

2026-05-16T18:10:48Z Haotian Mao Yuhan Huang Jiatao Lin Yang Zhao Hui Wang Yiheng Zhang Yuwang Wang Chenliang Zhou Yan Zhang Fangcheng Zhong Xubo Yang http://arxiv.org/abs/2605.17011v1 Topo-GS: Continuous Volumetric Embedding of High-Dimensional Data via Topological Gaussian Splatting 2026-05-16T14:21:08Z

Dimensionality reduction algorithms map high-dimensional data into visualizable 2D or 3D spaces, but traditionally rely on a discrete point-cloud paradigm. This discrete abstraction is susceptible to visual occlusion and artificial discontinuities, often failing to represent the continuous density of the underlying manifold. To address these limitations, we introduce Topo-GS, a framework that repurposes 3D Gaussian Splatting (3DGS) to cast multidimensional projection as a meshless volumetric reconstruction process. Instead of standard photometric losses, Topo-GS is driven by local geometric constraints. By solving orthogonal Procrustes targets, the optimization enforces an As-Rigid-As-Possible prior while explicitly aligning the spatial covariance of each Gaussian to the local tangent space. Recognizing that unrolling data of varying intrinsic dimensionalities requires distinct spatial treatments, we utilize a topology-aware strategy that tailors the loss formulation to preserve either continuous 1D trajectories or cohesive 2D surfaces. Quantitative and visual evaluations demonstrate that Topo-GS successfully transforms discrete scatter plots into continuous volumetric representations, where inherent projection distortions explicitly manifest as observable geometric variations, while preserving local topological fidelity comparable to discrete baselines.

2026-05-16T14:21:08Z 7 pages, 2 figures João Paulo Gois Luis Gustavo Nonato http://arxiv.org/abs/2605.17002v1 A Single Atlas is All You Need: Decoder-Side Gaussian Splatting for Immersive Video 2026-05-16T13:54:36Z

Immersive video delivery is bottlenecked by pixel-rate constraints, making the transmission of high-resolution depth maps or explicit 3D volumetric data expensive. Decoder-Side Depth Estimation (DSDE) shifts depth computation to the client, but struggles with complex geometries, inter-view flickering, and non-Lambertian reflections. Conversely, 3D Gaussian Splatting (3DGS) offers state-of-the-art view synthesis, but transmitting splats (or their projected 2D maps) incurs prohibitive bandwidth costs and is poorly aligned with standard video codecs. We propose Decoder-Side Gaussian Splatting (DSGS), a framework that natively replaces the depth-estimation stage of DSDE with feed-forward 3DGS inference, optimizing volumetric scenes entirely on the decoder side from compressed textures and metadata. A central, counterintuitive finding is that lossy compression acts as an implicit low-pass filter stabilizing feed-forward splat prediction: compressed bitstreams exceed lossless quality while shrinking tenfold. Under extreme view sparsity (one 2D atlas comprising 4 input views), DSGS achieves a +5.79 dB BD-PSNR and +0.054 BD-SSIM gain over the DSDE anchor while reducing maximum inter-view Delta IV-PSNR from 17.2 dB to 6.4 dB, minimizing the domain shift between transmitted and virtual viewports.

2026-05-16T13:54:36Z Dawid Mieloch Stuart Perry http://arxiv.org/abs/2606.06500v1 Cubic Hermite Lattice Structures 2026-05-16T09:06:11Z

Lattice structures are of growing importance in additive manufacturing, where complex internal geometries are increasingly required for lightweight, high surface-to-volume ratios, multifunctionality, and other superior mechanical properties. Conventional lattice modeling methods typically represent struts with simple primitives, such as cylinders or cones, limiting geometric diversity and the design space. Although recent efforts have increased strut-shape complexity to address this issue, they often do so at the expense of computational efficiency and modeling robustness. As a result, achieving both rich geometric expressiveness and efficient computation remains a challenging problem. In this paper, we present an implicit modeling method that expands the design and optimization space of lattice structures while preserving the modeling robustness and efficiency of implicit representations. In our method, each strut is defined as a convolution surface over a skeletal graph, and its profile shape is controlled by a cubic Hermite curve. By exploiting the polynomial structure of both the convolution kernel and the cubic Hermite curve-controlled profile, we derive analytical expressions for efficient field evaluation, avoiding costly and unstable numerical computation. Four case studies have been conducted to validate the proposed method in terms of profile shape diversity, graded lattice modeling, as well as slicing robustness and efficiency.

2026-05-16T09:06:11Z Accepted by ASME IDETC/CIE 2026 ASME IDETC/CIE 2026 Yaonaiming Zhao Yuntao Ma Guoyue Luo Qiang Zou http://arxiv.org/abs/2605.16795v1 3DPhysVideo: Consistency-Guided Flow SDE for Video Generation via 3D Scene Reconstruction and Physical Simulation 2026-05-16T03:56:52Z

Video generative models have made remarkable progress, yet they often yield visual artifacts that violate grounding in physical dynamics. Recent works such as PhysGen3D tackle single image-to-3D physics through mesh reconstruction and Physically-Based Rendering, but challenges remain in modeling fluid dynamics, multi-object interactions and photorealism. This work introduces 3DPhysVideo, a novel training-free pipeline that generates physically realistic videos from a single image. We repurpose an off-the-shelf video model for two stages. First, we use it as a novel view synthesizer to reconstruct complete 360-degree 3D scene geometry by guiding the image-to-video (I2V) flow model with rendered point clouds. Second, after applying physics solvers to this geometry, the physically simulated point cloud is used to guide the same I2V flow model to synthesize final, high-quality videos. Consistency-Guided Flow SDE, which decomposes the predicted velocity of the I2V flow model into denoising and consistency bias, enforces consistency to the conditional inputs, allowing us to effectively repurpose the model for both 3D reconstruction and simulation-guided video generation. In the diverse experiments including multi-objects, and fluid interaction scenes, our method successfully bridges the gap from single-images to physically plausible videos, while remaining efficient to run on a single consumer GPU. It outperforms state-of-the-art baselines on GPT-based scores, VideoPhy benchmark and human evaluation.

2026-05-16T03:56:52Z Project page: https://hwidong-kim.github.io/projects/3DPhysVideo Hwidong Kim Yunho Kim Tae-Kyun Kim http://arxiv.org/abs/2605.16748v1 Genflow Ad Studio: A Compound AI Architecture for Brand-Aligned, Self-Correcting Video Generation 2026-05-16T02:03:53Z

Recent advancements in generative video models demonstrate high visual fidelity, yet their integration into enterprise environments is restricted by temporal inconsistencies and severe brand misalignment. Current monolithic architectures struggle to enforce rigid brand constraints, frequently hallucinating unapproved visual assets. We introduce Genflow, a Compound AI System designed to enforce brand consistency in generative media production. Our architecture integrates a retrieval-based 'Brand DNA' extraction module to parameterize generation according to established corporate identity guidelines. Furthermore, we implement an Adversarial Multi-Agent Quality Control (QC) loop. Instead of a single-pass generation, this pipeline employs evaluator agents to iteratively critique generated frames against the extracted parameters, prompting generator models to refine outputs until a deterministic consensus is reached. By transitioning to a multi-stage, self-correcting pipeline, Genflow improved the yield of brand-compliant video generations from 42% to 89%, establishing a robust framework for scalable, enterprise-grade generative systems.

2026-05-16T02:03:53Z 6 pages, 2 figures, 2 tables. Accepted to the ACM Conference on AI and Agentic Systems (CAIS '26). Includes demo video and code repository links ACM Conference on AI and Agentic Systems (CAIS '26), May 26-29, 2026, San Jose, CA, USA Debanshu Das Lavi Nigam Sunil Kumar Jang Bahadur Gopala Dhar 10.1145/3786335.3813213 http://arxiv.org/abs/2605.16697v1 Ordered Front-to-back Any-Hit Traversal in RTX 2026-05-15T23:21:04Z

We look at the problem of Ordered Front-To-Back Any-Hit Traversal (FTB); i.e., a traversal that iterates through successive hits along a ray in a guaranteed front to back-sorted order, and without skip- ping any intersections even if they occur at the same distance. We describe multiple different ways of solving this problem within the constraints of the existing ray tracing pipeline, and evaluate the different realizations.

2026-05-15T23:21:04Z Ingo Wald http://arxiv.org/abs/2605.16661v1 Beyond One-Size-Fits-All: User Strategies for Simplification Technique and Level Selection in Responsive Line Charts 2026-05-15T21:58:05Z

Simplifying line charts for responsive displays typically applies a single algorithm uniformly across devices, despite the availability of multiple techniques that preserve different signal characteristics (e.g., peaks, trends, periodicity). We investigate whether users benefit from algorithmic choice when adapting charts across screen sizes. In a within-subjects study (N=30), participants simplified nine datasets under three conditions: single pre-assigned technique (C1), multiple techniques (C2), and multiple techniques with manual point selection (C3), each with control over simplification level. We found that users adapted technique selections across datasets rather than devices, leveraging dataset-level strategies rather than per-device optimization. Additionally, interaction complexity did not always increase engagement uniformly, suggesting that responsive simplification tools should balance algorithmic flexibility with progressive disclosure and strong defaults. Supplemental materials are available at https://osf.io/yjp76/?view_only=b77b5e97f0cc4f689fbf48ad0d965af3.

2026-05-15T21:58:05Z Rifat Ara Proma Paul Rosen http://arxiv.org/abs/2602.07272v2 VideoNeuMat: Neural Material Extraction from Generative Video Models 2026-05-15T19:27:27Z

Creating photorealistic materials for 3D rendering requires exceptional artistic skill. Generative models for materials could help, but are currently limited by the lack of high-quality training data. While recent video generative models effortlessly produce realistic material appearances, this knowledge remains entangled with geometry and lighting. We present VideoNeuMat, a two-stage pipeline that extracts reusable neural material assets from video diffusion models. First, we finetune a large video model (Wan 2.1 14B) to generate material sample videos under controlled camera and lighting trajectories, effectively creating a "virtual gonioreflectometer" that preserves the model's material realism while learning a structured measurement pattern. Second, we reconstruct compact neural materials from these videos through a Large Reconstruction Model (LRM) finetuned from a smaller Wan 1.3B video backbone. From 17 generated video frames, our LRM performs single-pass inference to predict neural material parameters that generalize to novel viewing and lighting conditions. The resulting materials exhibit realism and diversity far exceeding the limited synthetic training data, demonstrating that material knowledge can be successfully transferred from internet-scale video models into standalone, reusable neural 3D assets.

2026-02-06T23:49:10Z Bowen Xue Saeed Hadadan Zheng Zeng Fabrice Rousselle Zahra Montazeri Milos Hasan http://arxiv.org/abs/2605.16544v1 TARIPlay: A Test Framework for AR Applications based on Interactive Area Tracking in Playback Videos 2026-05-15T18:39:02Z

As Augmented Reality (AR) becomes more and more embedded in daily life, ensuring the quality, safety, and reliability of AR applications is increasingly important. However, AR apps present unique challenges for automated testing. Unlike static GUI layouts in traditional mobile apps, AR apps acquire their interaction interface from the surrounding environment, which is volatile and non-deterministic. Recent advancements like ARCore Playback and ARKit Replay allow developers to reuse real-world scenarios by recording and playing back enriched videos, enabling more feasible automated AR testing. However, using playback videos introduces two major challenges: test inputs must be timed precisely, and interactive areas in the video are dynamic, irregular, and difficult to identify. To address these challenges, we propose TARIPlay, a framework that analyzes playback videos to detect, track, and filter proper interactive areas over time for automated testing. In particular, TARIPlay identifies viable test opportunities based on criteria like stability and visibility, then feeds this information to an automated testing engine to simulate user interactions. We perform an experiment with four open-source AR apps and nine playback videos. Evaluation results show that TARIPlay significantly outperforms the existing tool Monkey in test coverage (55.8% over 41.98% on branch coverage) of AR-related code, and can also be used to assess the quality of playback videos for testing suitability.

2026-05-15T18:39:02Z 13 pages, 7 figures, 3 tables. Accepted at ICSE 2026 In Proceedings of the 2026 IEEE/ACM 48th International Conference on Software Engineering (ICSE '26), April 12-18, 2026, Rio de Janeiro, Brazil Seyed Amir Mousavi Xiaoyin Wang 10.1145/3744916.3787817 http://arxiv.org/abs/2605.16158v1 Smart target point control for Gaussian Splatting methods 2026-05-15T16:38:45Z

Standard Gaussian splatting methods rely on heuristic densification and pruning to adaptively allocate primitives during training, and the resulting Gaussian count strongly influences both reconstruction quality and runtime. This makes comparisons across methods fragile: improvements can stem from higher representational capacity rather than algorithmic design. A common and naive workaround for this is hard-stopping or budgeting densification/pruning once a target count is reached, which biases training because different methods hit the cap at different times, yielding non-uniform densify/prune exposure across views and uneven point distributions. We propose a target point control scheme that preserves the standard densification window and cadence, but adjusts only the existing densification and opacity-culling hyper-parameters to track a quadratic target count trajectory. This quota-governor reaches the desired count by 15k iterations without abrupt cutoffs, ensuring that all methods and views receive equal densification and pruning cycles, enabling fairer, capacity-matched evaluation.

2026-05-15T16:38:45Z Pratik Singh Bisht Andreas Kolb http://arxiv.org/abs/2605.15875v1 Distributed Affine Body Dynamics with Adaptive Consensus 2026-05-15T11:53:22Z

Affine Body Dynamics (ABD) within the Incremental Potential Contact (IPC) framework provides accurate simulation of extremely stiff solids exhibiting near-rigid behavior, with strict non-penetration guarantees. However, IPC's globally coupled barrier constraints hinder scalable execution across multiple GPUs and compute nodes. We propose a distributed formulation of ABD using a consensus-based ADMM scheme. Each compute node solves its local ABD subproblem in parallel, followed by a global consensus step that enforces consistency among shared boundary bodies. The proposed method preserves IPC-level robustness and global consistency under distributed execution. Experiments demonstrate stable convergence, non-penetration, and efficient scaling on large-scale scenes across multiple nodes.

2026-05-15T11:53:22Z Jiafeng Liu Wenhui Zhou Xinming Pei Yifan Peng Huamin Wang Yin Yang Lei Lan Weiwei Xu http://arxiv.org/abs/2605.15816v1 StippleDiffusion: Capacity-Constrained Stippling using Controlled Diffusion 2026-05-15T10:12:42Z

Stipple patterns, point sets whose local density tracks a target image, are traditionally produced by per-density iterative optimizers, which are slow, non-differentiable, and must be re-run from scratch for each new target. Learned alternatives have so far addressed only unconditional point generation; capacity-constrained, image-conditioned stippling has remained out of reach. We present the first diffusion-based sampler that simultaneously satisfies a learned local point-distribution prior and a continuous, image-defined capacity constraint at inference. The method is a ControlNet branch built on top of an optimal-transport-grid point-set diffusion baseline, conditioned on the target density map and a high-resolution image. Two design choices make the combination tractable: training and inference are restricted to the late-stage denoising regime, initialized from a density-weighted rejection sample, and the standard zero-convolution injection is replaced with a sigmoid-gated 1x1 projection that preserves the base model's blue-noise structure under hard density signals. A single trained checkpoint accepts arbitrary target densities at inference, generalizes to point budgets that were not seen during training, and produces stipples in time nearly independent of the output point count. On the Icons-50 benchmark, our learned sampler reaches parity with per-density-optimized baselines on every reported metric while remaining differentiable end-to-end.

2026-05-15T10:12:42Z 12 pages, 10 figures Ofir Gilad Aleksander Plocharski Przemyslaw Musialski Andrei Sharf http://arxiv.org/abs/2605.15681v1 DealMaTe: Multi-Dimensional Material Transfer via Diffusion Transformer 2026-05-15T07:06:39Z

Recently, diffusion-based material transfer methods rely on image fine-tuning or complex architectures with auxiliary networks but face challenges such as text dependency, additional computational costs, and feature misalignment. To address these limitations, we propose \textbf{DealMaTe}, using \underline{\textbf{de}}pth, norm\underline{\textbf{a}}l, and \underline{\textbf{l}}ighting images for \underline{\textbf{ma}}terial \underline{\textbf{t}}ransf\underline{\textbf{e}}r. DealMaTe is a simplified diffusion framework that eliminates text guidance and reference networks. We design a lightweight 3D information injection method, Multi-Dim 3D Shader LoRA, which, without modifying the base model weights, enables compatible control conditions and achieves harmonious and stable results. Additionally, we optimize the attention mechanism with Shader Causal Mutual Attention and key-value (KV) caching to reduce inference latency caused by multiple conditions, improve computational efficiency, and achieve high-quality material transfer results with low architectural complexity. Extensive experiments covering a wide variety of objects and lighting conditions consistently demonstrate that DealMaTe achieves remarkable high-fidelity material transfer under arbitrary input materials. The code is available at https://github.com/haha-lisa/DealMaTe.

2026-05-15T07:06:39Z Nisha Huang Yizhou Lin Jie Guo Xiu Li Tong-Yee Lee Zitong Yu