https://arxiv.org/api/EAMdmv0pKVfJlV+znyNoacDP+2c 2026-06-25T23:27:39Z 9383 1380 15 http://arxiv.org/abs/2510.21682v1 WorldGrow: Generating Infinite 3D World 2025-10-24T17:39:52Z

We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance. Existing methods face key challenges: 2D-lifting approaches suffer from geometric and appearance inconsistencies across views, 3D implicit representations are hard to scale up, and current 3D foundation models are mostly object-centric, limiting their applicability to scene-level generation. Our key insight is leveraging strong generation priors from pre-trained 3D models for structured scene block generation. To this end, we propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis. Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity. Evaluated on the large-scale 3D-FRONT dataset, WorldGrow achieves SOTA performance in geometry reconstruction, while uniquely supporting infinite scene generation with photorealistic and structurally consistent outputs. These results highlight its capability for constructing large-scale virtual environments and potential for building future world models.

2025-10-24T17:39:52Z Project page: https://world-grow.github.io/ Code: https://github.com/world-grow/WorldGrow Sikuang Li Chen Yang Jiemin Fang Taoran Yi Jia Lu Jiazhong Cen Lingxi Xie Wei Shen Qi Tian http://arxiv.org/abs/2510.21654v1 Group Inertial Poser: Multi-Person Pose and Global Translation from Sparse Inertial Sensors and Ultra-Wideband Ranging 2025-10-24T17:11:50Z

Tracking human full-body motion using sparse wearable inertial measurement units (IMUs) overcomes the limitations of occlusion and instrumentation of the environment inherent in vision-based approaches. However, purely IMU-based tracking compromises translation estimates and accurate relative positioning between individuals, as inertial cues are inherently self-referential and provide no direct spatial reference for others. In this paper, we present a novel approach for robustly estimating body poses and global translation for multiple individuals by leveraging the distances between sparse wearable sensors - both on each individual and across multiple individuals. Our method Group Inertial Poser estimates these absolute distances between pairs of sensors from ultra-wideband ranging (UWB) and fuses them with inertial observations as input into structured state-space models to integrate temporal motion patterns for precise 3D pose estimation. Our novel two-step optimization further leverages the estimated distances for accurately tracking people's global trajectories through the world. We also introduce GIP-DB, the first IMU+UWB dataset for two-person tracking, which comprises 200 minutes of motion recordings from 14 participants. In our evaluation, Group Inertial Poser outperforms previous state-of-the-art methods in accuracy and robustness across synthetic and real-world data, showing the promise of IMU+UWB-based multi-human motion capture in the wild. Code, models, dataset: https://github.com/eth-siplab/GroupInertialPoser

2025-10-24T17:11:50Z Accepted by ICCV 2025, Code: https://github.com/eth-siplab/GroupInertialPoser Ying Xue Jiaxi Jiang Rayan Armani Dominik Hollidt Yi-Chi Liao Christian Holz http://arxiv.org/abs/2510.21432v1 ArtiLatent: Realistic Articulated 3D Object Generation via Structured Latents 2025-10-24T13:08:15Z

We propose ArtiLatent, a generative framework that synthesizes human-made 3D objects with fine-grained geometry, accurate articulation, and realistic appearance. Our approach jointly models part geometry and articulation dynamics by embedding sparse voxel representations and associated articulation properties, including joint type, axis, origin, range, and part category, into a unified latent space via a variational autoencoder. A latent diffusion model is then trained over this space to enable diverse yet physically plausible sampling. To reconstruct photorealistic 3D shapes, we introduce an articulation-aware Gaussian decoder that accounts for articulation-dependent visibility changes (e.g., revealing the interior of a drawer when opened). By conditioning appearance decoding on articulation state, our method assigns plausible texture features to regions that are typically occluded in static poses, significantly improving visual realism across articulation configurations. Extensive experiments on furniture-like objects from PartNet-Mobility and ACD datasets demonstrate that ArtiLatent outperforms existing approaches in geometric consistency and appearance fidelity. Our framework provides a scalable solution for articulated 3D object synthesis and manipulation.

2025-10-24T13:08:15Z accepted to SIGGRAPH Asia; Project page: https://chenhonghua.github.io/MyProjects/ArtiLatent/ Honghua Chen Yushi Lan Yongwei Chen Xingang Pan http://arxiv.org/abs/2510.21404v1 PC-NCLaws: Physics-Embedded Conditional Neural Constitutive Laws for Elastoplastic Materials 2025-10-24T12:47:56Z

While data-driven methods offer significant promise for modeling complex materials, they often face challenges in generalizing across diverse physical scenarios and maintaining physical consistency. To address these limitations, we propose a generalizable framework called Physics-Embedded Conditional Neural Constitutive Laws for Elastoplastic Materials, which combines the partial differential equations with neural networks. Specifically, the model employs two separate neural networks to model elastic and plastic constitutive laws. Simultaneously, the model incorporates physical parameters as conditional inputs and is trained on comprehensive datasets encompassing multiple scenarios with varying physical parameters, thereby enabling generalization across different properties without requiring retraining for each individual case. Furthermore, the differentiable architecture of our model, combined with its explicit parameter inputs, enables the inverse estimation of physical parameters from observed motion sequences. This capability extends our framework to objects with unknown or unmeasured properties. Experimental results demonstrate state-of-the-art performance in motion reconstruction, robust long-term prediction, geometry generalization, and precise parameters estimation for elastoplastic materials, highlighting its versatility as a unified simulator and inverse analysis tool.

2025-10-24T12:47:56Z 11 pages Pacific Graphics 2025 Conference Papers Xueguang Xie Shu Yan Shiwen Jia Siyu Yang Aimin Hao Yang Gao Peng Yu 10.2312/pg.20251266 http://arxiv.org/abs/2412.11224v4 GenLit: Reformulating Single-Image Relighting as Video Generation 2025-10-23T17:11:15Z

Manipulating the illumination of a 3D scene within a single image represents a fundamental challenge in computer vision and graphics. This problem has traditionally been addressed using inverse rendering techniques, which involve explicit 3D asset reconstruction and costly ray-tracing simulations. Meanwhile, recent advancements in visual foundation models suggest that a new paradigm could soon be possible -- one that replaces explicit physical models with networks that are trained on large amounts of image and video data. In this paper, we exploit the implicit scene understanding of a video diffusion model, particularly Stable Video Diffusion, to relight a single image. We introduce GenLit, a framework that distills the ability of a graphics engine to perform light manipulation into a video-generation model, enabling users to directly insert and manipulate a point light in the 3D world within a given image and generate results directly as a video sequence. We find that a model fine-tuned on only a small synthetic dataset generalizes to real-world scenes, enabling single-image relighting with plausible and convincing shadows and inter-reflections. Our results highlight the ability of video foundation models to capture rich information about lighting, material, and shape, and our findings indicate that such models, with minimal training, can be used to perform relighting without explicit asset reconstruction or ray-tracing. . Project page: https://genlit.is.tue.mpg.de/.

2024-12-15T15:40:40Z Shrisha Bharadwaj Haiwen Feng Giorgio Becherini Victoria Fernandez Abrevaya Michael J. Black http://arxiv.org/abs/2510.20738v1 Optimizing Feature Ordering in Radar Charts for Multi-Profile Comparison 2025-10-23T16:56:32Z

Radar charts are widely used to visualize multivariate data and compare multiple profiles across features. However, the visual clarity of radar charts can be severely compromised when feature values alternate drastically in magnitude around the circle, causing areas to collapse, which misrepresents relative differences. In the present work we introduce a permutation optimization strategy that reorders features to minimize polygon ``spikiness'' across multiple profiles simultaneously. The method is combinatorial (exhaustive search) for moderate numbers of features and uses a lexicographic minimax criterion that first considers overall smoothness (mean jump) and then the largest single jump as a tie-breaker. This preserves more global information and produces visually balanced arrangements. We discuss complexity, practical bounds, and relations to existing approaches that either change the visualization (e.g., OrigamiPlot) or learn orderings (e.g., Versatile Ordering Network). An example with two profiles and $p=6$ features (before/after ordering) illustrates the qualitative improvement. Keywords: data visualization, radar charts, combinatorial optimization, minimax optimization, feature ordering

2025-10-23T16:56:32Z Albert Dorador http://arxiv.org/abs/2510.24764v1 Comparative Analysis of Procedural Planet Generators 2025-10-23T15:59:57Z

This paper presents the development of two distinct real-time procedural planet generators within the Godot engine: one employing Fractal Brownian Motion (FBM) with Perlin Noise, and another adapting Minecraft-inspired layered noise techniques. We detail their implementation, including a quadtree-based Level of Detail (LOD) system and solutions for planetary mesh generation. A comparative user study (N=15) was conducted where participants explored unique instances generated by our two algorithms alongside two existing procedural planet projects.

2025-10-23T15:59:57Z Published in the Proceedings of GAME-ON 2025 GAMEON 2025, 2025, pp. 15-19 Manuel Zechmann Helmut Hlavacs http://arxiv.org/abs/2510.21864v1 LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation 2025-10-23T10:09:24Z

Speech-driven 3D facial animation has attracted increasing interest since its potential to generate expressive and temporally synchronized digital humans. While recent works have begun to explore emotion-aware animation, they still depend on explicit one-hot encodings to represent identity and emotion with given emotion and identity labels, which limits their ability to generalize to unseen speakers. Moreover, the emotional cues inherently present in speech are often neglected, limiting the naturalness and adaptability of generated animations. In this work, we propose LSF-Animation, a novel framework that eliminates the reliance on explicit emotion and identity feature representations. Specifically, LSF-Animation implicitly extracts emotion information from speech and captures the identity features from a neutral facial mesh, enabling improved generalization to unseen speakers and emotional states without requiring manual labels. Furthermore, we introduce a Hierarchical Interaction Fusion Block (HIFB), which employs a fusion token to integrate dual transformer features and more effectively integrate emotional, motion-related and identity-related cues. Extensive experiments conducted on the 3DMEAD dataset demonstrate that our method surpasses recent state-of-the-art approaches in terms of emotional expressiveness, identity generalization, and animation realism. The source code will be released at: https://github.com/Dogter521/LSF-Animation.

2025-10-23T10:09:24Z Xin Lu Chuanqing Zhuang Chenxi Jin Zhengda Lu Yiqun Wang Wu Liu Jun Xiao http://arxiv.org/abs/2510.20050v1 Interactive Hypergraph Visual Analytics for Exploring Large and Complex Image Collections 2025-10-22T21:59:04Z

Analyzing large complex image collections in domains like forensics, accident investigation, or social media analysis involves interpreting intricate, overlapping relationships among images. Traditional clustering and classification methods fail to adequately represent these complex relationships, particularly when labeled data or suitable pre-trained models are unavailable. Hypergraphs effectively capture overlapping relationships, but to translate their complexity into information and insights for domain expert users visualization is essential. We propose an interactive visual analytics approach specifically designed for the construction, exploration, and analysis of hypergraphs on large-scale complex image collections. Our core contributions include: (1) a scalable pipeline for constructing hypergraphs directly from raw image data, including a similarity measure to evaluate constructed hypergraphs against a ground truth, (2) interactive visualization techniques that integrate spatial hypergraph representations, interactive grids, and matrix visualizations, enabling users to dynamically explore and interpret relationships without becoming overwhelmed and disoriented, and (3) practical insights on how domain experts can effectively use the application, based on evaluation with real-life image collections. Our results demonstrate that our visual analytics approach facilitates iterative exploration, enabling domain experts to efficiently derive insights from image collections containing tens of thousands of images.

2025-10-22T21:59:04Z Floris Gisolf Zeno J. M. H. Geradts Marcel Worring http://arxiv.org/abs/2510.20027v1 Extreme Views: 3DGS Filter for Novel View Synthesis from Out-of-Distribution Camera Poses 2025-10-22T21:09:16Z

When viewing a 3D Gaussian Splatting (3DGS) model from camera positions significantly outside the training data distribution, substantial visual noise commonly occurs. These artifacts result from the lack of training data in these extrapolated regions, leading to uncertain density, color, and geometry predictions from the model. To address this issue, we propose a novel real-time render-aware filtering method. Our approach leverages sensitivity scores derived from intermediate gradients, explicitly targeting instabilities caused by anisotropic orientations rather than isotropic variance. This filtering method directly addresses the core issue of generative uncertainty, allowing 3D reconstruction systems to maintain high visual fidelity even when users freely navigate outside the original training viewpoints. Experimental evaluation demonstrates that our method substantially improves visual quality, realism, and consistency compared to existing Neural Radiance Field (NeRF)-based approaches such as BayesRays. Critically, our filter seamlessly integrates into existing 3DGS rendering pipelines in real-time, unlike methods that require extensive post-hoc retraining or fine-tuning. Code and results at https://damian-bowness.github.io/EV3DGS

2025-10-22T21:09:16Z Damian Bowness Charalambos Poullis http://arxiv.org/abs/2503.14475v2 Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation 2025-10-22T16:31:43Z

The field of Novel View Synthesis has been revolutionized by 3D Gaussian Splatting (3DGS), which enables high-quality scene reconstruction that can be rendered in real-time. 3DGS-based techniques typically suffer from high GPU memory and disk storage requirements which limits their practical application on consumer-grade devices. We propose Opti3DGS, a novel frequency-modulated coarse-to-fine optimization framework that aims to minimize the number of Gaussian primitives used to represent a scene, thus reducing memory and storage demands. Opti3DGS leverages image frequency modulation, initially enforcing a coarse scene representation and progressively refining it by modulating frequency details in the training images. On the baseline 3DGS, we demonstrate an average reduction of 62% in Gaussians, a 40% reduction in the training GPU memory requirements and a 20% reduction in optimization time without sacrificing the visual quality. Furthermore, we show that our method integrates seamlessly with many 3DGS-based techniques, consistently reducing the number of Gaussian primitives while maintaining, and often improving, visual quality. Additionally, Opti3DGS inherently produces a level-of-detail scene representation at no extra cost, a natural byproduct of the optimization pipeline. Results and code will be made publicly available.

2025-03-18T17:49:01Z Umar Farooq Jean-Yves Guillemaut Adrian Hilton Marco Volino 10.1145/3756863.3769707 http://arxiv.org/abs/2510.21840v1 Improving the Physics of Video Generation with VJEPA-2 Reward Signal 2025-10-22T13:40:38Z

This is a short technical report describing the winning entry of the PhysicsIQ Challenge, presented at the Perception Test Workshop at ICCV 2025. State-of-the-art video generative models exhibit severely limited physical understanding, and often produce implausible videos. The Physics IQ benchmark has shown that visual realism does not imply physics understanding. Yet, intuitive physics understanding has shown to emerge from SSL pretraining on natural videos. In this report, we investigate whether we can leverage SSL-based video world models to improve the physics plausibility of video generative models. In particular, we build ontop of the state-of-the-art video generative model MAGI-1 and couple it with the recently introduced Video Joint Embedding Predictive Architecture 2 (VJEPA-2) to guide the generation process. We show that by leveraging VJEPA-2 as reward signal, we can improve the physics plausibility of state-of-the-art video generative models by ~6%.

2025-10-22T13:40:38Z 2 pages Winning entry of the ICCV 2025 Physics IQ Challenge Jianhao Yuan Xiaofeng Zhang Felix Friedrich Nicolas Beltran-Velez Melissa Hall Reyhane Askari-Hemmat Xiaochuang Han Nicolas Ballas Michal Drozdzal Adriana Romero-Soriano http://arxiv.org/abs/2405.14882v2 LookUp3D: Data-Driven 3D Scanning 2025-10-22T13:34:05Z

High speed, high-resolution, and accurate 3D scanning would open doors to many new applications in graphics, robotics, science, and medicine by enabling the accurate scanning of deformable objects during interactions. Past attempts to use structured light, time-of-flight, and stereo in high-speed settings have usually required tradeoffs in resolution or inaccuracy. In this paper, we introduce a method that enables, for the first time, 3D scanning at 450 frames per second at 1~Megapixel, or 1,450 frames per second at 0.4~Megapixel in an environment with controlled lighting. The key idea is to use a per-pixel lookup table that maps colors to depths, which is built using a linear stage. Imperfections, such as lens-distortion and sensor defects are baked into the calibration. We describe our method and test it on a novel hardware prototype. We compare the system with both ground-truth geometry as well as commercially available dynamic sensors like the Microsoft Kinect and Intel Realsense. Our results show the system acquiring geometry of objects undergoing high-speed deformations and oscillations and demonstrate the ability to recover physical properties from the reconstructions.

2024-04-05T07:08:20Z Giancarlo Pereira, Yidan Gao, and Yurii Piadyk are joint first authors with equal contribution. 11 pages of main paper, 9 pages of supplemental text (all combined into a single document) Giancarlo Pereira Yidan Gao Yurii Piadyk David Fouhey Claudio T Silva Daniele Panozzo http://arxiv.org/abs/2510.19347v1 A New Type of Adversarial Examples 2025-10-22T08:14:11Z

Most machine learning models are vulnerable to adversarial examples, which poses security concerns on these models. Adversarial examples are crafted by applying subtle but intentionally worst-case modifications to examples from the dataset, leading the model to output a different answer from the original example. In this paper, adversarial examples are formed in an exactly opposite manner, which are significantly different from the original examples but result in the same answer. We propose a novel set of algorithms to produce such adversarial examples, including the negative iterative fast gradient sign method (NI-FGSM) and the negative iterative fast gradient method (NI-FGM), along with their momentum variants: the negative momentum iterative fast gradient sign method (NMI-FGSM) and the negative momentum iterative fast gradient method (NMI-FGM). Adversarial examples constructed by these methods could be used to perform an attack on machine learning systems in certain occasions. Moreover, our results show that the adversarial examples are not merely distributed in the neighbourhood of the examples from the dataset; instead, they are distributed extensively in the sample space.

2025-10-22T08:14:11Z Xingyang Nie Guojie Xiao Su Pan Biao Wang Huilin Ge Tao Fang http://arxiv.org/abs/2510.19009v1 Visually Comparing Graph Vertex Ordering Algorithms through Geometrical and Topological Approaches 2025-10-21T18:48:27Z

Graph vertex ordering is widely employed in spatial data analysis, especially in urban analytics, where street graphs serve as spatial discretization for modeling and simulation. It is also crucial for visualization, as many methods require vertices to be arranged in a well-defined order to reveal non-trivial patterns. The goal of vertex ordering methods is to preserve neighborhood relations, but the structural complexity of real-world graphs often introduces distortions. Comparing different ordering methods is therefore essential to identify the most suitable one for each application. Existing metrics for assessing spatial vertex ordering typically focus on global quality, which hinders the identification of localized distortions. Visual evaluation is particularly valuable, as it allows analysts to compare methods within a single visualization, assess distortions, identify anomalous regions, and, in urban contexts, explain spatial inconsistencies. This work presents a visualization-assisted tool for assessing vertex ordering techniques, with a focus on urban analytics. We evaluate geometric and topological ordering approaches using urban street graphs. The visual tool integrates existing and newly proposed metrics, validated through experiments on data from multiple cities. Results demonstrate that the proposed methodology effectively supports users in selecting suitable vertex ordering techniques, tuning hyperparameters, and identifying regions with high ordering distortions.

2025-10-21T18:48:27Z Karelia Salinas Victor Barella Thales Viera Luis Gustavo Nonato