https://arxiv.org/api/EAMdmv0pKVfJlV+znyNoacDP+2c2026-06-25T23:27:39Z9383138015http://arxiv.org/abs/2510.21682v1WorldGrow: Generating Infinite 3D World2025-10-24T17:39:52ZWe tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance. Existing methods face key challenges: 2D-lifting approaches suffer from geometric and appearance inconsistencies across views, 3D implicit representations are hard to scale up, and current 3D foundation models are mostly object-centric, limiting their applicability to scene-level generation. Our key insight is leveraging strong generation priors from pre-trained 3D models for structured scene block generation. To this end, we propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis. Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity. Evaluated on the large-scale 3D-FRONT dataset, WorldGrow achieves SOTA performance in geometry reconstruction, while uniquely supporting infinite scene generation with photorealistic and structurally consistent outputs. These results highlight its capability for constructing large-scale virtual environments and potential for building future world models.2025-10-24T17:39:52ZProject page: https://world-grow.github.io/ Code: https://github.com/world-grow/WorldGrowSikuang LiChen YangJiemin FangTaoran YiJia LuJiazhong CenLingxi XieWei ShenQi Tianhttp://arxiv.org/abs/2510.21654v1Group Inertial Poser: Multi-Person Pose and Global Translation from Sparse Inertial Sensors and Ultra-Wideband Ranging2025-10-24T17:11:50ZTracking human full-body motion using sparse wearable inertial measurement units (IMUs) overcomes the limitations of occlusion and instrumentation of the environment inherent in vision-based approaches. However, purely IMU-based tracking compromises translation estimates and accurate relative positioning between individuals, as inertial cues are inherently self-referential and provide no direct spatial reference for others. In this paper, we present a novel approach for robustly estimating body poses and global translation for multiple individuals by leveraging the distances between sparse wearable sensors - both on each individual and across multiple individuals. Our method Group Inertial Poser estimates these absolute distances between pairs of sensors from ultra-wideband ranging (UWB) and fuses them with inertial observations as input into structured state-space models to integrate temporal motion patterns for precise 3D pose estimation. Our novel two-step optimization further leverages the estimated distances for accurately tracking people's global trajectories through the world. We also introduce GIP-DB, the first IMU+UWB dataset for two-person tracking, which comprises 200 minutes of motion recordings from 14 participants. In our evaluation, Group Inertial Poser outperforms previous state-of-the-art methods in accuracy and robustness across synthetic and real-world data, showing the promise of IMU+UWB-based multi-human motion capture in the wild. Code, models, dataset: https://github.com/eth-siplab/GroupInertialPoser2025-10-24T17:11:50ZAccepted by ICCV 2025, Code: https://github.com/eth-siplab/GroupInertialPoserYing XueJiaxi JiangRayan ArmaniDominik HollidtYi-Chi LiaoChristian Holzhttp://arxiv.org/abs/2510.21432v1ArtiLatent: Realistic Articulated 3D Object Generation via Structured Latents2025-10-24T13:08:15ZWe propose ArtiLatent, a generative framework that synthesizes human-made 3D objects with fine-grained geometry, accurate articulation, and realistic appearance. Our approach jointly models part geometry and articulation dynamics by embedding sparse voxel representations and associated articulation properties, including joint type, axis, origin, range, and part category, into a unified latent space via a variational autoencoder. A latent diffusion model is then trained over this space to enable diverse yet physically plausible sampling. To reconstruct photorealistic 3D shapes, we introduce an articulation-aware Gaussian decoder that accounts for articulation-dependent visibility changes (e.g., revealing the interior of a drawer when opened). By conditioning appearance decoding on articulation state, our method assigns plausible texture features to regions that are typically occluded in static poses, significantly improving visual realism across articulation configurations. Extensive experiments on furniture-like objects from PartNet-Mobility and ACD datasets demonstrate that ArtiLatent outperforms existing approaches in geometric consistency and appearance fidelity. Our framework provides a scalable solution for articulated 3D object synthesis and manipulation.2025-10-24T13:08:15Zaccepted to SIGGRAPH Asia; Project page: https://chenhonghua.github.io/MyProjects/ArtiLatent/Honghua ChenYushi LanYongwei ChenXingang Panhttp://arxiv.org/abs/2510.21404v1PC-NCLaws: Physics-Embedded Conditional Neural Constitutive Laws for Elastoplastic Materials2025-10-24T12:47:56ZWhile data-driven methods offer significant promise for modeling complex materials, they often face challenges in generalizing across diverse physical scenarios and maintaining physical consistency. To address these limitations, we propose a generalizable framework called Physics-Embedded Conditional Neural Constitutive Laws for Elastoplastic Materials, which combines the partial differential equations with neural networks. Specifically, the model employs two separate neural networks to model elastic and plastic constitutive laws. Simultaneously, the model incorporates physical parameters as conditional inputs and is trained on comprehensive datasets encompassing multiple scenarios with varying physical parameters, thereby enabling generalization across different properties without requiring retraining for each individual case. Furthermore, the differentiable architecture of our model, combined with its explicit parameter inputs, enables the inverse estimation of physical parameters from observed motion sequences. This capability extends our framework to objects with unknown or unmeasured properties. Experimental results demonstrate state-of-the-art performance in motion reconstruction, robust long-term prediction, geometry generalization, and precise parameters estimation for elastoplastic materials, highlighting its versatility as a unified simulator and inverse analysis tool.2025-10-24T12:47:56Z11 pagesPacific Graphics 2025 Conference PapersXueguang XieShu YanShiwen JiaSiyu YangAimin HaoYang GaoPeng Yu10.2312/pg.20251266http://arxiv.org/abs/2412.11224v4GenLit: Reformulating Single-Image Relighting as Video Generation2025-10-23T17:11:15ZManipulating the illumination of a 3D scene within a single image represents a fundamental challenge in computer vision and graphics. This problem has traditionally been addressed using inverse rendering techniques, which involve explicit 3D asset reconstruction and costly ray-tracing simulations. Meanwhile, recent advancements in visual foundation models suggest that a new paradigm could soon be possible -- one that replaces explicit physical models with networks that are trained on large amounts of image and video data. In this paper, we exploit the implicit scene understanding of a video diffusion model, particularly Stable Video Diffusion, to relight a single image. We introduce GenLit, a framework that distills the ability of a graphics engine to perform light manipulation into a video-generation model, enabling users to directly insert and manipulate a point light in the 3D world within a given image and generate results directly as a video sequence. We find that a model fine-tuned on only a small synthetic dataset generalizes to real-world scenes, enabling single-image relighting with plausible and convincing shadows and inter-reflections. Our results highlight the ability of video foundation models to capture rich information about lighting, material, and shape, and our findings indicate that such models, with minimal training, can be used to perform relighting without explicit asset reconstruction or ray-tracing. . Project page: https://genlit.is.tue.mpg.de/.2024-12-15T15:40:40ZShrisha BharadwajHaiwen FengGiorgio BecheriniVictoria Fernandez AbrevayaMichael J. Blackhttp://arxiv.org/abs/2510.20738v1Optimizing Feature Ordering in Radar Charts for Multi-Profile Comparison2025-10-23T16:56:32ZRadar charts are widely used to visualize multivariate data and compare multiple profiles across features. However, the visual clarity of radar charts can be severely compromised when feature values alternate drastically in magnitude around the circle, causing areas to collapse, which misrepresents relative differences. In the present work we introduce a permutation optimization strategy that reorders features to minimize polygon ``spikiness'' across multiple profiles simultaneously. The method is combinatorial (exhaustive search) for moderate numbers of features and uses a lexicographic minimax criterion that first considers overall smoothness (mean jump) and then the largest single jump as a tie-breaker. This preserves more global information and produces visually balanced arrangements. We discuss complexity, practical bounds, and relations to existing approaches that either change the visualization (e.g., OrigamiPlot) or learn orderings (e.g., Versatile Ordering Network). An example with two profiles and $p=6$ features (before/after ordering) illustrates the qualitative improvement.
Keywords: data visualization, radar charts, combinatorial optimization, minimax optimization, feature ordering2025-10-23T16:56:32ZAlbert Doradorhttp://arxiv.org/abs/2510.24764v1Comparative Analysis of Procedural Planet Generators2025-10-23T15:59:57ZThis paper presents the development of two distinct real-time procedural planet generators within the Godot engine: one employing Fractal Brownian Motion (FBM) with Perlin Noise, and another adapting Minecraft-inspired layered noise techniques. We detail their implementation, including a quadtree-based Level of Detail (LOD) system and solutions for planetary mesh generation. A comparative user study (N=15) was conducted where participants explored unique instances generated by our two algorithms alongside two existing procedural planet projects.2025-10-23T15:59:57ZPublished in the Proceedings of GAME-ON 2025GAMEON 2025, 2025, pp. 15-19Manuel ZechmannHelmut Hlavacshttp://arxiv.org/abs/2510.21864v1LSF-Animation: Label-Free Speech-Driven Facial Animation via Implicit Feature Representation2025-10-23T10:09:24ZSpeech-driven 3D facial animation has attracted increasing interest since its potential to generate expressive and temporally synchronized digital humans. While recent works have begun to explore emotion-aware animation, they still depend on explicit one-hot encodings to represent identity and emotion with given emotion and identity labels, which limits their ability to generalize to unseen speakers. Moreover, the emotional cues inherently present in speech are often neglected, limiting the naturalness and adaptability of generated animations. In this work, we propose LSF-Animation, a novel framework that eliminates the reliance on explicit emotion and identity feature representations. Specifically, LSF-Animation implicitly extracts emotion information from speech and captures the identity features from a neutral facial mesh, enabling improved generalization to unseen speakers and emotional states without requiring manual labels. Furthermore, we introduce a Hierarchical Interaction Fusion Block (HIFB), which employs a fusion token to integrate dual transformer features and more effectively integrate emotional, motion-related and identity-related cues. Extensive experiments conducted on the 3DMEAD dataset demonstrate that our method surpasses recent state-of-the-art approaches in terms of emotional expressiveness, identity generalization, and animation realism. The source code will be released at: https://github.com/Dogter521/LSF-Animation.2025-10-23T10:09:24ZXin LuChuanqing ZhuangChenxi JinZhengda LuYiqun WangWu LiuJun Xiaohttp://arxiv.org/abs/2510.20050v1Interactive Hypergraph Visual Analytics for Exploring Large and Complex Image Collections2025-10-22T21:59:04ZAnalyzing large complex image collections in domains like forensics, accident investigation, or social media analysis involves interpreting intricate, overlapping relationships among images. Traditional clustering and classification methods fail to adequately represent these complex relationships, particularly when labeled data or suitable pre-trained models are unavailable. Hypergraphs effectively capture overlapping relationships, but to translate their complexity into information and insights for domain expert users visualization is essential. We propose an interactive visual analytics approach specifically designed for the construction, exploration, and analysis of hypergraphs on large-scale complex image collections. Our core contributions include: (1) a scalable pipeline for constructing hypergraphs directly from raw image data, including a similarity measure to evaluate constructed hypergraphs against a ground truth, (2) interactive visualization techniques that integrate spatial hypergraph representations, interactive grids, and matrix visualizations, enabling users to dynamically explore and interpret relationships without becoming overwhelmed and disoriented, and (3) practical insights on how domain experts can effectively use the application, based on evaluation with real-life image collections. Our results demonstrate that our visual analytics approach facilitates iterative exploration, enabling domain experts to efficiently derive insights from image collections containing tens of thousands of images.2025-10-22T21:59:04ZFloris GisolfZeno J. M. H. GeradtsMarcel Worringhttp://arxiv.org/abs/2510.20027v1Extreme Views: 3DGS Filter for Novel View Synthesis from Out-of-Distribution Camera Poses2025-10-22T21:09:16ZWhen viewing a 3D Gaussian Splatting (3DGS) model from camera positions significantly outside the training data distribution, substantial visual noise commonly occurs. These artifacts result from the lack of training data in these extrapolated regions, leading to uncertain density, color, and geometry predictions from the model.
To address this issue, we propose a novel real-time render-aware filtering method. Our approach leverages sensitivity scores derived from intermediate gradients, explicitly targeting instabilities caused by anisotropic orientations rather than isotropic variance. This filtering method directly addresses the core issue of generative uncertainty, allowing 3D reconstruction systems to maintain high visual fidelity even when users freely navigate outside the original training viewpoints.
Experimental evaluation demonstrates that our method substantially improves visual quality, realism, and consistency compared to existing Neural Radiance Field (NeRF)-based approaches such as BayesRays. Critically, our filter seamlessly integrates into existing 3DGS rendering pipelines in real-time, unlike methods that require extensive post-hoc retraining or fine-tuning.
Code and results at https://damian-bowness.github.io/EV3DGS2025-10-22T21:09:16ZDamian BownessCharalambos Poullishttp://arxiv.org/abs/2503.14475v2Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation2025-10-22T16:31:43ZThe field of Novel View Synthesis has been revolutionized by 3D Gaussian Splatting (3DGS), which enables high-quality scene reconstruction that can be rendered in real-time. 3DGS-based techniques typically suffer from high GPU memory and disk storage requirements which limits their practical application on consumer-grade devices. We propose Opti3DGS, a novel frequency-modulated coarse-to-fine optimization framework that aims to minimize the number of Gaussian primitives used to represent a scene, thus reducing memory and storage demands. Opti3DGS leverages image frequency modulation, initially enforcing a coarse scene representation and progressively refining it by modulating frequency details in the training images. On the baseline 3DGS, we demonstrate an average reduction of 62% in Gaussians, a 40% reduction in the training GPU memory requirements and a 20% reduction in optimization time without sacrificing the visual quality. Furthermore, we show that our method integrates seamlessly with many 3DGS-based techniques, consistently reducing the number of Gaussian primitives while maintaining, and often improving, visual quality. Additionally, Opti3DGS inherently produces a level-of-detail scene representation at no extra cost, a natural byproduct of the optimization pipeline. Results and code will be made publicly available.2025-03-18T17:49:01ZUmar FarooqJean-Yves GuillemautAdrian HiltonMarco Volino10.1145/3756863.3769707http://arxiv.org/abs/2510.21840v1Improving the Physics of Video Generation with VJEPA-2 Reward Signal2025-10-22T13:40:38ZThis is a short technical report describing the winning entry of the PhysicsIQ Challenge, presented at the Perception Test Workshop at ICCV 2025. State-of-the-art video generative models exhibit severely limited physical understanding, and often produce implausible videos. The Physics IQ benchmark has shown that visual realism does not imply physics understanding. Yet, intuitive physics understanding has shown to emerge from SSL pretraining on natural videos. In this report, we investigate whether we can leverage SSL-based video world models to improve the physics plausibility of video generative models. In particular, we build ontop of the state-of-the-art video generative model MAGI-1 and couple it with the recently introduced Video Joint Embedding Predictive Architecture 2 (VJEPA-2) to guide the generation process. We show that by leveraging VJEPA-2 as reward signal, we can improve the physics plausibility of state-of-the-art video generative models by ~6%.2025-10-22T13:40:38Z2 pagesWinning entry of the ICCV 2025 Physics IQ ChallengeJianhao YuanXiaofeng ZhangFelix FriedrichNicolas Beltran-VelezMelissa HallReyhane Askari-HemmatXiaochuang HanNicolas BallasMichal DrozdzalAdriana Romero-Sorianohttp://arxiv.org/abs/2405.14882v2LookUp3D: Data-Driven 3D Scanning2025-10-22T13:34:05ZHigh speed, high-resolution, and accurate 3D scanning would open doors to many new applications in graphics, robotics, science, and medicine by enabling the accurate scanning of deformable objects during interactions. Past attempts to use structured light, time-of-flight, and stereo in high-speed settings have usually required tradeoffs in resolution or inaccuracy. In this paper, we introduce a method that enables, for the first time, 3D scanning at 450 frames per second at 1~Megapixel, or 1,450 frames per second at 0.4~Megapixel in an environment with controlled lighting. The key idea is to use a per-pixel lookup table that maps colors to depths, which is built using a linear stage. Imperfections, such as lens-distortion and sensor defects are baked into the calibration. We describe our method and test it on a novel hardware prototype. We compare the system with both ground-truth geometry as well as commercially available dynamic sensors like the Microsoft Kinect and Intel Realsense. Our results show the system acquiring geometry of objects undergoing high-speed deformations and oscillations and demonstrate the ability to recover physical properties from the reconstructions.2024-04-05T07:08:20ZGiancarlo Pereira, Yidan Gao, and Yurii Piadyk are joint first authors with equal contribution. 11 pages of main paper, 9 pages of supplemental text (all combined into a single document)Giancarlo PereiraYidan GaoYurii PiadykDavid FouheyClaudio T SilvaDaniele Panozzohttp://arxiv.org/abs/2510.19347v1A New Type of Adversarial Examples2025-10-22T08:14:11ZMost machine learning models are vulnerable to adversarial examples, which poses security concerns on these models. Adversarial examples are crafted by applying subtle but intentionally worst-case modifications to examples from the dataset, leading the model to output a different answer from the original example. In this paper, adversarial examples are formed in an exactly opposite manner, which are significantly different from the original examples but result in the same answer. We propose a novel set of algorithms to produce such adversarial examples, including the negative iterative fast gradient sign method (NI-FGSM) and the negative iterative fast gradient method (NI-FGM), along with their momentum variants: the negative momentum iterative fast gradient sign method (NMI-FGSM) and the negative momentum iterative fast gradient method (NMI-FGM). Adversarial examples constructed by these methods could be used to perform an attack on machine learning systems in certain occasions. Moreover, our results show that the adversarial examples are not merely distributed in the neighbourhood of the examples from the dataset; instead, they are distributed extensively in the sample space.2025-10-22T08:14:11ZXingyang NieGuojie XiaoSu PanBiao WangHuilin GeTao Fanghttp://arxiv.org/abs/2510.19009v1Visually Comparing Graph Vertex Ordering Algorithms through Geometrical and Topological Approaches2025-10-21T18:48:27ZGraph vertex ordering is widely employed in spatial data analysis, especially in urban analytics, where street graphs serve as spatial discretization for modeling and simulation. It is also crucial for visualization, as many methods require vertices to be arranged in a well-defined order to reveal non-trivial patterns. The goal of vertex ordering methods is to preserve neighborhood relations, but the structural complexity of real-world graphs often introduces distortions. Comparing different ordering methods is therefore essential to identify the most suitable one for each application. Existing metrics for assessing spatial vertex ordering typically focus on global quality, which hinders the identification of localized distortions. Visual evaluation is particularly valuable, as it allows analysts to compare methods within a single visualization, assess distortions, identify anomalous regions, and, in urban contexts, explain spatial inconsistencies. This work presents a visualization-assisted tool for assessing vertex ordering techniques, with a focus on urban analytics. We evaluate geometric and topological ordering approaches using urban street graphs. The visual tool integrates existing and newly proposed metrics, validated through experiments on data from multiple cities. Results demonstrate that the proposed methodology effectively supports users in selecting suitable vertex ordering techniques, tuning hyperparameters, and identifying regions with high ordering distortions.2025-10-21T18:48:27ZKarelia SalinasVictor BarellaThales VieraLuis Gustavo Nonato