https://arxiv.org/api/QL+YB3U17WSgZCC93FATd9SkYSs 2026-06-25T08:46:07Z 9383 1185 15 http://arxiv.org/abs/2512.09164v1 WonderZoom: Multi-Scale 3D World Generation 2025-12-09T22:21:07Z

We present WonderZoom, a novel approach to generating 3D scenes with contents across multiple spatial scales from a single image. Existing 3D world generation models remain limited to single-scale synthesis and cannot produce coherent scene contents at varying granularities. The fundamental challenge is the lack of a scale-aware 3D representation capable of generating and rendering content with largely different spatial sizes. WonderZoom addresses this through two key innovations: (1) scale-adaptive Gaussian surfels for generating and real-time rendering of multi-scale 3D scenes, and (2) a progressive detail synthesizer that iteratively generates finer-scale 3D contents. Our approach enables users to "zoom into" a 3D region and auto-regressively synthesize previously non-existent fine details from landscapes to microscopic features. Experiments demonstrate that WonderZoom significantly outperforms state-of-the-art video and 3D models in both quality and alignment, enabling multi-scale 3D world creation from a single image. We show video results and an interactive viewer of generated multi-scale 3D worlds in https://wonderzoom.github.io/

2025-12-09T22:21:07Z Project website: https://wonderzoom.github.io/ The first two authors contributed equally Jin Cao Hong-Xing Yu Jiajun Wu http://arxiv.org/abs/2507.23485v3 Rational complex Bezier curves 2025-12-09T16:08:30Z

In this paper we develop the formalism of rational complex Bezier curves. This framework is a simple extension of the CAD paradigm, since it describes arc of curves in terms of control polygons and weights, which are extended to complex values. One of the major advantages of this extension is that we may make use of two different groups of projective transformations. Besides the group of projective transformations of the real plane, we have the group of complex projective transformations. This allows us to apply useful transformations like the geometric inversion to curves in design. In addition to this, the use of the complex formulation allows to lower the degree of the curves in some cases. This can be checked using the resultant of two polynomials and provides a simple formula for determining whether a rational cubic curve is a conic or not. Examples of application of the formalism to classical curves are included.

2025-07-31T12:08:50Z 11 pages, 6 figures. Accepted for publication in Journal of Computational and Applied Mathematics A. Canton L. Fernandez-Jambrina M. J. Vazquez-Gallo http://arxiv.org/abs/2512.08500v1 Learning to Control Physically-simulated 3D Characters via Generating and Mimicking 2D Motions 2025-12-09T11:30:56Z

Video data is more cost-effective than motion capture data for learning 3D character motion controllers, yet synthesizing realistic and diverse behaviors directly from videos remains challenging. Previous approaches typically rely on off-the-shelf motion reconstruction techniques to obtain 3D trajectories for physics-based imitation. These reconstruction methods struggle with generalizability, as they either require 3D training data (potentially scarce) or fail to produce physically plausible poses, hindering their application to challenging scenarios like human-object interaction (HOI) or non-human characters. We tackle this challenge by introducing Mimic2DM, a novel motion imitation framework that learns the control policy directly and solely from widely available 2D keypoint trajectories extracted from videos. By minimizing the reprojection error, we train a general single-view 2D motion tracking policy capable of following arbitrary 2D reference motions in physics simulation, using only 2D motion data. The policy, when trained on diverse 2D motions captured from different or slightly different viewpoints, can further acquire 3D motion tracking capabilities by aggregating multiple views. Moreover, we develop a transformer-based autoregressive 2D motion generator and integrate it into a hierarchical control framework, where the generator produces high-quality 2D reference trajectories to guide the tracking policy. We show that the proposed approach is versatile and can effectively learn to synthesize physically plausible and diverse motions across a range of domains, including dancing, soccer dribbling, and animal movements, without any reliance on explicit 3D motion data. Project Website: https://jiann-li.github.io/mimic2dm/

2025-12-09T11:30:56Z Jianan Li Xiao Chen Tao Huang Tien-Tsin Wong http://arxiv.org/abs/2512.08478v1 Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform 2025-12-09T10:54:58Z

Neural rendering, particularly 3D Gaussian Splatting (3DGS), has evolved rapidly and become a key component for building world models. However, existing viewer solutions remain fragmented, heavy, or constrained by legacy pipelines, resulting in high deployment friction and limited support for dynamic content and generative models. In this work, we present Visionary, an open, web-native platform for real-time various Gaussian Splatting and meshes rendering. Built on an efficient WebGPU renderer with per-frame ONNX inference, Visionary enables dynamic neural processing while maintaining a lightweight, "click-to-run" browser experience. It introduces a standardized Gaussian Generator contract, which not only supports standard 3DGS rendering but also allows plug-and-play algorithms to generate or update Gaussians each frame. Such inference also enables us to apply feedforward generative post-processing. The platform further offers a plug in three.js library with a concise TypeScript API for seamless integration into existing web applications. Experiments show that, under identical 3DGS assets, Visionary achieves superior rendering efficiency compared to current Web viewers due to GPU-based primitive sorting. It already supports multiple variants, including MLP-based 3DGS, 4DGS, neural avatars, and style transformation or enhancement networks. By unifying inference and rendering directly in the browser, Visionary significantly lowers the barrier to reproduction, comparison, and deployment of 3DGS-family methods, serving as a unified World Model Carrier for both reconstructive and generative paradigms.

2025-12-09T10:54:58Z Project page: https://visionary-laboratory.github.io/visionary Yuning Gong Yifei Liu Yifan Zhan Muyao Niu Xueying Li Yuanjun Liao Jiaming Chen Yuanyuan Gao Jiaqi Chen Minming Chen Li Zhou Yuning Zhang Wei Wang Xiaoqing Hou Huaxi Huang Shixiang Tang Le Ma Dingwen Zhang Xue Yang Junchi Yan Yanchi Zhang Yinqiang Zheng Xiao Sun Zhihang Zhong http://arxiv.org/abs/2508.21058v3 Mixture of Contexts for Long Video Generation 2025-12-09T10:22:57Z

Long video generation is fundamentally a long context memory problem: models must retain and retrieve salient events across a long range without collapsing or drifting. However, scaling diffusion transformers to generate long-context videos is fundamentally limited by the quadratic cost of self-attention, which makes memory and computation intractable and difficult to optimize for long sequences. We recast long-context video generation as an internal information retrieval task and propose a simple, learnable sparse attention routing module, Mixture of Contexts (MoC), as an effective long-term memory retrieval engine. In MoC, each query dynamically selects a few informative chunks plus mandatory anchors (caption, local windows) to attend to, with causal routing that prevents loop closures. As we scale the data and gradually sparsify the routing, the model allocates compute to salient history, preserving identities, actions, and scenes over minutes of content. Efficiency follows as a byproduct of retrieval (near-linear scaling), which enables practical training and synthesis, and the emergence of memory and consistency at the scale of minutes.

2025-08-28T17:57:55Z Project page: https://primecai.github.io/moc/ Shengqu Cai Ceyuan Yang Lvmin Zhang Yuwei Guo Junfei Xiao Ziyan Yang Yinghao Xu Zhenheng Yang Alan Yuille Leonidas Guibas Maneesh Agrawala Lu Jiang Gordon Wetzstein http://arxiv.org/abs/2508.00937v3 A General Approach to Visualizing Uncertainty in Statistical Graphics 2025-12-09T09:07:57Z

We present a general approach to visualizing uncertainty in static 2-D statistical graphics. If we treat a visualization as a function of its underlying quantities, uncertainty in those quantities induces a distribution over images. We show how to aggregate these images into a single visualization that represents the uncertainty. The approach can be viewed as a generalization of sample-based approaches that use overlay. Notably, standard representations, such as confidence intervals and bands, emerge with their usual coverage guarantees without being explicitly quantified or visualized. As a proof of concept, we implement our approach in the IID setting using resampling, provided as an open-source Python library. Because the approach operates directly on images, the user needs only to supply the data and the code for visualizing the quantities of interest without uncertainty. Through several examples, we show how both familiar and novel forms of uncertainty visualization can be created. The implementation is not only a practical validation of the underlying theory but also an immediately usable tool that can complement existing uncertainty-visualization libraries.

2025-07-31T09:19:05Z Bernarda Petek David Nabergoj Erik Štrumbelj http://arxiv.org/abs/2410.20304v2 Deep Learning, Machine Learning -- Digital Signal and Image Processing: From Theory to Application 2025-12-09T03:48:02Z

Digital Signal Processing (DSP) and Digital Image Processing (DIP) with Machine Learning (ML) and Deep Learning (DL) are popular research areas in Computer Vision and related fields. We highlight transformative applications in image enhancement, filtering techniques, and pattern recognition. By integrating frameworks like the Discrete Fourier Transform (DFT), Z-Transform, and Fourier Transform methods, we enable robust data manipulation and feature extraction essential for AI-driven tasks. Using Python, we implement algorithms that optimize real-time data processing, forming a foundation for scalable, high-performance solutions in computer vision. This work illustrates the potential of ML and DL to advance DSP and DIP methodologies, contributing to artificial intelligence, automated feature extraction, and applications across diverse domains.

2024-10-27T02:00:33Z 293 pages Weiche Hsieh Ziqian Bi Junyu Liu Benji Peng Sen Zhang Xuanhe Pan Jiawei Xu Jinlang Wang Keyu Chen Caitlyn Heqi Yin Pohsun Feng Yizhu Wen Tianyang Wang Ming Li Jintao Ren Xinyuan Song Qian Niu Silin Chen Ming Liu http://arxiv.org/abs/2501.13104v2 Neural Radiance Fields for the Real World: A Survey 2025-12-09T00:47:29Z

Neural Radiance Fields (NeRFs) have remodeled 3D scene representation since release. NeRFs can effectively reconstruct complex 3D scenes from 2D images, advancing different fields and applications such as scene understanding, 3D content generation, and robotics. Despite significant research progress, a thorough review of recent innovations, applications, and challenges is lacking. This survey compiles key theoretical advancements and alternative representations and investigates emerging challenges. It further explores applications on reconstruction, highlights NeRFs' impact on computer vision and robotics, and reviews essential datasets and toolkits. By identifying gaps in the literature, this survey discusses open challenges and offers directions for future research.

2025-01-22T18:59:10Z Revised version Wenhui Xiao Remi Chierchia Rodrigo Santa Cruz Xuesong Li David Ahmedt-Aristizabal Olivier Salvado Clinton Fookes Leo Lebrat http://arxiv.org/abs/2503.16424v4 Bezier Splatting for Fast and Differentiable Vector Graphics Rendering 2025-12-09T00:42:25Z

Differentiable vector graphics (VGs) are widely used in image vectorization and vector synthesis, while existing representations are costly to optimize and struggle to achieve high-quality rendering results for high-resolution images. This work introduces a new differentiable VG representation, dubbed Bézier Splatting, that enables fast yet high-fidelity VG rasterization. Bézier Splatting samples 2D Gaussians along Bézier curves, which naturally provide positional gradients at object boundaries. Thanks to the efficient splatting-based differentiable rasterizer, Bézier Splatting achieves 30x and 150x faster per forward and backward rasterization step for open curves compared to DiffVG. Additionally, we introduce an adaptive pruning and densification strategy that dynamically adjusts the spatial distribution of curves to escape local minima, further improving VG quality. Furthermore, our new VG representation supports conversion to standard XML-based SVG format, enhancing interoperability with existing VG tools and pipelines. Experimental results show that Bézier Splatting significantly outperforms existing methods with better visual fidelity and significant optimization speedup.

2025-03-20T17:59:50Z NeurIPS 2025. Project page: https://xiliu8006.github.io/Bezier_splatting_project/ Xi Liu Chaoyi Zhou Nanxuan Zhao Siyu Huang http://arxiv.org/abs/2512.07807v1 Lang3D-XL: Language Embedded 3D Gaussians for Large-scale Scenes 2025-12-08T18:39:58Z

Embedding a language field in a 3D representation enables richer semantic understanding of spatial environments by linking geometry with descriptive meaning. This allows for a more intuitive human-computer interaction, enabling querying or editing scenes using natural language, and could potentially improve tasks like scene retrieval, navigation, and multimodal reasoning. While such capabilities could be transformative, in particular for large-scale scenes, we find that recent feature distillation approaches cannot effectively learn over massive Internet data due to challenges in semantic feature misalignment and inefficiency in memory and runtime. To this end, we propose a novel approach to address these challenges. First, we introduce extremely low-dimensional semantic bottleneck features as part of the underlying 3D Gaussian representation. These are processed by rendering and passing them through a multi-resolution, feature-based, hash encoder. This significantly improves efficiency both in runtime and GPU memory. Second, we introduce an Attenuated Downsampler module and propose several regularizations addressing the semantic misalignment of ground truth 2D features. We evaluate our method on the in-the-wild HolyScenes dataset and demonstrate that it surpasses existing approaches in both performance and efficiency.

2025-12-08T18:39:58Z Accepted to SIGGRAPH Asia 2025. Project webpage: https://tau-vailab.github.io/Lang3D-XL Shai Krakovsky Gal Fiebelman Sagie Benaim Hadar Averbuch-Elor http://arxiv.org/abs/2512.07459v1 Human Geometry Distribution for 3D Animation Generation 2025-12-08T11:35:16Z

Generating realistic human geometry animations remains a challenging task, as it requires modeling natural clothing dynamics with fine-grained geometric details under limited data. To address these challenges, we propose two novel designs. First, we propose a compact distribution-based latent representation that enables efficient and high-quality geometry generation. We improve upon previous work by establishing a more uniform mapping between SMPL and avatar geometries. Second, we introduce a generative animation model that fully exploits the diversity of limited motion data. We focus on short-term transitions while maintaining long-term consistency through an identity-conditioned design. These two designs formulate our method as a two-stage framework: the first stage learns a latent space, while the second learns to generate animations within this latent space. We conducted experiments on both our latent space and animation model. We demonstrate that our latent space produces high-fidelity human geometry surpassing previous methods ($90\%$ lower Chamfer Dist.). The animation model synthesizes diverse animations with detailed and natural dynamics ($2.2 \times$ higher user study score), achieving the best results across all evaluation metrics.

2025-12-08T11:35:16Z Xiangjun Tang Biao Zhang Peter Wonka http://arxiv.org/abs/2512.07359v1 Multi-Rigid-Body Approximation of Human Hands with Application to Digital Twin 2025-12-08T09:59:41Z

Human hand simulation plays a critical role in digital twin applications, requiring models that balance anatomical fidelity with computational efficiency. We present a complete pipeline for constructing multi-rigid-body approximations of human hands that preserve realistic appearance while enabling real-time physics simulation. Starting from optical motion capture of a specific human hand, we construct a personalized MANO (Multi-Abstracted hand model with Neural Operations) model and convert it to a URDF (Unified Robot Description Format) representation with anatomically consistent joint axes. The key technical challenge is projecting MANO's unconstrained SO(3) joint rotations onto the kinematically constrained joints of the rigid-body model. We derive closed-form solutions for single degree-of-freedom joints and introduce a Baker-Campbell-Hausdorff (BCH)-corrected iterative method for two degree-of-freedom joints that properly handles the non-commutativity of rotations. We validate our approach through digital twin experiments where reinforcement learning policies control the multi-rigid-body hand to replay captured human demonstrations. Quantitative evaluation shows sub-centimeter reconstruction error and successful grasp execution across diverse manipulation tasks.

2025-12-08T09:59:41Z 10 pages, 4 figures. Accepted at ICBSR'25 (International Conference on Biomechanical Systems and Robotics) Bin Zhao Yiwen Lu Haohua Zhu Xiao Li Sheng Yi http://arxiv.org/abs/2512.06830v1 Inverse Discrete Elastic Rod 2025-12-07T13:00:10Z

Inverse design of slender elastic structures underlies a wide range of applications in computer graphics, flexible electronics, biomedical devices, and soft robotics. Traditional optimization-based approaches, however, are often orders of magnitude slower than forward dynamic simulations and typically impose restrictive boundary conditions. In this work, we present an inverse discrete elastic rods (inverse-DER) method that enables efficient and accurate inverse design under general loading and boundary conditions. By reformulating the inverse problem as a static equilibrium in the reference configuration, our method attains computational efficiency comparable to forward simulations while preserving high fidelity. This framework allows rapid determination of undeformed geometries for elastic fabrication structures that naturally deform into desired target shapes upon actuation or loading. We validate the approach through both physical prototypes and forward simulations, demonstrating its accuracy, robustness, and potential for real-world design applications.

2025-12-07T13:00:10Z 13 pages, 11 figures Jiahao Li Mingchao Liu Haiyi Liang HengAn Wu Weicheng Huang http://arxiv.org/abs/2512.01306v2 Gaussian Swaying: Surface-Based Framework for Aerodynamic Simulation with 3D Gaussians 2025-12-07T07:38:46Z

Branches swaying in the breeze, flags rippling in the wind, and boats rocking on the water all show how aerodynamics shape natural motion -- an effect crucial for realism in vision and graphics. In this paper, we present Gaussian Swaying, a surface-based framework for aerodynamic simulation using 3D Gaussians. Unlike mesh-based methods that require costly meshing, or particle-based approaches that rely on discrete positional data, Gaussian Swaying models surfaces continuously with 3D Gaussians, enabling efficient and fine-grained aerodynamic interaction. Our framework unifies simulation and rendering on the same representation: Gaussian patches, which support force computation for dynamics while simultaneously providing normals for lightweight shading. Comprehensive experiments on both synthetic and real-world datasets across multiple metrics demonstrate that Gaussian Swaying achieves state-of-the-art performance and efficiency, offering a scalable approach for realistic aerodynamic scene simulation.

2025-12-01T06:03:47Z Accepted to WACV 2026 Hongru Yan Xiang Zhang Zeyuan Chen Fangyin Wei Zhuowen Tu http://arxiv.org/abs/2510.23880v2 TRELLISWorld: Training-Free World Generation from Object Generators 2025-12-07T03:16:29Z

Text-driven 3D scene generation holds promise for a wide range of applications, from virtual prototyping to AR/VR and simulation. However, existing methods are often constrained to single-object generation, require domain-specific training, or lack support for full 360-degree viewability. In this work, we present a training-free approach to 3D scene synthesis by repurposing general-purpose text-to-3D object diffusion models as modular tile generators. We reformulate scene generation as a multi-tile denoising problem, where overlapping 3D regions are independently generated and seamlessly blended via weighted averaging. This enables scalable synthesis of large, coherent scenes while preserving local semantic control. Our method eliminates the need for scene-level datasets or retraining, relies on minimal heuristics, and inherits the generalization capabilities of object-level priors. We demonstrate that our approach supports diverse scene layouts, efficient generation, and flexible editing, establishing a simple yet powerful foundation for general-purpose, language-driven 3D scene construction.

2025-10-27T21:40:31Z Hanke Chen Yuan Liu Minchen Li