https://arxiv.org/api/du4fjS6+LAqlsT+A++3qCTy6JjM 2026-06-25T12:05:03Z 9383 1230 15 http://arxiv.org/abs/2512.00413v1 SplatFont3D: Structure-Aware Text-to-3D Artistic Font Generation with Part-Level Style Control 2025-11-29T09:46:37Z

Artistic font generation (AFG) can assist human designers in creating innovative artistic fonts. However, most previous studies primarily focus on 2D artistic fonts in flat design, leaving personalized 3D-AFG largely underexplored. 3D-AFG not only enables applications in immersive 3D environments such as video games and animations, but also may enhance 2D-AFG by rendering 2D fonts of novel views. Moreover, unlike general 3D objects, 3D fonts exhibit precise semantics with strong structural constraints and also demand fine-grained part-level style control. To address these challenges, we propose SplatFont3D, a novel structure-aware text-to-3D AFG framework with 3D Gaussian splatting, which enables the creation of 3D artistic fonts from diverse style text prompts with precise part-level style control. Specifically, we first introduce a Glyph2Cloud module, which progressively enhances both the shapes and styles of 2D glyphs (or components) and produces their corresponding 3D point clouds for Gaussian initialization. The initialized 3D Gaussians are further optimized through interaction with a pretrained 2D diffusion model using score distillation sampling. To enable part-level control, we present a dynamic component assignment strategy that exploits the geometric priors of 3D Gaussians to partition components, while alleviating drift-induced entanglement during 3D Gaussian optimization. Our SplatFont3D provides more explicit and effective part-level style control than NeRF, attaining faster rendering efficiency. Experiments show that our SplatFont3D outperforms existing 3D models for 3D-AFG in style-text consistency, visual quality, and rendering efficiency.

2025-11-29T09:46:37Z Ji Gan Lingxu Chen Jiaxu Leng Xinbo Gao http://arxiv.org/abs/2503.16177v2 OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering 2025-11-29T05:31:00Z

In large-scale scene reconstruction using 3D Gaussian splatting, it is common to partition the scene into multiple smaller regions and reconstruct them individually. However, existing division methods are occlusion-agnostic, meaning that each region may contain areas with severe occlusions. As a result, the cameras within those regions are less correlated, leading to a low average contribution to the overall reconstruction. In this paper, we propose an occlusion-aware scene division strategy that clusters training cameras based on their positions and co-visibilities to acquire multiple regions. Cameras in such regions exhibit stronger correlations and a higher average contribution, facilitating high-quality scene reconstruction. We further propose a region-based rendering technique to accelerate large scene rendering, which culls Gaussians invisible to the region where the viewpoint is located. Such a technique significantly speeds up the rendering without compromising quality. Extensive experiments on multiple large scenes show that our method achieves superior reconstruction results with faster rendering speed compared to existing state-of-the-art approaches. Project page: https://occlugaussian.github.io.

2025-03-20T14:18:52Z Accepted to ICCV 2025. Project website: https://occlugaussian.github.io Shiyong Liu Xiao Tang Zhihao Li Yingfan He Chongjie Ye Jianzhuang Liu Binxiao Huang Shunbo Zhou Xiaofei Wu http://arxiv.org/abs/2505.18151v2 WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions 2025-11-29T04:26:22Z

WonderPlay is a novel framework integrating physics simulation with video generation for generating action-conditioned dynamic 3D scenes from a single image. While prior works are restricted to rigid body or simple elastic dynamics, WonderPlay features a hybrid generative simulator to synthesize a wide range of 3D dynamics. The hybrid generative simulator first uses a physics solver to simulate coarse 3D dynamics, which subsequently conditions a video generator to produce a video with finer, more realistic motion. The generated video is then used to update the simulated dynamic 3D scene, closing the loop between the physics solver and the video generator. This approach enables intuitive user control to be combined with the accurate dynamics of physics-based simulators and the expressivity of diffusion-based video generators. Experimental results demonstrate that WonderPlay enables users to interact with various scenes of diverse content, including cloth, sand, snow, liquid, smoke, elastic, and rigid bodies -- all using a single image input. Code will be made public. Project website: https://kyleleey.github.io/WonderPlay/

2025-05-23T17:59:24Z ICCV 2025 (Highlight). The first two authors contributed equally. Project website: https://kyleleey.github.io/WonderPlay/ Zizhang Li Hong-Xing Yu Wei Liu Yin Yang Charles Herrmann Gordon Wetzstein Jiajun Wu http://arxiv.org/abs/2504.00983v2 WorldScore: A Unified Evaluation Benchmark for World Generation 2025-11-29T04:20:04Z

We introduce the WorldScore benchmark, the first unified benchmark for world generation. We decompose world generation into a sequence of next-scene generation tasks with explicit camera trajectory-based layout specifications, enabling unified evaluation of diverse approaches from 3D and 4D scene generation to video generation models. The WorldScore benchmark encompasses a curated dataset of 3,000 test examples that span diverse worlds: static and dynamic, indoor and outdoor, photorealistic and stylized. The WorldScore metrics evaluate generated worlds through three key aspects: controllability, quality, and dynamics. Through extensive evaluation of 19 representative models, including both open-source and closed-source ones, we reveal key insights and challenges for each category of models. Our dataset, evaluation code, and leaderboard can be found at https://haoyi-duan.github.io/WorldScore/

2025-04-01T17:20:23Z ICCV 2025. Project website: https://haoyi-duan.github.io/WorldScore/ The first two authors contributed equally Haoyi Duan Hong-Xing Yu Sirui Chen Li Fei-Fei Jiajun Wu http://arxiv.org/abs/2511.23290v1 Machine Learning for Scientific Visualization: Ensemble Data Analysis 2025-11-28T15:45:54Z

Scientific simulations and experimental measurements produce vast amounts of spatio-temporal data, yet extracting meaningful insights remains challenging due to high dimensionality, complex structures, and missing information. Traditional analysis methods often struggle with these issues, motivating the need for more robust, data-driven approaches. This dissertation explores deep learning methodologies to improve the analysis and visualization of spatio-temporal scientific ensembles, focusing on dimensionality reduction, flow estimation, and temporal interpolation. First, we address high-dimensional data representation through autoencoder-based dimensionality reduction for scientific ensembles. We evaluate the stability of projection metrics under partial labeling and introduce a Pareto-efficient selection strategy to identify optimal autoencoder variants, ensuring expressive and reliable low-dimensional embeddings. Next, we present FLINT, a deep learning model for high-quality flow estimation and temporal interpolation in both flow-supervised and flow-unsupervised settings. FLINT reconstructs missing velocity fields and generates high-fidelity temporal interpolants for scalar fields across 2D+time and 3D+time ensembles without domain-specific assumptions or extensive finetuning. To further improve adaptability and generalization, we introduce HyperFLINT, a hypernetwork-based approach that conditions on simulation parameters to estimate flow fields and interpolate scalar data. This parameter-aware adaptation yields more accurate reconstructions across diverse scientific domains, even with sparse or incomplete data. Overall, this dissertation advances deep learning techniques for scientific visualization, providing scalable, adaptable, and high-quality solutions for interpreting complex spatio-temporal ensembles.

2025-11-28T15:45:54Z PhD thesis, University of Groningen, 2025 Hamid Gadirov 10.33612/diss.1402847307 http://arxiv.org/abs/2511.23131v1 Towards Generalized Position-Based Dynamics 2025-11-28T12:25:33Z

The position-based dynamics (PBD) algorithm is a popular and versatile technique for real-time simulation of deformable bodies, but is only applicable to forces that can be expressed as linearly compliant constraints. In this work, we explore a generalization of PBD that is applicable to arbitrary nonlinear force models. We do this by reformulating the implicit time integration equations in terms of the individual forces in the system, to which applying Gauss-Seidel iterations naturally leads to a PBD-type algorithm. As we demonstrate, our method allows simulation of data-driven cloth models [Sperl et al. 2020] that cannot be represented by existing variations of position-based dynamics, enabling performance improvements over the baseline Newton-based solver for high mesh resolutions. We also show our method's applicability to volumetric neo-Hookean elasticity with an inversion barrier.

2025-11-28T12:25:33Z Manas Chaudhary Chandradeep Pokhariya Rahul Narain http://arxiv.org/abs/2503.09464v2 Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation 2025-11-28T11:17:36Z

Neural reconstruction models for autonomous driving simulation have made significant strides in recent years, with dynamic models becoming increasingly prevalent. However, these models are typically limited to handling in-domain objects closely following their original trajectories. We introduce a hybrid approach that combines the strengths of neural reconstruction with physics-based rendering. This method enables the virtual placement of traditional mesh-based dynamic agents at arbitrary locations, adjustments to environmental conditions, and rendering from novel camera viewpoints. Our approach significantly enhances novel view synthesis quality -- especially for road surfaces and lane markings -- while maintaining interactive frame rates through our novel training method, NeRF2GS. This technique leverages the superior generalization capabilities of NeRF-based methods and the real-time rendering speed of 3D Gaussian Splatting (3DGS). We achieve this by training a customized NeRF model on the original images with depth regularization derived from a noisy LiDAR point cloud, then using it as a teacher model for 3DGS training. This process ensures accurate depth, surface normals, and camera appearance modeling as supervision. With our block-based training parallelization, the method can handle large-scale reconstructions (greater than or equal to 100,000 square meters) and predict segmentation masks, surface normals, and depth maps. During simulation, it supports a rasterization-based rendering backend with depth-based composition and multiple camera models for real-time camera simulation, as well as a ray-traced backend for precise LiDAR simulation.

2025-03-12T15:18:50Z Máté Tóth Péter Kovács Réka Bencses Zoltán Bendefy Zoltán Hortsin Balázs Teréki Tamás Matuszka http://arxiv.org/abs/2504.11212v3 SDFs from Unoriented Point Clouds using Neural Variational Heat Distances 2025-11-28T10:41:47Z

We propose a novel variational approach for computing neural Signed Distance Fields (SDF) from unoriented point clouds. To this end, we replace the commonly used eikonal equation with the heat method, carrying over to the neural domain what has long been standard practice for computing distances on discrete surfaces. This yields two convex optimization problems for whose solution we employ neural networks: We first compute a neural approximation of the gradients of the unsigned distance field through a small time step of heat flow with weighted point cloud densities as initial data. Then we use it to compute a neural approximation of the SDF. We prove that the underlying variational problems are well-posed. Through numerical experiments, we demonstrate that our method provides state-of-the-art surface reconstruction and consistent SDF gradients. Furthermore, we show in a proof-of-concept that it is accurate enough for solving a PDE on the zero-level set.

2025-04-15T14:13:54Z 16 pages, 19 figures, 4 tables Samuel Weidemaier Florine Hartwig Josua Sassen Sergio Conti Mirela Ben-Chen Martin Rumpf 10.1111/cgf.70296 http://arxiv.org/abs/2511.23029v1 Geodiffussr: Generative Terrain Texturing with Elevation Fidelity 2025-11-28T09:52:44Z

Large-scale terrain generation remains a labor-intensive task in computer graphics. We introduce Geodiffussr, a flow-matching pipeline that synthesizes text-guided texture maps while strictly adhering to a supplied Digital Elevation Map (DEM). The core mechanism is multi-scale content aggregation (MCA): DEM features from a pretrained encoder are injected into UNet blocks at multiple resolutions to enforce global-to-local elevation consistency. Compared with a non-MCA baseline, MCA markedly improves visual fidelity and strengthens height-appearance coupling (FID $\downarrow$ 49.16%, LPIPS $\downarrow$ 32.33%, $Δ$dCor $\downarrow$ to 0.0016). To train and evaluate Geodiffussr, we assemble a globally distributed, biome- and climate-stratified corpus of triplets pairing SRTM-derived DEMs with Sentinel-2 imagery and vision-grounded natural-language captions that describe visible land cover. We position Geodiffussr as a strong baseline and step toward controllable 2.5D landscape generation for coarse-scale ideation and previz, complementary to physically based terrain and ecosystem simulators.

2025-11-28T09:52:44Z Tai Inui Alexander Matsumura Edgar Simo-Serra http://arxiv.org/abs/2410.20789v2 LoDAvatar: Hierarchical Embedding and Selective Detail Enhancement for Adaptive Levels of Detail Gaussian Avatars 2025-11-28T09:03:10Z

With the advancement of virtual reality, the demand for 3D human avatars is increasing. The emergence of Gaussian Splatting technology has enabled the rendering of Gaussian avatars with superior visual quality and reduced computational costs. Despite numerous methods researchers propose for implementing drivable Gaussian avatars, limited attention has been given to balancing visual quality and computational costs. In this paper, we introduce LoDAvatar, a method that introduces levels of detail into Gaussian avatars through hierarchical embedding and selective detail enhancement methods. The key steps of LoDAvatar encompass data preparation, Gaussian embedding, Gaussian optimization, and selective detail enhancement. We conducted experiments involving Gaussian avatars at various levels of detail, employing both objective assessments and subjective evaluations. The outcomes indicate that incorporating levels of detail into Gaussian avatars can decrease computational costs during rendering while upholding commendable visual quality, thereby enhancing runtime frame rates. We advocate adopting LoDAvatar to render multiple dynamic Gaussian avatars or extensive Gaussian scenes to balance visual quality and computational costs.

2024-10-28T07:11:20Z 21 pages, 7 figures, Published in Virtual Reality Virtual Reality 29, 178 (2025) Xiaonuo Dongye Hanzhi Guo Le Luo Haiyan Jiang Yihua Bao Jie Guo Zeyu Tian Dongdong Weng 10.1007/s10055-025-01249-3 http://arxiv.org/abs/2503.09864v2 Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models 2025-11-28T00:27:26Z

Recent advances in text-to-image (T2I) diffusion models have enabled remarkable control over various attributes, yet precise color specification remains a fundamental challenge. Existing approaches, such as ColorPeel, rely on model personalization, requiring additional optimization and limiting flexibility in specifying arbitrary colors. In this work, we introduce ColorWave, a novel training-free approach that achieves exact RGB-level color control in diffusion models without fine-tuning. By systematically analyzing the cross-attention mechanisms within IP-Adapter, we uncover an implicit binding between textual color descriptors and reference image features. Leveraging this insight, our method rewires these bindings to enforce precise color attribution while preserving the generative capabilities of pretrained models. Our approach maintains generation quality and diversity, outperforming prior methods in accuracy and applicability across diverse object categories. Through extensive evaluations, we demonstrate that ColorWave establishes a new paradigm for structured, color-consistent diffusion-based image synthesis.

2025-03-12T21:49:52Z WACV 2026. Project page: https://hecoding.github.io/colorwave-page Héctor Laria Alexandra Gomez-Villa Jiang Qin Muhammad Atif Butt Bogdan Raducanu Javier Vazquez-Corral Joost van de Weijer Kai Wang http://arxiv.org/abs/2512.08951v1 AI Co-Artist: A LLM-Powered Framework for Interactive GLSL Shader Animation Evolution 2025-11-27T18:55:32Z

Creative coding and real-time shader programming are at the forefront of interactive digital art, enabling artists, designers, and enthusiasts to produce mesmerizing, complex visual effects that respond to real-time stimuli such as sound or user interaction. However, despite the rich potential of tools like GLSL, the steep learning curve and requirement for programming fluency pose substantial barriers for newcomers and even experienced artists who may not have a technical background. In this paper, we present AI Co-Artist, a novel interactive system that harnesses the capabilities of large language models (LLMs), specifically GPT-4, to support the iterative evolution and refinement of GLSL shaders through a user-friendly, visually-driven interface. Drawing inspiration from the user-guided evolutionary principles pioneered by the Picbreeder platform, our system empowers users to evolve shader art using intuitive interactions, without needing to write or understand code. AI Co-Artist serves as both a creative companion and a technical assistant, allowing users to explore a vast generative design space of real-time visual art. Through comprehensive evaluations, including structured user studies and qualitative feedback, we demonstrate that AI Co-Artist significantly reduces the technical threshold for shader creation, enhances creative outcomes, and supports a wide range of users in producing professional-quality visual effects. Furthermore, we argue that this paradigm is broadly generalizable. By leveraging the dual strengths of LLMs-semantic understanding and program synthesis, our method can be applied to diverse creative domains, including website layout generation, architectural visualizations, product prototyping, and infographics.

2025-11-27T18:55:32Z Kamer Ali Yuksel Hassan Sawaf http://arxiv.org/abs/2511.22288v1 Improving Sparse IMU-based Motion Capture with Motion Label Smoothing 2025-11-27T10:11:48Z

Sparse Inertial Measurement Units (IMUs) based human motion capture has gained significant momentum, driven by the adaptation of fundamental AI tools such as recurrent neural networks (RNNs) and transformers that are tailored for temporal and spatial modeling. Despite these achievements, current research predominantly focuses on pipeline and architectural designs, with comparatively little attention given to regularization methods, highlighting a critical gap in developing a comprehensive AI toolkit for this task. To bridge this gap, we propose motion label smoothing, a novel method that adapts the classic label smoothing strategy from classification to the sparse IMU-based motion capture task. Specifically, we first demonstrate that a naive adaptation of label smoothing, including simply blending a uniform vector or a ``uniform'' motion representation (e.g., dataset-average motion or a canonical T-pose), is suboptimal; and argue that a proper adaptation requires increasing the entropy of the smoothed labels. Second, we conduct a thorough analysis of human motion labels, identifying three critical properties: 1) Temporal Smoothness, 2) Joint Correlation, and 3) Low-Frequency Dominance, and show that conventional approaches to entropy enhancement (e.g., blending Gaussian noise) are ineffective as they disrupt these properties. Finally, we propose the blend of a novel skeleton-based Perlin noise for motion label smoothing, designed to raise label entropy while satisfying motion properties. Extensive experiments applying our motion label smoothing to three state-of-the-art methods across four real-world IMU datasets demonstrate its effectiveness and robust generalization (plug-and-play) capability.

2025-11-27T10:11:48Z Accepted by AAAI 2026 Zhaorui Meng Lu Yin Yangqing Hou Anjun Chen Shihui Guo Yipeng Qin http://arxiv.org/abs/2511.22171v1 BrepGPT: Autoregressive B-rep Generation with Voronoi Half-Patch 2025-11-27T07:16:53Z

Boundary representation (B-rep) is the de facto standard for CAD model representation in modern industrial design. The intricate coupling between geometric and topological elements in B-rep structures has forced existing generative methods to rely on cascaded multi-stage networks, resulting in error accumulation and computational inefficiency. We present BrepGPT, a single-stage autoregressive framework for B-rep generation. Our key innovation lies in the Voronoi Half-Patch (VHP) representation, which decomposes B-reps into unified local units by assigning geometry to nearest half-edges and sampling their next pointers. Unlike hierarchical representations that require multiple distinct encodings for different structural levels, our VHP representation facilitates unifying geometric attributes and topological relations in a single, coherent format. We further leverage dual VQ-VAEs to encode both vertex topology and Voronoi Half-Patches into vertex-based tokens, achieving a more compact sequential encoding. A decoder-only Transformer is then trained to autoregressively predict these tokens, which are subsequently mapped to vertex-based features and decoded into complete B-rep models. Experiments demonstrate that BrepGPT achieves state-of-the-art performance in unconditional B-rep generation. The framework also exhibits versatility in various applications, including conditional generation from category labels, point clouds, text descriptions, and images, as well as B-rep autocompletion and interpolation.

2025-11-27T07:16:53Z Pu Li Wenhao Zhang Weize Quan Biao Zhang Peter Wonka Dong-Ming Yan http://arxiv.org/abs/2408.00303v2 Neural Octahedral Field: Octahedral prior for simultaneous smoothing and sharp edge regularization 2025-11-27T00:50:56Z

Neural implicit representation, the parameterization of a continuous distance function as a Multi-Layer Perceptron (MLP), has emerged as a promising lead in tackling surface reconstruction from unoriented point clouds. In the presence of noise, however, its lack of explicit neighborhood connectivity makes sharp edges identification particularly challenging, hence preventing the separation of smoothing and sharpening operations, as is achievable with its discrete counterparts. In this work, we propose to tackle this challenge with an auxiliary field, the \emph{octahedral field}. We observe that both smoothness and sharp features in the distance field can be equivalently described by the smoothness in octahedral space. Therefore, by aligning and smoothing an octahedral field alongside the implicit geometry, our method behaves analogously to bilateral filtering, resulting in a smooth reconstruction while preserving sharp edges. Despite being operated purely pointwise, our method outperforms various traditional and neural implicit fitting approaches across extensive experiments, and is very competitive with methods that require normals and data priors. Code and data of our work are available at: https://github.com/Ankbzpx/frame-field.

2024-08-01T06:02:59Z project page: https://github.com/Ankbzpx/frame-field Ruichen Zheng Tao Yu Ruizhen Hu