https://arxiv.org/api/MoeNmbWLiCrNZfSYrYqwbY024H4 2026-06-14T18:31:44Z 9323 510 15 http://arxiv.org/abs/2604.08411v1 What a Comfortable World: Ergonomic Principles Guided Apartment Layout Generation 2026-04-09T16:11:01Z

Current data-driven floor plan generation methods often reproduce the ergonomic inefficiencies found in real-world training datasets. To address this, we propose a novel approach that integrates architectural design principles directly into a transformer-based generative process. We formulate differentiable loss functions based on established architectural standards from literature to optimize room adjacency and proximity. By guiding the model with these ergonomic priors during training, our method produces layouts with significantly improved livability metrics. Comparative evaluations show that our approach outperforms baselines in ergonomic compliance while maintaining high structural validity.

2026-04-09T16:11:01Z 4 pages, 2 figures, EUROGRAPHICS 2026 Short Paper Piotr Nieciecki Aleksander Plocharski Przemyslaw Musialski http://arxiv.org/abs/2604.08378v1 Investigating Performance and Practices with Univariate Distribution Charts 2026-04-09T15:39:15Z

A range of charts with different strengths and weaknesses exists to support the visual analysis of univariate distributions, with a limited understanding of which charts best support which tasks and users, and how practitioners use charts. We categorize the available charts for univariate distributions into four groups and present the results of a mixed-methods comparison (n=215) of participants' perception and preferences across boxplots, violinplots, jittered stripplots, and histograms as representatives of their respective categories. The click-to-select approach in our study, combined with data on participants' subjective experiences and preferences, allows to both measure accuracy on benchmark tasks and discuss participants' choices qualitatively. Our analysis reveals differences between charts in task accuracy, common misunderstandings, and preferences across various low-level tasks, and indicates that chart preference and familiarity do not necessarily align with participants' task performance. Interviews with five visualization practitioners further reveal that charts widely preferred by general audiences (such as histograms) or commonly used in scientific domains (such as boxplots) are not inherently the most effective for all tasks.

2026-04-09T15:39:15Z Laura Lotteraner Anna Kurtenkova Torsten Möller Daniel Pahr 10.1111/cgf.70482 http://arxiv.org/abs/2604.07984v1 Physics-Based Motion Tracking of Contact-Rich Interacting Characters 2026-04-09T08:55:16Z

Motion tracking has been an important technique for imitating human-like movement from large-scale datasets in physics-based motion synthesis. However, existing approaches focus on tracking either single character or a particular type of interaction, limiting their ability to handle contact-rich interactions. Extending single-character tracking approaches suffers from the instability due to the challenge of forces transferred through contacts. Contact-rich interactions requires levels of control, which places much greater demands on model capacity. To this end, we propose a robust tracking method based on progressive neural network (PNN) where multiple experts are specialized in learning skills of various difficulties. Our method learns to assign training samples to experts automatically without requiring manually scheduling. Both qualitative and quantitative results show that our method delivers more stable motion tracking in densely interactive movements while enabling more efficient model training.

2026-04-09T08:55:16Z Xiaotang Zhang Ziyi Chang Qianhui Men Hubert P. H. Shum http://arxiv.org/abs/2401.03890v9 A Survey on 3D Gaussian Splatting 2026-04-09T05:33:48Z

3D Gaussian splatting (GS) has emerged as a transformative technique in radiance fields. Unlike mainstream implicit neural models, 3D GS uses millions of learnable 3D Gaussians for an explicit scene representation. Paired with a differentiable rendering algorithm, this approach achieves real-time rendering and unprecedented editability, making it a potential game-changer for 3D reconstruction and representation. In the present paper, we provide the first systematic overview of the recent developments and critical contributions in 3D GS. We begin with a detailed exploration of the underlying principles and the driving forces behind the emergence of 3D GS, laying the groundwork for understanding its significance. A focal point of our discussion is the practical applicability of 3D GS. By enabling unprecedented rendering speed, 3D GS opens up a plethora of applications, ranging from virtual reality to interactive media and beyond. This is complemented by a comparative analysis of leading 3D GS models, evaluated across various benchmark tasks to highlight their performance and practical utility. The survey concludes by identifying current challenges and suggesting potential avenues for future research. Through this survey, we aim to provide a valuable resource for both newcomers and seasoned researchers, fostering further exploration and advancement in explicit radiance field.

2024-01-08T13:42:59Z Accepted by ACM Computing Surveys; Paper list: https://github.com/guikunchen/Awesome3DGS ; Benchmark: https://github.com/guikunchen/3DGS-Benchmarks Guikun Chen Wenguan Wang http://arxiv.org/abs/2604.07795v1 Image-Guided Geometric Stylization of 3D Meshes 2026-04-09T04:44:42Z

Recent generative models can create visually plausible 3D representations of objects. However, the generation process often allows for implicit control signals, such as contextual descriptions, and rarely supports bold geometric distortions beyond existing data distributions. We propose a geometric stylization framework that deforms a 3D mesh, allowing it to express the style of an image. While style is inherently ambiguous, we utilize pre-trained diffusion models to extract an abstract representation of the provided image. Our coarse-to-fine stylization pipeline can drastically deform the input 3D model to express a diverse range of geometric variations while retaining the valid topology of the original mesh and part-level semantics. We also propose an approximate VAE encoder that provides efficient and reliable gradients from mesh renderings. Extensive experiments demonstrate that our method can create stylized 3D meshes that reflect unique geometric features of the pictured assets, such as expressive poses and silhouettes, thereby supporting the creation of distinctive artistic 3D creations. Project page: https://changwoonchoi.github.io/GeoStyle

2026-04-09T04:44:42Z Changwoon Choi Hyunsoo Lee Clément Jambon Yael Vinker Young Min Kim http://arxiv.org/abs/2412.10437v3 SVGFusion: A VAE-Diffusion Transformer for Vector Graphic Generation 2026-04-09T03:46:50Z

Generating high-quality Scalable Vector Graphics (SVGs) from text remains a significant challenge. Existing LLM-based models that generate SVG code as a flat token sequence struggle with poor structural understanding and error accumulation, while optimization-based methods are slow and yield uneditable outputs. To address these limitations, we introduce SVGFusion, a unified framework that adapts the VAE-diffusion architecture to bridge the dual code-visual nature of SVGs. Our model features two core components: a Vector-Pixel Fusion Variational Autoencoder (VP-VAE) that learns a perceptually rich latent space by jointly encoding SVG code and its rendered image, and a Vector Space Diffusion Transformer (VS-DiT) that achieves globally coherent compositions through iterative refinement. Furthermore, this architecture is enhanced by a Rendering Sequence Modeling strategy, which ensures accurate object layering and occlusion. Evaluated on our novel SVGX-Dataset comprising 240k human-designed SVGs, SVGFusion establishes a new state-of-the-art, generating high-quality, editable SVGs that are strictly semantically aligned with the input text.

2024-12-11T09:02:25Z project page: https://ximinng.github.io/SVGFusionProject/ Ximing Xing Juncheng Hu Ziteng Xue Jing Zhang Buyu Li Sheng Wang Dong Xu Qian Yu http://arxiv.org/abs/2504.13378v2 SMPL-GPTexture: Dual-View 3D Human Texture Estimation using Text-to-Image Generation Models 2026-04-09T03:17:26Z

Generating high-quality, photorealistic textures for 3D human avatars remains a fundamental yet challenging task in computer vision and multimedia field. However, real paired front and back images of human subjects are rarely available with privacy, ethical and cost of acquisition, which restricts scalability of the data. Additionally, learning priors from image inputs using deep generative models, such as GANs or diffusion models, to infer unseen regions such as the human back often leads to artifacts, structural inconsistencies, or loss of fine-grained detail. To address these issues, we present SMPL-GPTexture (skinned multi-person linear model - general purpose Texture), a novel pipeline that takes natural language prompts as input and leverages a state-of-the-art text-to-image generation model to produce paired high-resolution front and back images of a human subject as the starting point for texture estimation. Using the generated paired dual-view images, we first employ a human mesh recovery model to obtain a robust 2D-to-3D SMPL alignment between image pixels and the 3D model's UV coordinates for each views. Second, we use an inverted rasterization technique that explicitly projects the observed colour from the input images into the UV space, thereby producing accurate, complete texture maps. Finally, we apply a diffusion-based inpainting module to fill in the missing regions, and the fusion mechanism then combines these results into a unified full texture map. Extensive experiments shows that our SMPL-GPTexture can generate high resolution texture aligned with user's prompts.

2025-04-17T23:28:38Z Mingxiao Tu Shuchang Ye Hoijoon Jung Jinman Kim http://arxiv.org/abs/2503.09640v2 Physically Plausible Human-Object Rendering from Sparse Views via 3D Gaussian Splatting 2026-04-09T03:04:43Z

Rendering realistic human-object interactions (HOIs) from sparse-view inputs is a challenging yet crucial task for various real-world applications. Existing methods often struggle to simultaneously achieve high rendering quality, physical plausibility, and computational efficiency. To address these limitations, we propose HOGS (Human-Object Rendering via 3D Gaussian Splatting), a novel framework for efficient HOI rendering with physically plausible geometric constraints from sparse views. HOGS represents both humans and objects as dynamic 3D Gaussians. Central to HOGS is a novel optimization process that operates directly on these Gaussians to enforce geometric consistency (i.e., preventing inter-penetration or floating contacts) to achieve physical plausibility. To support this core optimization under sparse-view ambiguity, our framework incorporates two pre-trained modules: an optimization-guided Human Pose Refiner for robust estimation under sparse-view occlusions, and a Human-Object Contact Predictor that efficiently identifies interaction regions to guide our novel contact and separation losses. Extensive experiments on both human-object and hand-object interaction datasets demonstrate that HOGS achieves state-of-the-art rendering quality and maintains high computational efficiency.

2025-03-12T04:19:21Z 16 pages, 14 figures, accepted by IEEE Transactions on Image Processing (TIP) Weiquan Wang Jun Xiao Yi Yang Yueting Zhuang Long Chen http://arxiv.org/abs/2604.07728v1 GEAR: GEometry-motion Alternating Refinement for Articulated Object Modeling with Gaussian Splatting 2026-04-09T02:24:39Z

High-fidelity interactive digital assets are essential for embodied intelligence and robotic interaction, yet articulated objects remain challenging to reconstruct due to their complex structures and coupled geometry-motion relationships. Existing methods suffer from instability in geometry-motion joint optimization, while their generalization remains limited on complex multi-joint or out-of-distribution objects. To address these challenges, we propose GEAR, an EM-style alternating optimization framework that jointly models geometry and motion as interdependent components within a Gaussian Splatting representation. GEAR treats part segmentation as a latent variable and joint motion parameters as explicit variables, alternately refining them for improved convergence and geometric-motion consistency. To enhance part segmentation quality without sacrificing generalization, we leverage a vanilla 2D segmentation model to provide multi-view part priors, and employ a weakly supervised constraint to regularize the latent variable. Experiments on multiple benchmarks and our newly constructed dataset GEAR-Multi demonstrate that GEAR achieves state-of-the-art results in geometric reconstruction and motion parameters estimation, particularly on complex articulated objects with multiple movable parts.

2026-04-09T02:24:39Z Accepted to CVPRF2026 Jialin Li Bin Fu Ruiping Wang Xilin Chen http://arxiv.org/abs/2604.07350v1 Fast Spatial Memory with Elastic Test-Time Training 2026-04-08T17:59:48Z

Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remain vulnerable to catastrophic forgetting and overfitting. As a result, LaCT is typically instantiated with a single large chunk spanning the full input sequence, falling short of the broader goal of handling arbitrarily long sequences in a single pass. We propose Elastic Test-Time Training inspired by elastic weight consolidation, that stabilizes LaCT fast-weight updates with a Fisher-weighted elastic prior around a maintained anchor state. The anchor evolves as an exponential moving average of past fast weights to balance stability and plasticity. Based on this updated architecture, we introduce Fast Spatial Memory (FSM), an efficient and scalable model for 4D reconstruction that learns spatiotemporal representations from long observation sequences and renders novel view-time combinations. We pre-trained FSM on large-scale curated 3D/4D data to capture the dynamics and semantics of complex spatial environments. Extensive experiments show that FSM supports fast adaptation over long sequences and delivers high-quality 3D/4D reconstruction with smaller chunks and mitigating the camera-interpolation shortcut. Overall, we hope to advance LaCT beyond the bounded single-chunk setting toward robust multi-chunk adaptation, a necessary step for generalization to genuinely longer sequences, while substantially alleviating the activation-memory bottleneck.

2026-04-08T17:59:48Z Project Page: https://fast-spatial-memory.github.io/ Ziqiao Ma Xueyang Yu Haoyu Zhen Yuncong Yang Joyce Chai Chuang Gan http://arxiv.org/abs/2604.07348v1 MoRight: Motion Control Done Right 2026-04-08T17:59:22Z

Generating motion-controlled videos--where user-specified actions drive physically plausible scene dynamics under freely chosen viewpoints--demands two capabilities: (1) disentangled motion control, allowing users to separately control the object motion and adjust camera viewpoint; and (2) motion causality, ensuring that user-driven actions trigger coherent reactions from other objects rather than merely displacing pixels. Existing methods fall short on both fronts: they entangle camera and object motion into a single tracking signal and treat motion as kinematic displacement without modeling causal relationships between object motion. We introduce MoRight, a unified framework that addresses both limitations through disentangled motion modeling. Object motion is specified in a canonical static-view and transferred to an arbitrary target camera viewpoint via temporal cross-view attention, enabling disentangled camera and object control. We further decompose motion into active (user-driven) and passive (consequence) components, training the model to learn motion causality from data. At inference, users can either supply active motion and MoRight predicts consequences (forward reasoning), or specify desired passive outcomes and MoRight recovers plausible driving actions (inverse reasoning), all while freely adjusting the camera viewpoint. Experiments on three benchmarks demonstrate state-of-the-art performance in generation quality, motion controllability, and interaction awareness.

2026-04-08T17:59:22Z Project Page: https://research.nvidia.com/labs/sil/projects/moright Shaowei Liu Xuanchi Ren Tianchang Shen Huan Ling Saurabh Gupta Shenlong Wang Sanja Fidler Jun Gao http://arxiv.org/abs/2604.01204v2 Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction 2026-04-08T17:59:08Z

Primitive-based methods such as 3D Gaussian Splatting have recently become the state-of-the-art for novel-view synthesis and related reconstruction tasks. Compared to neural fields, these representations are more flexible, adaptive, and scale better to large scenes. However, the limited expressivity of individual primitives makes modeling high-frequency detail challenging. We introduce Neural Harmonic Textures, a neural representation approach that anchors latent feature vectors on a virtual scaffold surrounding each primitive. These features are interpolated within the primitive at ray intersection points. Inspired by Fourier analysis, we apply periodic activations to the interpolated features, turning alpha blending into a weighted sum of harmonic components. The resulting signal is then decoded in a single deferred pass using a small neural network, significantly reducing computational cost. Neural Harmonic Textures yield state-of-the-art results in real-time novel view synthesis while bridging the gap between primitive- and neural-field-based reconstruction. Our method integrates seamlessly into existing primitive-based pipelines such as 3DGUT, Triangle Splatting, and 2DGS. We further demonstrate its generality with applications to 2D image fitting and semantic reconstruction.

2026-04-01T17:48:22Z Jorge Condor Nicolas Moenne-Loccoz Merlin Nimier-David Piotr Didyk Zan Gojcic Qi Wu http://arxiv.org/abs/2409.17346v3 Preserving Discrete Morse-Smale Complexes in Error-Bounded Lossy Compression 2026-04-08T16:55:08Z

Scientific applications are generating unprecedented volumes of data that overwhelm storage and transmission systems, posing significant challenges for the design of data management tools and scientific databases. Lossy compression has emerged as a promising strategy to address this problem, but most existing compressors fail to preserve the topology of scientific data, leading to inaccuracies in downstream analyses and potentially erroneous scientific conclusions. In this work, we present a methodology for fully preserving the topology, specifically, Morse-Smale complexes (MSCs), in lossy-compressed 2D and 3D scalar field data from scientific simulations. We generalize the edit-based strategy introduced in MSz (a previous method that preserves only segmentations and cannot preserve saddles or separatrices) by extending the framework to the full MSCs, including all critical points and separatrices. Our approach corrects the MSCs in the decompressed output of any error-bounded lossy compressor (e.g., SZ3 or ZFP), referred to as the base compressor, using an iterative editing strategy that preserves all critical points and their connectivity via separatrices. During compression, we generate a sequence of quantized edits that are applied to the decompressed output, ensuring accurate preservation of topological features while maintaining the error within prescribed bounds. The strategy iteratively fixes critical points and separatrices in alternating steps until convergence is achieved in a finite number of iterations. To meet diverse application needs, our method offers flexible options that balance compression efficiency with feature preservation. To reduce computation time, we leverage GPU parallelism to accelerate each component of the workflow. Experiments on multiple datasets demonstrate that our method achieves 100% preservation of Morse-Smale complexes.

2024-09-25T20:46:40Z Yuxiao Li Mingze Xia Xin Liang Bei Wang Hanqi Guo http://arxiv.org/abs/2605.13857v1 MoZoo:Unleashing Video Diffusion power in animal fur and muscle simulation 2026-04-08T15:42:16Z

The creation of cinematic-quality animal effects necessitates the precise modeling of muscle and fur dynamics, a process that remains both labor-intensive and computationally expensive within traditional production workflows. While generative diffusion models have shown promise in diverse artistic workflows, their capacity for high-fidelity animal simulation remains largely unexploited. We present MoZoo, a generative dynamics solver that bypasses conventional refinement to synthesize high-fidelity animal videos from coarse meshes under multimodal guidance. We propose Role-Aware RoPE (RAR-RoPE) which employs role-based index remapping to synchronize motion alignment while decoupling reference information via fixed temporal offsets. Complementing this, Asymmetric Decoupled Attention partitions the latent sequence to enforce a unidirectional information flow, effectively preventing feature interference and improving computational efficiency. To address the scarcity of high-quality training data, we introduce MoZoo-Data, a synthetic-to-real pipeline that leverages a rendering engine and an inverse mapping approach to construct a large-scale dataset of paired sequences. Furthermore, we establish MoZooBench, a comprehensive benchmark with 120 mesh-video pairs. Experimental results demonstrate that MoZoo achieves high-fidelity fur simulation across diverse animal skeletons and layouts, preserving superior temporal and structural consistency.

2026-04-08T15:42:16Z Github Page:https://dongxialiu15.github.io/MoZoo/ Dongxia Liu Jie Ma Xiaochen Yang Jiancheng Zhang Bin Xia Zhehan Kan Nisha Huang Jun Liang Wenming Yang Jin Li http://arxiv.org/abs/2604.07177v1 Splats under Pressure: Exploring Performance-Energy Trade-offs in Real-Time 3D Gaussian Splatting under Constrained GPU Budgets 2026-04-08T15:05:29Z

We investigate the feasibility of real-time 3D Gaussian Splatting (3DGS) rasterisation on edge clients with varying Gaussian splat counts and GPU computational budgets. Instead of evaluating multiple physical devices, we adopt an emulation-based approach that approximates different GPU capability tiers on a single high-end GPU. By systematically under-clocking the GPU core frequency and applying power caps, we emulate a controlled range of floating-point performance levels that approximate different GPU capability tiers. At each point in this range, we measure frame rate, runtime behaviour, and power consumption across scenes of varying complexity, pipelines, and optimisations, enabling analysis of power-performance relationships such as FPS-power curves, energy per frame, and performance per watt. This method allows us to approximate the performance envelope of a diverse class of GPUs, from embedded and mobile-class devices to high-end consumer-grade systems. Our objective is to explore the practical lower bounds of client-side 3DGS rasterisation and assess its potential for deployment in energy-constrained environments, including standalone headsets and thin clients. Through this analysis, we provide early insights into the performance-energy trade-offs that govern the viability of edge-deployed 3DGS systems.

2026-04-08T15:05:29Z Muhammad Fahim Tajwar Arthur Wuhrlin Bhojan Anand