https://arxiv.org/api/dNFa7aV2dipk7VSZQyzhClWFOmg 2026-06-14T22:11:56Z 9323 570 15 http://arxiv.org/abs/2601.10075v2 Thinking Like Van Gogh: Structure-Aware Style Transfer via Flow-Guided 3D Gaussian Splatting 2026-04-01T07:37:19Z

In 1888, Vincent van Gogh wrote, "I am seeking exaggeration in the essential." This principle, amplifying structural form while suppressing photographic detail, lies at the core of Post-Impressionist art. However, most existing 3D style transfer methods invert this philosophy, treating geometry as a rigid substrate for surface-level texture projection. To authentically reproduce Post-Impressionist stylization, geometric abstraction must be embraced as the primary vehicle of expression. We propose a flow-guided geometric advection framework for 3D Gaussian Splatting (3DGS) that operationalizes this principle in a mesh-free setting. Our method extracts directional flow fields from 2D paintings and back-propagates them into 3D space, rectifying Gaussian primitives to form flow-aligned brushstrokes that conform to scene topology without relying on explicit mesh priors. This enables expressive structural deformation driven directly by painterly motion rather than photometric constraints. Our contributions are threefold: (1) a projection-based, mesh-free flow guidance mechanism that transfers 2D artistic motion into 3D Gaussian geometry; (2) a luminance-structure decoupling strategy that isolates geometric deformation from color optimization, mitigating artifacts during aggressive structural abstraction; and (3) a VLM-as-a-Judge evaluation framework that assesses artistic authenticity through aesthetic judgment instead of conventional pixel-level metrics, explicitly addressing the subjective nature of artistic stylization.

2026-01-15T05:00:02Z 7 pages, 8 figures Lebin Zhou Jingchuan Xiao Zhendong Wang Jinhao Wang Rongduo Han Nam Ling Cihan Ruan http://arxiv.org/abs/2505.14222v3 MATHDance: Mamba-Transformer Architecture with Uniform Tokenization for High-Quality 3D Dance Generation 2026-04-01T06:38:44Z

Music-to-dance generation represents a challenging yet pivotal task at the intersection of choreography, virtual reality, and creative content generation. Despite its significance, existing methods face substantial limitation in achieving choreographic consistency. To address the challenge, we propose MatchDance, a novel framework for music-to-dance generation that constructs a latent representation to enhance choreographic consistency. MatchDance employs a two-stage design: (1) a Kinematic-Dynamic-based Quantization Stage (KDQS), which encodes dance motions into a latent representation by Finite Scalar Quantization (FSQ) with kinematic-dynamic constraints and reconstructs them with high fidelity, and (2) a Hybrid Music-to-Dance Generation Stage(HMDGS), which uses a Mamba-Transformer hybrid architecture to map music into the latent representation, followed by the KDQS decoder to generate 3D dance motions. Additionally, a music-dance retrieval framework and comprehensive metrics are introduced for evaluation. Extensive experiments on the FineDance dataset demonstrate state-of-the-art performance.

2025-05-20T11:30:28Z Kaixing Yang Xulong Tang Ziqiao Peng Yuxuan Hu Xiangyue Zhang Puwei Wang Hongyan Liu Jun He Zhaoxin Fan http://arxiv.org/abs/2604.00509v1 RT-GS: Gaussian Splatting with Reflection and Transmittance Primitives 2026-04-01T05:50:03Z

Gaussian Splatting is a powerful tool for reconstructing diffuse scenes, but it struggles to simultaneously model specular reflections and the appearance of objects behind semi-transparent surfaces. These specular reflections and transmittance are essential for realistic novel view synthesis, and existing methods do not properly incorporate the underlying physical processes to simulate them. To address this issue, we propose RT-GS, a unified framework that integrates a microfacet material model and ray tracing to jointly model specular reflection and transmittance in Gaussian Splatting. We accomplish this by using separate Gaussian primitives for reflections and transmittance, which allow modeling distant reflections and reconstructing objects behind transparent surfaces concurrently. We utilize a differentiable ray tracing framework to obtain the specular reflection and transmittance appearance. Our experiments demonstrate that our method successfully produces reflections and recovers objects behind transparent surfaces in complex environments, achieving significant qualitative improvements over prior methods where these specular light interactions are prominent.

2026-04-01T05:50:03Z Kunnong Zeng Chensheng Peng Yichen Xie Masayoshi Tomizuka Cem Yuksel http://arxiv.org/abs/2603.29939v1 XR is XR: Rethinking MR and XR as Neutral Umbrella Terms 2026-03-31T16:12:21Z

The term XR is currently widely used as an expression encompassing Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR). However, there is no clear consensus regarding its origin or meaning. XR is sometimes explained as an abbreviation for Extended Reality, but multiple interpretations exist regarding its etymology and formation process. This paper organizes the historical formation of terminology related to VR, AR, MR, and XR, and reexamines the context in which the term XR emerged and how it has spread. In particular, by presenting a timeline that distinguishes between the coinage of terms and the drivers of their adoption, we suggest that XR, as an umbrella term, functions not as an abbreviation of Extended Reality, but rather as a neutral symbolic label that encompasses multiple "reality"-related terms. Furthermore, we argue that stable usage of terminology, including XR, requires governance through collaboration among academia, industry, and standardization organizations.

2026-03-31T16:12:21Z 4 pages, 2 figures Takeshi Kurata http://arxiv.org/abs/2404.13497v3 Histropy: A Computer Program for Quantifications of Histograms of 2D Gray-scale Images 2026-03-31T14:02:29Z

The computer program "Histropy" is an interactive Python program for the quantification of selected features of two-dimensional (2D) images/patterns (in either JPG/JPEG, PNG, GIF, BMP, or baseline TIF/TIFF formats) using calculations based on the pixel intensities in this data, their histograms, and user-selected sections of those histograms. The histograms of these images display pixel-intensity values along the x-axis (of a 2D Cartesian plot), with the frequency of each intensity value within the image represented along the y-axis. The images need to be of 8-bit or 16-bit information depth and can be of arbitrary size. Histropy generates an image's histogram surrounded by a graphical user interface that allows one to select any range of image-pixel intensity levels, i.e. sections along the histograms' x-axis, using either the computer mouse or numerical text entries. The program subsequently calculates the (so-called Monkey Model) Shannon entropy and root-mean-square contrast for the selected section and displays them as part of what we call a "histogram-workspace-plot." To support the visual identification of small peaks in the histograms, the user can switch between a linear and log-base-10 display scale for the y-axis of the histograms. Pixel intensity data from different images can be overlaid onto the same histogram-workspace-plot for visual comparisons. The visual outputs of the program can be saved as histogram-workspace-plots in the PNG format for future usage. The source code of the program and a brief user manual are published in the supporting materials as well as on GitHub. Instead of taking only 2D images as inputs, the program's functionality could be extended by a few lines of code to other potential uses employing data tables with one or two dimensions in the CSV format.

2024-04-21T01:03:07Z Sagarika Menon Peter Moeck http://arxiv.org/abs/2603.29618v1 ARCOL: Aspect Ratio Constrained Orthogonal Layout 2026-03-31T11:40:23Z

Orthogonal graph layout algorithms aim to produce clear, compact, and readable network diagrams by arranging nodes and edges along horizontal and vertical lines, while minimizing bends and crossings. Most existing orthogonal layout methods focus primarily on quality criteria such as area usage, total edge length, and bend minimization. Explicitly controlling the global aspect ratio (AR) of the resulting layout is as of now unexplored. Existing orthogonal layout methods offer no control over the resulting AR and their rigid geometric constraints make adaptation of finished layouts difficult. With the increasing variety of aspect ratios encountered in daily life, from wide monitors to tall mobile devices or fixed-size interface panels, there is a clear need for aspect ratio control in orthogonal layout methods. To tackle this issue, we introduce Aspect Ratio-Constrained Orthogonal Layout (ARCOL). Building upon the Human-like Orthogonal Layout Algorithm (HOLA)~\cite{Kieffer2016}, we integrate aspect ratio at two different stages: (1) into the stress minimization phase, as a soft constraint, allowing the layout algorithm to gently guide node positions toward a specified target AR, while preserving visual clarity and topological faithfulness; and (2) into the tree reattachment phase, where we modify the cost function to favor placements that improve the AR. We evaluate our approach through quantitative evaluation and a user study, as well as expert interviews. Our evaluations show that ARCOL produces balanced and space efficient orthogonal layouts across diverse aspect ratios.

2026-03-31T11:40:23Z Zainab Alsuwaykit Yousef Rajeh Alexandre Kouyoumdjian Steve Kieffer Dominik Engel Sara Di Bartolomeo Martin Nöllenburg Ivan Viola http://arxiv.org/abs/2603.29139v1 SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents 2026-03-31T01:41:28Z

Recent advances in large language models (LLMs) have enabled agentic systems that translate natural language intent into executable scientific visualization (SciVis) tasks. Despite rapid progress, the community lacks a principled and reproducible benchmark for evaluating these emerging SciVis agents in realistic, multi-step analysis settings. We present SciVisAgentBench, a comprehensive and extensible benchmark for evaluating scientific data analysis and visualization agents. Our benchmark is grounded in a structured taxonomy spanning four dimensions: application domain, data type, complexity level, and visualization operation. It currently comprises 108 expert-crafted cases covering diverse SciVis scenarios. To enable reliable assessment, we introduce a multimodal outcome-centric evaluation pipeline that combines LLM-based judging with deterministic evaluators, including image-based metrics, code checkers, rule-based verifiers, and case-specific evaluators. We also conduct a validity study with 12 SciVis experts to examine the agreement between human and LLM judges. Using this framework, we evaluate representative SciVis agents and general-purpose coding agents to establish initial baselines and reveal capability gaps. SciVisAgentBench is designed as a living benchmark to support systematic comparison, diagnose failure modes, and drive progress in agentic SciVis. The benchmark is available at https://scivisagentbench.github.io/.

2026-03-31T01:41:28Z Kuangshi Ai Haichao Miao Kaiyuan Tang Nathaniel Gorski Jianxin Sun Guoxi Liu Helgi I. Ingolfsson David Lenz Hanqi Guo Hongfeng Yu Teja Leburu Michael Molash Bei Wang Tom Peterka Chaoli Wang Shusen Liu http://arxiv.org/abs/2603.29089v1 WorldFlow3D: Flowing Through 3D Distributions for Unbounded World Generation 2026-03-31T00:08:17Z

Unbounded 3D world generation is emerging as a foundational task for scene modeling in computer vision, graphics, and robotics. In this work, we present WorldFlow3D, a novel method capable of generating unbounded 3D worlds. Building upon a foundational property of flow matching - namely, defining a path of transport between two data distributions - we model 3D generation more generally as a problem of flowing through 3D data distributions, not limited to conditional denoising. We find that our latent-free flow approach generates causal and accurate 3D structure, and can use this as an intermediate distribution to guide the generation of more complex structure and high-quality texture - all while converging more rapidly than existing methods. We enable controllability over generated scenes with vectorized scene layout conditions for geometric structure control and visual texture control through scene attributes. We confirm the effectiveness of WorldFlow3D on both real outdoor driving scenes and synthetic indoor scenes, validating cross-domain generalizability and high-quality generation on real data distributions. We confirm favorable scene generation fidelity over approaches in all tested settings for unbounded scene generation. For more, see https://light.princeton.edu/worldflow3d.

2026-03-31T00:08:17Z Amogh Joshi Julian Ost Felix Heide http://arxiv.org/abs/2603.06679v2 MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines 2026-03-30T23:01:43Z

Video world models have shown immense promise for interactive simulation and entertainment, but current systems still struggle with two important aspects of interactivity: user control over the environment for reproducible, editable experiences, and shared inference where players hold influence over a common world. To address these limitations, we introduce an explicit external memory into the system, a persistent state operating independent of the model's context window, that is continually updated by user actions and queried throughout the generation roll-out. Unlike conventional diffusion game engines that operate as next-frame predictors, our approach decomposes generation into Memory, Observation, and Dynamics modules. This design gives users direct, editable control over environment structure via an editable memory representation, and it naturally extends to real-time multiplayer rollouts with coherent viewpoints and consistent cross-player interactions.

2026-03-03T18:58:17Z Project page here: https://ryanpo.com/multigen/ Ryan Po David Junhao Zhang Amir Hertz Gordon Wetzstein Neal Wadhwa Nataniel Ruiz http://arxiv.org/abs/2407.19097v2 NARVis: Neural Accelerated Rendering for Real-Time Scientific Point Cloud Visualization 2026-03-30T17:22:16Z

Exploring scientific datasets with billions of samples in real-time visualization presents a challenge - balancing high-fidelity rendering with speed. This work introduces a neural accelerated renderer, NARVis, that uses the neural deferred rendering framework to visualize large-scale scientific point cloud data. NARVis augments a real-time point cloud rendering pipeline with high-quality neural post-processing, making the approach ideal for interactive visualization at scale. Specifically, we render the multi-attribute point cloud using a high-performance multi-attribute rasterizer and train a neural renderer to capture the desired post-processing effects from a conventional high-quality renderer. NARVis is effective in visualizing complex multidimensional Lagrangian flow fields and photometric scans of a large terrain as compared to the state-of-the-art high-quality renderers. Extensive evaluations demonstrate that NARVis prioritizes speed and scalability while retaining high visual fidelity. We achieve competitive frame rates of $>$126 fps for interactive rendering of $>$350M points (i.e., an effective throughput of $>$44 billion points per second) using ~12 GB of memory on RTX 2080 Ti GPU. Furthermore, NARVis is generalizable across different point clouds with similar visualization needs and the desired post-processing effects could be obtained with substantial high quality even at lower resolutions of the original point cloud, further reducing the memory requirements.

2024-07-26T21:21:13Z Srinidhi Hegde Kaur Kullman Thomas Grubb Leslie Lait Stephen Guimond Matthias Zwicker http://arxiv.org/abs/2603.28365v1 The Rise of AI-Generated Anime Avatars: Trends, Challenges, and Opportunities 2026-03-30T12:34:47Z

The rise of 3D anime-style avatars in gaming, virtual reality, and other digital media has driven significant interest in automated generation methods capable of capturing their distinctive visual characteristics. These include stylized proportions, expressive features, and non-photorealistic rendering. This paper reviews the advancements and challenges in using deep learning in 3D anime-style avatar generation. We analyze the strengths and limitations of these methods in capturing the aesthetics of anime characters and supporting customization and animation. Additionally, we identify and discuss open problems in the field, such as difficulties in resolution and detail preservation, and constraints regarding the animation of hair and loose clothing. This article aims to provide a comprehensive overview of the current state-of-the-art and identify promising research directions for advancing 3D anime-style avatar generation.

2026-03-30T12:34:47Z IEEE Computer Graphics and Applications, vol. 46, no. 02, pp. 112-119, 2026 Fernanda Miyuki Yamada João Paulo Gois Hiroki Takahashi 10.1109/MCG.2025.3627323 http://arxiv.org/abs/2505.08137v2 Large Language Models for Computer-Aided Design: A Survey 2026-03-30T03:52:18Z

Large Language Models (LLMs) have seen rapid advancements in recent years, with models like ChatGPT and DeepSeek, showcasing their remarkable capabilities across diverse domains. While substantial research has been conducted on LLMs in various fields, a comprehensive review focusing on their integration with Computer-Aided Design (CAD) remains notably absent. CAD is the industry standard for 3D modeling and plays a vital role in the design and development of products across different industries. As the complexity of modern designs increases, the potential for LLMs to enhance and streamline CAD workflows presents an exciting frontier. This article presents the first systematic survey exploring the intersection of LLMs and CAD. We begin by outlining the industrial significance of CAD, highlighting the need for AI-driven innovation. Next, we provide a detailed overview of the foundation of LLMs. We also examine both closed-source LLMs as well as publicly available models. The core of this review focuses on the various applications of LLMs in CAD, providing a taxonomy of six key areas where these models are making considerable impact. Finally, we propose several promising future directions for further advancements, which offer vast opportunities for innovation and are poised to shape the future of CAD technology. Github: https://github.com/lichengzhanguom/LLMs-CAD-Survey-Taxonomy

2025-05-13T00:19:04Z Licheng Zhang Bach Le Naveed Akhtar Siew-Kei Lam Tuan Ngo http://arxiv.org/abs/2603.27862v1 ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks 2026-03-29T20:42:05Z

Advances in diffusion, autoregressive, and hybrid models have enabled high-quality image synthesis for tasks such as text-to-image, editing, and reference-guided composition. Yet, existing benchmarks remain limited, either focus on isolated tasks, cover only narrow domains, or provide opaque scores without explaining failure modes. We introduce \textbf{ImagenWorld}, a benchmark of 3.6K condition sets spanning six core tasks (generation and editing, with single or multiple references) and six topical domains (artworks, photorealistic images, information graphics, textual graphics, computer graphics, and screenshots). The benchmark is supported by 20K fine-grained human annotations and an explainable evaluation schema that tags localized object-level and segment-level errors, complementing automated VLM-based metrics. Our large-scale evaluation of 14 models yields several insights: (1) models typically struggle more in editing tasks than in generation tasks, especially in local edits. (2) models excel in artistic and photorealistic settings but struggle with symbolic and text-heavy domains such as screenshots and information graphics. (3) closed-source systems lead overall, while targeted data curation (e.g., Qwen-Image) narrows the gap in text-heavy cases. (4) modern VLM-based metrics achieve Kendall accuracies up to 0.79, approximating human ranking, but fall short of fine-grained, explainable error attribution. ImagenWorld provides both a rigorous benchmark and a diagnostic tool to advance robust image generation.

2026-03-29T20:42:05Z Published in ICLR 2026 Samin Mahdizadeh Sani Max Ku Nima Jamali Matina Mahdizadeh Sani Paria Khoshtab Wei-Chieh Sun Parnian Fazel Zhi Rui Tam Thomas Chong Edisy Kin Wai Chan Donald Wai Tong Tsang Chiao-Wei Hsu Ting Wai Lam Ho Yin Sam Ng Chiafeng Chu Chak-Wing Mak Keming Wu Hiu Tung Wong Yik Chun Ho Chi Ruan Zhuofeng Li I-Sheng Fang Shih-Ying Yeh Ho Kei Cheng Ping Nie Wenhu Chen http://arxiv.org/abs/2603.27801v1 Engineering Mythology: A Digital-Physical Framework for Culturally-Inspired Public Art 2026-03-29T18:31:33Z

Navagunjara Reborn: The Phoenix of Odisha was built for Burning Man 2025 as both a sculpture and an experiment-a fusion of myth, craft, and computation. This paper describes the digital-physical workflow developed for the project: a pipeline that linked digital sculpting, distributed fabrication by artisans in Odisha (India), modular structural optimization in the U.S., iterative feedback through photogrammetry and digital twins, and finally, one-shot full assembly at the art site in Black Rock Desert, Nevada. The desert installation tested not just materials, but also systems of collaboration: between artisans and engineers, between myth and technology, between cultural specificity and global experimentation. We share the lessons learned in design, fabrication, and deployment and offer a framework for future interdisciplinary projects at the intersection of cultural heritage, STEAM education, and public art. In retrospect, this workflow can be read as a convergence of many knowledge systems-artisan practice, structural engineering, mythic narrative, and environmental constraint-rather than as execution of a single fixed blueprint.

2026-03-29T18:31:33Z 19 pages, 28 figures, 4 tables Jnaneshwar Das Christopher Filkins Rajesh Moharana Ekadashi Barik Bishweshwar Das David Ayers Christopher Skiba Rodney Staggers Mark Dill Swig Miller Daniel Tulberg Patrick Smith Seth Brink Kyle Breen Harish Anand Ramon Arrowsmith http://arxiv.org/abs/2502.07754v2 MeshSplats: Mesh-Based Rendering with Gaussian Splatting Initialization 2026-03-29T14:20:44Z

Gaussian Splatting (GS) is a recent and pivotal technique in 3D computer graphics. GS-based algorithms almost always bypass classical methods such as ray tracing, which offer numerous inherent advantages for rendering. For example, ray tracing can handle incoherent rays for advanced lighting effects, including shadows and reflections. To address this limitation, we introduce MeshSplats, a method which converts GS to a mesh-like format. Following the completion of training, MeshSplats transforms Gaussian elements into mesh faces, enabling rendering using ray tracing methods with all their associated benefits. Our model can be utilized immediately following transformation, yielding a mesh of slightly reduced reconstruction quality without additional training. Furthermore, we can enhance the quality by applying a dedicated optimization algorithm that operates on mesh faces rather than Gaussian components. Importantly, MeshSplats acts as a wrapper, converting pre-trained GS models into a ray-traceable format. The efficacy of our method is substantiated by experimental results, underscoring its extensive applications in computer graphics and image processing.

2025-02-11T18:27:39Z Rafał Tobiasz Grzegorz Wilczyński Marcin Mazur Sławomir Tadeja Weronika Smolak-Dyżewska Przemysław Spurek