https://arxiv.org/api/rCuKMTZdd7CQ0mTht0ICIh51r8A 2026-06-14T10:25:06Z 9323 390 15 http://arxiv.org/abs/2503.17897v2 Real-time Global Illumination for Dynamic 3D Gaussian Scenes 2026-04-29T10:39:38Z

We present a real-time global illumination approach along with a pipeline for dynamic 3D Gaussian models and meshes. Building on a formulated surface light transport model for 3D Gaussians, we address key performance challenges with a fast compound stochastic ray-tracing algorithm and an optimized 3D Gaussian rasterizer. Our pipeline integrates multiple real-time techniques to accelerate performance and achieve high-quality lighting effects. Our approach enables real-time rendering of dynamic scenes with interactively editable materials and dynamic lighting of diverse multi-lights settings, capturing mutual multi-bounce light transport (indirect illumination) between 3D Gaussians and mesh. Additionally, we present a real-time renderer with an interactive user interface, validating our approach and demonstrating its practicality and high efficiency with over 40 fps in scenes including both 3D Gaussians and mesh. Furthermore, our work highlights the potential of 3D Gaussians in real-time applications with dynamic lighting, offering insights into performance and optimization.

2025-03-23T01:51:36Z accepted by IEEE Transactions on Visualization and Computer Graphics Chenxiao Hu Meng Gai Guoping Wang Sheng Li http://arxiv.org/abs/2604.26518v1 GMT: A Geometric Multigrid Transformer Solver for Microstructure Homogenization 2026-04-29T10:35:31Z

Lattice metamaterials enable lightweight, multifunctional structures, yet homogenization-based evaluation of their effective properties remains computationally expensive. Neural surrogates offer speed but often lack the accuracy and stability required for engineering-grade simulations. We introduce GMT, a Geometric Multigrid Transformer -- a neural solver with high numerical fidelity for fast and reliable lattice homogenization. GMT achieves architectural alignment with Geometric Multigrid (GMG) by restructuring Point Transformer V3 to operate across sparse GMG hierarchies, capturing long-range dependencies and cross-level interactions essential for multigrid convergence. To enforce physical consistency, GMT incorporates physics-aware positional encoding for strict enforcement of periodicity and predicts both the finest-level solution and multi-level residual corrections. These predictions deliver a spectrally-aligned initialization, enabling end-to-end training under physics-informed and solver-aware losses and requiring only a single GMG V-cycle refinement to reach convergence. This fusion of neural prediction and numerical rigor achieves relative residual errors of $10^{-5}$ with a $160\times$ speedup over state-of-the-art GPU-based solvers at equivalent accuracy -- particularly at high resolutions (e.g. $512^3$), where traditional methods become most costly. We validate GMT across mechanical and thermal domains, demonstrate robust generalization to unseen geometries and non-periodic settings, and showcase scalability to high resolutions -- enabling real-time design iteration, multi-scale simulations, high-throughput material discovery, and inverse design.

2026-04-29T10:35:31Z SIGGRAPH 2026 journal track Yu Xing Yang Liu Tianyang Xue Lin Lu http://arxiv.org/abs/2508.07852v2 Vertex Features for Neural Global Illumination 2026-04-29T09:29:23Z

Recent research on learnable neural representations has been widely adopted in the field of 3D scene reconstruction and neural rendering applications. However, traditional feature grid representations often suffer from substantial memory footprint, posing a significant bottleneck for modern parallel computing hardware. In this paper, we present neural vertex features, a generalized formulation of learnable representation for neural rendering tasks involving explicit mesh surfaces. Instead of uniformly distributing neural features throughout 3D space, our method stores learnable features directly at mesh vertices, leveraging the underlying geometry as a compact and structured representation for neural processing. This not only optimizes memory efficiency, but also improves feature representation by aligning compactly with the surface using task-specific geometric priors. We validate our neural representation across diverse neural rendering tasks, with a specific emphasis on neural radiosity. Experimental results demonstrate that our method reduces memory consumption to only one-fifth (or even less) of grid-based representations, while maintaining comparable rendering quality and lowering inference overhead.

2025-08-11T11:10:19Z Accepted by ACM SIGGRAPH Asia'2025 Rui Su Honghao Dong Haojie Jin Yisong Chen Guoping Wang Sheng Li http://arxiv.org/abs/2405.13729v3 ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models 2026-04-29T09:10:05Z

In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, additional attributes are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes can be insufficiently covered by existing training schemes of diffusion generative models, potentially limiting test time performance. We present a simple fix to this problem by constructing stochastic processes that fully exploit the combinatorial structures, hence the name ComboStoc. Using this simple strategy, we show that network training is significantly accelerated across diverse data modalities, including images and 3D structured shapes. Moreover, ComboStoc enables a new way of test time generation which uses asynchronous time steps for different dimensions and attributes, thus allowing for varying degrees of control over them. Our code is available at: https://github.com/Xrvitd/ComboStoc

2024-05-22T15:23:10Z ACM Transactions on Graphics, SIGGRAPH 2026 Rui Xu Jiepeng Wang Hao Pan Yang Liu Xin Tong Shiqing Xin Changhe Tu Taku Komura Wenping Wang http://arxiv.org/abs/2604.26097v1 Momentum-Conserving Graph Neural Networks for Deformable Objects 2026-04-28T20:19:33Z

Graph neural networks (GNNs) have emerged as a versatile and efficient option for modeling the dynamic behavior of deformable materials. While GNNs generalize readily to arbitrary shapes, mesh topologies, and material parameters, existing architectures struggle to correctly predict the temporal evolution of key physical quantities such as linear and angular momentum. In this work, we propose MomentumGNN -- a novel architecture designed to accurately track momentum by construction. Unlike existing GNNs that output unconstrained nodal accelerations, our model predicts per-edge stretching and bending impulses which guarantee the preservation of linear and angular momentum. We train our network in an unsupervised fashion using a physics-based loss, and we show that our method outperforms baselines in a number of common scenarios where momentum plays a pivotal role.

2026-04-28T20:19:33Z Accepted to 3DV 2026 Jiahong Wang Logan Numerow Stelian Coros Christian Theobalt Vahid Babaei Bernhard Thomaszewski http://arxiv.org/abs/2412.10441v2 Novel 3D Binary Indexed Tree for Volume Computation of 3D Reconstructed Models from Volumetric Data 2026-04-28T15:44:22Z

In the burgeoning field of medical imaging, precise computation of 3D volume holds a significant importance for subsequent qualitative analysis of 3D reconstructed objects. Combining multivariate calculus, marching cube algorithm, and binary indexed tree data structure, we developed an algorithm for efficient computation of intrinsic volume of any volumetric data recovered from computed tomography (CT) or magnetic resonance (MR). We proposed the 30 configurations of volume values based on the polygonal mesh generation method. Our algorithm processes the data in scan-line order simultaneously with reconstruction algorithm to create a Fenwick tree, ensuring query time much faster and assisting users' edition of slicing or transforming model. We tested the algorithm's accuracy on simple 3D objects (e.g., sphere, cylinder) to complicated structures (e.g., lungs, cardiac chambers). The result deviated within $\pm 0.004 \text{cm}^3$ and there is still room for further improvement.

2024-12-11T11:29:53Z This paper has been withdrawn by the author. After further review, the author believes that the current version does not meet the desired standards and plans to revise the work before any potential resubmission Quoc-Bao Nguyen-Le Tuan-Hy Le Anh-Triet Do http://arxiv.org/abs/2504.11349v3 Representation Paradigms in AI-based 3D Radiological Image Reconstruction: A Systematic Review 2026-04-28T14:48:00Z

The demand for high-quality medical imaging in clinical practice and assisted diagnosis has made 3D image reconstruction in radiological imaging a key research focus. Artificial intelligence (AI) has emerged as a promising approach for improving reconstruction accuracy while reducing acquisition and processing time, thereby minimizing patient radiation exposure and discomfort and ultimately benefiting clinical diagnosis. This review surveys state-of-the-art AI-based 3D reconstruction algorithms in radiological imaging and organizes them into four representation families according to how the reconstructed target is parameterized: discrete grid representations, explicit basis expansion representations, explicit primitive representations, and implicit neural representations. In particular, the review clarifies the relationships among these representation forms and highlights radiance field methods as a specialized subtype of implicit neural representation. In addition, we summarize commonly used evaluation metrics and benchmark datasets for radiological image reconstruction. Finally, we discuss the current state of development, major challenges, and future research directions in this rapidly evolving field. Our project is available at: https://github.com/Bean-Young/AI4Radiology.

2025-04-15T16:21:47Z 58 pages, Under Reivew Yuezhe Yang Lei Bi Boyu Yang Yaqian Wang Yang He Yige Peng Zhe Jin Xingbo Dong Jinman Kim http://arxiv.org/abs/2605.16308v1 Conformal Geometric Algebra as a Symbolic Interface for LLM-Driven 3D Scene Editing 2026-04-28T09:30:45Z

What symbolic format should an LLM emit for reliable 3D scene editing from natural language, and does algebraic structure help beyond compact syntax? We evaluate Conformal Geometric Algebra (CGA) as a compact symbolic interface against a verbose Euclidean 4$\times$4 matrix baseline and a non-CGA Compact SE3 control in a natural-language 3D editing pipeline with controlled prompting and deterministic geometric execution. Our primary result is compositional fidelity under sequential instruction chains. In a sequence-stress protocol (20 templates, 6 trials each; $\texttt{n=120}$ outputs per method), Simple CGA and Compact SE3 both achieve 100% parse validity, but Simple CGA preserves exact ordered operation chains more reliably (97.5% vs 90.0%, two-proportion $\texttt{p=0.016}$) with lower completion-token cost (112.6 vs 133.6 tokens). This pattern is consistent with algebraic expression form supporting compositional faithfulness beyond compactness alone. A second result is confirmatory in the powered hard semantic suite ($\texttt{n=100}$ per method): compact representations (Simple CGA 45.0%, Compact SE3 42.0%, Shenlong 44.0%) all exceed the Euclidean 4$\times$4 baseline (24.0%). Simple CGA vs Euclidean is +21 pp ($\texttt{p=0.0028}$) and Compact SE3 vs Euclidean is +18 pp ($\texttt{p=0.0103}$), while Simple CGA vs Compact SE3 is statistically close ($\texttt{p=0.7755}$). Separating parse validity from geometric correctness reveals substantial optimization headroom invisible to syntax-only metrics. Overall, compact symbolic interfaces appear to drive reliability-cost gains, with CGA motor composition providing an additional advantage on ordered instruction chains. These findings inform real-time natural-language editing in immersive and interactive 3D environments.

2026-04-28T09:30:45Z 34 pages, 15 figures, 28 tables Manos Kamarianakis Pandelis Sofianos George Papagiannakis http://arxiv.org/abs/2604.25318v1 Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation 2026-04-28T07:28:14Z

Cutscenes are carefully choreographed cinematic sequences embedded in video games and interactive media, serving as the primary vehicle for narrative delivery, character development, and emotional engagement. Producing cutscenes is inherently complex: it demands seamless coordination across screenwriting, cinematography, character animation, voice acting, and technical direction, often requiring days to weeks of collaborative effort from multidisciplinary teams to produce minutes of polished content. In this work, we present Cutscene Agent, an LLM agent framework for automated end-to-end cutscene generation. The framework makes three contributions: (1)~a Cutscene Toolkit built on the Model Context Protocol (MCP) that establishes \emph{bidirectional} integration between LLM agents and the game engine -- agents not only invoke engine operations but continuously observe real-time scene state, enabling closed-loop generation of editable engine-native cinematic assets; (2)~a multi-agent system where a director agent orchestrates specialist subagents for animation, cinematography, and sound design, augmented by a visual reasoning feedback loop for perception-driven refinement; and (3)~CutsceneBench, a hierarchical evaluation benchmark for cutscene generation. Unlike typical tool-use benchmarks that evaluate short, isolated function calls, cutscene generation requires long-horizon, multi-step orchestration of dozens of interdependent tool invocations with strict ordering constraints -- a capability dimension that existing benchmarks do not cover. We evaluate a range of LLMs on CutsceneBench and analyze their performance across this challenging task.

2026-04-28T07:28:14Z 27 pages excluding appendix Lanshan He Haozhou Pang Qi Gan Xin Shen Ziwei Zhang Yibo Liu Gang Fang Bo Liu Kai Sheng Shengfeng Zeng Chaofan Li Zhen Hui Keer Zhou Lan Zhou Shujun Dai http://arxiv.org/abs/2604.25129v1 8DNA: 8D Neural Asset Light Transport by Distribution Learning 2026-04-28T02:07:42Z

High-fidelity 3D assets exhibit intriguing global illumination effects like subsurface scattering, glossy interreflections, and fine-scale fiber scatterings, which often involve long scattering paths that are expensive to simulate. We introduce 8D neural assets (8DNA) to pre-bake these light transport effects into neural representations. Unlike prior methods that assume far-field lighting and precompute light transport into 6D functions, 8DNA learns the full 8D light transport, enabling accurate rendering under near-field illumination. Our training leverages a distribution-learning formulation that learns light transport from forward path-traced samples, which produces less optimization variance with lower training budget than the prior regression-based approaches. Experiments show our 8DNA rendering closely matches path-traced results under various scene configurations, yet it achieves improved variance reduction and fast inference speeds on challenging assets.

2026-04-28T02:07:42Z Liwen Wu Haolin Lu Bing Xu Miloš Hašan Ravi Ramamoorthi http://arxiv.org/abs/2605.08115v1 Alice v1: Distillation-Enhanced Video Generation Surpassing Closed-Source Models 2026-04-27T23:37:33Z

Wepresent Alice v1, a 14-billion parameter open-source video generation model that achieves state-of-the-art quality through consistency distillation with score regularization (rCM). Contrary to conventional distillation-which trades quality for speed-we demonstrate that rCM-based distillation can exceed teacher model quality. We attribute this to three mechanisms: (1) the score regularization term acts as a mode-seeking objective that concentrates probability mass on high-quality outputs rather than covering the full teacher distribution, (2) our targeted synthetic data pipeline with hard example mining provides training signal specifically for failure modes (physics, hands, faces) that the teacher handles inconsistently, and (3) consistency enforcement acts as implicit regularization, eliminating "lucky path" dependence on specific noise samples. Alice v1 generates 5-second 720p videos at 24fps in 4 denoising steps (~8 seconds on H100), a 7x speedup over the 50-step teacher while improving VBench score from 84.0 (Wan2.2) to 91.2. This surpasses both the teacher and closed-source systems including Veo3 (~90) and Sora2 (~88) on automated benchmarks, with competitive results in human preference studies. We release all model weights, training code, synthetic data pipelines, and evaluation scripts to advance open research in video generation.

2026-04-27T23:37:33Z Wang Xiaoyu Phong Nguyen Chen Zhao http://arxiv.org/abs/2604.24994v1 Power Foam: Unifying Real-Time Differentiable Ray Tracing and Rasterization 2026-04-27T20:58:36Z

We introduce a differentiable 3D representation that unifies the ray tracing capabilities of foam-based ray tracing with the efficiency of modern rasterization pipelines. While prior foam representations enable constant-time ray traversal through an explicit volumetric partition of space, their potentially unbounded cells hinder efficient tile-based rasterization. We address this limitation by generalizing Voronoi foams to bounded power diagrams with controllable cell extents, enabling spatially bounded primitives without requiring expensive Delaunay triangulations during training. We further introduce an oriented surface formulation that explicitly models interfaces between interior and exterior regions, and decouple geometry from appearance by embedding differentiable texture directly on these surfaces. Together, these contributions yield a representation that preserves state-of-the-art ray tracing efficiency while achieving rasterization performance competitive with current generation 3DGS, providing a practical path toward unified real-time differentiable rendering.

2026-04-27T20:58:36Z Shrisudhan Govindarajan Daniel Rebain Dor Verbin Kwang Moo Yi Anish Prabhu Andrea Tagliasacchi http://arxiv.org/abs/2604.24833v1 MotionBricks: Scalable Real-Time Motions with Modular Latent Generative Model and Smart Primitives 2026-04-27T17:58:23Z

Despite transformative advances in generative motion synthesis, real-time interactive motion control remains dominated by traditional techniques. In this work, we identify two key challenges in bridging research and production: 1) Real-time scalability: Industry applications demand real-time generation of a vast repertoire of motion skills, while generative methods exhibit significant degradation in quality and scalability under real-time computation constraints, and 2) Integration: Industry applications demand fine-grained multi-modal control involving velocity commands, style selection, and precise keyframes, a need largely unmet by existing text- or tag-driven models. To overcome these limitations, we introduce MotionBricks: a large-scale, real-time generative framework with a two-fold solution. First, we propose a large-scale modular latent generative backbone tailored for robust real-time motion generation, effectively modeling a dataset of over 350,000 motion clips with a single model. Second, we introduce smart primitives that provide a unified, robust, and intuitive interface for authoring both navigation and object interaction. Applications can be designed in a plug-and-play manner like assembling bricks without expert animation knowledge. Quantitatively, we show that MotionBricks produces state-of-the-art motion quality on open-source and proprietary datasets of various scales, while also achieving a real-time throughput of 15,000 FPS with 2ms latency. We demonstrate the flexibility and robustness of MotionBricks in a complete production-level animation demo, covering navigation and object-scene interaction across various styles with a unified model. To showcase our framework's application beyond animation, we deploy MotionBricks on the Unitree G1 humanoid robot to demonstrate its flexibility and generalization for real-time robotic control.

2026-04-27T17:58:23Z ACM Transactions on Graphics; SIGGRAPH 2026. Project page: https://nvlabs.github.io/motionbricks/ Tingwu Wang Olivier Dionne Michael De Ruyter David Minor Davis Rempe Kaifeng Zhao Mathis Petrovich Ye Yuan Chenran Li Zhengyi Luo Brian Robison Xavier Blackwell Bernardo Antoniazzi Xue Bin Peng Yuke Zhu Simon Yuen http://arxiv.org/abs/2604.24666v1 Voxel Deformation-Aware Neural Intersection Function 2026-04-27T16:26:43Z

We extend the Locally-Subdivided Neural Intersection Function (LSNIF) to support parameterized deformable and animated geometry. Our approach introduces a rest-space and deformed-space formulation inspired by meshless rendering, allowing ray samples to be mapped back to a canonical space where a single neural network represents geometry consistently across poses without retraining. To maintain accuracy under deformation-aware training, we incorporate scale-invariant distance regression, uncertainty-weighted multi-task learning, and a hybrid positional-grid encoding. The resulting method preserves the compactness and efficiency of LSNIF while enabling robust neural intersection prediction for dynamic geometry.

2026-04-27T16:26:43Z Chih-Chen Kao Grzegorz Makowski Shin Fujieda Takahiro Harada http://arxiv.org/abs/2605.16306v1 UVTran: Accurate Hole-Filling Parameterization with Transformers 2026-04-27T14:12:03Z

In industrial design, N-sided hole filling is typically formulated as the construction of a single trimmed B-spline surface by minimizing a fairness energy subject to geometric boundary constraints. This formulation requires an accurate parameter-space representation of the trimming curve on the filling surface. Most existing methods project the hole boundary onto a nearby plane or polygon to establish correspondence; however, they often neglect boundary heterogeneity, which can yield biased mappings, degrade fairness, and even cause filling failures. We propose UVTran, a transformer-based framework that predicts an auxiliary projection surface better to capture the geometric characteristics of the hole boundary. Exploiting B-spline locality, we design a cross-attention mechanism that biases each surface control point toward the nearby hole boundary, preserving local geometric detail. We voxelize control-point coordinates and formulate the fitting problem as a classification task, which reduces the model's sensitivity to small numerical perturbations and noise. We adopt a progressive-resolution training strategy that injects controlled discretization errors at coarse resolutions to mimic distribution shifts, thereby mitigating overfitting and improving generalization at high resolution. On our benchmark, UVTran outperforms both industrial and academic baselines: the tolerance-satisfaction rate improves by $12\%$, and it consistently produces fair filled surfaces even under complex hole boundary conditions. These results suggest that UVTran yields more faithful correspondences and fairer trimmed surfaces across a wide range of N-sided holes.

2026-04-27T14:12:03Z JunFeng Zhang