https://arxiv.org/api/PEE2B/xN/gEL4+aH+c7DImYtQsA2026-06-14T14:48:17Z932345015http://arxiv.org/abs/2511.19202v2NVGS: Neural Visibility for Occlusion Culling in 3D Gaussian Splatting2026-04-20T15:36:30Z3D Gaussian Splatting can exploit frustum culling and level-of-detail strategies to accelerate rendering of scenes containing a large number of primitives. However, the semi-transparent nature of Gaussians prevents the application of another highly effective technique: occlusion culling. We address this limitation by proposing a novel method to learn the viewpoint-dependent visibility function of all Gaussians in a trained model using a small, shared MLP across instances of an asset in a scene. By querying it for Gaussians within the viewing frustum prior to rasterization, our method can discard occluded primitives during rendering. Leveraging Tensor Cores for efficient computation, we integrate these neural queries directly into a novel instanced software rasterizer. Our approach outperforms the current state of the art for composed scenes in terms of VRAM usage and image quality, utilizing a combination of our instanced rasterizer and occlusion culling MLP, and exhibits complementary properties to existing LoD techniques.2025-11-24T15:11:12Z17 pages, 15 figuresBrent ZoomersFlorian HahlbohmJoni VanherckLode JorissenMarcus MagnorNick Michielshttp://arxiv.org/abs/2501.12119v3ENTIRE: Learning-based Volume Rendering Time Prediction2026-04-20T15:34:10ZWe introduce ENTIRE, a novel deep learning-based approach for fast and accurate volume rendering time prediction. Predicting rendering time is inherently challenging due to its dependence on multiple factors, including volume data characteristics, image resolution, camera configuration, and transfer function settings. Our method addresses this by first extracting a feature vector that encodes structural volume properties relevant to rendering performance. This feature vector is then integrated with additional rendering parameters, such as image resolution, camera setup, and transfer function settings, to produce the final prediction. We evaluate ENTIRE across multiple rendering frameworks (CPU- and GPU-based) and configurations (with and without single-scattering) on diverse datasets. The results demonstrate that our model achieves high prediction accuracy with fast inference speed and can be efficiently adapted to new scenarios by fine-tuning the pretrained model with few samples. Furthermore, we showcase ENTIRE's effectiveness in two case studies, where it enables dynamic parameter adaptation for stable frame rates and load balancing.2025-01-21T13:30:16ZZikai YinHamid GadirovJiri KosinkaSteffen Freyhttp://arxiv.org/abs/2604.18364v1Training and Agentic Inference Strategies for LLM-based Manim Animation Generation2026-04-20T14:54:06ZGenerating programmatic animation using libraries such as Manim presents unique challenges for Large Language Models (LLMs), requiring spatial reasoning, temporal sequencing, and familiarity with domain-specific APIs that are underrepresented in general pre-training data. A systematic study of how training and inference strategies interact in this setting is lacking in current research. This study introduces ManimTrainer, a training pipeline that combines Supervised Fine-tuning (SFT) with Reinforcement Learning (RL) based Group Relative Policy Optimisation (GRPO) using a unified reward signal that fuses code and visual assessment signals, and ManimAgent, an inference pipeline featuring Renderer-in-the-loop (RITL) and API documentation-augmented RITL (RITL-DOC) strategies. Using these techniques, this study presents the first unified training and inference study for text-to-code-to-video transformation with Manim. It evaluates 17 open-source sub-30B LLMs across nine combinations of training and inference strategies using ManimBench. Results show that SFT generally improves code quality, while GRPO enhances visual outputs and increases the models' responsiveness to extrinsic signals during self-correction at inference time. The Qwen 3 Coder 30B model with GRPO and RITL-DOC achieved the highest overall performance, with a 94% Render Success Rate (RSR) and 85.7% Visual Similarity (VS) to reference videos, surpassing the baseline GPT-4.1 model by +3 percentage points in VS. Additionally, the analysis shows that the correlation between code and visual metrics strengthens with SFT and GRPO but weakens with inference-time enhancements, highlighting the complementary roles of training and agentic inference strategies in Manim animation generation.2026-04-20T14:54:06ZRavidu Suien Rammuni SilvaAhmad LotfiIsibor Kennedy IhianleGolnaz ShahtahmassebiJordan J. Birdhttp://arxiv.org/abs/2511.05152v2Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges2026-04-20T10:48:10ZDeformable Gaussian Splatting (GS) accomplishes photorealistic dynamic 3-D reconstruction from dense multi-view video (MVV) by learning to deform a canonical GS representation. However, in filmmaking, tight budgets can result in sparse camera configurations, which limits state-of-the-art (SotA) methods when capturing complex dynamic features. To address this issue, we introduce an approach that splits the canonical Gaussians and deformation field into foreground and background components using a sparse set of masks for frames at t=0. Each representation is separately trained on different loss functions during canonical pre-training. Then, during dynamic training, different parameters are modeled for each deformation field following common filmmaking practices. The foreground stage contains diverse dynamic features so changes in color, position and rotation are learned. While, the background containing film-crew and equipment, is typically dimmer and less dynamic so only changes in point position are learned. Experiments on 3-D and 2.5-D entertainment datasets show that our method produces SotA qualitative and quantitative results; up to 3 PSNR higher with half the model size on 3-D scenes. Unlike the SotA and without the need for dense mask supervision, our method also produces segmented dynamic reconstructions including transparent and dynamic textures. Code and video comparisons are available online: https://azzarelli.github.io/splatographypage/index.html2025-11-07T11:07:34ZAccepted to IEEE International Conference on 3DV (2026)Adrian AzzarelliNantheera AnantrasirichaiDavid R Bullhttp://arxiv.org/abs/2604.17959v1Chatting about Upper-Body Expressive Human Pose and Shape Estimation2026-04-20T08:40:54ZExpressive Human Pose and Shape Estimation (EHPS) plays a crucial role in various AR/VR applications and has witnessed significant progress in recent years. However, current state-of-the-art methods still struggle with accurate parameter estimation for facial and hand regions and exhibit limited generalization to wild images. To address these challenges, we present CoEvoer, a novel one-stage synergistic cross-dependency transformer framework tailored for upper-body EHPS. CoEvoer enables explicit feature-level interaction across different body parts, allowing for mutual enhancement through contextual information exchange. Specifically, larger and more easily estimated regions such as the torso provide global semantics and positional priors to guide the estimation of finer, more complex regions like the face and hands. Conversely, the localized details captured in facial and hand regions help refine and calibrate adjacent body parts. To the best of our knowledge, CoEvoer is the first framework designed specifically for upper-body EHPS, with the goal of capturing the strong coupling and semantic dependencies among the face, hands, and torso through joint parameter regression. Extensive experiments demonstrate that CoEvoer achieves state-of-the-art performance on upper-body benchmarks and exhibits strong generalization capability even on unseen wild images.2026-04-20T08:40:54ZYuxiang ZhaoWei HuangYujie SongLiu WangHuan Zhaohttp://arxiv.org/abs/2604.17831v1PCM-NeRF: Probabilistic Camera Modeling for Neural Radiance Fields under Pose Uncertainty2026-04-20T05:34:09ZNeural surface reconstruction methods typically treat camera poses as fixed values, assuming perfect accuracy from Structure-from-Motion (SfM) systems. This assumption breaks down with imperfect pose estimates, leading to distorted or incomplete reconstructions. We present PCM-NeRF, a probabilistic framework that augments neural surface reconstruction with per-camera learnable uncertainty, built on top of SG-NeRF. Rather than treating all cameras equally throughout optimization, we represent each pose as a distribution with a learnable mean and variance, initialized from SfM correspondence quality. An uncertainty regularization loss couples the learned variance to view confidence, and the resulting uncertainty directly modulates the effective pose learning rate: uncertain cameras receive damped gradient updates, preventing poorly initialized views from corrupting the reconstruction. This lightweight mechanism requires no changes to the rendering pipeline and adds negligible overhead. Experiments on challenging scenes with severe pose outliers demonstrate that PCM-NeRF consistently outperforms state-of-the-art methods in both Chamfer Distance and F-Score, particularly for geometrically complex structures, without requiring foreground masks.2026-04-20T05:34:09ZCVPR-W 2026 (GenRec3D)Shravan VenkatramanRakesh Raj MadavanPavan Kumar Sathya Venkateshhttp://arxiv.org/abs/2312.17181v2Geometric Guidance for Globally Synchronized Deployment of Elastic Geodesic Grids2026-04-20T04:06:07ZElastic geodesic grids deploy from flat to spatial configurations via complex nonlinear motion that is difficult to represent robustly for simulation. We present a geometric guidance framework that discretizes deployment as synchronized, time-coupled deformation trajectories. Starting from inverse tracing -- collapsing the deployed structure with a lightweight rod model while recording node paths under a shared parameter -- we obtain feasible node paths and formulate a polyline approximation problem that selects {globally synchronized} time steps and minimizes a robust tail-aggregated deviation measure under monotonicity constraints. {We solve the resulting non-smooth optimization problem via global optimization to obtain compact, synchronized displacement sequences for all paths simultaneously}. We evaluate the method using geometry-centric metrics (deviation versus step count, scaling with trajectory count) and demonstrate its utility by driving finite element deployment simulations that avoid intermediate buckling and capture deployment-induced prestress.2023-12-28T18:14:17ZComputer Aided Geometric Design / International Conference on Geometric Modeling and Processing (GMP 2026), journal preprint, 14 pages including appendices, 13 figuresStefan PillweinAlexander HentschelMarkus LukacevicPrzemyslaw Musialski10.1016/j.cagd.2026.102565http://arxiv.org/abs/2412.19446v2Stimpack: An Adaptive Rendering Optimization System for Scalable Cloud Gaming2026-04-19T20:29:36ZIn distributed multimedia applications, content is often delivered to users in a degraded form due to network-induced lossy compression. Real-time and interactive use cases like cloud gaming, which render content on the fly, require low latency and are hosted at resource-constrained edge servers. We present a new insight: when rendered content is delivered over a network with lossy compression, high-quality rendering can be ineffective in improving user-perceived quality, leading to a poor return on computing resources. Leveraging this observation, we built Stimpack, a novel system that adaptively optimizes game rendering quality by balancing server-side rendering costs against user-perceived quality. The system uses a mechanism that quantifies the efficiency of resource usage to maximize overall system utility in multi-user scenarios. Our open-sourced implementation and extensive evaluations show that Stimpack achieves up to 24% higher service quality and serves twice as many users with the same resources compared to baselines. A user study further validates that Stimpack provides a measurably better user experience.2024-12-27T04:25:32Z12 pages, 18 figures, 4 tablesJin HeoVic WangKetan BhardwajAda Gavrilovskahttp://arxiv.org/abs/2604.17331v1Evaluation of Gauss-Legendre curves2026-04-19T08:53:25ZWe present new representations of Gauss--Legendre polynomials and their derivatives in the shifted power basis and in bases related to symmetric orthogonal Jacobi polynomials. Using these representations and certain recurrence relations, we propose efficient $O(n^2+dn)$ methods for evaluating a Gauss--Legendre curve of degree $n$ in $\mathbb E^d$. We also propose algorithms for multipoint evaluation with computational complexity $O(Mdn+dn^2)$, where $M$ is the number of evaluation points.2026-04-19T08:53:25ZFilip ChudyPaweł Woźnyhttp://arxiv.org/abs/2508.08775v2SonicRadiation: A Hybrid Numerical Solution for Sound Radiation without Ghost Cells2026-04-19T04:17:52ZInteractive synthesis of physical sound effects is crucial in digital media production. Sound radiation simulation, a key component of physically based sound synthesis, has posed challenges in the context of complex object boundaries. Previous methods, such as ghost cell-based finite-difference time-domain (FDTD) wave solver, have struggled to address these challenges, leading to large errors and failures in complex boundaries because of the limitation of ghost cells. We present SonicRadiation, a hybrid numerical solution capable of handling complex and dynamic object boundaries in sound radiation simulation without relying on ghost cells. We derive a consistent formulation to connect the physical quantities on grid cells in FDTD with the boundary elements in the time-domain boundary element method (TDBEM). Hereby, we propose a boundary grid synchronization strategy to seamlessly integrate TDBEM with FDTD while maintaining high numerical accuracy. Our method holds both advantages from the accuracy of TDBEM for the near-field and the efficiency of FDTD for the far-field. Experimental results demonstrate the superiority of our method in sound radiation simulation over previous approaches in terms of accuracy and efficiency, particularly in complex scenes, further validating its effectiveness.2025-08-12T09:21:02Z11 pagesXutong JinFei ZhuGuoping WangSheng Lihttp://arxiv.org/abs/2604.17155v1Instant Colorization of Gaussian Splats2026-04-18T21:56:52ZGaussian Splatting has recently become one of the most popular frameworks for photorealistic 3D scene reconstruction and rendering. While current rasterizers allow for efficient mappings of 3D Gaussian splats onto 2D camera views, this work focuses on mapping 2D image information (e.g. color, neural features or segmentation masks) efficiently back onto an existing scene of Gaussian splats. This 'opposite' direction enables applications ranging from scene relighting and stylization to 3D semantic segmentation, but also introduces challenges, such as view-dependent colorization and occlusion handling.
Our approach tackles these challenges using the normal equation to solve a visibility-weighted least squares problem for every Gaussian and can be implemented efficiently with existing differentiable rasterizers. We demonstrate the effectiveness of our approach on scene relighting, feature enrichment and 3D semantic segmentation tasks, achieving up to an order of magnitude speedup compared to gradient descent-based baselines.2026-04-18T21:56:52ZDaniel LieberAlexander MockNils Wandelhttp://arxiv.org/abs/2604.16976v1UGD: An Unsupervised Geometric Distance for Evaluating Real-world Noisy Point Cloud Denoising2026-04-18T12:09:29ZPoint cloud denoising is a fundamental and crucial challenge in real-world point cloud applications. Existing quantitative evaluation metrics for point cloud denoising methods are implemented in a supervised manner, which requires both the denoised point cloud and the corresponding ground-truth clean point cloud to compute a representative geometric distance. This requirement is highly problematic in real-world scenarios, where ground-truth clean point clouds are often unavailable. In this paper, we propose a simple yet effective unsupervised geometric distance (UGD) for real-world noisy point cloud denoising, calculated solely from noisy point clouds. The core idea of UGD is to learn a patch-wise prior model from a set of clean point clouds and then employ this prior model as the ground-truth to quantify the degradation by measuring the geometric variations of the denoised point cloud. To this end, we first learn a pristine Gaussian Mixture Model (GMM) with extracted patch-wise quality-aware features from a set of pristine clean point clouds by a patch-wise feature extraction network, which serves as the ground-truth for the quantitative evaluation. Then, the UGD is defined as the weighted sum of distances between each patch of the denoised point cloud and the learned pristine GMM model in the patch space. To train the employed patch-wise feature extraction network, we propose a self-supervised training framework through multi-task learning, which includes pair-wise quality ranking, distortion classification, and distortion distribution prediction. Quantitative experiments with synthetic noise confirm that the proposed UGD achieves comparable performance to supervised full-reference metrics. Moreover, experimental results on real-world data demonstrate that the proposed UGD enables unsupervised evaluation of point cloud denoising methods based exclusively on noisy point clouds.2026-04-18T12:09:29Zto be published in IEEE Transactions on Visualization and Computer GraphicsZhiyong SuJincan WuYonghui LiuZheng LiWeiqing Li10.1109/TVCG.2026.3685664http://arxiv.org/abs/2604.14927v2STEP-Parts: Geometric Partitioning of Boundary Representations for Large-Scale CAD Processing2026-04-17T22:10:44ZMany CAD learning pipelines discretize Boundary Representations (B-Reps) into triangle meshes, discarding analytic surface structure and topological adjacency and thereby weakening consistent instance-level analysis. We present STEP-Parts, a deterministic CAD-to-supervision toolchain that extracts geometric instance partitions directly from raw STEP B-Reps and transfers them to tessellated carriers through retained source-face correspondence, yielding instance labels and metadata for downstream learning and evaluation. The construction merges adjacent B-Rep faces only when they share the same analytic primitive type and satisfy a near-tangent continuity criterion. On ABC, same-primitive dihedral angles are strongly bimodal, yielding a threshold-insensitive low-angle regime for part extraction. Because the partition is defined on intrinsic B-Rep topology rather than on a particular triangulation, the resulting boundaries remain stable under changes in tessellation. Applied to the DeepCAD subset of ABC, the pipeline processes approximately 180{,}000 models in under six hours on a consumer CPU. We release code and precomputed labels, and show that STEP-Parts serves both as a tessellation-robust geometric reference and as a useful supervision source in two downstream probes: an implicit reconstruction--segmentation network and a dataset-level point-based backbone.2026-04-16T12:12:27ZShen FanMikołaj KidaPrzemyslaw Musialskihttp://arxiv.org/abs/2601.22858v2Learning to Build Shapes by Extrusion2026-04-17T18:45:48ZWe introduce Text Encoded Extrusions (TEE), a text-based representation that expresses mesh construction as sequences of face extrusions rather than polygon lists, and a method for generating 3D meshes from TEE using a large language model (LLM). By learning extrusion sequences that assemble a mesh, similar to the way artists create meshes, our approach naturally supports arbitrary output face counts and produces manifold meshes by design, in contrast to recent mesh generative transformer based models. The learnt extrusion sequences can also be applied to existing meshes - enabling editing in addition to generation. To train our model, we decompose a library of quadrilateral meshes with non-self-intersecting face loops into constituent loops, which can be viewed as their building blocks, and finetune an LLM on the steps for reassembling the quadrilateral meshes by performing a sequence of extrusions. We demonstrate that our representation enables reconstruction, novel shape synthesis, and the addition of new features to existing meshes.2026-01-30T11:32:34ZA preprintThor Vestergaard ChristiansenKarran PandeyAlba ReindersKaran SinghMorten Rieger HannemoseJ. Andreas Bærentzenhttp://arxiv.org/abs/2604.16629v1Amortized Inverse Kinematics via Graph Attention for Real-Time Human Avatar Animation2026-04-17T18:30:20ZInverse kinematics (IK) is a core operation in animation, robotics, and biomechanics: given Cartesian constraints, recover joint rotations under a known kinematic tree. In many real-time human avatar pipelines, the available signal per frame is a sparse set of tracked 3D joint positions, whereas animation systems require joint orientations to drive skinning. Recovering full orientations from positions is underconstrained, most notably because twist about bone axes is ambiguous, and classical IK solvers typically rely on iterative optimization that can be slow and sensitive to noisy inputs. We introduce IK-GAT, a lightweight graph-attention network that reconstructs full-body joint orientations from 3D joint positions in a single forward pass. The model performs message passing over the skeletal parent-child graph to exploit kinematic structure during rotation inference. To simplify learning, IK-GAT predicts rotations in a bone-aligned world-frame representation anchored to rest-pose bone frames. This parameterization makes the twist axis explicit and is exactly invertible to standard parent-relative local rotations given the kinematic tree and rest pose. The network uses a continuous 6D rotation representation and is trained with a geodesic loss on SO(3) together with an optional forward-kinematics consistency regularizer. IK-GAT produces animation-ready local rotations that can directly drive a rigged avatar or be converted to pose parameters of SMPL-like body models for real-time and online applications. With 374K parameters and over 650 FPS on CPU, IK-GAT outperforms VPoser-based per-frame iterative optimization without warm-start at significantly lower cost, and is robust to initial pose and input noise2026-04-17T18:30:20ZMuhammad Saif Ullah KhanChen-Yu WangTim ProkoschMichael LorenzBertram TaetzDidier Stricker