https://arxiv.org/api/lK1e47Z11KZnk2vE4a9SfdJDNFs 2026-06-26T11:09:41Z 9390 1530 15 http://arxiv.org/abs/2509.24677v1 NeuralPVS: Learned Estimation of Potentially Visible Sets 2025-09-29T12:15:40Z

Real-time visibility determination in expansive or dynamically changing environments has long posed a significant challenge in computer graphics. Existing techniques are computationally expensive and often applied as a precomputation step on a static scene. We present NeuralPVS, the first deep-learning approach for visibility computation that efficiently determines from-region visibility in a large scene, running at approximately 100 Hz processing with less than $1\%$ missing geometry. This approach is possible by using a neural network operating on a voxelized representation of the scene. The network's performance is achieved by combining sparse convolution with a 3D volume-preserving interleaving for data compression. Moreover, we introduce a novel repulsive visibility loss that can effectively guide the network to converge to the correct data distribution. This loss provides enhanced robustness and generalization to unseen scenes. Our results demonstrate that NeuralPVS outperforms existing methods in terms of both accuracy and efficiency, making it a promising solution for real-time visibility computation.

2025-09-29T12:15:40Z SIGGRAPH Asia 2025 Xiangyu Wang Thomas Köhler Jun Lin Qiu Shohei Mori Markus Steinberger Dieter Schmalstieg http://arxiv.org/abs/2502.18309v3 GCDance: Genre-Controlled Music-Driven 3D Full Body Dance Generation 2025-09-29T11:12:31Z

Music-driven dance generation is a challenging task as it requires strict adherence to genre-specific choreography while ensuring physically realistic and precisely synchronized dance sequences with the music's beats and rhythm. Although significant progress has been made in music-conditioned dance generation, most existing methods struggle to convey specific stylistic attributes in generated dance. To bridge this gap, we propose a diffusion-based framework for genre-specific 3D full-body dance generation, conditioned on both music and descriptive text. To effectively incorporate genre information, we develop a text-based control mechanism that maps input prompts, either explicit genre labels or free-form descriptive text, into genre-specific control signals, enabling precise and controllable text-guided generation of genre-consistent dance motions. Furthermore, to enhance the alignment between music and textual conditions, we leverage the features of a music foundation model, facilitating coherent and semantically aligned dance synthesis. Last, to balance the objectives of extracting text-genre information and maintaining high-quality generation results, we propose a novel multi-task optimization strategy. This effectively balances competing factors such as physical realism, spatial accuracy, and text classification, significantly improving the overall quality of the generated sequences. Extensive experimental results obtained on the FineDance and AIST++ datasets demonstrate the superiority of GCDance over the existing state-of-the-art approaches.

2025-02-25T15:53:18Z IEEE Transactions on Multimedia, 2026 Xinran Liu Xu Dong Shenbin Qian Diptesh Kanojia Wenwu Wang Zhenhua Feng http://arxiv.org/abs/2509.21541v2 ControlHair: Physically-based Video Diffusion for Controllable Dynamic Hair Rendering 2025-09-29T10:41:47Z

Hair simulation and rendering are challenging due to complex strand dynamics, diverse material properties, and intricate light-hair interactions. Recent video diffusion models can generate high-quality videos, but they lack fine-grained control over hair dynamics. We present ControlHair, a hybrid framework that integrates a physics simulator with conditional video diffusion to enable controllable dynamic hair rendering. ControlHair adopts a three-stage pipeline: it first encodes physics parameters (e.g., hair stiffness, wind) into per-frame geometry using a simulator, then extracts per-frame control signals, and finally feeds control signals into a video diffusion model to generate videos with desired hair dynamics. This cascaded design decouples physics reasoning from video generation, supports diverse physics, and makes training the video diffusion model easy. Trained on a curated 10K video dataset, ControlHair outperforms text- and pose-conditioned baselines, delivering precisely controlled hair dynamics. We further demonstrate three use cases of ControlHair: dynamic hairstyle try-on, bullet-time effects, and cinemagraphic. ControlHair introduces the first physics-informed video diffusion framework for controllable dynamics. We provide a teaser video and experimental results on our website.

2025-09-25T20:29:05Z 9 pages,Project website: https://ctrlhair-arxiv.netlify.app/ Weikai Lin Haoxiang Li Yuhao Zhu http://arxiv.org/abs/2405.12895v3 Implicit-ARAP: Efficient Handle-Guided Neural Field Deformation via Local Patch Meshing 2025-09-29T10:40:46Z

Neural fields have emerged as a powerful representation for 3D geometry, enabling compact and continuous modeling of complex shapes. Despite their expressive power, manipulating neural fields in a controlled and accurate manner -- particularly under spatial constraints -- remains an open challenge, as existing approaches struggle to balance surface quality, robustness, and efficiency. We address this by introducing a novel method for handle-guided neural field deformation, which leverages discrete local surface representations to optimize the As-Rigid-As-Possible deformation energy. To this end, we propose the local patch mesh representation, which discretizes level sets of a neural signed distance field by projecting and deforming flat mesh patches guided solely by the SDF and its gradient. We conduct a comprehensive evaluation showing that our method consistently outperforms baselines in deformation quality, robustness, and computational efficiency. We also present experiments that motivate our choice of discretization over marching cubes. By bridging classical geometry processing and neural representations through local patch meshing, our work enables scalable, high-quality deformation of neural fields and paves the way for extending other geometric tasks to neural domains.

2024-05-21T16:04:32Z 24 pages, 19 figures Daniele Baieri Filippo Maggioli Emanuele Rodolà Simone Melzi Zorah Lähner http://arxiv.org/abs/2504.15782v2 Model-based Metric 3D Shape and Motion Reconstruction of Wild Bottlenose Dolphins in Drone-Shot Videos 2025-09-29T09:01:20Z

We address the problem of estimating the metric 3D shape and motion of wild dolphins from monocular video, with the aim of assessing their body condition. While considerable progress has been made in reconstructing 3D models of terrestrial quadrupeds, aquatic animals remain unexplored due to the difficulty of observing them in their natural underwater environment. To address this, we propose a model-based approach that incorporates a transmission model to account for water-induced occlusion. We apply our method to video captured under different sea conditions. We estimate mass and volume, and compare our results to a manual 2D measurements-based method. Additionally, we apply our method to video of captive animals with known ground truth mass. While in our experiments the manual approach is often more accurate, our method demonstrates a distinct advantage when applied to larger specimen. These findings highlight the potential of our method as a scalable and automated alternative for mass and volume estimation of dolphins from monocular video.

2025-04-22T10:47:29Z 9 pages, 9 figures Daniele Baieri Riccardo Cicciarella Michael Krützen Emanuele Rodolà Silvia Zuffi http://arxiv.org/abs/2509.18497v2 Differentiable Light Transport with Gaussian Surfels via Adapted Radiosity for Efficient Relighting and Geometry Reconstruction 2025-09-29T04:51:54Z

Radiance fields have gained tremendous success with applications ranging from novel view synthesis to geometry reconstruction, especially with the advent of Gaussian splatting. However, they sacrifice modeling of material reflective properties and lighting conditions, leading to significant geometric ambiguities and the inability to easily perform relighting. One way to address these limitations is to incorporate physically-based rendering, but it has been prohibitively expensive to include full global illumination within the inner loop of the optimization. Therefore, previous works adopt simplifications that make the whole optimization with global illumination effects efficient but less accurate. In this work, we adopt Gaussian surfels as the primitives and build an efficient framework for differentiable light transport, inspired from the classic radiosity theory. The whole framework operates in the coefficient space of spherical harmonics, enabling both diffuse and specular materials. We extend the classic radiosity into non-binary visibility and semi-opaque primitives, propose novel solvers to efficiently solve the light transport, and derive the backward pass for gradient optimizations, which is more efficient than auto-differentiation. During inference, we achieve view-independent rendering where light transport need not be recomputed under viewpoint changes, enabling hundreds of FPS for global illumination effects, including view-dependent reflections using a spherical harmonics representation. Through extensive qualitative and quantitative experiments, we demonstrate superior geometry reconstruction, view synthesis and relighting than previous inverse rendering baselines, or data-driven baselines given relatively sparse datasets with known or unknown lighting conditions.

2025-09-23T01:02:31Z Kaiwen Jiang Jia-Mu Sun Zilu Li Dan Wang Tzu-Mao Li Ravi Ramamoorthi http://arxiv.org/abs/2509.00066v2 T-MLP: Tailed Multi-Layer Perceptron for Level-of-Detail Signal Representation 2025-09-29T04:33:20Z

Level-of-detail (LoD) representation is critical for efficiently modeling and transmitting various types of signals, such as images and 3D shapes. In this work, we propose a novel network architecture that enables LoD signal representation. Our approach builds on a modified Multi-Layer Perceptron (MLP), which inherently operates at a single scale and thus lacks native LoD support. Specifically, we introduce the Tailed Multi-Layer Perceptron (T-MLP), which extends the MLP by attaching an output branch, also called tail, to each hidden layer. Each tail refines the residual between the current prediction and the ground-truth signal, so that the accumulated outputs across layers correspond to the target signals at different LoDs, enabling multi-scale modeling with supervision from only a single-resolution signal. Extensive experiments demonstrate that our T-MLP outperforms existing neural LoD baselines across diverse signal representation tasks.

2025-08-26T08:16:13Z Chuanxiang Yang Yuanfeng Zhou Guangshun Wei Siyu Ren Yuan Liu Junhui Hou Wenping Wang http://arxiv.org/abs/2509.24150v1 Neural Visibility of Point Sets 2025-09-29T00:54:00Z

Point clouds are widely used representations of 3D data, but determining the visibility of points from a given viewpoint remains a challenging problem due to their sparse nature and lack of explicit connectivity. Traditional methods, such as Hidden Point Removal (HPR), face limitations in computational efficiency, robustness to noise, and handling concave regions or low-density point clouds. In this paper, we propose a novel approach to visibility determination in point clouds by formulating it as a binary classification task. The core of our network consists of a 3D U-Net that extracts view-independent point-wise features and a shared multi-layer perceptron (MLP) that predicts point visibility using the extracted features and view direction as inputs. The network is trained end-to-end with ground-truth visibility labels generated from rendered 3D models. Our method significantly outperforms HPR in both accuracy and computational efficiency, achieving up to 126 times speedup on large point clouds. Additionally, our network demonstrates robustness to noise and varying point cloud densities and generalizes well to unseen shapes. We validate the effectiveness of our approach through extensive experiments on the ShapeNet, ABC Dataset and real-world datasets, showing substantial improvements in visibility accuracy. We also demonstrate the versatility of our method in various applications, including point cloud visualization, surface reconstruction, normal estimation, shadow rendering, and viewpoint optimization. Our code and models are available at https://github.com/octree-nn/neural-visibility.

2025-09-29T00:54:00Z Accepted to SIGGRAPH Asia 2025 Jun-Hao Wang Yi-Yang Tian Baoquan Chen Peng-Shuai Wang 10.1145/3757377.3763869 http://arxiv.org/abs/2509.24083v1 WireBend-kit: A Computational Design and Fabrication Toolkit for Wirebending Custom 3D Wireframe Structures 2025-09-28T21:39:51Z

This paper introduces WireBend-kit, a desktop wirebending machine and computational design tool for creating 3D wireframe structures. Combined, they allow users to rapidly and inexpensively create custom 3D wireframe structures from aluminum wire. Our design tool is implemented in freely available software and allows users to generate virtual wireframe designs and assess their fabricability. A path-planning procedure automatically converts the wireframe design into fabrication instructions for our machine while accounting for material elasticity and kinematic error sources. The custom machine costs $293 in parts and can form aluminum wire into 3D wireframe structures through an ordered sequence of feed, bend, and rotate instructions. Our technical evaluation reveals our system's ability to overcome odometrically accumulating errors inherent to wirebending in order to produce accurate 3D structures from inexpensive hardware. Finally, we provide application examples demonstrating the design space enabled by Wirebend-kit.

2025-09-28T21:39:51Z Faraz Faruqi Josha Paonaskar Riley Schuler Aiden Prevey Carson Taylor Anika Tak Anthony Guinto Eeshani Shilamkar Natarith Cheenaruenthong Martin Nisser 10.1145/3745778.3766662 http://arxiv.org/abs/2509.23769v1 ReLumix: Extending Image Relighting to Video via Video Diffusion Models 2025-09-28T09:35:33Z

Controlling illumination during video post-production is a crucial yet elusive goal in computational photography. Existing methods often lack flexibility, restricting users to certain relighting models. This paper introduces ReLumix, a novel framework that decouples the relighting algorithm from temporal synthesis, thereby enabling any image relighting technique to be seamlessly applied to video. Our approach reformulates video relighting into a simple yet effective two-stage process: (1) an artist relights a single reference frame using any preferred image-based technique (e.g., Diffusion Models, physics-based renderers); and (2) a fine-tuned stable video diffusion (SVD) model seamlessly propagates this target illumination throughout the sequence. To ensure temporal coherence and prevent artifacts, we introduce a gated cross-attention mechanism for smooth feature blending and a temporal bootstrapping strategy that harnesses SVD's powerful motion priors. Although trained on synthetic data, ReLumix shows competitive generalization to real-world videos. The method demonstrates significant improvements in visual fidelity, offering a scalable and versatile solution for dynamic lighting control.

2025-09-28T09:35:33Z Project page: https://lez-s.github.io/Relumix_project/ Lezhong Wang Shutong Jin Ruiqi Cui Anders Bjorholm Dahl Jeppe Revall Frisvad Siavash Bigdeli http://arxiv.org/abs/2509.23718v1 Diff-3DCap: Shape Captioning with Diffusion Models 2025-09-28T07:59:22Z

The task of 3D shape captioning occupies a significant place within the domain of computer graphics and has garnered considerable interest in recent years. Traditional approaches to this challenge frequently depend on the utilization of costly voxel representations or object detection techniques, yet often fail to deliver satisfactory outcomes. To address the above challenges, in this paper, we introduce Diff-3DCap, which employs a sequence of projected views to represent a 3D object and a continuous diffusion model to facilitate the captioning process. More precisely, our approach utilizes the continuous diffusion model to perturb the embedded captions during the forward phase by introducing Gaussian noise and then predicts the reconstructed annotation during the reverse phase. Embedded within the diffusion framework is a commitment to leveraging a visual embedding obtained from a pre-trained visual-language model, which naturally allows the embedding to serve as a guiding signal, eliminating the need for an additional classifier. Extensive results of our experiments indicate that Diff-3DCap can achieve performance comparable to that of the current state-of-the-art methods.

2025-09-28T07:59:22Z IEEE Transactions on Visualization and Computer Graphics. 2025 Zhenyu Shu Jiawei Wen Shiyang Li Shiqing Xin Ligang Liu 10.1109/TVCG.2025.3564664 http://arxiv.org/abs/2509.23709v1 StrucADT: Generating Structure-controlled 3D Point Clouds with Adjacency Diffusion Transformer 2025-09-28T07:45:51Z

In the field of 3D point cloud generation, numerous 3D generative models have demonstrated the ability to generate diverse and realistic 3D shapes. However, the majority of these approaches struggle to generate controllable 3D point cloud shapes that meet user-specific requirements, hindering the large-scale application of 3D point cloud generation. To address the challenge of lacking control in 3D point cloud generation, we are the first to propose controlling the generation of point clouds by shape structures that comprise part existences and part adjacency relationships. We manually annotate the adjacency relationships between the segmented parts of point cloud shapes, thereby constructing a StructureGraph representation. Based on this StructureGraph representation, we introduce StrucADT, a novel structure-controllable point cloud generation model, which consists of StructureGraphNet module to extract structure-aware latent features, cCNF Prior module to learn the distribution of the latent features controlled by the part adjacency, and Diffusion Transformer module conditioned on the latent features and part adjacency to generate structure-consistent point cloud shapes. Experimental results demonstrate that our structure-controllable 3D point cloud generation method produces high-quality and diverse point cloud shapes, enabling the generation of controllable point clouds based on user-specified shape structures and achieving state-of-the-art performance in controllable point cloud generation on the ShapeNet dataset.

2025-09-28T07:45:51Z IEEE Transactions on Visualization and Computer Graphics. 2025 Zhenyu Shu Jiajun Shen Zhongui Chen Xiaoguang Han Shiqing Xin 10.1109/TVCG.2025.3600392 http://arxiv.org/abs/2509.23703v1 DFG-PCN: Point Cloud Completion with Degree-Flexible Point Graph 2025-09-28T07:28:42Z

Point cloud completion is a vital task focused on reconstructing complete point clouds and addressing the incompleteness caused by occlusion and limited sensor resolution. Traditional methods relying on fixed local region partitioning, such as k-nearest neighbors, which fail to account for the highly uneven distribution of geometric complexity across different regions of a shape. This limitation leads to inefficient representation and suboptimal reconstruction, especially in areas with fine-grained details or structural discontinuities. This paper proposes a point cloud completion framework called Degree-Flexible Point Graph Completion Network (DFG-PCN). It adaptively assigns node degrees using a detail-aware metric that combines feature variation and curvature, focusing on structurally important regions. We further introduce a geometry-aware graph integration module that uses Manhattan distance for edge aggregation and detail-guided fusion of local and global features to enhance representation. Extensive experiments on multiple benchmark datasets demonstrate that our method consistently outperforms state-of-the-art approaches.

2025-09-28T07:28:42Z IEEE Transactions on Visualization and Computer Graphics, 2025 Zhenyu Shu Jian Yao Shiqing Xin 10.1109/TVCG.2025.3612379 http://arxiv.org/abs/2509.23572v1 Automated design of compound lenses with discrete-continuous optimization 2025-09-28T02:08:23Z

We introduce a method that automatically and jointly updates both continuous and discrete parameters of a compound lens design, to improve its performance in terms of sharpness, speed, or both. Previous methods for compound lens design use gradient-based optimization to update continuous parameters (e.g., curvature of individual lens elements) of a given lens topology, requiring extensive expert intervention to realize topology changes. By contrast, our method can additionally optimize discrete parameters such as number and type (e.g., singlet or doublet) of lens elements. Our method achieves this capability by combining gradient-based optimization with a tailored Markov chain Monte Carlo sampling algorithm, using transdimensional mutation and paraxial projection operations for efficient global exploration. We show experimentally on a variety of lens design tasks that our method effectively explores an expanded design space of compound lenses, producing better designs than previous methods and pushing the envelope of speed-sharpness tradeoffs achievable by automated lens design.

2025-09-28T02:08:23Z SIGGRAPH Asia 2025, project website: https://imaging.cs.cmu.edu/automated_lens_design/ Arjun Teh Delio Vicini Bernd Bickel Ioannis Gkioulekas Matthew O'Toole 10.1145/3757377.3763850 http://arxiv.org/abs/2509.23489v1 Modeling and Exploiting the Time Course of Chromatic Adaptation for Display Power Optimizations in Virtual Reality 2025-09-27T20:30:15Z

We introduce a gaze-tracking--free method to reduce OLED display power consumption in VR with minimal perceptual impact. This technique exploits the time course of chromatic adaptation, the human visual system's ability to maintain stable color perception under changing illumination. To that end, we propose a novel psychophysical paradigm that models how human adaptation state changes with the scene illuminant. We exploit this model to compute an optimal illuminant shift trajectory, controlling the rate and extent of illumination change, to reduce display power under a given perceptual loss budget. Our technique significantly improves the perceptual quality over prior work that applies illumination shifts instantaneously. Our technique can also be combined with prior work on luminance dimming to reduce display power by 31% with no statistical loss of perceptual quality.

2025-09-27T20:30:15Z To appear in Transactions on Graphics and SIGGRAPH ASIA 2025 Ethan Chen Sushant Kondguli Carl Marshall Yuhao Zhu 10.1145/3763294