https://arxiv.org/api/GeOojPric6Ler7q33QVWZAPDCP02026-07-01T07:48:08Z9421208515http://arxiv.org/abs/2506.11163v1Vector Representations of Vessel Trees2025-06-11T20:34:08ZWe introduce a novel framework for learning vector representations of tree-structured geometric data focusing on 3D vascular networks. Our approach employs two sequentially trained Transformer-based autoencoders. In the first stage, the Vessel Autoencoder captures continuous geometric details of individual vessel segments by learning embeddings from sampled points along each curve. In the second stage, the Vessel Tree Autoencoder encodes the topology of the vascular network as a single vector representation, leveraging the segment-level embeddings from the first model. A recursive decoding process ensures that the reconstructed topology is a valid tree structure. Compared to 3D convolutional models, this proposed approach substantially lowers GPU memory requirements, facilitating large-scale training. Experimental results on a 2D synthetic tree dataset and a 3D coronary artery dataset demonstrate superior reconstruction fidelity, accurate topology preservation, and realistic interpolations in latent space. Our scalable framework, named VeTTA, offers precise, flexible, and topologically consistent modeling of anatomical tree structures in medical imaging.2025-06-11T20:34:08ZJames BattenMichiel SchaapMatthew SinclairYing BaiBen Glockerhttp://arxiv.org/abs/2506.09997v1DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos2025-06-11T17:59:58ZWe introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM), the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene. Feed-forward scene reconstruction has gained significant attention for its ability to rapidly create digital replicas of real-world environments. However, most existing models are limited to static scenes and fail to reconstruct the motion of moving objects. Developing a feed-forward model for dynamic scene reconstruction poses significant challenges, including the scarcity of training data and the need for appropriate 3D representations and training paradigms. To address these challenges, we introduce several key technical contributions: an enhanced large-scale synthetic dataset with ground-truth multi-view videos and dense 3D scene flow supervision; a per-pixel deformable 3D Gaussian representation that is easy to learn, supports high-quality dynamic view synthesis, and enables long-range 3D tracking; and a large transformer network that achieves real-time, generalizable dynamic scene reconstruction. Extensive qualitative and quantitative experiments demonstrate that DGS-LRM achieves dynamic scene reconstruction quality comparable to optimization-based methods, while significantly outperforming the state-of-the-art predictive dynamic reconstruction method on real-world examples. Its predicted physically grounded 3D deformation is accurate and can readily adapt for long-range 3D tracking tasks, achieving performance on par with state-of-the-art monocular video 3D tracking methods.2025-06-11T17:59:58ZProject page: https://hubert0527.github.io/dgslrm/Chieh Hubert LinZhaoyang LvSongyin WuZhen XuThu Nguyen-PhuocHung-Yu TsengJulian StraubNumair KhanLei XiaoMing-Hsuan YangYuheng RenRichard NewcombeZhao DongZhengqin Lihttp://arxiv.org/abs/2506.09023v2Fine-Grained Spatially Varying Material Selection in Images2025-06-11T17:49:53ZSelection is the first step in many image editing processes, enabling faster and simpler modifications of all pixels sharing a common modality. In this work, we present a method for material selection in images, robust to lighting and reflectance variations, which can be used for downstream editing tasks. We rely on vision transformer (ViT) models and leverage their features for selection, proposing a multi-resolution processing strategy that yields finer and more stable selection results than prior methods. Furthermore, we enable selection at two levels: texture and subtexture, leveraging a new two-level material selection (DuMaS) dataset which includes dense annotations for over 800,000 synthetic images, both on the texture and subtexture levels.2025-06-10T17:50:05ZJulia Guerrero-ViuMichael FischerIliyan GeorgievElena GarcesDiego GutierrezBelen MasiaValentin Deschaintrehttp://arxiv.org/abs/2506.09909v1TransGI: Real-Time Dynamic Global Illumination With Object-Centric Neural Transfer Model2025-06-11T16:23:06ZNeural rendering algorithms have revolutionized computer graphics, yet their impact on real-time rendering under arbitrary lighting conditions remains limited due to strict latency constraints in practical applications. The key challenge lies in formulating a compact yet expressive material representation. To address this, we propose TransGI, a novel neural rendering method for real-time, high-fidelity global illumination. It comprises an object-centric neural transfer model for material representation and a radiance-sharing lighting system for efficient illumination. Traditional BSDF representations and spatial neural material representations lack expressiveness, requiring thousands of ray evaluations to converge to noise-free colors. Conversely, real-time methods trade quality for efficiency by supporting only diffuse materials. In contrast, our object-centric neural transfer model achieves compactness and expressiveness through an MLP-based decoder and vertex-attached latent features, supporting glossy effects with low memory overhead. For dynamic, varying lighting conditions, we introduce local light probes capturing scene radiance, coupled with an across-probe radiance-sharing strategy for efficient probe generation. We implemented our method in a real-time rendering engine, combining compute shaders and CUDA-based neural networks. Experimental results demonstrate that our method achieves real-time performance of less than 10 ms to render a frame and significantly improved rendering quality compared to baseline methods.2025-06-11T16:23:06ZYijie DengLei HanLu Fanghttp://arxiv.org/abs/2507.00006v1MVGBench: Comprehensive Benchmark for Multi-view Generation Models2025-06-11T08:02:40ZWe propose MVGBench, a comprehensive benchmark for multi-view image generation models (MVGs) that evaluates 3D consistency in geometry and texture, image quality, and semantics (using vision language models). Recently, MVGs have been the main driving force in 3D object creation. However, existing metrics compare generated images against ground truth target views, which is not suitable for generative tasks where multiple solutions exist while differing from ground truth. Furthermore, different MVGs are trained on different view angles, synthetic data and specific lightings -- robustness to these factors and generalization to real data are rarely evaluated thoroughly. Without a rigorous evaluation protocol, it is also unclear what design choices contribute to the progress of MVGs. MVGBench evaluates three different aspects: best setup performance, generalization to real data and robustness. Instead of comparing against ground truth, we introduce a novel 3D self-consistency metric which compares 3D reconstructions from disjoint generated multi-views. We systematically compare 12 existing MVGs on 4 different curated real and synthetic datasets. With our analysis, we identify important limitations of existing methods specially in terms of robustness and generalization, and we find the most critical design choices. Using the discovered best practices, we propose ViFiGen, a method that outperforms all evaluated MVGs on 3D consistency. Our code, model, and benchmark suite will be publicly released.2025-06-11T08:02:40Z17 pages, 11 figures, 9 tables, project page: https://virtualhumans.mpi-inf.mpg.de/MVGBench/Xianghui XieChuhang ZouMeher Gitika KarumuriJan Eric LenssenGerard Pons-Mollhttp://arxiv.org/abs/2311.09652v2Event-based Motion-Robust Accurate Shape Estimation for Mixed Reflectance Scenes2025-06-10T23:34:04ZEvent-based structured light systems have recently been introduced as an exciting alternative to conventional frame-based triangulation systems for the 3D measurements of diffuse surfaces. Important benefits include the fast capture speed and the high dynamic range provided by the event camera - albeit at the cost of lower data quality. So far, both low-accuracy event-based and high-accuracy frame-based 3D imaging systems are tailored to a specific surface type, such as diffuse or specular, and can not be used for a broader class of object surfaces ("mixed reflectance scenes"). In this work, we present a novel event-based structured light system that enables fast 3D imaging of mixed reflectance scenes with high accuracy. On the captured events, we use epipolar constraints that intrinsically enable decomposing the measured reflections into diffuse, two-bounce specular, and other multi-bounce reflections. The diffuse surfaces in the scene are reconstructed using triangulation. Then, the reconstructed diffuse scene parts are leveraged as a "display" to evaluate the specular scene parts via deflectometry. This novel procedure allows us to use the entire scene as a virtual screen, using only a scanning laser and an event camera. The resulting system achieves fast and motion-robust (14Hz) reconstructions of mixed reflectance scenes with < 600 $μm$ depth error. Moreover, we introduce an "ultrafast" capture mode (250Hz) for the 3D measurement of diffuse scenes.2023-11-16T08:12:10ZAniket DashputeJiazhang WangJames TaylorOliver CossairtAshok VeeraraghavanFlorian Willomitzerhttp://arxiv.org/abs/2506.10038v1Ambient Diffusion Omni: Training Good Models with Bad Data2025-06-10T22:37:39ZWe show how to use low-quality, synthetic, and out-of-distribution images to improve the quality of a diffusion model. Typically, diffusion models are trained on curated datasets that emerge from highly filtered data pools from the Web and other sources. We show that there is immense value in the lower-quality images that are often discarded. We present Ambient Diffusion Omni, a simple, principled framework to train diffusion models that can extract signal from all available images during training. Our framework exploits two properties of natural images -- spectral power law decay and locality. We first validate our framework by successfully training diffusion models with images synthetically corrupted by Gaussian blur, JPEG compression, and motion blur. We then use our framework to achieve state-of-the-art ImageNet FID, and we show significant improvements in both image quality and diversity for text-to-image generative modeling. The core insight is that noise dampens the initial skew between the desired high-quality distribution and the mixed distribution we actually observe. We provide rigorous theoretical justification for our approach by analyzing the trade-off between learning from biased data versus limited unbiased data across diffusion times.2025-06-10T22:37:39ZPreprint, work in progressGiannis DarasAdrian Rodriguez-MunozAdam KlivansAntonio TorralbaConstantinos Daskalakishttp://arxiv.org/abs/2506.10027v1Learning-based density-equalizing map2025-06-10T02:54:39ZDensity-equalizing map (DEM) serves as a powerful technique for creating shape deformations with the area changes reflecting an underlying density function. In recent decades, DEM has found widespread applications in fields such as data visualization, geometry processing, and medical imaging. Traditional approaches to DEM primarily rely on iterative numerical solvers for diffusion equations or optimization-based methods that minimize handcrafted energy functionals. However, these conventional techniques often face several challenges: they may suffer from limited accuracy, produce overlapping artifacts in extreme cases, and require substantial algorithmic redesign when extended from 2D to 3D, due to the derivative-dependent nature of their energy formulations. In this work, we propose a novel learning-based density-equalizing mapping framework (LDEM) using deep neural networks. Specifically, we introduce a loss function that enforces density uniformity and geometric regularity, and utilize a hierarchical approach to predict the transformations at both the coarse and dense levels. Our method demonstrates superior density-equalizing and bijectivity properties compared to prior methods for a wide range of simple and complex density distributions, and can be easily applied to surface remeshing with different effects. Also, it generalizes seamlessly from 2D to 3D domains without structural changes to the model architecture or loss formulation. Altogether, our work opens up new possibilities for scalable and robust computation of density-equalizing maps for practical applications.2025-06-10T02:54:39ZAIMS Mathematics, 10(11), 25756-25790 (2025)Yanwen HuangLok Ming LuiGary P. T. Choi10.3934/math.20251140http://arxiv.org/abs/2506.08237v1Solving partial differential equations in participating media2025-06-09T21:12:22ZWe consider the problem of solving partial differential equations (PDEs) in domains with complex microparticle geometry that is impractical, or intractable, to model explicitly. Drawing inspiration from volume rendering, we propose tackling this problem by treating the domain as a participating medium that models microparticle geometry stochastically, through aggregate statistical properties (e.g., particle density). We first introduce the problem setting of PDE simulation in participating media. We then specialize to exponential media and describe the properties that make them an attractive model of microparticle geometry for PDE simulation problems. We use these properties to develop two new algorithms, volumetric walk on spheres and volumetric walk on stars, that generalize previous Monte Carlo algorithms to enable efficient and discretization-free simulation of linear elliptic PDEs (e.g., Laplace) in participating media. We demonstrate experimentally that our algorithms can solve Laplace boundary value problems with complex microparticle geometry more accurately and more efficiently than previous approaches, such as ensemble averaging and homogenization.2025-06-09T21:12:22ZSIGGRAPH 2025. Project page https://imaging.cs.cmu.edu/volumetric_walk_on_spheresBailey MillerRohan SawhneyKeenan CraneIoannis Gkioulekas10.1145/3731152http://arxiv.org/abs/2506.09075v1SILK: Smooth InterpoLation frameworK for motion in-betweening A Simplified Computational Approach2025-06-09T19:26:27ZMotion in-betweening is a crucial tool for animators, enabling intricate control over pose-level details in each keyframe. Recent machine learning solutions for motion in-betweening rely on complex models, incorporating skeleton-aware architectures or requiring multiple modules and training steps. In this work, we introduce a simple yet effective Transformer-based framework, employing a single Transformer encoder to synthesize realistic motions for motion in-betweening tasks. We find that data modeling choices play a significant role in improving in-betweening performance. Among others, we show that increasing data volume can yield equivalent or improved motion transitions, that the choice of pose representation is vital for achieving high-quality results, and that incorporating velocity input features enhances animation performance. These findings challenge the assumption that model complexity is the primary determinant of animation quality and provide insights into a more data-centric approach to motion interpolation. Additional videos and supplementary material are available at https://silk-paper.github.io.2025-06-09T19:26:27ZAccepted to CVPR 2025 Human Motion Generation Workshop. 10 pages, 3 figures, 5 Tables, and 40 ReferencesElly AkhoundiHung Yu LingAnup Anand DeshmukhJudith Butepagehttp://arxiv.org/abs/2504.06598v3Stochastic Ray Tracing of Transparent 3D Gaussians2025-06-09T18:28:35Z3D Gaussian splatting has been widely adopted as a 3D representation for novel-view synthesis, relighting, and 3D generation tasks. It delivers realistic and detailed results through a collection of explicit 3D Gaussian primitives, each carrying opacity and view-dependent color. However, efficient rendering of many transparent primitives remains a significant challenge. Existing approaches either rasterize the Gaussians with approximate per-view sorting or rely on high-end RTX GPUs. This paper proposes a stochastic ray-tracing method to render 3D clouds of transparent primitives. Instead of processing all ray-Gaussian intersections in sequential order, each ray traverses the acceleration structure only once, randomly accepting and shading a single intersection (or $N$ intersections, using a simple extension). This approach minimizes shading time and avoids primitive sorting along the ray, thereby minimizing register usage and maximizing parallelism even on low-end GPUs. The cost of rays through the Gaussian asset is comparable to that of standard mesh-intersection rays. The shading is unbiased and has low variance, as our stochastic acceptance achieves importance sampling based on accumulated weight. The alignment with Monte Carlo philosophy simplifies implementation and integration into a conventional path-tracing framework.2025-04-09T05:49:05Z10 pages, 7 figures, 5 tablesXin SunIliyan GeorgievYun FeiMiloš Hašanhttp://arxiv.org/abs/2506.07932v1Squeeze3D: Your 3D Generation Model is Secretly an Extreme Neural Compressor2025-06-09T16:52:10ZWe propose Squeeze3D, a novel framework that leverages implicit prior knowledge learnt by existing pre-trained 3D generative models to compress 3D data at extremely high compression ratios. Our approach bridges the latent spaces between a pre-trained encoder and a pre-trained generation model through trainable mapping networks. Any 3D model represented as a mesh, point cloud, or a radiance field is first encoded by the pre-trained encoder and then transformed (i.e. compressed) into a highly compact latent code. This latent code can effectively be used as an extremely compressed representation of the mesh or point cloud. A mapping network transforms the compressed latent code into the latent space of a powerful generative model, which is then conditioned to recreate the original 3D model (i.e. decompression). Squeeze3D is trained entirely on generated synthetic data and does not require any 3D datasets. The Squeeze3D architecture can be flexibly used with existing pre-trained 3D encoders and existing generative models. It can flexibly support different formats, including meshes, point clouds, and radiance fields. Our experiments demonstrate that Squeeze3D achieves compression ratios of up to 2187x for textured meshes, 55x for point clouds, and 619x for radiance fields while maintaining visual quality comparable to many existing methods. Squeeze3D only incurs a small compression and decompression latency since it does not involve training object-specific networks to compress an object.2025-06-09T16:52:10ZRishit DagliYushi GuanSankeerth DurvasulaMohammadreza MofayeziNandita Vijaykumarhttp://arxiv.org/abs/2506.07897v1GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution2025-06-09T16:13:12ZWe present a novel approach for enhancing the resolution and geometric fidelity of 3D Gaussian Splatting (3DGS) beyond native training resolution. Current 3DGS methods are fundamentally limited by their input resolution, producing reconstructions that cannot extrapolate finer details than are present in the training views. Our work breaks this limitation through a lightweight generative model that predicts and refines additional 3D Gaussians where needed most. The key innovation is our Hessian-assisted sampling strategy, which intelligently identifies regions that are likely to benefit from densification, ensuring computational efficiency. Unlike computationally intensive GANs or diffusion approaches, our method operates in real-time (0.015s per inference on a single consumer-grade GPU), making it practical for interactive applications. Comprehensive experiments demonstrate significant improvements in both geometric accuracy and rendering quality compared to state-of-the-art methods, establishing a new paradigm for resolution-free 3D scene enhancement.2025-06-09T16:13:12ZThe Conference on Computer Vision and Pattern Recognition (CVPR) 2025 - Second Workshop on Visual ConceptsShuja KhalidMohamed IbrahimYang Liuhttp://arxiv.org/abs/2506.07781v1SMaRCSim: Maritime Robotics Simulation Modules2025-06-09T13:57:32ZDeveloping new functionality for underwater robots and testing them in the real world is time-consuming and resource-intensive. Simulation environments allow for rapid testing before field deployment. However, existing tools lack certain functionality for use cases in our project: i) developing learning-based methods for underwater vehicles; ii) creating teams of autonomous underwater, surface, and aerial vehicles; iii) integrating the simulation with mission planning for field experiments. A holistic solution to these problems presents great potential for bringing novel functionality into the underwater domain. In this paper we present SMaRCSim, a set of simulation packages that we have developed to help us address these issues.2025-06-09T13:57:32ZMart KartaševDavid DörnerÖzer ÖzkahramanPetter ÖgrenIvan SteniusJohn Folkessonhttp://arxiv.org/abs/2506.04120v2Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data2025-06-09T11:42:49ZCreating accurate, physical simulations directly from real-world robot motion holds great value for safe, scalable, and affordable robot learning, yet remains exceptionally challenging. Real robot data suffers from occlusions, noisy camera poses, dynamic scene elements, which hinder the creation of geometrically accurate and photorealistic digital twins of unseen objects. We introduce a novel real-to-sim framework tackling all these challenges at once. Our key insight is a hybrid scene representation merging the photorealistic rendering of 3D Gaussian Splatting with explicit object meshes suitable for physics simulation within a single representation. We propose an end-to-end optimization pipeline that leverages differentiable rendering and differentiable physics within MuJoCo to jointly refine all scene components - from object geometry and appearance to robot poses and physical parameters - directly from raw and imprecise robot trajectories. This unified optimization allows us to simultaneously achieve high-fidelity object mesh reconstruction, generate photorealistic novel views, and perform annotation-free robot pose calibration. We demonstrate the effectiveness of our approach both in simulation and on challenging real-world sequences using an ALOHA 2 bi-manual manipulator, enabling more practical and robust real-to-simulation pipelines.2025-06-04T16:14:31ZUpdated version correcting inadvertent omission in author listBen MoranMauro ComiArunkumar ByravanSteven BohezTom ErezZhibin LiLeonard Hasenclever