https://arxiv.org/api/9K0ME8giqBh0Ub96zIHYcv8OX+82026-06-26T08:41:53Z9390150015http://arxiv.org/abs/2501.17895v2ProcTex: Consistent and Interactive Text-to-texture Synthesis for Part-based Procedural Models2025-10-03T22:27:45ZRecent advances in generative modeling have driven significant progress in text-guided texture synthesis. However, current methods focus on synthesizing texture for single static 3D object, and struggle to handle entire families of shapes, such as those produced by procedural programs. Applying existing methods naively to each procedural shape is too slow to support exploring different parameter configurations at interactive rates, and also results in inconsistent textures across the procedural shapes. To this end, we introduce ProcTex, the first text-to-texture system designed for part-based procedural models. ProcTex enables consistent and real-time text-guided texture synthesis for families of shapes, which integrates seamlessly with the interactive design flow of procedural modeling. To ensure consistency, our core approach is to synthesize texture for a template shape from the procedural model, followed by a texture transfer stage to apply the texture to other procedural shapes via solving dense correspondence. To ensure interactiveness, we propose a novel correspondence network and show that dense correspondence can be effectively learned by a neural network for procedural models. We also develop several techniques, including a retexturing pipeline to support structural variation from procedural parameters, and part-level UV texture map generation for local appearance editing. Extensive experiments on a diverse set of procedural models validate ProcTex's ability to produce high-quality, visually consistent textures while supporting interactive applications.2025-01-28T22:38:55ZRuiqi XuZihan ZhuBen AhlbrandSrinath SridharDaniel Ritchiehttp://arxiv.org/abs/2510.03433v1Style Brush: Guided Style Transfer for 3D Objects2025-10-03T18:50:51ZWe introduce Style Brush, a novel style transfer method for textured meshes designed to empower artists with fine-grained control over the stylization process. Our approach extends traditional 3D style transfer methods by introducing a novel loss function that captures style directionality, supports multiple style images or portions thereof, and enables smooth transitions between styles in the synthesized texture. The use of easily generated guiding textures streamlines user interaction, making our approach accessible to a broad audience. Extensive evaluations with various meshes, style images, and contour shapes demonstrate the flexibility of our method and showcase the visual appeal of the generated textures.2025-10-03T18:50:51ZÁron Samuel KovácsPedro HermosillaRenata G. Raidouhttp://arxiv.org/abs/2502.20215v2Topological Autoencoders++: Fast and Accurate Cycle-Aware Dimensionality Reduction2025-10-03T14:34:00ZThis paper presents a novel topology-aware dimensionality reduction approach aiming at accurately visualizing the cyclic patterns present in high dimensional data. To that end, we build on the Topological Autoencoders (TopoAE) formulation. First, we provide a novel theoretical analysis of its associated loss and show that a zero loss indeed induces identical persistence pairs (in high and low dimensions) for the $0$-dimensional persistent homology (PH$^0$) of the Rips filtration. We also provide a counter example showing that this property no longer holds for a naive extension of TopoAE to PH$^d$ for $d\ge 1$. Based on this observation, we introduce a novel generalization of TopoAE to $1$-dimensional persistent homology (PH$^1$), called TopoAE++, for the accurate generation of cycle-aware planar embeddings, addressing the above failure case. This generalization is based on the notion of cascade distortion, a new penalty term favoring an isometric embedding of the $2$-chains filling persistent $1$-cycles, hence resulting in more faithful geometrical reconstructions of the $1$-cycles in the plane. We further introduce a novel, fast algorithm for the exact computation of PH for Rips filtrations in the plane, yielding improved runtimes over previously documented topology-aware methods. Our method also achieves a better balance between the topological accuracy, as measured by the Wasserstein distance, and the visual preservation of the cycles in low dimensions. Our C++ implementation is available at https://github.com/MClemot/TopologicalAutoencodersPlusPlus.2025-02-27T15:55:23ZMattéo ClémotJulie DigneJulien Tiernyhttp://arxiv.org/abs/2510.02884v1GS-Share: Enabling High-fidelity Map Sharing with Incremental Gaussian Splatting2025-10-03T10:40:54ZConstructing and sharing 3D maps is essential for many applications, including autonomous driving and augmented reality. Recently, 3D Gaussian splatting has emerged as a promising approach for accurate 3D reconstruction. However, a practical map-sharing system that features high-fidelity, continuous updates, and network efficiency remains elusive. To address these challenges, we introduce GS-Share, a photorealistic map-sharing system with a compact representation. The core of GS-Share includes anchor-based global map construction, virtual-image-based map enhancement, and incremental map update. We evaluate GS-Share against state-of-the-art methods, demonstrating that our system achieves higher fidelity, particularly for extrapolated views, with improvements of 11%, 22%, and 74% in PSNR, LPIPS, and Depth L1, respectively. Furthermore, GS-Share is significantly more compact, reducing map transmission overhead by 36%.2025-10-03T10:40:54Z11 pages, 11 figuresXinran ZhangHanqi ZhuYifan DuanYanyong Zhanghttp://arxiv.org/abs/2510.02651v1Visualizing Spatial Point Clouds: A Task-Oriented Taxonomy2025-10-03T01:07:11ZThe visualization of 3D point cloud data is essential in fields such as autonomous navigation, environmental monitoring, and disaster response, where tasks like object recognition, structural analysis, and spatiotemporal exploration rely on clear and effective visual representation. Despite advancements in AI-driven processing, visualization remains a critical tool for interpreting complex spatial datasets. However, designing effective point cloud visualizations presents significant challenges due to the sparsity, density variations, and scale of the data. In this work, we analyze the design space of spatial point cloud visualization, highlighting a gap in systematically mapping visualization techniques to analytical objectives. We introduce a taxonomy that categorizes four decades of visualization design choices, linking them to fundamental challenges in modern applications. By structuring visualization strategies based on data types, user objectives, and visualization techniques, our framework provides a foundation for advancing more effective, interpretable, and user-centered visualization techniques.2025-10-03T01:07:11Z12 pages, 3 figures, 1 tableMahsa PartoviFederico Iuricichhttp://arxiv.org/abs/2507.00412v2ViscoReg: Neural Signed Distance Functions via Viscosity Solutions2025-10-02T16:53:28ZImplicit Neural Representations (INRs) that learn Signed Distance Functions (SDFs) from point cloud data represent the state-of-the-art for geometrically accurate 3D scene reconstruction. However, training these Neural SDFs often requires enforcing the Eikonal equation, an ill-posed equation that also leads to unstable gradient flows. Numerical Eikonal solvers have relied on viscosity approaches for regularization and stability. Motivated by this well-established theory, we introduce ViscoReg, a novel regularizer that provably stabilizes Neural SDF training. Empirically, ViscoReg outperforms state-of-the-art approaches such as SIREN, DiGS, and StEik on ShapeNet, the Surface Reconstruction Benchmark, and 3D scene reconstruction datasets. Additionally, we establish novel generalization error estimates for Neural SDFs in terms of the training error, using the theory of viscosity solutions.2025-07-01T03:55:13Z21 pages, 7 figuresMeenakshi KrishnanRamani Duraiswamihttp://arxiv.org/abs/2506.17087v2PCG-Informed Neural Solvers for High-Resolution Homogenization of Periodic Microstructures2025-10-02T14:58:55ZThe mechanical properties of periodic microstructures are pivotal in various engineering applications. Homogenization theory is a powerful tool for predicting these properties by averaging the behavior of complex microstructures over a representative volume element. However, traditional numerical solvers for homogenization problems can be computationally expensive, especially for high-resolution and complicated topology and geometry. Existing learning-based methods, while promising, often struggle with accuracy and generalization in such scenarios. To address these challenges, we present CGINS, a preconditioned-conjugate-gradient-solver-informed neural network for solving homogenization problems. CGINS leverages sparse and periodic 3D convolution to enable high-resolution learning while ensuring structural periodicity. It features a multi-level network architecture that facilitates effective learning across different scales and employs minimum potential energy as label-free loss functions for self-supervised learning. The integrated preconditioned conjugate gradient iterations ensure that the network provides PCG-friendly initial solutions for fast convergence and high accuracy. Additionally, CGINS imposes a global displacement constraint to ensure physical consistency, addressing a key limitation in prior methods that rely on Dirichlet anchors. Evaluated on large-scale datasets with diverse topologies and material configurations, CGINS achieves state-of-the-art accuracy (relative error below 1%) and outperforms both learning-based baselines and GPU-accelerated numerical solvers. Notably, it delivers 2 times to 10 times speedups over traditional methods while maintaining physically reliable predictions at resolutions up to $512^3$.2025-06-20T15:44:17ZYu XingYang LiuLipeng ChenHuiping TangLin Luhttp://arxiv.org/abs/2504.06735v2Interactive Expressive Motion Generation Using Dynamic Movement Primitives2025-10-02T12:51:30ZOur goal is to enable social robots to interact autonomously with humans in a realistic, engaging, and expressive manner. The 12 Principles of Animation are a well-established framework animators use to create movements that make characters appear convincing, dynamic, and emotionally expressive. This paper proposes a novel approach that leverages Dynamic Movement Primitives (DMPs) to implement key animation principles, providing a learnable, explainable, modulable, online adaptable and composable model for automatic expressive motion generation. DMPs, originally developed for general imitation learning in robotics and grounded in a spring-damper system design, offer mathematical properties that make them particularly suitable for this task. Specifically, they enable modulation of the intensities of individual principles and facilitate the decomposition of complex, expressive motion sequences into learnable and parametrizable primitives. We present the mathematical formulation of the parameterized animation principles and demonstrate the effectiveness of our framework through experiments and application on three robotic platforms with different kinematic configurations, in simulation, on actual robots and in a user study. Our results show that the approach allows for creating diverse and nuanced expressions using a single base model.2025-04-09T09:46:50ZThis paper has been accepted for publication at the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)Till HielscherAndreas BullingKai O. Arrashttp://arxiv.org/abs/2302.01820v2Automatic inference of a anatomically meaningful solid wood texture from a single photograph2025-10-02T07:55:51ZWood is a volumetric material with a very large appearance gamut that is further enlarged by numerous finishing techniques. Computer graphics has made considerable progress in creating sophisticated and flexible appearance models that allow convincing renderings of wooden materials.
However, these do not yet allow fully automatic appearance matching to a concrete exemplar piece of wood, and have to be fine-tuned by hand. More general appearance matching strategies are incapable of reconstructing anatomically meaningful volumetric information. This is essential for applications where the internal structure of wood is significant, such as non-planar furniture parts machined from a solid block of wood, translucent appearance of thin wooden layers, or in the field of dendrochronology.
In this paper, we provide the two key ingredients for automatic matching of a procedural wood appearance model to exemplar photographs: a good initialization, built on detecting and modelling the ring structure, and a phase-based loss function that allows to accurately recover growth ring deformations and gives anatomically meaningful results.
Our ring-detection technique is based on curved Gabor filters, and robustly works for a considerable range of wood types.2023-02-03T15:54:24ZThomas K. NindelMohcen HafidiTomáš IserAlexander Wilkiehttp://arxiv.org/abs/2510.01743v1MIRAGE: Patient-Specific Mixed Reality Coaching for MRI via Depth-Only Markerless Registration and Immersive VR2025-10-02T07:31:10ZMagnetic resonance imaging (MRI) is an indispensable diagnostic tool, yet the confined bore and acoustic noise can evoke considerable anxiety and claustrophobic reactions. High anxiety leads to motion artifacts, incomplete scans and reliance on pharmacological sedation. MIRAGE (Mixed Reality Anxiety Guidance Environment) harnesses the latest mixed reality (MR) hardware to prepare patients for MRI through immersive virtual reality (VR) and markerless augmented reality (AR) registration. In this paper, we extend our previous work by providing a comprehensive review of related research, detailing the system architecture, and exploring metrics for patient and clinician experience. We also present considerations for clinical deployment of MR systems within hospital workflows. Our results indicate that depth-based registration achieves sub-centimeter accuracy with minimal setup, while the immersive coaching environment reduces patient anxiety and yields favourable usability scores.2025-10-02T07:31:10ZDaniel BrooksEmily CarterHu GuoRajesh Nairhttp://arxiv.org/abs/2510.01690v1Multimodal Feedback for Task Guidance in Augmented Reality2025-10-02T05:36:31ZOptical see-through augmented reality (OST-AR) overlays digital targets and annotations on the physical world, offering promising guidance for hands-on tasks such as medical needle insertion or assembly. Recent work on OST-AR depth perception shows that target opacity and tool visualization significantly affect accuracy and usability; opaque targets and rendering the real instrument reduce depth errors, whereas transparent targets and absent tools impair performance. However, reliance on visual overlays may overload attention and leaves little room for depth cues when occlusion or lighting hampers perception. To address these limitations, we explore multimodal feedback that combines OST-AR with wrist-based vibrotactile haptics. The past two years have seen rapid advances in haptic technology. Researchers have investigated skin-stretch and vibrotactile cues for conveying spatial information to blind users, wearable ring actuators that support precise pinching in AR, cross-modal audio-haptic cursors that enable eyes-free object selection, and wrist-worn feedback for teleoperated surgery that improves force awareness at the cost of longer task times. Studies comparing pull versus push vibrotactile metaphors found that pull cues yield faster gesture completion and lower cognitive load. These findings motivate revisiting OST-AR guidance with a fresh perspective on wrist-based haptics. We design a custom wristband with six vibromotors delivering directional and state cues, integrate it with a handheld tool and OST-AR, and assess its impact on cue recognition and depth guidance. Through a formative study and two experiments (N=21 and N=27), we show that participants accurately identify haptic patterns under cognitive load and that multimodal feedback improves spatial precision and usability compared with visual-only or haptic-only conditions.2025-10-02T05:36:31ZHu GuoLily PatelRohan Gupthttp://arxiv.org/abs/2510.01619v1MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics2025-10-02T02:51:45ZWhile there has been significant progress in the field of 3D avatar creation from visual observations, modeling physically plausible dynamics of humans with loose garments remains a challenging problem. Although a few existing works address this problem by leveraging physical simulation, they suffer from limited accuracy or robustness to novel animation inputs. In this work, we present MPMAvatar, a framework for creating 3D human avatars from multi-view videos that supports highly realistic, robust animation, as well as photorealistic rendering from free viewpoints. For accurate and robust dynamics modeling, our key idea is to use a Material Point Method-based simulator, which we carefully tailor to model garments with complex deformations and contact with the underlying body by incorporating an anisotropic constitutive model and a novel collision handling algorithm. We combine this dynamics modeling scheme with our canonical avatar that can be rendered using 3D Gaussian Splatting with quasi-shadowing, enabling high-fidelity rendering for physically realistic animations. In our experiments, we demonstrate that MPMAvatar significantly outperforms the existing state-of-the-art physics-based avatar in terms of (1) dynamics modeling accuracy, (2) rendering accuracy, and (3) robustness and efficiency. Additionally, we present a novel application in which our avatar generalizes to unseen interactions in a zero-shot manner-which was not achievable with previous learning-based methods due to their limited simulation generalizability. Our project page is at: https://KAISTChangmin.github.io/MPMAvatar/2025-10-02T02:51:45ZAccepted to NeurIPS 2025Changmin LeeJihyun LeeTae-Kyun Kimhttp://arxiv.org/abs/2510.01061v1ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction2025-10-01T16:01:17ZDistribution matching is central to many vision and graphics tasks, where the widely used Wasserstein distance is too costly to compute for high dimensional distributions. The Sliced Wasserstein Distance (SWD) offers a scalable alternative, yet its Monte Carlo estimator suffers from high variance, resulting in noisy gradients and slow convergence. We introduce Reservoir SWD (ReSWD), which integrates Weighted Reservoir Sampling into SWD to adaptively retain informative projection directions in optimization steps, resulting in stable gradients while remaining unbiased. Experiments on synthetic benchmarks and real-world tasks such as color correction and diffusion guidance show that ReSWD consistently outperforms standard SWD and other variance reduction baselines. Project page: https://reservoirswd.github.io/2025-10-01T16:01:17ZMark BossAndreas EngelhardtSimon DonnéVarun Jampanihttp://arxiv.org/abs/2510.00314v1Motion In-Betweening for Densely Interacting Characters2025-09-30T22:11:39ZMotion in-betweening is the problem to synthesize movement between keyposes. Traditional research focused primarily on single characters. Extending them to densely interacting characters is highly challenging, as it demands precise spatial-temporal correspondence between the characters to maintain the interaction, while creating natural transitions towards predefined keyposes. In this research, we present a method for long-horizon interaction in-betweening that enables two characters to engage and respond to one another naturally. To effectively represent and synthesize interactions, we propose a novel solution called Cross-Space In-Betweening, which models the interactions of each character across different conditioning representation spaces. We further observe that the significantly increased constraints in interacting characters heavily limit the solution space, leading to degraded motion quality and diminished interaction over time. To enable long-horizon synthesis, we present two solutions to maintain long-term interaction and motion quality, thereby keeping synthesis in the stable region of the solution space.We first sustain interaction quality by identifying periodic interaction patterns through adversarial learning. We further maintain the motion quality by learning to refine the drifted latent space and prevent pose error accumulation. We demonstrate that our approach produces realistic, controllable, and long-horizon in-between motions of two characters with dynamic boxing and dancing actions across multiple keyposes, supported by extensive quantitative evaluations and user studies.2025-09-30T22:11:39ZXiaotang ZhangZiyi ChangQianhui MenHubert P. H. Shum10.1145/3757377.3763950http://arxiv.org/abs/2509.26233v13DiFACE: Synthesizing and Editing Holistic 3D Facial Animation2025-09-30T13:30:01ZCreating personalized 3D animations with precise control and realistic head motions remains challenging for current speech-driven 3D facial animation methods. Editing these animations is especially complex and time consuming, requires precise control and typically handled by highly skilled animators. Most existing works focus on controlling style or emotion of the synthesized animation and cannot edit/regenerate parts of an input animation. They also overlook the fact that multiple plausible lip and head movements can match the same audio input. To address these challenges, we present 3DiFACE, a novel method for holistic speech-driven 3D facial animation. Our approach produces diverse plausible lip and head motions for a single audio input and allows for editing via keyframing and interpolation. Specifically, we propose a fully-convolutional diffusion model that can leverage the viseme-level diversity in our training corpus. Additionally, we employ a speaking-style personalization and a novel sparsely-guided motion diffusion to enable precise control and editing. Through quantitative and qualitative evaluations, we demonstrate that our method is capable of generating and editing diverse holistic 3D facial animations given a single audio input, with control between high fidelity and diversity. Code and models are available here: https://balamuruganthambiraja.github.io/3DiFACE2025-09-30T13:30:01ZBalamurugan ThambirajaMalte PrinzlerSadegh AliakbarianDarren CoskerJustus Thies