https://arxiv.org/api/5UaaywVH0x+fe4q3U/h47CjzYb8 2026-06-26T04:01:54Z 9390 1440 15 http://arxiv.org/abs/2504.16564v2 SAIP-Net: Enhancing Remote Sensing Image Segmentation via Spectral Adaptive Information Propagation 2025-10-14T16:50:12Z

Semantic segmentation of remote sensing imagery demands precise spatial boundaries and robust intra-class consistency, challenging conventional hierarchical models. To address limitations arising from spatial domain feature fusion and insufficient receptive fields, this paper introduces SAIP-Net, a novel frequency-aware segmentation framework that leverages Spectral Adaptive Information Propagation. SAIP-Net employs adaptive frequency filtering and multi-scale receptive field enhancement to effectively suppress intra-class feature inconsistencies and sharpen boundary lines. Comprehensive experiments demonstrate significant performance improvements over state-of-the-art methods, highlighting the effectiveness of spectral-adaptive strategies combined with expanded receptive fields for remote sensing image segmentation.

2025-04-23T09:43:58Z Zhongtao Wang Xizhe Cao Yisong Chen Guoping Wang http://arxiv.org/abs/2412.17144v3 Augmented Mass-Spring model for Real-Time Dense Hair Simulation 2025-10-14T13:22:37Z

We propose a novel Augmented Mass-Spring (AMS) model for real-time simulation of dense hair at strand level. Our approach considers the traditional edge, bending, and torsional degrees of freedom in mass-spring systems, but incorporates an additional one-way biphasic coupling with a ghost rest-shape configuration. Trough multiple evaluation experiments with varied dynamical settings, we show that AMS improves the stability of the simulation in comparison to mass-spring discretizations, preserves global features, and enables the simulation of non-Hookean effects. Using an heptadiagonal decomposition of the resulting matrix, our approach provides the efficiency advantages of mass-spring systems over more complex constitutive hair models, while enabling a more robust simulation of multiple strand configurations. Finally, our results demonstrate that our framework enables the generation, complex interactivity, and editing of simulation-ready dense hair assets in real-time. More details can be found on our project page: https://agrosamad.github.io/AMS/.

2024-12-22T19:49:39Z Jorge Alejandro Amador Herrera Yi Zhou Xin Sun Zhixin Shu Chengan He Sören Pirk Dominik L. Michels http://arxiv.org/abs/2505.15058v2 AsynFusion: Towards Asynchronous Latent Consistency Models for Decoupled Whole-Body Audio-Driven Avatars 2025-10-14T07:40:58Z

Whole-body audio-driven avatar pose and expression generation is a critical task for creating lifelike digital humans and enhancing the capabilities of interactive virtual agents, with wide-ranging applications in virtual reality, digital entertainment, and remote communication. Existing approaches often generate audio-driven facial expressions and gestures independently, which introduces a significant limitation: the lack of seamless coordination between facial and gestural elements, resulting in less natural and cohesive animations. To address this limitation, we propose AsynFusion, a novel framework that leverages diffusion transformers to achieve harmonious expression and gesture synthesis. The proposed method is built upon a dual-branch DiT architecture, which enables the parallel generation of facial expressions and gestures. Within the model, we introduce a Cooperative Synchronization Module to facilitate bidirectional feature interaction between the two modalities, and an Asynchronous LCM Sampling strategy to reduce computational overhead while maintaining high-quality outputs. Extensive experiments demonstrate that AsynFusion achieves state-of-the-art performance in generating real-time, synchronized whole-body animations, consistently outperforming existing methods in both quantitative and qualitative evaluations.

2025-05-21T03:28:53Z 15pages, conference Tianbao Zhang Jian Zhao Yuer Li Zheng Zhu Ping Hu Zhaoxin Fan Wenjun Wu Xuelong Li http://arxiv.org/abs/2510.12094v1 H4G: Unlocking Faithful Inference for Zero-Shot Graph Learning in Hyperbolic Space 2025-10-14T02:58:57Z

Text-attributed graphs are widely used across domains, offering rich opportunities for zero-shot learning via graph-text alignment. However, existing methods struggle with tasks requiring fine-grained pattern recognition, particularly on heterophilic graphs. Through empirical and theoretical analysis, we identify an \textbf{over-abstraction problem}: current approaches operate at excessively large hyperbolic radii, compressing multi-scale structural information into uniform high-level abstractions. This abstraction-induced information loss obscures critical local patterns essential for accurate predictions. By analyzing embeddings in hyperbolic space, we demonstrate that optimal graph learning requires \textbf{faithful preservation} of fine-grained structural details, better retained by representations positioned closer to the origin. To address this, we propose \textbf{H4G}, a framework that systematically reduces embedding radii using learnable block-diagonal scaling matrices and Möbius matrix multiplication. This approach restores access to fine-grained patterns while maintaining global receptive ability with minimal computational overhead. Experiments show H4G achieves state-of-the-art zero-shot performance with \textbf{12.8\%} improvement on heterophilic graphs and \textbf{8.4\%} on homophilic graphs, confirming that radius reduction enables faithful multi-scale representation for advancing zero-shot graph learning.

2025-10-14T02:58:57Z Heng Zhang Tianyi Zhang Zijun Liu Yuling Shi Yaomin Shen Haochen You Haichuan Hu Lubin Gan Jin Huang http://arxiv.org/abs/2510.12087v1 Can Representation Gaps Be the Key to Enhancing Robustness in Graph-Text Alignment? 2025-10-14T02:49:00Z

Representation learning on text-attributed graphs (TAGs) integrates structural connectivity with rich textual semantics, enabling applications in diverse domains. Current methods largely rely on contrastive learning to maximize cross-modal similarity, assuming tighter coupling between graph and text representations improves transfer performance. However, our empirical analysis reveals that both natural gap expansion and forced gap reduction result in performance degradation by disrupting pre-trained knowledge structures and impairing generalization. This arises from the geometric incompatibility between encoders, where graph encoders capture topological patterns, while text encoders capture semantic structures. Over-alignment compresses these distinct spaces into shared subspaces, causing structure collapse that diminishes both topological reasoning and semantic understanding. We propose \textbf{LLM4GTA}, a gap-aware alignment framework that preserves representation gaps as geometric necessities for maintaining modality-specific knowledge and improving transfer performance. LLM4GTA includes an adaptive gap preservation module to prevent over-alignment by monitoring similarity evolution and an intra-modal compensation mechanism that boosts discriminative power using auxiliary classifiers in graph space. Extensive experiments show significant improvements over existing methods in zero-shot and few-shot scenarios.

2025-10-14T02:49:00Z Heng Zhang Tianyi Zhang Yuling Shi Xiaodong Gu Yaomin Shen Zijian Zhang Yilei Yuan Hao Zhang Jin Huang http://arxiv.org/abs/2504.01483v4 GarmageNet: A Multimodal Generative Framework for Sewing Pattern Design and Generic Garment Modeling 2025-10-14T02:37:25Z

Realistic digital garment modeling remains a labor-intensive task due to the intricate process of translating 2D sewing patterns into high-fidelity, simulation-ready 3D garments. We introduce GarmageNet, a unified generative framework that automates the creation of 2D sewing patterns, the construction of sewing relationships, and the synthesis of 3D garment initializations compatible with physics-based simulation. Central to our approach is Garmage, a novel garment representation that encodes each panel as a structured geometry image, effectively bridging the semantic and geometric gap between 2D structural patterns and 3D garment geometries. Followed by GarmageNet, a latent diffusion transformer to synthesize panel-wise geometry images and GarmageJigsaw, a neural module for predicting point-to-point sewing connections along panel contours. To support training and evaluation, we build GarmageSet, a large-scale dataset comprising 14,801 professionally designed garments with detailed structural and style annotations. Our method demonstrates versatility and efficacy across multiple application scenarios, including scalable garment generation from multi-modal design concepts (text prompts, sketches, photographs), automatic modeling from raw flat sewing patterns, pattern recovery from unstructured point clouds, and progressive garment editing using conventional instructions, laying the foundation for fully automated, production-ready pipelines in digital fashion. Project page: https://style3d.github.io/garmagenet/.

2025-04-02T08:37:32Z 23 pages,20 figures ACM Trans. Graph. 44, 6, Article 216 (2025) Siran Li Ruiyang Liu Chen Liu Zhendong Wang Gaofeng He Yong-Lu Li Xiaogang Jin Huamin Wang 10.1145/3763271 http://arxiv.org/abs/2510.12053v1 Coordinate Condensation: Subspace-Accelerated Coordinate Descent for Physics-Based Simulation 2025-10-14T01:36:48Z

We introduce Coordinate Condensation, a variant of coordinate descent that accelerates physics-based simulation by augmenting local coordinate updates with a Schur-complement-based subspace correction. Recent work by Lan et al. 2025 (JGS2) uses perturbation subspaces to augment local solves to account for global coupling, but their approach introduces damping that can degrade convergence. We reuse this subspace but solve for local and subspace displacements independently, eliminating this damping. For problems where the subspace adequately captures global coupling, our method achieves near-Newton convergence while retaining the efficiency and parallelism of coordinate descent. Through experiments across varying material stiffnesses and mesh resolutions, we show substantially faster convergence than both standard coordinate descent and JGS2. We also characterize when subspace-based coordinate methods succeed or fail, offering insights for future solver design.

2025-10-14T01:36:48Z Ty Trusty http://arxiv.org/abs/2502.05339v2 Fast Subspace Fluid Simulation with a Temporally-Aware Basis 2025-10-13T22:55:15Z

We present a novel reduced-order fluid simulation technique leveraging Dynamic Mode Decomposition (DMD) to achieve fast, memory-efficient, and user-controllable subspace simulation. We demonstrate that our approach combines the strengths of both spatial reduced order models (ROMs) as well as spectral decompositions. By optimizing for the operator that evolves a system state from one timestep to the next, rather than the system state itself, we gain both the compressive power of spatial ROMs as well as the intuitive physical dynamics of spectral methods. The latter property is of particular interest in graphics applications, where user control of fluid phenomena is of high demand. We demonstrate this in various applications including spatial and temporal modulation tools and fluid upscaling with added turbulence. We adapt DMD for graphics applications by reducing computational overhead, incorporating user-defined force inputs, and optimizing memory usage with randomized SVD. The integration of OptDMD and DMD with Control (DMDc) facilitates noise-robust reconstruction and real-time user interaction. We demonstrate the technique's robustness across diverse simulation scenarios, including artistic editing, time-reversal, and super-resolution. Through experimental validation on challenging scenarios, such as colliding vortex rings and boundary-interacting plumes, our method also exhibits superior performance and fidelity with significantly fewer basis functions compared to existing spatial ROMs. The inherent linearity of the DMD operator enables unique application modes, such as time-reversible fluid simulation. This work establishes another avenue for developing real-time, high-quality fluid simulations, enriching the space of fluid simulation techniques in interactive graphics and animation.

2025-02-07T21:21:41Z ACM Trans. Graph. 44 (4) (2025) 1-18 Siyuan Chen Yixin Chen Jonathan Panuelos Otman Benchekroun Yue Chang Eitan Grinspun Zhecheng Wang 10.1145/3730826 http://arxiv.org/abs/2510.03597v3 Neon: Negative Extrapolation From Self-Training Improves Image Generation 2025-10-13T22:13:35Z

Scaling generative AI models is bottlenecked by the scarcity of high-quality training data. The ease of synthesizing from a generative model suggests using (unverified) synthetic data to augment a limited corpus of real data for the purpose of fine-tuning in the hope of improving performance. Unfortunately, however, the resulting positive feedback loop leads to model autophagy disorder (MAD, aka model collapse) that results in a rapid degradation in sample quality and/or diversity. In this paper, we introduce Neon (for Negative Extrapolation frOm self-traiNing), a new learning method that turns the degradation from self-training into a powerful signal for self-improvement. Given a base model, Neon first fine-tunes it on its own self-synthesized data but then, counterintuitively, reverses its gradient updates to extrapolate away from the degraded weights. We prove that Neon works because typical inference samplers that favor high-probability regions create a predictable anti-alignment between the synthetic and real data population gradients, which negative extrapolation corrects to better align the model with the true data distribution. Neon is remarkably easy to implement via a simple post-hoc merge that requires no new real data, works effectively with as few as 1k synthetic samples, and typically uses less than 1% additional training compute. We demonstrate Neon's universality across a range of architectures (diffusion, flow matching, autoregressive, and inductive moment matching models) and datasets (ImageNet, CIFAR-10, and FFHQ). In particular, on ImageNet 256x256, Neon elevates the xAR-L model to a new state-of-the-art FID of 1.02 with only 0.36% additional training compute. Code is available at https://github.com/VITA-Group/Neon

2025-10-04T01:20:30Z Sina Alemohammad Zhangyang Wang Richard G. Baraniuk http://arxiv.org/abs/2510.11927v1 Visual Stenography: Feature Recreation and Preservation in Sketches of Noisy Line Charts 2025-10-13T20:47:18Z

Line charts surface many features in time series data, from trends to periodicity to peaks and valleys. However, not every potentially important feature in the data may correspond to a visual feature which readers can detect or prioritize. In this study, we conducted a visual stenography task, where participants re-drew line charts to solicit information about the visual features they believed to be important. We systematically varied noise levels (SNR ~5-30 dB) across line charts to observe how visual clutter influences which features people prioritize in their sketches. We identified three key strategies that correlated with the noise present in the stimuli: the Replicator attempted to retain all major features of the line chart including noise; the Trend Keeper prioritized trends disregarding periodicity and peaks; and the De-noiser filtered out noise while preserving other features. Further, we found that participants tended to faithfully retain trends and peaks and valleys when these features were present, while periodicity and noise were represented in more qualitative or gestural ways: semantically rather than accurately. These results suggest a need to consider more flexible and human-centric ways of presenting, summarizing, pre-processing, or clustering time series data.

2025-10-13T20:47:18Z Rifat Ara Proma Michael Correll Ghulam Jilani Quadri Paul Rosen http://arxiv.org/abs/2510.11912v1 Evaluating Line Chart Strategies for Mitigating Density of Temporal Data: The Impact on Trend, Prediction, and Decision-Making 2025-10-13T20:24:18Z

Overplotted line charts can obscure trends in temporal data and hinder prediction. We conduct a user study comparing three alternatives-aggregated, trellis, and spiral line charts against standard line charts on tasks involving trend identification, making predictions, and decision-making. We found aggregated charts performed similarly to standard charts and support more accurate trend recognition and prediction; trellis and spiral charts generally lag. We also examined the impact on decision-making via a trust game. The results showed similar trust in standard and aggregated charts, varied trust in spiral charts, and a lean toward distrust in trellis charts. These findings provide guidance for practitioners choosing visualization strategies for dense temporal data.

2025-10-13T20:24:18Z Rifat Ara Proma Ghulam Jilani Quadri Paul Rosen http://arxiv.org/abs/2510.11567v1 A Framework for Low-Effort Training Data Generation for Urban Semantic Segmentation 2025-10-13T16:12:29Z

Synthetic datasets are widely used for training urban scene recognition models, but even highly realistic renderings show a noticeable gap to real imagery. This gap is particularly pronounced when adapting to a specific target domain, such as Cityscapes, where differences in architecture, vegetation, object appearance, and camera characteristics limit downstream performance. Closing this gap with more detailed 3D modelling would require expensive asset and scene design, defeating the purpose of low-cost labelled data. To address this, we present a new framework that adapts an off-the-shelf diffusion model to a target domain using only imperfect pseudo-labels. Once trained, it generates high-fidelity, target-aligned images from semantic maps of any synthetic dataset, including low-effort sources created in hours rather than months. The method filters suboptimal generations, rectifies image-label misalignments, and standardises semantics across datasets, transforming weak synthetic data into competitive real-domain training sets. Experiments on five synthetic datasets and two real target datasets show segmentation gains of up to +8.0%pt. mIoU over state-of-the-art translation methods, making rapidly constructed synthetic datasets as effective as high-effort, time-intensive synthetic datasets requiring extensive manual design. This work highlights a valuable collaborative paradigm where fast semantic prototyping, combined with generative models, enables scalable, high-quality training data creation for urban scene understanding.

2025-10-13T16:12:29Z Denis Zavadski Damjan Kalšan Tim Küchler Haebom Lee Stefan Roth Carsten Rother http://arxiv.org/abs/2512.05121v1 PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles 2025-10-13T13:21:38Z

PESTalk is a novel method for generating 3D facial animations with personalized emotional styles directly from speech. It overcomes key limitations of existing approaches by introducing a Dual-Stream Emotion Extractor (DSEE) that captures both time and frequency-domain audio features for fine-grained emotion analysis, and an Emotional Style Modeling Module (ESMM) that models individual expression patterns based on voiceprint characteristics. To address data scarcity, the method leverages a newly constructed 3D-EmoStyle dataset. Evaluations demonstrate that PESTalk outperforms state-of-the-art methods in producing realistic and personalized facial animations.

2025-10-13T13:21:38Z Tianshun Han Benjia Zhou Ajian Liu Yanyan Liang Du Zhang Zhen Lei Jun Wan http://arxiv.org/abs/2503.18254v2 Surface-Aware Distilled 3D Semantic Features 2025-10-13T12:58:01Z

Many 3D tasks such as pose alignment, animation, motion transfer, and 3D reconstruction rely on establishing correspondences between 3D shapes. This challenge has recently been approached by pairwise matching of semantic features from pre-trained vision models. However, despite their power, these features struggle to differentiate instances of the same semantic class such as ``left hand'' versus ``right hand'' which leads to substantial mapping errors. To solve this, we learn a surface-aware embedding space that is robust to these ambiguities while facilitating shared mapping for an entire family of 3D shapes. Importantly, our approach is self-supervised and requires only a small number of unpaired training meshes to infer features for new possibly imperfect 3D shapes at test time. We achieve this by introducing a contrastive loss that preserves the semantic content of the features distilled from foundational models while disambiguating features located far apart on the shape's surface. We observe superior performance in correspondence matching benchmarks and enable downstream applications including 2D-to-3D and 3D-to-3D texture transfer, in-part segmentation, pose alignment, and motion transfer in low-data regimes. Unlike previous pairwise approaches, our solution constructs a joint embedding space, where both seen and unseen 3D shapes are implicitly aligned without further optimization. The code is available at https://graphics.tudelft.nl/SurfaceAware3DFeatures.

2025-03-24T00:36:16Z Lukas Uzolas Elmar Eisemann Petr Kellnhofer http://arxiv.org/abs/2412.14005v2 Real-Time Position-Aware View Synthesis from Single-View Input 2025-10-13T12:18:29Z

Recent advancements in view synthesis have significantly enhanced immersive experiences across various computer graphics and multimedia applications, including telepresence and entertainment. By enabling the generation of new perspectives from a single input view, view synthesis allows users to better perceive and interact with their environment. However, many state-of-the-art methods, while achieving high visual quality, face limitations in real-time performance, which makes them less suitable for live applications where low latency is critical. In this paper, we present a lightweight, position-aware network designed for real-time view synthesis from a single input image and a target camera pose. The proposed framework consists of a Position Aware Embedding, which efficiently maps positional information from the target pose to generate high dimensional feature maps. These feature maps, along with the input image, are fed into a Rendering Network that merges features from dual encoder branches to resolve both high and low level details, producing a realistic new view of the scene. Experimental results demonstrate that our method achieves superior efficiency and visual quality compared to existing approaches, particularly in handling complex translational movements without explicit geometric operations like warping. This work marks a step toward enabling real-time live and interactive telepresence applications.

2024-12-18T16:20:21Z Manu Gond Emin Zerman Sebastian Knorr Mårten Sjöström