https://arxiv.org/api/MWJxKvNtwf7iGZRSwSfTAHOLjYA 2026-06-17T23:04:01Z 9346 810 15 http://arxiv.org/abs/2602.22565v1 SwiftNDC: Fast Neural Depth Correction for High-Fidelity 3D Reconstruction 2026-02-26T03:07:53Z

Depth-guided 3D reconstruction has gained popularity as a fast alternative to optimization-heavy approaches, yet existing methods still suffer from scale drift, multi-view inconsistencies, and the need for substantial refinement to achieve high-fidelity geometry. Here, we propose SwiftNDC, a fast and general framework built around a Neural Depth Correction field that produces cross-view consistent depth maps. From these refined depths, we generate a dense point cloud through back-projection and robust reprojection-error filtering, obtaining a clean and uniformly distributed geometric initialization for downstream reconstruction. This reliable dense geometry substantially accelerates 3D Gaussian Splatting (3DGS) for mesh reconstruction, enabling high-quality surfaces with significantly fewer optimization iterations. For novel-view synthesis, SwiftNDC can also improve 3DGS rendering quality, highlighting the benefits of strong geometric initialization. We conduct a comprehensive study across five datasets, including two for mesh reconstruction, as well as three for novel-view synthesis. SwiftNDC consistently reduces running time for accurate mesh reconstruction and boosts rendering fidelity for view synthesis, demonstrating the effectiveness of combining neural depth refinement with robust geometric initialization for high-fidelity and efficient 3D reconstruction.

2026-02-26T03:07:53Z Kang Han Wei Xiang Lu Yu Mathew Wyatt Gaowen Liu Ramana Rao Kompella http://arxiv.org/abs/2508.12691v2 Adaptive Hybrid Caching for Efficient Text-to-Video Diffusion Model Acceleration 2026-02-26T02:21:05Z

Efficient video generation models are increasingly vital for multimedia synthetic content generation. Leveraging the Transformer architecture and the diffusion process, video DiT models have emerged as a dominant approach for high-quality video generation. However, their multi-step iterative denoising process incurs high computational cost and inference latency. Caching, a widely adopted optimization method in DiT models, leverages the redundancy in the diffusion process to skip computations in different granularities (e.g., step, cfg, block). Nevertheless, existing caching methods are limited to single-granularity strategies, struggling to balance generation quality and inference speed in a flexible manner. In this work, we propose MixCache, a training-free caching-based framework for efficient video DiT inference. It first distinguishes the interference and boundary between different caching strategies, and then introduces a context-aware cache triggering strategy to determine when caching should be enabled, along with an adaptive hybrid cache decision strategy for dynamically selecting the optimal caching granularity. Extensive experiments on diverse models demonstrate that, MixCache can significantly accelerate video generation (e.g., 1.94$\times$ speedup on Wan 14B, 1.97$\times$ speedup on HunyuanVideo) while delivering both superior generation quality and inference efficiency compared to baseline methods.

2025-08-18T07:49:33Z 9 pages, 12 figures Yuanxin Wei Lansong Diao Bujiao Chen Shenggan Cheng Zhengping Qian Wenyuan Yu Nong Xiao Wei Lin Jiangsu Du http://arxiv.org/abs/2510.10611v3 HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-Agent Communication 2026-02-26T02:07:45Z

Recent advances in large language model-powered multi-agent systems have demonstrated remarkable collective intelligence through effective communication. However, existing approaches face two primary challenges: (i) \textit{Ineffective group collaboration modeling}, as they rely on pairwise edge representations in graph structures, limiting their ability to capture relationships among multiple agents; and (ii) \textit{Limited task-adaptiveness in communication topology design}, leading to excessive communication cost for simple tasks and insufficient coordination for complex scenarios. These issues restrict the scalability and practical deployment of adaptive collaboration frameworks. To address these challenges, we propose \textbf{HyperAgent}, a hypergraph-based framework that optimizes communication topologies and effectively captures group collaboration patterns using direct hyperedge representations. Unlike edge-based approaches, HyperAgent uses hyperedges to link multiple agents within the same subtask and employs hypergraph convolutional layers to achieve one-step information aggregation in collaboration groups. Additionally, it incorporates a variational autoencoder framework with sparsity regularization to dynamically adjust hypergraph topologies based on task complexity. Experiments highlight the superiority of HyperAgent in both performance and efficiency. For instance, on GSM8K, HyperAgent achieves 95.07\% accuracy while reducing token consumption by 25.33\%, demonstrating the potential of hypergraph-based optimization for multi-agent communication.

2025-10-12T13:47:42Z This submission has been withdrawn by the authors due to a fundamental error in the methodology that affects the validity of the main results Heng Zhang Yuling Shi Xiaodong Gu Zijian Zhang Haochen You Lubin Gan Yilei Yuan Jin Huang http://arxiv.org/abs/2510.10585v3 D3MAS: Decompose, Deduce, and Distribute for Enhanced Knowledge Sharing in Multi-Agent Systems 2026-02-26T02:06:51Z

Multi-agent systems powered by large language models exhibit strong capabilities in collaborative problem-solving. However, these systems suffer from substantial knowledge redundancy. Agents duplicate efforts in retrieval and reasoning processes. This inefficiency stems from a deeper issue: current architectures lack mechanisms to ensure agents share minimal sufficient information at each operational stage. Empirical analysis reveals an average knowledge duplication rate of 47.3\% across agent communications. We propose D3MAS (Decompose, Deduce, and Distribute), a hierarchical coordination framework addressing redundancy through structural design rather than explicit optimization. The framework organizes collaboration across three coordinated layers. Task decomposition filters irrelevant sub-problems early. Collaborative reasoning captures complementary inference paths across agents. Distributed memory provides access to non-redundant knowledge. These layers coordinate through structured message passing in a unified heterogeneous graph. This cross-layer alignment ensures information remains aligned with actual task needs. Experiments on four challenging datasets show that D3MAS consistently improves reasoning accuracy by 8.7\% to 15.6\% and reduces knowledge redundancy by 46\% on average.

2025-10-12T13:01:41Z This submission has been withdrawn by the authors due to a fundamental error in the methodology that affects the validity of the main results Heng Zhang Yuling Shi Xiaodong Gu Haochen You Zijian Zhang Lubin Gan Yilei Yuan Jin Huang http://arxiv.org/abs/2602.22430v1 TopoEdit: Fast Post-Optimization Editing of Topology Optimized Structures 2026-02-25T21:41:44Z

Despite topology optimization producing high-performance structures, late-stage localized revisions remain brittle: direct density-space edits (e.g., warping pixels, inserting holes, swapping infill) can sever load paths and sharply degrade compliance, while re-running optimization is slow and may drift toward a qualitatively different design. We present TopoEdit, a fast post-optimization editor that demonstrates how structured latent embeddings from a pre-trained topology foundation model (OAT) can be repurposed as an interface for physics-aware engineering edits. Given an optimized topology, TopoEdit encodes it into OAT's spatial latent, applies partial noising to preserve instance identity while increasing editability, and injects user intent through an edit-then-denoise diffusion pipeline. We instantiate three edit operators: drag-based topology warping with boundary-condition-consistent conditioning updates, shell-infill lattice replacement using a lattice-anchored reference latent with updated volume-fraction conditioning, and late-stage no-design region enforcement via masked latent overwrite followed by diffusion-based recovery. A consistency-preserving guided DDIM procedure localizes changes while allowing global structural adaptation; multiple candidates can be sampled and selected using a compliance-aware criterion, with optional short SIMP refinement for warps. Across diverse case studies and large edit sweeps, TopoEdit produces intention-aligned modifications that better preserve mechanical performance and avoid catastrophic failure modes compared to direct density-space edits, while generating edited candidates in sub-second diffusion time per sample.

2026-02-25T21:41:44Z Hongrui Chen Josephine V. Carstensen Faez Ahmed http://arxiv.org/abs/2510.10218v2 Sketch Animation: State-of-the-art Report 2026-02-25T20:02:03Z

Sketch animation has emerged as a transformative technology, bridging art and science to create dynamic visual narratives across various fields such as entertainment, education, healthcare, and virtual reality. This survey explores recent trends and innovations in sketch animation, with a focus on methods that have advanced the state of the art. The paper categorizes and evaluates key methodologies, including keyframe interpolation, physics-based animation, data-driven, motion capture, and deep learning approaches. We examine the integration of artificial intelligence, real-time rendering, and cloud-based solutions, highlighting their impact on enhancing realism, scalability, and interactivity. Additionally, the survey delves into the challenges of computational complexity, scalability, and user-friendly interfaces, as well as emerging opportunities within metaverse applications and human-machine interaction. By synthesizing insights from a wide array of research, this survey aims to provide a comprehensive understanding of the current landscape and future directions of sketch animation, serving as a resource for both academics and industry professionals seeking to innovate in this dynamic field.

2025-10-11T13:48:27Z Gaurav Rai Ojaswa Sharma http://arxiv.org/abs/2504.11435v3 Robust Containment Queries over Collections of Trimmed NURBS Surfaces via Generalized Winding Numbers 2026-02-25T19:53:24Z

We propose a containment query that is robust to the watertightness of regions bound by trimmed NURBS surfaces, as this property is difficult to guarantee for in-the-wild CAD models. Containment is determined through the generalized winding number (GWN), a mathematical construction that is indifferent to the arrangement of surfaces in the shape. Applying contemporary techniques for the 3D GWN to trimmed NURBS surfaces requires some form of geometric discretization, introducing computational inefficiency to the algorithm and even risking containment misclassifications near the surface. In contrast, our proposed method leverages properties of the 3D solid angle to solve the relevant surface integral using a boundary formulation with rapidly converging adaptive quadrature. Batches of queries are further accelerated by \textit{memoizing} (i.e. caching and reusing) quadrature node positions and tangents as they are evaluated. We demonstrate that our GWN method is robust to complex trimming geometry in a CAD model, and is accurate up to arbitrary precision at arbitrary distances from the surface. The derived containment query is therefore robust to model non-watertightness while respecting all curved features of the input shape.

2025-04-15T17:51:39Z 18 Pages, 16 Figures, 1 Table ACM Transactions on Graphics, Volume 45, Issue 3, Article No.: 26, Pages 1 - 21 (2026) Jacob Spainhour Kenneth Weiss 10.1145/3797957 http://arxiv.org/abs/2602.18741v2 Compact Hadamard Latent Codes for Efficient Spectral Rendering 2026-02-25T19:48:14Z

Spectral rendering accurately reproduces wavelength-dependent appearance but is computationally expensive, as shading must be evaluated at many wavelength samples and scales roughly linearly with the number of samples. It also requires spectral textures and lights throughout the rendering pipeline. We propose Hadamard spectral codes, a compact latent representation that enables spectral rendering using standard RGB rendering operations. Spectral images are approximated with a small number of RGB rendering passes, followed by a decoding step. Our key requirement is latent linearity: scaling and addition in spectral space correspond to scaling and addition of codes, and the element-wise product of spectra (for example reflectance times illumination) is approximated by the element-wise product of their latent codes. We show that an exact low-dimensional algebra-preserving representation cannot exist for arbitrary spectra when the latent dimension k is smaller than the number of spectral samples n. We therefore introduce a learned non-negative linear encoder and decoder architecture that preserves scaling and addition exactly while encouraging approximate multiplicativity under the Hadamard product. With k = 6, we render k/3 = 2 RGB images per frame using an unmodified RGB renderer, reconstruct the latent image, and decode to high-resolution spectra or XYZ or RGB. Experiments on 3D scenes demonstrate that k = 6 significantly reduces color error compared to RGB baselines while being substantially faster than naive n-sample spectral rendering. Using k = 9 provides higher-quality reference results. We further introduce a lightweight neural upsampling network that maps RGB assets directly to latent codes, enabling integration of legacy RGB content into the spectral pipeline while maintaining perceptually accurate colors in rendered images.

2026-02-21T07:30:09Z Jiaqi Yu Dar'ya Guarnera Giuseppe Claudio Guarnera http://arxiv.org/abs/2506.22685v3 Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment 2026-02-25T12:55:13Z

In this paper, we investigate the semantic collapsing problem in generative personalization, an under-explored topic where the learned visual concept ($V$) gradually shifts from its original textual meaning and comes to dominate other concepts in multi-concept input prompts. This issue not only reduces the semantic richness of complex input prompts like "a photo of $V$ wearing glasses and playing guitar" into simpler, less contextually rich forms such as "a photo of $V$" but also leads to simplified output images that fail to capture the intended concept. We identify the root cause as unconstrained optimisation, which allows the learned embedding $V$ to drift arbitrarily in the embedding space, both in direction and magnitude. To address this, we propose a simple yet effective training-free method that adjusts the magnitude and direction of pre-trained embedding at inference time, effectively mitigating the semantic collapsing problem. Our method is broadly applicable across different personalization methods and demonstrates significant improvements in text-image alignment in diverse use cases. Our code is anonymously published at https://github.com/tuananhbui89/Embedding-Adjustment

2025-06-27T23:40:27Z Anh Bui Trang Vu Trung Le Junae Kim Tamas Abraham Rollin Omari Amar Kaur Dinh Phung http://arxiv.org/abs/2602.21864v1 DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs 2026-02-25T12:45:45Z

Vision-Language Models (VLMs) have emerged as versatile solutions for zero-shot question answering (QA) across various domains. However, enabling VLMs to effectively comprehend structured graphs and perform accurate, efficient QA remains challenging. Existing approaches typically rely on one single graph topology representation (GTR), such as fixed-style visual images or unified text descriptions. This ``one-size-fits-all'' strategy often neglects model-specific and task-specific preferences, resulting in inaccurate or over-lengthy responses to graph-related queries. To address this, we propose the $\mbox{DynamicGTR}$ framework, which dynamically selects the optimal GTR for each query during inference, thereby enhancing the zero-shot graph QA capabilities of VLMs with a customizable accuracy and brevity trade-off. Extensive experiments show that DynamicGTR not only improves VLM-based graph algorithm QA performance but also successfully transfers the experience trained from synthetic graph algorithm tasks to real-world applications like link prediction and node classification, without any additional training. Additionally, DynamicGTR demonstrates strong transferability across tasks, domains, and models, suggesting its potential as a flexible solution for broad graph scenarios.

2026-02-25T12:45:45Z CVPR 2026 Yanbin Wei Jiangyue Yan Chun Kang Yang Chen Hua Liu James Kwok Yu Zhang http://arxiv.org/abs/2602.22275v1 Deep Accurate Solver for the Geodesic Problem 2026-02-25T09:39:49Z

A common approach to compute distances on continuous surfaces is by considering a discretized polygonal mesh approximating the surface and estimating distances on the polygon. We show that exact geodesic distances restricted to the polygon are at most second-order accurate with respect to the distances on the corresponding continuous surface. By order of accuracy we refer to the convergence rate as a function of the average distance between sampled points. Next, a higher-order accurate deep learning method for computing geodesic distances on surfaces is introduced. Traditionally, one considers two main components when computing distances on surfaces: a numerical solver that locally approximates the distance function, and an efficient causal ordering scheme by which surface points are updated. Classical minimal path methods often exploit a dynamic programming principle with quasi-linear computational complexity in the number of sampled points. The quality of the distance approximation is determined by the local solver that is revisited in this paper. To improve state of the art accuracy, we consider a neural network-based local solver which implicitly approximates the structure of the continuous surface. We supply numerical evidence that the proposed learned update scheme provides better accuracy compared to the best possible polyhedral approximations and previous learning-based methods. The result is a third-order accurate solver with a bootstrapping-recipe for further improvement.

2026-02-25T09:39:49Z Extended version of Deep Accurate Solver for the Geodesic Problem originally published in Scale Space and Variational Methods in Computer Vision (SSVM 2023), Lecture Notes in Computer Science, Springer. This version includes additional experiments and detailed analysis Scale Space and Variational Methods in Computer Vision (SSVM 2023), Lecture Notes in Computer Science, vol. 14009, Springer Saar Huberman Amit Bracha Ron Kimmel 10.1007/978-3-031-31975-4_22 http://arxiv.org/abs/2602.21702v1 Half Pound Filter for Real-Time Animation Blending 2026-02-25T09:05:48Z

This paper introduces the Half Pound Filter (HPF) as a modification of the 1 Euro Filter (1EF) and algorithms for automatic data-driven tuning and for filter triggering based on motion derivative boundary checks. An application of the filter is presented in the context of human animation replay for real-time simulations, where switches in animation clips cause discontinuities that must be hidden by filtering the bone trajectory without introducing noticeable artifacts. The quality of the filtering will be compared with other common animation filtering techniques using an example case drawn fromthe LaFAN1 dataset, showing that the resulting animation is replayed with higher fidelity by evaluating the Mean Squared Error (MSE) and Normalized Power Spectrum Similarity (NPSS) for each setup. Performances will be evaluated using both a standard predetermined triggerpoint and blending distance and the automatic blending trigger and recovery system.

2026-02-25T09:05:48Z 12 pages, 3 figures Riccardo Lasagno http://arxiv.org/abs/2603.15627v1 Physics-Informed Video Diffusion For Shallow Water Equations 2026-02-24T08:33:53Z

Traditional fluid dynamics simulation pipelines combine numerical solvers with rendering, producing highly realistic results but at considerable computational cost. Diffusion-based generative video models offer a faster alternative, yet often ignore physical laws and thus fail to capture consistent dynamics. We propose a physics-informed video diffusion framework that jointly generates visual outputs and physical states. Unlike prior two-stage approaches that first simulate the physical variables and then render, we directly integrate physics constraints into the generative process, enabling simultaneous prediction of physical states and realistic videos without a separate rendering step. Built on the two-dimensional shallow water equations with terrain topography, our method produces temporally coherent water flow while maintaining physical plausibility. Experiments show that it outperforms purely data-driven video diffusion baselines in both realism and physical fidelity, while generating videos significantly faster than traditional simulation-plus-rendering pipelines.

2026-02-24T08:33:53Z 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing Yang Bai George Eskandar Ziyuan Liu Gitta Kutyniok http://arxiv.org/abs/2602.20377v1 Directly from Alpha to Omega: Controllable End-to-End Vector Floor Plan Generation 2026-02-23T21:31:08Z

Automated floor plan generation aims to create residential layouts by arranging rooms within a given boundary, balancing topological, geometric, and aesthetic considerations. The existing methods typically use a multi-step pipeline with intermediate representations to decompose the prediction process into several sub-tasks, limiting model flexibility and imposing predefined solution paths. This often results in unreasonable outputs when applied to data unsuitable for these predefined paths, making it challenging for these methods to match human designers, who do not restrict themselves to a specific set of design workflows. To address these limitations, we introduce CE2EPlan, a controllable end-to-end topology- and geometry-enhanced diffusion model that removes restrictions on the generative process of AI design tools. Instead, it enables the model to learn how to design floor plans directly from data, capturing a wide range of solution paths from input boundaries to complete layouts. Extensive experiments demonstrate that our method surpasses all existing approaches using the multi-step pipeline, delivering higher-quality results with enhanced user control and greater diversity in output, bringing AI design tools closer to the versatility of human designers.

2026-02-23T21:31:08Z accepted to IEEE Transactions on Visualization and Computer Graphics Shidong Wang Renato Pajarola 10.1109/TVCG.2026.3665422 http://arxiv.org/abs/2602.20063v1 Spherical Hermite Maps 2026-02-23T17:22:19Z

Spherical functions appear throughout computer graphics, from spherical harmonic lighting and precomputed radiance transfer to neural radiance fields and procedural planet rendering. Efficient evaluation is critical for real-time applications, yet existing approaches face a quality-performance trade-off: bilinear LUT sampling is fast but produces faceting, while bicubic filtering requires 16 texture samples. Most implementations use finite differences for normals, requiring extra samples and introducing noise. This paper presents Spherical Hermite Maps, a derivative-augmented LUT representation that resolves this trade-off. By storing function values alongside scaled partial derivatives at each texel of a padded cubemap, bicubic-Hermite reconstruction is enabled from only four texture samples (a 2x2 footprint) while providing continuous gradients from the same samples. The key insight is that Hermite interpolation reconstructs smooth derivatives as a byproduct of value reconstruction, making surface normals effectively free. In controlled experiments, Spherical Hermite Maps improve PSNR by 8-41 dB over bilinear interpolation and match 16-tap bicubic quality at one-quarter the cost. Analytic normals reduce mean angular error by 9-13% on complex surfaces while yielding stable specular highlights. Three applications demonstrate versatility: spherical harmonic glyph visualization, radial depth-map impostors for mesh level-of-detail, and procedural planet/asteroid rendering with spherical heightfields.

2026-02-23T17:22:19Z 17 pages, 13 figures Mohamed Abouagour Eleftherios Garyfallidis