https://arxiv.org/api/13ABucgi5E6LB9gZYFA5279Gawo 2026-06-27T22:05:49Z 9390 1665 15 http://arxiv.org/abs/2509.00541v1 LatentEdit: Adaptive Latent Control for Consistent Semantic Editing 2025-08-30T15:47:03Z

Diffusion-based Image Editing has achieved significant success in recent years. However, it remains challenging to achieve high-quality image editing while maintaining the background similarity without sacrificing speed or memory efficiency. In this work, we introduce LatentEdit, an adaptive latent fusion framework that dynamically combines the current latent code with a reference latent code inverted from the source image. By selectively preserving source features in high-similarity, semantically important regions while generating target content in other regions guided by the target prompt, LatentEdit enables fine-grained, controllable editing. Critically, the method requires no internal model modifications or complex attention mechanisms, offering a lightweight, plug-and-play solution compatible with both UNet-based and DiT-based architectures. Extensive experiments on the PIE-Bench dataset demonstrate that our proposed LatentEdit achieves an optimal balance between fidelity and editability, outperforming the state-of-the-art method even in 8-15 steps. Additionally, its inversion-free variant further halves the number of neural function evaluations and eliminates the need for storing any intermediate variables, substantially enhancing real-time deployment efficiency.

2025-08-30T15:47:03Z Accepted by PRCV 2025 Siyi Liu Weiming Chen Yushun Tang Zhihai He http://arxiv.org/abs/2501.04648v2 FlairGPT: Repurposing LLMs for Interior Designs 2025-08-30T11:32:28Z

Interior design involves the careful selection and arrangement of objects to create an aesthetically pleasing, functional, and harmonized space that aligns with the client's design brief. This task is particularly challenging, as a successful design must not only incorporate all the necessary objects in a cohesive style, but also ensure they are arranged in a way that maximizes accessibility, while adhering to a variety of affordability and usage considerations. Data-driven solutions have been proposed, but these are typically room- or domain-specific and lack explainability in their design design considerations used in producing the final layout. In this paper, we investigate if large language models (LLMs) can be directly utilized for interior design. While we find that LLMs are not yet capable of generating complete layouts, they can be effectively leveraged in a structured manner, inspired by the workflow of interior designers. By systematically probing LLMs, we can reliably generate a list of objects along with relevant constraints that guide their placement. We translate this information into a design layout graph, which is then solved using an off-the-shelf constrained optimization setup to generate the final layouts. We benchmark our algorithm in various design configurations against existing LLM-based methods and human designs, and evaluate the results using a variety of quantitative and qualitative metrics along with user studies. In summary, we demonstrate that LLMs, when used in a structured manner, can effectively generate diverse high-quality layouts, making them a viable solution for creating large-scale virtual scenes. Project webpage at https://flairgpt.github.io/

2025-01-08T18:01:49Z EUROGRAPHICS 2025 Gabrielle Littlefair Niladri Shekhar Dutt Niloy J. Mitra 10.1111/cgf.70036 http://arxiv.org/abs/2509.00180v1 Evaluate Neighbor Search for Curve-based Vector Field Processing 2025-08-29T18:28:50Z

Curve-based representations, particularly integral curves, are often used to represent large-scale computational fluid dynamic simulations. Processing and analyzing curve-based vector field data sets often involves searching for neighboring segments given a query point or curve segment. However, because the original flow behavior may not be fully represented by the set of integral curves and the input integral curves may not be evenly distributed in space, popular neighbor search strategies often return skewed and redundant neighboring segments. Yet, there is a lack of systematic and comprehensive research on how different configurations of neighboring segments returned by specific neighbor search strategies affect subsequent tasks. To fill this gap, this study evaluates the performance of two popular neighbor search strategies combined with different distance metrics on a point-based vector field reconstruction task and a segment saliency estimation using input integral curves. A large number of reconstruction tests and saliency calculations are conducted for the study. To characterize the configurations of neighboring segments for an effective comparison of different search strategies, a number of measures, like average neighbor distance and uniformity, are proposed. Our study leads to a few observations that partially confirm our expectations about the ideal configurations of a neighborhood while revealing additional findings that were overlooked by the community.

2025-08-29T18:28:50Z 12 pages, 17 figures Nguyen Phan Guoning Chen http://arxiv.org/abs/2508.21736v1 MicroLabVR: Interactive 3D Visualization of Simulated Spatiotemporal Microbiome Data in Virtual Reality 2025-08-29T16:03:04Z

Microbiomes are a vital part of the human body, engaging in tasks like food digestion and immune defense. Their structure and function must be understood in order to promote host health and facilitate swift recovery during disease. Due to the difficulties in experimentally studying these systems in situ, more research is being conducted in the field of mathematical modeling. Visualizing spatiotemporal data is challenging, and current tools that simulate microbial communities' spatial and temporal development often only provide limited functionalities, often requiring expert knowledge to generate useful results. To overcome these limitations, we provide a user-friendly tool to interactively explore spatiotemporal simulation data, called MicroLabVR, which transfers spatial data into virtual reality (VR) while following guidelines to enhance user experience (UX). With MicroLabVR, users can import CSV datasets containing population growth, substance concentration development, and metabolic flux distribution data. The implemented visualization methods allow users to evaluate the dataset in a VR environment interactively. MicroLabVR aims to improve data analysis for the user by allowing the exploration of microbiome data in their spatial context.

2025-08-29T16:03:04Z Simon Burbach Maria Maleshkova Florian Centler Tanja Joan Schmidt http://arxiv.org/abs/2509.07993v1 Revisiting Deepfake Detection: Chronological Continual Learning and the Limits of Generalization 2025-08-29T13:34:21Z

The rapid evolution of deepfake generation technologies poses critical challenges for detection systems, as non-continual learning methods demand frequent and expensive retraining. We reframe deepfake detection (DFD) as a Continual Learning (CL) problem, proposing an efficient framework that incrementally adapts to emerging visual manipulation techniques while retaining knowledge of past generators. Our framework, unlike prior approaches that rely on unreal simulation sequences, simulates the real-world chronological evolution of deepfake technologies in extended periods across 7 years. Simultaneously, our framework builds upon lightweight visual backbones to allow for the real-time performance of DFD systems. Additionally, we contribute two novel metrics: Continual AUC (C-AUC) for historical performance and Forward Transfer AUC (FWT-AUC) for future generalization. Through extensive experimentation (over 600 simulations), we empirically demonstrate that while efficient adaptation (+155 times faster than full retraining) and robust retention of historical knowledge is possible, the generalization of current approaches to future generators without additional training remains near-random (FWT-AUC $\approx$ 0.5) due to the unique imprint characterizing each existing generator. Such observations are the foundation of our newly proposed Non-Universal Deepfake Distribution Hypothesis. \textbf{Code will be released upon acceptance.}

2025-08-29T13:34:21Z Federico Fontana Anxhelo Diko Romeo Lanzino Marco Raoul Marini Bachir Kaddar Gian Luca Foresti Luigi Cinque http://arxiv.org/abs/2506.05930v2 Neural Visibility Cache for Real-Time Light Sampling 2025-08-29T09:58:17Z

Direct illumination with many lights is an inherent component of physically-based rendering, remaining challenging, especially in real-time scenarios. We propose an online-trained neural cache that stores visibility between lights and 3D positions. We feed light visibility to weighted reservoir sampling (WRS) to sample a light source. The cache is implemented as a fully-fused multilayer perceptron (MLP) with multi-resolution hash-grid encoding, enabling online training and efficient inference on modern GPUs in real-time frame rates. The cache can be seamlessly integrated into existing rendering frameworks and can be used in combination with other real-time techniques such as spatiotemporal reservoir sampling (ReSTIR).

2025-06-06T09:55:59Z Jakub Bokšanský Daniel Meister http://arxiv.org/abs/2504.07134v2 Bringing Attention to CAD: Boundary Representation Learning via Transformer 2025-08-29T04:28:36Z

The recent rise of generative artificial intelligence (AI), powered by Transformer networks, has achieved remarkable success in natural language processing, computer vision, and graphics. However, the application of Transformers in computer-aided design (CAD), particularly for processing boundary representation (B-rep) models, remains largely unexplored. To bridge this gap, we propose a novel approach for adapting Transformers to B-rep learning, called the Boundary Representation Transformer (BRT). B-rep models pose unique challenges due to their irregular topology and continuous geometric definitions, which are fundamentally different from the structured and discrete data Transformers are designed for. To address this, BRT proposes a continuous geometric embedding method that encodes B-rep surfaces (trimmed and untrimmed) into Bezier triangles, preserving their shape and continuity without discretization. Additionally, BRT employs a topology-aware embedding method that organizes these geometric embeddings into a sequence of discrete tokens suitable for Transformers, capturing both geometric and topological characteristics within B-rep models. This enables the Transformer's attention mechanism to effectively learn shape patterns and contextual semantics of boundary elements in a B-rep model. Extensive experiments demonstrate that BRT achieves state-of-the-art performance in part classification and feature recognition tasks.

2025-04-07T07:04:02Z Computer-Aided Design. 2025 Aug 26:103940 Qiang Zou Lizhen Zhu 10.1016/j.cad.2025.103940 http://arxiv.org/abs/2508.21256v1 CrossTL: A Universal Programming Language Translator with Unified Intermediate Representation 2025-08-28T23:00:08Z

We present CrossTL, a universal programming language translator enabling bidirectional translation between multiple languages through a unified intermediate representation called CrossGL. Traditional approaches require separate translators for each language pair, leading to exponential complexity growth. CrossTL uses a single universal IR to facilitate translations between CUDA, HIP, Metal, DirectX HLSL, OpenGL GLSL, Vulkan SPIR-V, Rust, and Mojo, with Slang support in development. Our system consists of: language-specific lexers/parsers converting source code to ASTs, bidirectional CrossGL translation modules implementing ToCrossGLConverter classes for importing code and CodeGen classes for target generation, and comprehensive backend implementations handling full translation pipelines. We demonstrate effectiveness through comprehensive evaluation across programming domains, achieving successful compilation and execution across all supported backends. The universal IR design enables adding new languages with minimal effort, requiring only language-specific frontend/backend components. Our contributions include: (1) a unified IR capturing semantics of multiple programming paradigms, (2) a modular architecture enabling extensibility, (3) a comprehensive framework supporting GPU compute, graphics programming, and systems languages, and (4) empirical validation demonstrating practical viability of universal code translation. CrossTL represents a significant step toward language-agnostic programming, enabling write-once, deploy-everywhere development.

2025-08-28T23:00:08Z 15 Pages, 5 Figures, 1 Table. Introduces CrossTL, a universal programming language translator enabling bidirectional translation between 8 programming languages (CUDA, HIP, Metal, DirectX HLSL, OpenGL GLSL, Vulkan SPIR-V, Rust, Mojo) through a unified intermediate representation called CrossGL. Includes comprehensive evaluation with complex real-world examples Nripesh Niketan Vaatsalya Shrivastva 10.5281/zenodo.15826975 http://arxiv.org/abs/2509.00114v1 The Living Library of Trees: Mapping Knowledge Ecology in the Arnold Arboretum 2025-08-28T15:37:09Z

As biodiversity loss and climate change accelerate, botanical gardens serve as vital infrastructures for research, education, and conservation. This project focuses on the Arnold Arboretum of Harvard University, a 281-acre living museum founded in 1872 in Boston. Drawing on more than a century of curatorial data, the research combines historical analysis with computational methods to visualize the biographies of plants and people. The resulting platform reveals patterns of care and scientific observations, along with the collective dimensions embedded in botanical data. Using techniques from artificial intelligence, geospatial mapping, and information design, the project frames the arboretum as a system of shared agency--an active archive of more-than-human affinities that records the layered memory of curatorial labor, the situated nature of knowledge production, and the potential of design to bridge archival record and future care.

2025-08-28T15:37:09Z Johan Malmstedt Giacomo Nanni Dario Rodighiero http://arxiv.org/abs/2508.17645v2 Generating Human-AI Collaborative Design Sequence for 3D Assets via Differentiable Operation Graph 2025-08-28T14:16:28Z

The emergence of 3D artificial intelligence-generated content (3D-AIGC) has enabled rapid synthesis of intricate geometries. However, a fundamental disconnect persists between AI-generated content and human-centric design paradigms, rooted in representational incompatibilities: conventional AI frameworks predominantly manipulate meshes or neural representations (\emph{e.g.}, NeRF, Gaussian Splatting), while designers operate within parametric modeling tools. This disconnection diminishes the practical value of AI for 3D industry, undermining the efficiency of human-AI collaboration. To resolve this disparity, we focus on generating design operation sequences, which are structured modeling histories that comprehensively capture the step-by-step construction process of 3D assets and align with designers' typical workflows in modern 3D software. We first reformulate fundamental modeling operations (\emph{e.g.}, \emph{Extrude}, \emph{Boolean}) into differentiable units, enabling joint optimization of continuous (\emph{e.g.}, \emph{Extrude} height) and discrete (\emph{e.g.}, \emph{Boolean} type) parameters via gradient-based learning. Based on these differentiable operations, a hierarchical graph with gating mechanism is constructed and optimized end-to-end by minimizing Chamfer Distance to target geometries. Multi-stage sequence length constraint and domain rule penalties enable unsupervised learning of compact design sequences without ground-truth sequence supervision. Extensive validation demonstrates that the generated operation sequences achieve high geometric fidelity, smooth mesh wiring, rational step composition and flexible editing capacity, with full compatibility within design industry.

2025-08-25T04:18:35Z Xiaoyang Huang Bingbing Ni Wenjun Zhang http://arxiv.org/abs/2406.07981v3 Foveated Path Tracing with Configurable Sampling and Block-Based Rendering 2025-08-28T13:20:03Z

Path tracing offers high-fidelity rendering but remains impractical for real-time applications due to slow convergence and noise. We present a dynamic foveated path tracing technique that leverages visual perception by reducing sampling towards peripheral regions. Our system achieves up to 25-fold performance gains on complex scenes at 4K resolution with minimal perceptual degradation. We validate its effectiveness using structured error maps across varying sampling rates and foveated region sizes, establishing a foundation for future research in perceptual photorealistic rendering.

2024-06-12T08:06:46Z The paper has been accepted to GI VR/AR Workshop 2025 GI VR/AR Workshop 2025 Bipul Mohanto Sven Kluge Martin Weier Oliver Staadt http://arxiv.org/abs/2508.20664v1 Task-Oriented Edge-Assisted Cross-System Design for Real-Time Human-Robot Interaction in Industrial Metaverse 2025-08-28T11:10:41Z

Real-time human-device interaction in industrial Metaverse faces challenges such as high computational load, limited bandwidth, and strict latency. This paper proposes a task-oriented edge-assisted cross-system framework using digital twins (DTs) to enable responsive interactions. By predicting operator motions, the system supports: 1) proactive Metaverse rendering for visual feedback, and 2) preemptive control of remote devices. The DTs are decoupled into two virtual functions-visual display and robotic control-optimizing both performance and adaptability. To enhance generalizability, we introduce the Human-In-The-Loop Model-Agnostic Meta-Learning (HITL-MAML) algorithm, which dynamically adjusts prediction horizons. Evaluation on two tasks demonstrates the framework's effectiveness: in a Trajectory-Based Drawing Control task, it reduces weighted RMSE from 0.0712 m to 0.0101 m; in a real-time 3D scene representation task for nuclear decommissioning, it achieves a PSNR of 22.11, SSIM of 0.8729, and LPIPS of 0.1298. These results show the framework's capability to ensure spatial precision and visual fidelity in real-time, high-risk industrial environments.

2025-08-28T11:10:41Z This paper has submitted to IEEE Transactions on Mobile Computing Kan Chen Zhen Meng Xiangmin Xu Jiaming Yang Emma Li Philip G. Zhao http://arxiv.org/abs/2508.21095v1 ScanMove: Motion Prediction and Transfer for Unregistered Body Meshes 2025-08-27T19:41:32Z

Unregistered surface meshes, especially raw 3D scans, present significant challenges for automatic computation of plausible deformations due to the lack of established point-wise correspondences and the presence of noise in the data. In this paper, we propose a new, rig-free, data-driven framework for motion prediction and transfer on such body meshes. Our method couples a robust motion embedding network with a learned per-vertex feature field to generate a spatio-temporal deformation field, which drives the mesh deformation. Extensive evaluations, including quantitative benchmarks and qualitative visuals on tasks such as walking and running, demonstrate the effectiveness and versatility of our approach on challenging unregistered meshes.

2025-08-27T19:41:32Z Thomas Besnier Sylvain Arguillère Mohamed Daoudi http://arxiv.org/abs/2508.20080v1 Seam360GS: Seamless 360° Gaussian Splatting from Real-World Omnidirectional Images 2025-08-27T17:46:46Z

360-degree visual content is widely shared on platforms such as YouTube and plays a central role in virtual reality, robotics, and autonomous navigation. However, consumer-grade dual-fisheye systems consistently yield imperfect panoramas due to inherent lens separation and angular distortions. In this work, we introduce a novel calibration framework that incorporates a dual-fisheye camera model into the 3D Gaussian splatting pipeline. Our approach not only simulates the realistic visual artifacts produced by dual-fisheye cameras but also enables the synthesis of seamlessly rendered 360-degree images. By jointly optimizing 3D Gaussian parameters alongside calibration variables that emulate lens gaps and angular distortions, our framework transforms imperfect omnidirectional inputs into flawless novel view synthesis. Extensive evaluations on real-world datasets confirm that our method produces seamless renderings-even from imperfect images-and outperforms existing 360-degree rendering models.

2025-08-27T17:46:46Z Accepted to ICCV 2025. 10 pages main text, 4 figures, 4 tables, supplementary material included Changha Shin Woong Oh Cho Seon Joo Kim http://arxiv.org/abs/2501.05828v3 UltraRay: Introducing Full-Path Ray Tracing in Physics-Based Ultrasound Simulation 2025-08-27T15:03:30Z

Traditional ultrasound simulators solve the wave equation to model pressure distribution fields, achieving high accuracy but requiring significant computational time and resources. To address this, ray tracing approaches have been introduced, modeling wave propagation as rays interacting with boundaries and scatterers. However, existing models simplify ray propagation, generating echoes at interaction points without considering return paths to the sensor. This can result in unrealistic artifacts and necessitates careful scene tuning for plausible results. We propose a novel ultrasound simulation pipeline that utilizes a ray tracing algorithm to generate echo data, tracing each ray from the transducer through the scene and back to the sensor. To replicate advanced ultrasound imaging, we introduce a ray emission scheme optimized for plane wave imaging, incorporating delay and steering capabilities. Furthermore, we integrate a standard signal processing pipeline to simulate end-to-end ultrasound image formation. We showcase the efficacy of the proposed pipeline by modeling synthetic scenes featuring highly reflective objects, such as bones. In doing so, our proposed approach, UltraRay, not only enhances the overall visual quality but also improves the realism of the simulated images by accurately capturing secondary reflections and reducing unnatural artifacts. By building on top of a differentiable framework, the proposed pipeline lays the groundwork for a fast and differentiable ultrasound simulation tool necessary for gradient-based optimization, enabling advanced ultrasound beamforming strategies, neural network integration, and accurate inverse scene reconstruction.

2025-01-10T10:07:41Z Felix Duelmer Mohammad Farid Azampour Magdalena Wysocki Nassir Navab