https://arxiv.org/api/x6HJ7J5p9JCu4IyVzRhU6KIanLc2026-06-28T23:28:33Z9390201015http://arxiv.org/abs/2506.21629v1ICP-3DGS: SfM-free 3D Gaussian Splatting for Large-scale Unbounded Scenes2025-06-24T21:10:06ZIn recent years, neural rendering methods such as NeRFs and 3D Gaussian Splatting (3DGS) have made significant progress in scene reconstruction and novel view synthesis. However, they heavily rely on preprocessed camera poses and 3D structural priors from structure-from-motion (SfM), which are challenging to obtain in outdoor scenarios. To address this challenge, we propose to incorporate Iterative Closest Point (ICP) with optimization-based refinement to achieve accurate camera pose estimation under large camera movements. Additionally, we introduce a voxel-based scene densification approach to guide the reconstruction in large-scale scenes. Experiments demonstrate that our approach ICP-3DGS outperforms existing methods in both camera pose estimation and novel view synthesis across indoor and outdoor scenes of various scales. Source code is available at https://github.com/Chenhao-Z/ICP-3DGS.2025-06-24T21:10:06Z6 pages, Source code is available at https://github.com/Chenhao-Z/ICP-3DGS. To appear at ICIP 2025Chenhao ZhangYezhi ShenFengqing Zhuhttp://arxiv.org/abs/2412.00814v5VR-Doh: Hands-on 3D Modeling in Virtual Reality2025-06-24T16:24:09ZWe introduce VR-Doh, an open-source, hands-on 3D modeling system that enables intuitive creation and manipulation of elastoplastic objects in Virtual Reality (VR). By customizing the Material Point Method (MPM) for real-time simulation of hand-induced large deformations and enhancing 3D Gaussian Splatting for seamless rendering, VR-Doh provides an interactive and immersive 3D modeling experience. Users can naturally sculpt, deform, and edit objects through both contact- and gesture-based hand-object interactions. To achieve real-time performance, our system incorporates localized simulation techniques, particle-level collision handling, and the decoupling of physical and appearance representations, ensuring smooth and responsive interactions. VR-Doh supports both object creation and editing, enabling diverse modeling tasks such as designing food items, characters, and interlocking structures, all resulting in simulation-ready assets. User studies with both novice and experienced participants highlight the system's intuitive design, immersive feedback, and creative potential. Compared to existing geometric modeling tools, VR-Doh offers enhanced accessibility and natural interaction, making it a powerful tool for creative exploration in VR.2024-12-01T14:04:59Z12 pagesZhaofeng LuoZhitong CuiShijian LuoMengyu ChuMinchen Li10.1145/3731154http://arxiv.org/abs/2506.19708v1Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders2025-06-24T15:15:15ZDespite their impressive performance, generative image models trained on large-scale datasets frequently fail to produce images with seemingly simple concepts -- e.g., human hands or objects appearing in groups of four -- that are reasonably expected to appear in the training data. These failure modes have largely been documented anecdotally, leaving open the question of whether they reflect idiosyncratic anomalies or more structural limitations of these models. To address this, we introduce a systematic approach for identifying and characterizing "conceptual blindspots" -- concepts present in the training data but absent or misrepresented in a model's generations. Our method leverages sparse autoencoders (SAEs) to extract interpretable concept embeddings, enabling a quantitative comparison of concept prevalence between real and generated images. We train an archetypal SAE (RA-SAE) on DINOv2 features with 32,000 concepts -- the largest such SAE to date -- enabling fine-grained analysis of conceptual disparities. Applied to four popular generative models (Stable Diffusion 1.5/2.1, PixArt, and Kandinsky), our approach reveals specific suppressed blindspots (e.g., bird feeders, DVD discs, and whitespaces on documents) and exaggerated blindspots (e.g., wood background texture and palm trees). At the individual datapoint level, we further isolate memorization artifacts -- instances where models reproduce highly specific visual templates seen during training. Overall, we propose a theoretically grounded framework for systematically identifying conceptual blindspots in generative models by assessing their conceptual fidelity with respect to the underlying data-generating process.2025-06-24T15:15:15ZMatyas BohacekThomas FelManeesh AgrawalaEkdeep Singh Lubanahttp://arxiv.org/abs/2506.19415v1Virtual Memory for 3D Gaussian Splatting2025-06-24T08:31:33Z3D Gaussian Splatting represents a breakthrough in the field of novel view synthesis. It establishes Gaussians as core rendering primitives for highly accurate real-world environment reconstruction. Recent advances have drastically increased the size of scenes that can be created. In this work, we present a method for rendering large and complex 3D Gaussian Splatting scenes using virtual memory. By leveraging well-established virtual memory and virtual texturing techniques, our approach efficiently identifies visible Gaussians and dynamically streams them to the GPU just in time for real-time rendering. Selecting only the necessary Gaussians for both storage and rendering results in reduced memory usage and effectively accelerates rendering, especially for highly complex scenes. Furthermore, we demonstrate how level of detail can be integrated into our proposed method to further enhance rendering speed for large-scale scenes. With an optimized implementation, we highlight key practical considerations and thoroughly evaluate the proposed technique and its impact on desktop and mobile devices.2025-06-24T08:31:33ZBased on the Master Thesis from Jonathan Haberl from 2024, Submitted to TVCG in Feb. 2025;Jonathan HaberlPhilipp FleckClemens Arthhttp://arxiv.org/abs/2506.19400v1Continuous Indexed Points for Multivariate Volume Visualization2025-06-24T08:08:19ZWe introduce continuous indexed points for improved multivariate volume visualization. Indexed points represent linear structures in parallel coordinates and can be used to encode local correlation of multivariate (including multifield, multifaceted, and multiattribute) volume data. First, we perform local linear fitting in the spatial neighborhood of each volume sample using principal component analysis, accelerated by hierarchical spatial data structures. This local linear information is then visualized as continuous indexed points in parallel coordinates: a density representation of indexed points in a continuous domain. With our new method, multivariate volume data can be analyzed using the eigenvector information from local spatial embeddings. We utilize both 1-flat and 2-flat indexed points, allowing us to identify correlations between two variables and even three variables, respectively. An interactive occlusion shading model facilitates good spatial perception of the volume rendering of volumetric correlation characteristics. Interactive exploration is supported by specifically designed multivariate transfer function widgets working in the image plane of parallel coordinates. We show that our generic technique works for multi-attribute datasets. The effectiveness and usefulness of our new method is demonstrated through a case study, an expert user study, and domain expert feedback.2025-06-24T08:08:19ZPeer reviewed and accepted by Computational Visual MediaLiang ZhouXinyi GouDaniel Weiskopfhttp://arxiv.org/abs/2506.15742v2FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space2025-06-24T05:31:03ZWe present evaluation results for FLUX.1 Kontext, a generative flow matching model that unifies image generation and editing. The model generates novel output views by incorporating semantic context from text and image inputs. Using a simple sequence concatenation approach, FLUX.1 Kontext handles both local editing and generative in-context tasks within a single unified architecture. Compared to current editing models that exhibit degradation in character consistency and stability across multiple turns, we observe that FLUX.1 Kontext improved preservation of objects and characters, leading to greater robustness in iterative workflows. The model achieves competitive performance with current state-of-the-art systems while delivering significantly faster generation times, enabling interactive applications and rapid prototyping workflows. To validate these improvements, we introduce KontextBench, a comprehensive benchmark with 1026 image-prompt pairs covering five task categories: local editing, global editing, character reference, style reference and text editing. Detailed evaluations show the superior performance of FLUX.1 Kontext in terms of both single-turn quality and multi-turn consistency, setting new standards for unified image processing models.2025-06-17T20:18:23ZBlack Forest LabsStephen BatifolAndreas BlattmannFrederic BoeselSaksham ConsulCyril DiagneTim DockhornJack EnglishZion EnglishPatrick EsserSumith KulalKyle LaceyYam LeviCheng LiDominik LorenzJonas MüllerDustin PodellRobin RombachHarry SainiAxel SauerLuke Smithhttp://arxiv.org/abs/2506.19282v1A Batch-Insensitive Dynamic GNN Approach to Address Temporal Discontinuity in Graph Streams2025-06-24T03:31:43ZIn dynamic graphs, preserving temporal continuity is critical. However, Memory-based Dynamic Graph Neural Networks (MDGNNs) trained with large batches often disrupt event sequences, leading to temporal information loss. This discontinuity not only deteriorates temporal modeling but also hinders optimization by increasing the difficulty of parameter convergence. Our theoretical study quantifies this through a Lipschitz upper bound, showing that large batch sizes enlarge the parameter search space. In response, we propose BADGNN, a novel batch-agnostic framework consisting of two core components: (1) Temporal Lipschitz Regularization (TLR) to control parameter search space expansion, and (2) Adaptive Attention Adjustment (A3) to alleviate attention distortion induced by both regularization and batching. Empirical results on three benchmark datasets show that BADGNN maintains strong performance while enabling significantly larger batch sizes and faster training compared to TGN. Our code is available at Code: https://anonymous.4open.science/r/TGN_Lipichitz-C033/.2025-06-24T03:31:43Z8pages, 5figuresYang ZhouXiaoning Renhttp://arxiv.org/abs/2506.19278v1Style Transfer: A Decade Survey2025-06-24T03:17:18ZThe revolutionary advancement of Artificial Intelligence Generated Content (AIGC) has fundamentally transformed the landscape of visual content creation and artistic expression. While remarkable progress has been made in image generation and style transfer, the underlying mechanisms and aesthetic implications of these technologies remain insufficiently understood. This paper presents a comprehensive survey of AIGC technologies in visual arts, tracing their evolution from early algorithmic frameworks to contemporary deep generative models. We identify three pivotal paradigms: Variational Autoencoders (VAE), Generative Adversarial Networks (GANs), and Diffusion Models, and examine their roles in bridging the gap between human creativity and machine synthesis. To support our analysis, we systematically review over 500 research papers published in the past decade, spanning both foundational developments and state-of-the-art innovations. Furthermore, we propose a multidimensional evaluation framework that incorporates Technical Innovation, Artistic Merit, Visual Quality, Computational Efficiency, and Creative Potential. Our findings reveal both the transformative capacities and current limitations of AIGC systems, emphasizing their profound impact on the future of creative practices. Through this extensive synthesis, we offer a unified perspective on the convergence of artificial intelligence and artistic expression, while outlining key challenges and promising directions for future research in this rapidly evolving field.2025-06-24T03:17:18Z32 pagesTianshan ZhangHao Tanghttp://arxiv.org/abs/2506.19084v1Am I Playing Better Now? The Effects of G-SYNC in 60Hz Gameplay2025-06-23T20:00:04ZG-SYNC technology matches formerly regular display refreshes to irregular frame updates, improving frame rates and interactive latency. In a previous study of gaming at the 30Hz frame rates common on consoles, players of Battlefield 4 were unable to discern when G-SYNC was in use, but scored higher with G-SYNC and were affected emotionally. We build on that study with the first examination of G-SYNC's effects at the 60Hz frame rate more common in PC gaming and on emerging consoles. Though G-SYNC's effects are less at 60Hz than they were at 30Hz, G-SYNC can still improve the performance of veteran players, particularly when games are challenging. G-SYNC's effects on emotion and experience were limited.2025-06-23T20:00:04ZProceedings of the ACM on Computer Graphics and Interactive Techniques (2001), Volume 4, Issue 1 Article No.: 6, Pages 1 - 17Maryam RiahiBenjamin Watson10.1145/3451269http://arxiv.org/abs/2506.18680v1DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling2025-06-23T14:22:50ZWe present DuetGen, a novel framework for generating interactive two-person dances from music. The key challenge of this task lies in the inherent complexities of two-person dance interactions, where the partners need to synchronize both with each other and with the music. Inspired by the recent advances in motion synthesis, we propose a two-stage solution: encoding two-person motions into discrete tokens and then generating these tokens from music. To effectively capture intricate interactions, we represent both dancers' motions as a unified whole to learn the necessary motion tokens, and adopt a coarse-to-fine learning strategy in both the stages. Our first stage utilizes a VQ-VAE that hierarchically separates high-level semantic features at a coarse temporal resolution from low-level details at a finer resolution, producing two discrete token sequences at different abstraction levels. Subsequently, in the second stage, two generative masked transformers learn to map music signals to these dance tokens: the first producing high-level semantic tokens, and the second, conditioned on music and these semantic tokens, producing the low-level tokens. We train both transformers to learn to predict randomly masked tokens within the sequence, enabling them to iteratively generate motion tokens by filling an empty token sequence during inference. Through the hierarchical masked modeling and dedicated interaction representation, DuetGen achieves the generation of synchronized and interactive two-person dances across various genres. Extensive experiments and user studies on a benchmark duet dance dataset demonstrate state-of-the-art performance of DuetGen in motion realism, music-dance alignment, and partner coordination.2025-06-23T14:22:50Z11 pages, 7 figures, 2 tables, accepted in ACM Siggraph 2025 conference trackAnindita GhoshBing ZhouRishabh DabralJian WangVladislav GolyanikChristian TheobaltPhilipp SlusallekChuan Guohttp://arxiv.org/abs/2507.02900v1Advancing Talking Head Generation: A Comprehensive Survey of Multi-Modal Methodologies, Datasets, Evaluation Metrics, and Loss Functions2025-06-23T06:49:42ZTalking Head Generation (THG) has emerged as a transformative technology in computer vision, enabling the synthesis of realistic human faces synchronized with image, audio, text, or video inputs. This paper provides a comprehensive review of methodologies and frameworks for talking head generation, categorizing approaches into 2D--based, 3D--based, Neural Radiance Fields (NeRF)--based, diffusion--based, parameter-driven techniques and many other techniques. It evaluates algorithms, datasets, and evaluation metrics while highlighting advancements in perceptual realism and technical efficiency critical for applications such as digital avatars, video dubbing, ultra-low bitrate video conferencing, and online education. The study identifies challenges such as reliance on pre--trained models, extreme pose handling, multilingual synthesis, and temporal consistency. Future directions include modular architectures, multilingual datasets, hybrid models blending pre--trained and task-specific layers, and innovative loss functions. By synthesizing existing research and exploring emerging trends, this paper aims to provide actionable insights for researchers and practitioners in the field of talking head generation. For the complete survey, code, and curated resource list, visit our GitHub repository: https://github.com/VineetKumarRakesh/thg.2025-06-23T06:49:42ZVineet Kumar RakeshSoumya MazumdarResearch Pratim MaitySarbajit PalAmitabha DasTapas Samanta10.1007/s00371-025-04232-whttp://arxiv.org/abs/2412.19663v2CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs2025-06-23T02:49:04ZComputer-aided design (CAD) significantly enhances the efficiency, accuracy, and innovation of design processes by enabling precise 2D and 3D modeling, extensive analysis, and optimization. Existing methods for creating CAD models rely on latent vectors or point clouds, which are difficult to obtain, and storage costs are substantial. Recent advances in Multimodal Large Language Models (MLLMs) have inspired researchers to use natural language instructions and images for CAD model construction. However, these models still struggle with inferring accurate 3D spatial location and orientation, leading to inaccuracies in determining the spatial 3D starting points and extrusion directions for constructing geometries. This work introduces CAD-GPT, a CAD synthesis method with spatial reasoning-enhanced MLLM that takes either a single image or a textual description as input. To achieve precise spatial inference, our approach introduces a 3D Modeling Spatial Mechanism. This method maps 3D spatial positions and 3D sketch plane rotation angles into a 1D linguistic feature space using a specialized spatial unfolding mechanism, while discretizing 2D sketch coordinates into an appropriate planar space to enable precise determination of spatial starting position, sketch orientation, and 2D sketch coordinate translations. Extensive experiments demonstrate that CAD-GPT consistently outperforms existing state-of-the-art methods in CAD model synthesis, both quantitatively and qualitatively.2024-12-27T14:19:36ZAccepted at AAAI 2025 (Vol. 39, No. 8), pages 7880-7888. DOI: 10.1609/aaai.v39i8.32849Proc. of the AAAI Conf. on Artificial Intelligence, 39(8):7880-7888, 2025Siyu WangCailian ChenXinyi LeQimin XuLei XuYanzhou ZhangJie Yang10.1609/aaai.v39i8.32849http://arxiv.org/abs/2412.00526v2Human Action CLIPs: Detecting AI-generated Human Motion2025-06-22T18:13:18ZAI-generated video generation continues its journey through the uncanny valley to produce content that is increasingly perceptually indistinguishable from reality. To better protect individuals, organizations, and societies from its malicious applications, we describe an effective and robust technique for distinguishing real from AI-generated human motion using multi-modal semantic embeddings. Our method is robust to the types of laundering that typically confound more low- to mid-level approaches, including resolution and compression attacks. This method is evaluated against DeepAction, a custom-built, open-sourced dataset of video clips with human actions generated by seven text-to-video AI models and matching real footage. The dataset is available under an academic license at https://www.huggingface.co/datasets/faridlab/deepaction_v1.2024-11-30T16:20:58ZWorkshop on Deepfake Detection, Localization and Interpretability @ IJCAI 2025Matyas BohacekHany Faridhttp://arxiv.org/abs/2506.18017v1Auto-Regressive Surface Cutting2025-06-22T12:53:07ZSurface cutting is a fundamental task in computer graphics, with applications in UV parameterization, texture mapping, and mesh decomposition. However, existing methods often produce technically valid but overly fragmented atlases that lack semantic coherence. We introduce SeamGPT, an auto-regressive model that generates cutting seams by mimicking professional workflows. Our key technical innovation lies in formulating surface cutting as a next token prediction task: sample point clouds on mesh vertices and edges, encode them as shape conditions, and employ a GPT-style transformer to sequentially predict seam segments with quantized 3D coordinates. Our approach achieves exceptional performance on UV unwrapping benchmarks containing both manifold and non-manifold meshes, including artist-created, and 3D-scanned models. In addition, it enhances existing 3D segmentation tools by providing clean boundaries for part decomposition.2025-06-22T12:53:07ZTech. report. https://victorcheung12.github.io/seamgptYang LiVictor CheungXinhai LiuYuguang ChenZhongjin LuoBiwen LeiHaohan WengZibo ZhaoJingwei HuangZhuo ChenChunchao Guohttp://arxiv.org/abs/2506.17770v1Collaborative Texture Filtering2025-06-21T17:46:57ZRecent advances in texture compression provide major improvements in compression ratios, but cannot use the GPU's texture units for decompression and filtering. This has led to the development of stochastic texture filtering (STF) techniques to avoid the high cost of multiple texel evaluations with such formats. Unfortunately, those methods can give undesirable visual appearance changes under magnification and may contain visible noise and flicker despite the use of spatiotemporal denoisers. Recent work substantially improves the quality of magnification filtering with STF by sharing decoded texel values between nearby pixels (Wronski 2025). Using GPU wave communication intrinsics, this sharing can be performed inside actively executing shaders without memory traffic overhead. We take this idea further and present novel algorithms that use wave communication between lanes to avoid repeated texel decompression prior to filtering. By distributing unique work across lanes, we can achieve zero-error filtering using <=1 texel evaluations per pixel given a sufficiently large magnification factor. For the remaining cases, we propose novel filtering fallback methods that also achieve higher quality than prior approaches.2025-06-21T17:46:57ZAccepted to ACM/EG Symposium on High Performance Graphics (HPG), 2025Tomas Akenine-MöllerPontus EbelinMatt PharrBartlomiej Wronski10.2312/hpg.20251174