https://arxiv.org/api/b3wD5theuAhhmbc9I965o+bnc4g 2026-06-26T05:14:56Z 9390 1455 15 http://arxiv.org/abs/2502.01720v2 Generating Multi-Image Synthetic Data for Text-to-Image Customization 2025-10-13T00:38:43Z

Customization of text-to-image models enables users to insert new concepts or objects and generate them in unseen settings. Existing methods either rely on comparatively expensive test-time optimization or train encoders on single-image datasets without multi-image supervision, which can limit image quality. We propose a simple approach to address these challenges. We first leverage existing text-to-image models and 3D datasets to create a high-quality Synthetic Customization Dataset (SynCD) consisting of multiple images of the same object in different lighting, backgrounds, and poses. Using this dataset, we train an encoder-based model that incorporates fine-grained visual details from reference images via a shared attention mechanism. Finally, we propose an inference technique that normalizes text and image guidance vectors to mitigate overexposure issues in sampled images. Through extensive experiments, we show that our encoder-based model, trained on SynCD, and with the proposed inference algorithm, improves upon existing encoder-based methods on standard customization benchmarks.

2025-02-03T18:59:41Z ICCV 2025. Project webpage: https://www.cs.cmu.edu/~syncd-project/ Nupur Kumari Xi Yin Jun-Yan Zhu Ishan Misra Samaneh Azadi http://arxiv.org/abs/2510.10841v1 The Fire We Share 2025-10-12T23:07:48Z

The Fire We Share proposes a care-centered, consequence-aware visualization framework for engaging with wildfire data not as static metrics, but as living archives of ecological and social entanglement. By combining plants-inspired data forms, event-based mapping, and narrative layering, the project foregrounds fire as a shared temporal condition-one that cuts across natural cycles and human systems. Rather than simplifying wildfire data into digestible visuals, The Fire We Share reimagines it as a textured, wounded archive-embodied, relational, and radically ethical.

2025-10-12T23:07:48Z Accepted to VISAP'25 (IEEE VIS Arts Program), held alongside IEEE VIS 2025 Chen Wang Mengtan Lin http://arxiv.org/abs/2510.10751v1 MATStruct: High-Quality Medial Mesh Computation via Structure-aware Variational Optimization 2025-10-12T18:39:36Z

We propose a novel optimization framework for computing the medial axis transform that simultaneously preserves the medial structure and ensures high medial mesh quality. The medial structure, consisting of interconnected sheets, seams, and junctions, provides a natural volumetric decomposition of a 3D shape. Our method introduces a structure-aware, particle-based optimization pipeline guided by the restricted power diagram (RPD), which partitions the input volume into convex cells whose dual encodes the connectivity of the medial mesh. Structure-awareness is enforced through a spherical quadratic error metric (SQEM) projection that constrains the movement of medial spheres, while a Gaussian kernel energy encourages an even spatial distribution. Compared to feature-preserving methods such as MATFP and MATTopo, our approach produces cleaner and more accurate medial structures with significantly improved mesh quality. In contrast to voxel-based, point-cloud-based, and variational methods, our framework is the first to integrate structural awareness into the optimization process, yielding medial meshes with superior geometric fidelity, topological correctness, and explicit structural decomposition.

2025-10-12T18:39:36Z Ningna Wang Rui Xu Yibo Yin Zichun Zhong Taku Komura Wenping Wang Xiaohu Guo 10.1145/3757377.3763840 http://arxiv.org/abs/2510.10715v1 VLM-Guided Adaptive Negative Prompting for Creative Generation 2025-10-12T17:34:59Z

Creative generation is the synthesis of new, surprising, and valuable samples that reflect user intent yet cannot be envisioned in advance. This task aims to extend human imagination, enabling the discovery of visual concepts that exist in the unexplored spaces between familiar domains. While text-to-image diffusion models excel at rendering photorealistic scenes that faithfully match user prompts, they still struggle to generate genuinely novel content. Existing approaches to enhance generative creativity either rely on interpolation of image features, which restricts exploration to predefined categories, or require time-intensive procedures such as embedding optimization or model fine-tuning. We propose VLM-Guided Adaptive Negative-Prompting, a training-free, inference-time method that promotes creative image generation while preserving the validity of the generated object. Our approach utilizes a vision-language model (VLM) that analyzes intermediate outputs of the generation process and adaptively steers it away from conventional visual concepts, encouraging the emergence of novel and surprising outputs. We evaluate creativity through both novelty and validity, using statistical metrics in the CLIP embedding space. Through extensive experiments, we show consistent gains in creative novelty with negligible computational overhead. Moreover, unlike existing methods that primarily generate single objects, our approach extends to complex scenarios, such as generating coherent sets of creative objects and preserving creativity within elaborate compositional prompts. Our method integrates seamlessly into existing diffusion pipelines, offering a practical route to producing creative outputs that venture beyond the constraints of textual descriptions.

2025-10-12T17:34:59Z Project page at: https://shelley-golan.github.io/VLM-Guided-Creative-Generation/ Shelly Golan Yotam Nitzan Zongze Wu Or Patashnik http://arxiv.org/abs/2509.17974v2 A Comparative Study of Different Edit Distance-Based Methods for Feature Tracking using Merge Trees on Time-Varying Scalar Fields 2025-10-12T14:01:55Z

Feature tracking in time-varying scalar fields is a fundamental task in scientific computing. Topological descriptors, which summarize important features of data, have proved to be viable tools to facilitate this task. The merge tree is a topological descriptor that captures the connectivity behaviors of the sub- or superlevel sets of a scalar field. Edit distances between merge trees play a vital role in effective temporal data tracking. Existing methods to compute them fall into two main classes, namely whether they are dependent or independent of the branch decomposition. These two classes represent the most prominent approaches for producing tracking results. In this paper, we compare four different merge tree edit distance-based methods for feature tracking. We demonstrate that these methods yield distinct results with both analytical and real-world data sets. Furthermore, we investigate how these results vary and identify the factors that influence them. Our experiments reveal significant differences in tracked features over time, even among those produced by techniques within the same category.

2025-09-22T16:22:47Z Son Le Thanh Tino Weinkauf http://arxiv.org/abs/2411.14384v5 Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction 2025-10-11T17:53:18Z

Existing feedforward image-to-3D methods mainly rely on 2D multi-view diffusion models that cannot guarantee 3D consistency. These methods easily collapse when changing the prompt view direction and mainly handle object-centric cases. In this paper, we propose a novel single-stage 3D diffusion model, DiffusionGS, for object generation and scene reconstruction from a single view. DiffusionGS directly outputs 3D Gaussian point clouds at each timestep to enforce view consistency and allow the model to generate robustly given prompt views of any directions, beyond object-centric inputs. Plus, to improve the capability and generality of DiffusionGS, we scale up 3D training data by developing a scene-object mixed training strategy. Experiments show that DiffusionGS yields improvements of 2.20 dB/23.25 and 1.34 dB/19.16 in PSNR/FID for objects and scenes than the state-of-the-art methods, without depth estimator. Plus, our method enjoys over 5$\times$ faster speed ($\sim$6s on an A100 GPU). Our Project page at https://caiyuanhao1998.github.io/project/DiffusionGS/ shows the video and interactive results. The code and models are publicly available at https://github.com/caiyuanhao1998/Open-DiffusionGS

2024-11-21T18:21:24Z ICCV 2025; A novel one-stage 3DGS-based diffusion for 3D object generation and scene reconstruction from a single view in ~6 seconds Yuanhao Cai He Zhang Kai Zhang Yixun Liang Mengwei Ren Fujun Luan Qing Liu Soo Ye Kim Jianming Zhang Zhifei Zhang Yuqian Zhou Yulun Zhang Xiaokang Yang Zhe Lin Alan Yuille http://arxiv.org/abs/2510.10256v1 Unlocking Thickness Modeling for Codimensional Contact Simulation 2025-10-11T15:25:05Z

In this work we analyze and address a fundamental restriction that blocks the reliable application of codimensional yarn-level and shell models with thickness, to simulate real-world woven and knit fabrics. As discretizations refine toward practical and accurate physical modeling, such models can generate non-physical contact forces with stencil-neighboring elements in the simulation mesh, leading to severe locking artifacts. While not well-documented in the literature, this restriction has so far been addressed with two alternatives with undesirable tradeoffs. One option is to restrict the mesh to coarse resolutions, however, this eliminates the possibility of accurate (and consistent) resolution simulations across real-world material variations. A second alternative instead seeks to cull contact pairs that can create such locking forces in the first place. This relaxes resolution restrictions but compromise robustness. Culling can and will generate unacceptable and unpredictable geometric intersections and tunneling that destroys weaving and knitting structures and cause unrecoverable pull-throughs. We address these challenges to simulating real-world materials with a new and practical contact-processing model for thickened codimensional simulation, that removes resolution restrictions, while guaranteeing contact-locking-free, non-intersecting simulations. We demonstrate the application of our model across a wide range of previously unavailable simulation scenarios, with real-world material yarn and fabric parameters and patterns, challenging simulation conditions and mesh resolutions, and both rod and shell models, integrated with the IPC barrier.

2025-10-11T15:25:05Z Gonzalo Gomez-Nogales Zhen Chen Rosalie Martin Elena Garces Danny M. Kaufman http://arxiv.org/abs/2505.22313v2 Large-Area Fabrication-Aware Computational Diffractive Optics 2025-10-11T14:23:07Z

Differentiable optics, as an emerging paradigm that jointly optimizes optics and (optional) image processing algorithms, has made innovative optical designs possible across a broad range of applications. Many of these systems utilize diffractive optical components (DOEs) for holography, PSF engineering, or wavefront shaping. Existing approaches have, however, mostly remained limited to laboratory prototypes, owing to a large quality gap between simulation and manufactured devices. We aim at lifting the fundamental technical barriers to the practical use of learned diffractive optical systems. To this end, we propose a fabrication-aware design pipeline for diffractive optics fabricated by direct-write grayscale lithography followed by nano-imprinting replication, which is directly suited for inexpensive mass production of large area designs. We propose a super-resolved neural lithography model that can accurately predict the 3D geometry generated by the fabrication process. This model can be seamlessly integrated into existing differentiable optics frameworks, enabling fabrication-aware, end-to-end optimization of computational optical systems. To tackle the computational challenges, we also devise tensor-parallel compute framework centered on distributing large-scale FFT computation across many GPUs. As such, we demonstrate large scale diffractive optics designs up to 32.16 mm $\times$ 21.44 mm, simulated on grids of up to 128,640 by 85,760 feature points. We find adequate agreement between simulation and fabricated prototypes for applications such as holography and PSF engineering. We also achieve high image quality from an imaging system comprised only of a single DOE, with images processed only by a Wiener filter utilizing the simulation PSF. We believe our findings lift the fabrication limitations for real-world applications of diffractive optics and differentiable optical design.

2025-05-28T12:56:46Z To be appeared in SIGGRAPH Asia and ACM Trans. on Graphics 2025. Code is available at https://github.com/Vandermode/LAFA Kaixuan Wei Hector A. Jimenez-Romero Hadi Amata Jipeng Sun Qiang Fu Felix Heide Wolfgang Heidrich 10.1145/3763358 http://arxiv.org/abs/2505.24796v2 TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor Cores 2025-10-11T09:58:24Z

3D Gaussian Splatting (3DGS) renders pixels by rasterizing Gaussian primitives, where conditional alpha-blending dominates the computational cost in the rendering pipeline. This paper proposes TC-GS, an algorithm-independent universal module that expands the applicability of Tensor Core (TCU) for 3DGS, leading to substantial speedups and seamless integration into existing 3DGS optimization frameworks. The key innovation lies in mapping alpha computation to matrix multiplication, fully utilizing otherwise idle TCUs in existing 3DGS implementations. TC-GS provides plug-and-play acceleration for existing top-tier acceleration algorithms and integrates seamlessly with rendering pipeline designs, such as Gaussian compression and redundancy elimination algorithms. Additionally, we introduce a global-to-local coordinate transformation to mitigate rounding errors from quadratic terms of pixel coordinates caused by Tensor Core half-precision computation. Extensive experiments demonstrate that our method maintains rendering quality while providing an additional 2.18x speedup over existing Gaussian acceleration algorithms, thereby achieving a total acceleration of up to 5.6x.

2025-05-30T16:58:18Z 15 pages, 6 figures Zimu Liao Jifeng Ding Siwei Cui Ruixuan Gong Boni Hu Yi Wang Hengjie Li XIngcheng Zhang Hui Wang Rong Fu http://arxiv.org/abs/2409.07148v6 Jump Restore Light Transport 2025-10-10T19:09:02Z

Markov chain Monte Carlo (MCMC) algorithms are indispensable when sampling from a complex, high-dimensional distribution by a conventional method is intractable. Even though MCMC is a powerful tool, it is also hard to control and tune in practice. Simultaneously achieving both rapid local exploration of the state space and efficient global discovery of the target distribution is a challenging task. In this work, we introduce a novel continuous-time MCMC formulation to the computer science community. Generalizing existing work from the statistics community, we propose a novel framework for adjusting an arbitrary family of Markov processes - used for local exploration of the state space only - to an overall process which is invariant with respect to a target~distribution. To demonstrate the potential of our framework, we focus on a simple, but yet insightful, application in light transport simulation. As a by-product, we introduce continuous-time MCMC sampling to the computer graphics community. We show how any existing MCMC-based light transport algorithm can be seamlessly integrated into our framework. We prove empirically and theoretically that the integrated version is superior to the ordinary algorithm. In fact, our approach will convert any existing algorithm into a highly parallelizable variant with shorter running time, smaller error and less variance.

2024-09-11T09:51:21Z Sascha Holl Gurprit Singh Hans-Peter Seidel 10.1145/3763286 http://arxiv.org/abs/2510.09570v1 Differential Analysis of Pseudo Haptic Feedback: Novel Comparative Study of Visual and Auditory Cue Integration for Psychophysical Evaluation 2025-10-10T17:22:41Z

Pseudo-haptics exploit carefully crafted visual or auditory cues to trick the brain into "feeling" forces that are never physically applied, offering a low-cost alternative to traditional haptic hardware. Here, we present a comparative psychophysical study that quantifies how visual and auditory stimuli combine to evoke pseudo-haptic pressure sensations on a commodity tablet. Using a Unity-based Rollball game, participants (n = 4) guided a virtual ball across three textured terrains while their finger forces were captured in real time with a Robotous RFT40 force-torque sensor. Each terrain was paired with a distinct rolling-sound profile spanning 440 Hz - 4.7 kHz, 440 Hz - 13.1 kHz, or 440 Hz - 8.9 kHz; crevice collisions triggered additional "knocking" bursts to heighten realism. Average tactile forces increased systematically with cue intensity: 0.40 N, 0.79 N and 0.88 N for visual-only trials and 0.41 N, 0.81 N and 0.90 N for audio-only trials on Terrains 1-3, respectively. Higher audio frequencies and denser visual textures both elicited stronger muscle activation, and their combination further reduced the force needed to perceive surface changes, confirming multisensory integration. These results demonstrate that consumer-grade isometric devices can reliably induce and measure graded pseudo-haptic feedback without specialized actuators, opening a path toward affordable rehabilitation tools, training simulators and assistive interfaces.

2025-10-10T17:22:41Z 17 Pages, 9 Figures Nishant Gautam Somya Sharma Peter Corcoran Kaspar Althoefer http://arxiv.org/abs/2510.09489v1 Two-Stage Gaussian Splatting Optimization for Outdoor Scene Reconstruction 2025-10-10T15:52:23Z

Outdoor scene reconstruction remains challenging due to the stark contrast between well-textured, nearby regions and distant backgrounds dominated by low detail, uneven illumination, and sky effects. We introduce a two-stage Gaussian Splatting framework that explicitly separates and optimizes these regions, yielding higher-fidelity novel view synthesis. In stage one, background primitives are initialized within a spherical shell and optimized using a loss that combines a background-only photometric term with two geometric regularizers: one constraining Gaussians to remain inside the shell, and another aligning them with local tangential planes. In stage two, foreground Gaussians are initialized from a Structure-from-Motion reconstruction, added and refined using the standard rendering loss, while the background set remains fixed but contributes to the final image formation. Experiments on diverse outdoor datasets show that our method reduces background artifacts and improves perceptual quality compared to state-of-the-art baselines. Moreover, the explicit background separation enables automatic, object-free environment map estimation, opening new possibilities for photorealistic outdoor rendering and mixed-reality applications.

2025-10-10T15:52:23Z Deborah Pintani Ariel Caputo Noah Lewis Marc Stamminger Fabio Pellacini Andrea Giachetti http://arxiv.org/abs/2504.04831v2 SMF: Template-free and Rig-free Animation Transfer using Kinetic Codes 2025-10-10T15:19:38Z

Animation retargetting applies sparse motion description (e.g., keypoint sequences) to a character mesh to produce a semantically plausible and temporally coherent full-body mesh sequence. Existing approaches come with restrictions -- they require access to template-based shape priors or artist-designed deformation rigs, suffer from limited generalization to unseen motion and/or shapes, or exhibit motion jitter. We propose Self-supervised Motion Fields (SMF), a self-supervised framework that is trained with only sparse motion representations, without requiring dataset-specific annotations, templates, or rigs. At the heart of our method are Kinetic Codes, a novel autoencoder-based sparse motion encoding, that exposes a semantically rich latent space, simplifying large-scale training. Our architecture comprises dedicated spatial and temporal gradient predictors, which are jointly trained in an end-to-end fashion. The combined network, regularized by the Kinetic Codes' latent space, has good generalization across both unseen shapes and new motions. We evaluated our method on unseen motion sampled from AMASS, D4D, Mixamo, and raw monocular video for animation transfer on various characters with varying shapes and topology. We report a new SoTA on the AMASS dataset in the context of generalization to unseen motion. Code, weights, and supplementary are available on the project webpage at https://motionfields.github.io/

2025-04-07T08:42:52Z SIGGRAPH Asia 2025 [ACM Transactions on Graphics] | Project website: https://motionfields.github.io/ Sanjeev Muralikrishnan Niladri Shekhar Dutt Niloy J. Mitra 10.1145/3763309 http://arxiv.org/abs/2510.09336v1 Quantum Trigonometric Bézier Curves 2025-10-10T12:42:37Z

In order to construct quantum trigonometric Bézier curves with shape parameter, one parameter family of trigonometric Bernstein basis functions are introduced. We study the total positivity of the basis functions to analyze the shape preserving properties of the quantum trigonometric Bézier curves. We also showed that quantum trigonometric Bézier curves can be evaluated by two different recursive evaluation algorithms. Finally, we have defined rational counterpart of quantum trigonometric Bézier curves and show that the rational quantum trigonometric Bézier curves posses nice shape preserving properties.

2025-10-10T12:42:37Z Çetin Dişibüyük http://arxiv.org/abs/2510.08166v2 Variable-Rate Texture Compression: Real-Time Rendering with JPEG 2025-10-10T12:05:42Z

Although variable-rate compressed image formats such as JPEG are widely used to efficiently encode images, they have not found their way into real-time rendering due to special requirements such as random access to individual texels. In this paper, we investigate the feasibility of variable-rate texture compression on modern GPUs using the JPEG format, and how it compares to the GPU-friendly fixed-rate compression approaches BC1 and ASTC. Using a deferred rendering pipeline, we are able to identify the subset of blocks that are needed for a given frame, decode these, and colorize the framebuffer's pixels. Despite the additional $\sim$0.17 bit per pixel that we require for our approach, JPEG maintains significantly better quality and compression rates compared to BC1, and depending on the type of image, outperforms or competes with ASTC. The JPEG rendering pipeline increases rendering duration by less than 0.3 ms on an RTX 4090, demonstrating that sophisticated variable-rate compression schemes are feasible on modern GPUs, even in VR. Source code and data sets are available at: https://github.com/elias1518693/jpeg_textures

2025-10-09T12:51:40Z Removed incorrect affiliation from overleaf-arxiv recompilation config issues Elias Kristmann Markus Schütz Michael Wimmer