https://arxiv.org/api/N2Jnw3tOkL5kxfyapVsdPisJLSg 2026-06-25T18:01:56Z 9383 1305 15 http://arxiv.org/abs/2602.17044v1 InstantRetouch: Personalized Image Retouching without Test-time Fine-tuning Using an Asymmetric Auto-Encoder 2025-11-12T15:20:04Z

Personalized image retouching aims to adapt retouching style of individual users from reference examples, but existing methods often require user-specific fine-tuning or fail to generalize effectively. To address these challenges, we introduce $\textbf{InstantRetouch}$, a general framework for personalized image retouching that instantly adapts to user retouching styles without any test-time fine-tuning. It employs an $\textit{asymmetric auto-encoder}$ to encode the retouching style from paired examples into a content disentangled latent representation that enables faithful transfer of the retouching style to new images. To adaptively apply the encoded retouching style to new images, we further propose $\textit{retrieval-augmented retouching}$ (RAR), which retrieves and aggregates style latents from reference pairs most similar in content to the query image. With these components, $\textbf{InstantRetouch}$ enables superior and generic content-aware retouching personalization across diverse scenarios, including single-reference, multi-reference, and mixed-style setups, while also generalizing out of the box to photorealistic style transfer.

2025-11-12T15:20:04Z 19 pages, 11 figures Temesgen Muruts Weldengus Binnan Liu Fei Kou Youwei Lyu Jinwei Chen Qingnan Fan Changqing Zou http://arxiv.org/abs/2511.04029v3 Faithful Contouring: Near-Lossless 3D Voxel Representation Free from Iso-surface 2025-11-12T13:09:40Z

Accurate and efficient voxelized representations of 3D meshes are the foundation of 3D reconstruction and generation. However, existing representations based on iso-surface heavily rely on water-tightening or rendering optimization, which inevitably compromise geometric fidelity. We propose Faithful Contouring, a sparse voxelized representation that supports 2048+ resolutions for arbitrary meshes, requiring neither converting meshes to field functions nor extracting the isosurface during remeshing. It achieves near-lossless fidelity by preserving sharpness and internal structures, even for challenging cases with complex geometry and topology. The proposed method also shows flexibility for texturing, manipulation, and editing. Beyond representation, we design a dual-mode autoencoder for Faithful Contouring, enabling scalable and detail-preserving shape reconstruction. Extensive experiments show that Faithful Contouring surpasses existing methods in accuracy and efficiency for both representation and reconstruction. For direct representation, it achieves distance errors at the $10^{-5}$ level; for mesh reconstruction, it yields a 93\% reduction in Chamfer Distance and a 35\% improvement in F-score over strong baselines, confirming superior fidelity as a representation for 3D learning tasks.

2025-11-06T03:56:12Z Yihao Luo Xianglong He Chuanyu Pan Yiwen Chen Jiaqi Wu Yangguang Li Wanli Ouyang Yuanming Hu Guang Yang ChoonHwai Yap http://arxiv.org/abs/2411.16602v2 Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models 2025-11-12T11:28:10Z

Scalable Vector Graphics (SVG) has become the de facto standard for vector graphics in digital design, offering resolution independence and precise control over individual elements. Despite their advantages, creating high-quality SVG content remains challenging, as it demands technical expertise with professional editing software and a considerable time investment to craft complex shapes. Recent text-to-SVG generation methods aim to make vector graphics creation more accessible, but they still encounter limitations in shape regularity, generalization ability, and expressiveness. To address these challenges, we introduce Chat2SVG, a hybrid framework that combines the strengths of Large Language Models (LLMs) and image diffusion models for text-to-SVG generation. Our approach first uses an LLM to generate semantically meaningful SVG templates from basic geometric primitives. Guided by image diffusion models, a dual-stage optimization pipeline refines paths in latent space and adjusts point coordinates to enhance geometric complexity. Extensive experiments show that Chat2SVG outperforms existing methods in visual fidelity, path regularity, and semantic alignment. Additionally, our system enables intuitive editing through natural language instructions, making professional vector graphics creation accessible to all users.

2024-11-25T17:31:57Z Project Page: https://chat2svg.github.io/ Ronghuan Wu Wanchao Su Jing Liao http://arxiv.org/abs/2602.17040v1 Fuse3D: Generating 3D Assets Controlled by Multi-Image Fusion 2025-11-12T08:21:38Z

Recently, generating 3D assets with the control of condition images has achieved impressive quality. However, existing 3D generation methods are limited to handling a single control objective and lack the ability to utilize multiple images to independently control different regions of a 3D asset, which hinders their flexibility in applications. We propose Fuse3D, a novel method that enables generating 3D assets under the control of multiple images, allowing for the seamless fusion of multi-level regional controls from global views to intricate local details. First, we introduce a Multi-Condition Fusion Module to integrate the visual features from multiple image regions. Then, we propose a method to automatically align user-selected 2D image regions with their associated 3D regions based on semantic cues. Finally, to resolve control conflicts and enhance local control features from multi-condition images, we introduce a Local Attention Enhancement Strategy that flexibly balances region-specific feature fusion. Overall, we introduce the first method capable of controllable 3D asset generation from multiple condition images. The experimental results indicate that Fuse3D can flexibly fuse multiple 2D image regions into coherent 3D structures, resulting in high-quality 3D assets. Code and data for this paper are at https://jinnmnm.github.io/Fuse3d.github.io/.

2025-11-12T08:21:38Z Xuancheng Jin Rengan Xie Wenting Zheng Rui Wang Hujun Bao Yuchi Huo http://arxiv.org/abs/2511.08980v1 A Finite Difference Approximation of Second Order Regularization of Neural-SDFs 2025-11-12T04:56:08Z

We introduce a finite-difference framework for curvature regularization in neural signed distance field (SDF) learning. Existing approaches enforce curvature priors using full Hessian information obtained via second-order automatic differentiation, which is accurate but computationally expensive. Others reduced this overhead by avoiding explicit Hessian assembly, but still required higher-order differentiation. In contrast, our method replaces these operations with lightweight finite-difference stencils that approximate second derivatives using the well known Taylor expansion with a truncation error of O(h^2), and can serve as drop-in replacements for Gaussian curvature and rank-deficiency losses. Experiments demonstrate that our finite-difference variants achieve reconstruction fidelity comparable to their automatic-differentiation counterparts, while reducing GPU memory usage and training time by up to a factor of two. Additional tests on sparse, incomplete, and non-CAD data confirm that the proposed formulation is robust and general, offering an efficient and scalable alternative for curvature-aware SDF learning.

2025-11-12T04:56:08Z SIGGRAPH Asia Technical Communications, 6 pages, 6 figures, preprint Haotian Yin Aleksander Plocharski Michal Jan Wlodarczyk Przemyslaw Musialski 10.1145/3757376.3771413. http://arxiv.org/abs/2511.08079v1 Deep Inverse Shading: Consistent Albedo and Surface Detail Recovery via Generative Refinement 2025-11-11T10:29:40Z

Reconstructing human avatars using generative priors is essential for achieving versatile and realistic avatar models. Traditional approaches often rely on volumetric representations guided by generative models, but these methods require extensive volumetric rendering queries, leading to slow training. Alternatively, surface-based representations offer faster optimization through differentiable rasterization, yet they are typically limited by vertex count, restricting mesh resolution and scalability when combined with generative priors. Moreover, integrating generative priors into physically based human avatar modeling remains largely unexplored. To address these challenges, we introduce DIS (Deep Inverse Shading), a unified framework for high-fidelity, relightable avatar reconstruction that incorporates generative priors into a coherent surface representation. DIS centers on a mesh-based model that serves as the target for optimizing both surface and material details. The framework fuses multi-view 2D generative surface normal predictions, rich in detail but often inconsistent, into the central mesh using a normal conversion module. This module converts generative normal outputs into per-triangle surface offsets via differentiable rasterization, enabling the capture of fine geometric details beyond sparse vertex limitations. Additionally, DIS integrates a de-shading module to recover accurate material properties. This module refines albedo predictions by removing baked-in shading and back-propagates reconstruction errors to optimize the geometry. Through joint optimization of geometry and material appearance, DIS achieves physically consistent, high-quality reconstructions suitable for accurate relighting. Our experiments show that DIS delivers SOTA relighting quality, enhanced rendering efficiency, lower memory consumption, and detailed surface reconstruction.

2025-11-11T10:29:40Z AAAI2026 Jiacheng Wu Ruiqi Zhang Jie Chen http://arxiv.org/abs/2405.15056v3 ElastoGen: 4D Generative Elastodynamics 2025-11-11T07:54:07Z

We present ElastoGen, a knowledge-driven AI model that generates physically accurate 4D elastodynamics. Unlike deep models that learn from video- or image-based observations, ElastoGen leverages the principles of physics and learns from established mathematical and optimization procedures. The core idea of ElastoGen is converting the differential equation, corresponding to the nonlinear force equilibrium, into a series of iterative local convolution-like operations, which naturally fit deep architectures. We carefully build our network module following this overarching design philosophy. ElastoGen is much more lightweight in terms of both training requirements and network scale than deep generative models. Because of its alignment with actual physical procedures, ElastoGen efficiently generates accurate dynamics for a wide range of hyperelastic materials and can be easily integrated with upstream and downstream deep modules to enable end-to-end 4D generation.

2024-05-23T21:09:36Z Yutao Feng Yintong Shang Xiang Feng Lei Lan Shandian Zhe Tianjia Shao Hongzhi Wu Kun Zhou Chenfanfu Jiang Yin Yang http://arxiv.org/abs/2511.07860v1 TouchWalker: Real-Time Avatar Locomotion from Touchscreen Finger Walking 2025-11-11T05:51:28Z

We present TouchWalker, a real-time system for controlling full-body avatar locomotion using finger-walking gestures on a touchscreen. The system comprises two main components: TouchWalker-MotionNet, a neural motion generator that synthesizes full-body avatar motion on a per-frame basis from temporally sparse two-finger input, and TouchWalker-UI, a compact touch interface that interprets user touch input to avatar-relative foot positions. Unlike prior systems that rely on symbolic gesture triggers or predefined motion sequences, TouchWalker uses its neural component to generate continuous, context-aware full-body motion on a per-frame basis-including airborne phases such as running, even without input during mid-air steps-enabling more expressive and immediate interaction. To ensure accurate alignment between finger contacts and avatar motion, it employs a MoE-GRU architecture with a dedicated foot-alignment loss. We evaluate TouchWalker in a user study comparing it to a virtual joystick baseline with predefined motion across diverse locomotion tasks. Results show that TouchWalker improves users' sense of embodiment, enjoyment, and immersion.

2025-11-11T05:51:28Z Accepted to ISMAR 2025 Geuntae Park Jiwon Yi Taehyun Rhee Kwanguk Kim Yoonsang Lee http://arxiv.org/abs/2511.07418v1 Lightning Grasp: High Performance Procedural Grasp Synthesis with Contact Fields 2025-11-10T18:59:44Z

Despite years of research, real-time diverse grasp synthesis for dexterous hands remains an unsolved core challenge in robotics and computer graphics. We present Lightning Grasp, a novel high-performance procedural grasp synthesis algorithm that achieves orders-of-magnitude speedups over state-of-the-art approaches, while enabling unsupervised grasp generation for irregular, tool-like objects. The method avoids many limitations of prior approaches, such as the need for carefully tuned energy functions and sensitive initialization. This breakthrough is driven by a key insight: decoupling complex geometric computation from the search process via a simple, efficient data structure - the Contact Field. This abstraction collapses the problem complexity, enabling a procedural search at unprecedented speeds. We open-source our system to propel further innovation in robotic manipulation.

2025-11-10T18:59:44Z Code: https://github.com/zhaohengyin/lightning-grasp Zhao-Heng Yin Pieter Abbeel http://arxiv.org/abs/2411.12015v2 M^3ashy: Multi-Modal Material Synthesis via Hyperdiffusion 2025-11-10T16:16:02Z

High-quality material synthesis is essential for replicating complex surface properties to create realistic scenes. Despite advances in the generation of material appearance based on analytic models, the synthesis of real-world measured BRDFs remains largely unexplored. To address this challenge, we propose M^3ashy, a novel multi-modal material synthesis framework based on hyperdiffusion. M^3ashy enables high-quality reconstruction of complex real-world materials by leveraging neural fields as a compact continuous representation of BRDFs. Furthermore, our multi-modal conditional hyperdiffusion model allows for flexible material synthesis conditioned on material type, natural language descriptions, or reference images, providing greater user control over material generation. To support future research, we contribute two new material datasets and introduce two BRDF distributional metrics for more rigorous evaluation. We demonstrate the effectiveness of Mashy through extensive experiments, including a novel statistics-based constrained synthesis, which enables the generation of materials of desired categories.

2024-11-18T19:56:54Z Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 40, No. 16, 2026 Chenliang Zhou Zheyuan Hu Alejandro Sztrajman Yancheng Cai Yaru Liu Cengiz Oztireli 10.1609/aaai.v40i16.38363 http://arxiv.org/abs/2510.14270v3 GauSSmart: Enhanced 3D Reconstruction through 2D Foundation Models and Geometric Filtering 2025-11-10T16:16:00Z

Scene reconstruction has emerged as a central challenge in computer vision, with approaches such as Neural Radiance Fields (NeRF) and Gaussian Splatting achieving remarkable progress. While Gaussian Splatting demonstrates strong performance on large-scale datasets, it often struggles to capture fine details or maintain realism in regions with sparse coverage, largely due to the inherent limitations of sparse 3D training data. In this work, we propose GauSSmart, a hybrid method that effectively bridges 2D foundational models and 3D Gaussian Splatting reconstruction. Our approach integrates established 2D computer vision techniques, including convex filtering and semantic feature supervision from foundational models such as DINO, to enhance Gaussian-based scene reconstruction. By leveraging 2D segmentation priors and high-dimensional feature embeddings, our method guides the densification and refinement of Gaussian splats, improving coverage in underrepresented areas and preserving intricate structural details. We validate our approach across three datasets, where GauSSmart consistently outperforms existing Gaussian Splatting in the majority of evaluated scenes. Our results demonstrate the significant potential of hybrid 2D-3D approaches, highlighting how the thoughtful combination of 2D foundational models with 3D reconstruction pipelines can overcome the limitations inherent in either approach alone.

2025-10-16T03:38:26Z Alexander Valverde Brian Xu Yuyin Zhou Meng Xu Hongyun Wang http://arxiv.org/abs/2503.10637v4 Distilling Diversity and Control in Diffusion Models 2025-11-10T15:36:25Z

Distilled diffusion models generate images in far fewer timesteps but suffer from reduced sample diversity when generating multiple outputs from the same prompt. To understand this phenomenon, we first investigate whether distillation damages concept representations by examining if the required diversity is properly learned. Surprisingly, distilled models retain the base model's representational structure: control mechanisms like Concept Sliders and LoRAs transfer seamlessly without retraining, and SliderSpace analysis reveals distilled models possess variational directions needed for diversity yet fail to activate them. This redirects our investigation to understanding how the generation dynamics differ between base and distilled models. Using $\hat{\mathbf{x}}_{0}$ trajectory visualization, we discover distilled models commit to their final image structure almost immediately at the first timestep, while base models distribute structural decisions across many steps. To test whether this first-step commitment causes the diversity loss, we introduce diversity distillation, a hybrid approach using the base model for only the first critical timestep before switching to the distilled model. This single intervention restores sample diversity while maintaining computational efficiency. We provide both causal validation and theoretical support showing why the very first timestep concentrates the diversity bottleneck in distilled models. Our code and data are available at https://distillation.baulab.info/

2025-03-13T17:59:56Z Project Page: https://distillation.baulab.info/ Rohit Gandikota David Bau http://arxiv.org/abs/2511.07206v1 Geometric implicit neural representations for signed distance functions 2025-11-10T15:33:02Z

\textit{Implicit neural representations} (INRs) have emerged as a promising framework for representing signals in low-dimensional spaces. This survey reviews the existing literature on the specialized INR problem of approximating \textit{signed distance functions} (SDFs) for surface scenes, using either oriented point clouds or a set of posed images. We refer to neural SDFs that incorporate differential geometry tools, such as normals and curvatures, in their loss functions as \textit{geometric} INRs. The key idea behind this 3D reconstruction approach is to include additional \textit{regularization} terms in the loss function, ensuring that the INR satisfies certain global properties that the function should hold -- such as having unit gradient in the case of SDFs. We explore key methodological components, including the definition of INR, the construction of geometric loss functions, and sampling schemes from a differential geometry perspective. Our review highlights the significant advancements enabled by geometric INRs in surface reconstruction from oriented point clouds and posed images.

2025-11-10T15:33:02Z Luiz Schirmer Tiago Novello Vinícius da Silva Guilherme Schardong Daniel Perazzo Hélio Lopes Nuno Gonçalves Luiz Velho 10.1016/j.cag.2024.104085 http://arxiv.org/abs/2511.06765v1 Robust and High-Fidelity 3D Gaussian Splatting: Fusing Pose Priors and Geometry Constraints for Texture-Deficient Outdoor Scenes 2025-11-10T06:45:08Z

3D Gaussian Splatting (3DGS) has emerged as a key rendering pipeline for digital asset creation due to its balance between efficiency and visual quality. To address the issues of unstable pose estimation and scene representation distortion caused by geometric texture inconsistency in large outdoor scenes with weak or repetitive textures, we approach the problem from two aspects: pose estimation and scene representation. For pose estimation, we leverage LiDAR-IMU Odometry to provide prior poses for cameras in large-scale environments. These prior pose constraints are incorporated into COLMAP's triangulation process, with pose optimization performed via bundle adjustment. Ensuring consistency between pixel data association and prior poses helps maintain both robustness and accuracy. For scene representation, we introduce normal vector constraints and effective rank regularization to enforce consistency in the direction and shape of Gaussian primitives. These constraints are jointly optimized with the existing photometric loss to enhance the map quality. We evaluate our approach using both public and self-collected datasets. In terms of pose optimization, our method requires only one-third of the time while maintaining accuracy and robustness across both datasets. In terms of scene representation, the results show that our method significantly outperforms conventional 3DGS pipelines. Notably, on self-collected datasets characterized by weak or repetitive textures, our approach demonstrates enhanced visualization capabilities and achieves superior overall performance. Codes and data will be publicly available at https://github.com/justinyeah/normal_shape.git.

2025-11-10T06:45:08Z 7 pages, 3 figures. Accepted by IROS 2025 Meijun Guo Yongliang Shi Caiyun Liu Yixiao Feng Ming Ma Tinghai Yan Weining Lu Bin Liang http://arxiv.org/abs/2511.06674v1 Modeling and Topology Estimation of Low Rank Dynamical Networks 2025-11-10T03:42:35Z

Conventional topology learning methods for dynamical networks become inapplicable to processes exhibiting low-rank characteristics. To address this, we propose the low rank dynamical network model which ensures identifiability. By employing causal Wiener filtering, we establish a necessary and sufficient condition that links the sparsity pattern of the filter to conditional Granger causality. Building on this theoretical result, we develop a consistent method for estimating all network edges. Simulation results demonstrate the parsimony of the proposed framework and consistency of the topology estimation approach.

2025-11-10T03:42:35Z Wenqi Cao Aming Li