https://arxiv.org/api//VKLRDADMa7Lq8PV6e4/Dryz6i82026-06-14T13:43:49Z932343515http://arxiv.org/abs/2604.20336v1Stability-Driven Motion Generation for Object-Guided Human-Human Co-Manipulation2026-04-22T08:31:40ZCo-manipulation requires multiple humans to synchronize their motions with a shared object while ensuring reasonable interactions, maintaining natural poses, and preserving stable states. However, most existing motion generation approaches are designed for single-character scenarios or fail to account for payload-induced dynamics. In this work, we propose a flow-matching framework that ensures the generated co-manipulation motions align with the intended goals while maintaining naturalness and effectiveness. Specifically, we first introduce a generative model that derives explicit manipulation strategies from the object's affordance and spatial configuration, which guide the motion flow toward successful manipulation. To improve motion quality, we then design an adversarial interaction prior that promotes natural individual poses and realistic inter-person interactions during co-manipulation. In addition, we also incorporate a stability-driven simulation into the flow matching process, which refines unstable interaction states through sampling-based optimization and directly adjusts the vector field regression to promote more effective manipulation. The experimental results demonstrate that our method achieves higher contact accuracy, lower penetration, and better distributional fidelity compared to state-of-the-art human-object interaction baselines. The code is available at https://github.com/boycehbz/StaCOM.2026-04-22T08:31:40ZCVPR 2026Jiahao XuXiaohan YuanXingchen WuChongyang XuKun LiBuzhen Huanghttp://arxiv.org/abs/2510.18263v2From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation2026-04-22T07:39:13ZSubject-driven image generation models face a fundamental trade-off between identity preservation (fidelity) and prompt adherence (editability). While online reinforcement learning (RL), specifically GPRO, offers a promising solution, we find that a naive application of GRPO leads to competitive degradation, as the simple linear aggregation of rewards with static weights causes conflicting gradient signals and a misalignment with the temporal dynamics of the diffusion process. To overcome these limitations, we propose Customized-GRPO, a novel framework featuring two key innovations: (i) Synergy-Aware Reward Shaping (SARS), a non-linear mechanism that explicitly penalizes conflicted reward signals and amplifies synergistic ones, providing a sharper and more decisive gradient. (ii) Time-Aware Dynamic Weighting (TDW), which aligns the optimization pressure with the model's temporal dynamics by prioritizing prompt-following in the early, identity preservation in the later. Extensive experiments demonstrate that our method significantly outperforms naive GRPO baselines, successfully mitigating competitive degradation. Our model achieves a superior balance, generating images that both preserve key identity features and accurately adhere to complex textual prompts.2025-10-21T03:32:26ZZiwei HuangYing ShuHao FangQuanyu LongWenya WangQiushi GuoTiezheng GeLeilei Ganhttp://arxiv.org/abs/2511.01233v4Towards Reliable Human Evaluations in Gesture Generation: Insights from a Community-Driven State-of-the-Art Benchmark2026-04-22T07:05:26ZWe review human evaluation practices in automatic, speech-driven 3D gesture generation and find a lack of standardisation and frequent use of flawed experimental setups. This leads to a situation where it is impossible to know how different methods compare, or what the state of the art is. In order to address common shortcomings of evaluation design, and to standardise future user studies in gesture-generation works, we introduce a detailed human evaluation protocol for the widely-used BEAT2 motion-capture dataset. Using this protocol, we conduct large-scale crowdsourced evaluation to rank six recent gesture-generation models -- each trained by its original authors -- across two key evaluation dimensions: motion realism and speech-gesture alignment. Our results show that 1) motion realism has become a saturated evaluation measure on the BEAT2 dataset, with older models performing on par with more recent approaches; 2) previous findings of high speech-gesture alignment do not hold up under rigorous evaluation, even for specialised models; and 3) the field must adopt disentangled assessments of motion quality and multimodal alignment for accurate benchmarking in order to make progress. To drive standardisation and enable new evaluation research, we release five hours of synthetic motion from the benchmarked models; over 750 rendered video stimuli from the user studies -- enabling new evaluations without requiring model reimplementation -- alongside our open-source rendering script, and 16,000 pairwise human preference votes collected for our benchmark.2025-11-03T05:17:28ZAccepted to CVPR 2026, Findings Track. 23 pages, 10 figures. The last two authors made equal contributionsRajmund NagyKTH Royal Institute of TechnologyHendric VossBielefeld UniversityThanh Hoang-MinhUniversity of Science -- VNUHCMMihail TsakovIndependent ResearcherTeodor NikolovMotorica ABZeyi ZhangPeking UniversityTenglong AoPeking UniversitySicheng YangHuawei Technologies LtdShaoli HuangAstribotYongkang ChengAstribotM. Hamza MughalMax-Planck Institute for Informatics, SICRishabh DabralMax-Planck Institute for Informatics, SICKiran ChhatreKTH Royal Institute of TechnologyChristian TheobaltMax-Planck Institute for Informatics, SICLibin LiuPeking UniversityStefan KoppBielefeld UniversityRachel McDonnellTrinity College DublinMichael NeffUniversity of California, DavisTaras KucherenkoSEED -- Electronic ArtsYoungwoo YoonElectronics and Telecommunications Research InstituteGustav Eje HenterKTH Royal Institute of TechnologyMotorica ABhttp://arxiv.org/abs/2604.21717v1Monte Carlo PDE Solvers for Nonlinear Radiative Boundary Conditions2026-04-22T04:26:49ZMonte Carlo PDE solvers have become increasingly popular for solving heat-related partial differential equations in geometry processing and computer graphics due to their robustness in handling complex geometries. While existing methods can handle Dirichlet, Neumann, and linear Robin boundary conditions, nonlinear boundary conditions arising from thermal radiation remain largely unexplored.
In this paper, we introduce a Picard-style fixed-point iteration framework that enables Monte Carlo PDE solvers to handle nonlinear radiative boundary conditions. While strict theoretical convergence is not generally guaranteed, our method remains stable and empirically convergent with a properly chosen relaxation coefficient. Even with imprecise initial boundary estimates, it progressively approaches the correct solution. Compared to standard linearization strategies, the proposed approach achieves significantly higher accuracy.
To further address the high variance inherent in Monte Carlo estimators, we propose a heteroscedastic regression-based denoising technique specifically designed for on-boundary solution estimates, filling a gap left by prior variance reduction methods that focus solely on interior points. We validate our approach through extensive evaluations on synthetic benchmarks and demonstrate its effectiveness on practical heat radiation simulations with complex geometries.2026-04-22T04:26:49ZAnchang BaoEnya ShenJianmin Wanghttp://arxiv.org/abs/2604.19892v1An Efficient Multilevel Preconditioned Nonlinear Conjugate Gradient Method for Incremental Potential Contact2026-04-21T18:13:52ZIncremental Potential Contact (IPC) guarantees intersection-free simulation but suffers from high computational costs due to the expensive Hessian assembly and linear solves required by Newton's method. While Preconditioned Nonlinear Conjugate Gradient (PNCG) avoids Hessian assembly, it has historically struggled with poor convergence in stiff, contact-rich scenarios due to the lack of effective preconditioners; simple Jacobi preconditioners fail to capture the global coupling, while advanced hierarchy-based preconditioners like Multilevel Additive Schwarz (MAS) are computationally prohibitive to rebuild at every nonlinear iteration. We present MAS-PNCG, a method that unlocks the power of hierarchical preconditioning for nonlinear optimization. Our key technical innovation is a Sparse-Input Woodbury update algorithm that incrementally adapts the fine-level MAS components to rapidly evolving contact sets. This bypasses the need for full preconditioner rebuilds, reducing maintenance cost to near-zero while capturing the complex spectral properties of the contact system. Furthermore, we replace heuristic PNCG search directions with a Hessian-aware 2D subspace minimization that optimally combines the preconditioned gradient and previous direction. We also apply a fast per-subdomain conservative CCD method that ensures penetration-free trajectories while avoiding overly restrictive global step sizes. Experiments demonstrate that our MAS-PNCG outperforms state-of-the-art Newton-PCG solvers, GIPC and StiffGIPC, both preconditioned with MAS up to 5.66$\times$ and 2.07$\times$ respectively.2026-04-21T18:13:52ZYu ZhangXing ShenKemeng HuangWei ChenYin YangTaku KomuraTiantian LiuXingang Panhttp://arxiv.org/abs/2604.17390v2MESA: A Training-Free Multi-Exemplar Deep Framework for Restoring Ancient Inscription Textures2026-04-21T13:03:32ZAncient inscriptions frequently suffer missing or corrupted regions from fragmentation, erosion, or other damage, hindering reading, and analysis. We review prior image restoration methods and their applicability to inscription image recovery, then introduce MESA (Multi-Exemplar, Style-Aware) -an image-level restoration method that uses well-preserved exemplar inscriptions (from the same epigraphic monument, material, or similar letterforms) to guide reconstruction of damaged text. MESA encodes VGG19 convolutional features as Gram matrices to capture exemplar texture, style, and stroke structure; for each neural network layer it selects the exemplar minimizing Mean-Squared Displacement (MSD) to the damaged input. Layer-wise contribution weights are derived from Optical Character Recognition-estimated character widths in the exemplar set to bias filters toward scales matching letter geometry, and a training mask preserves intact regions so synthesis is restricted to damaged areas. We also summarize prior network architectures and exemplar and single-image synthesis, inpainting, and Generative Adversarial Network (GAN) approaches, highlighting limitations that MESA addresses. Comparative experiments demonstrate the advantages of MESA. Finally, we provide a practical roadmap for choosing restoration strategies given available exemplars and metadata.2026-04-19T11:38:03ZVasileios ToulatzisSofia TheodoridouIoannis Fudoshttp://arxiv.org/abs/2601.22755v2Synthetic Abundance Maps for Unsupervised Super-Resolution of Hyperspectral Remote Sensing Images2026-04-21T11:39:58ZHyperspectral single image super-resolution (HS-SISR) aims to enhance the spatial resolution of hyperspectral images to fully exploit their spectral information. While considerable progress has been made in this field, most existing methods are supervised and require ground truth data for training-data that is often unavailable in practice. To overcome this limitation, we propose a novel unsupervised training framework for HS-SISR, based on synthetic abundance data, where no high-resolution ground-truth reference is required for training. The approach begins by unmixing the hyperspectral image into endmembers and abundances. A neural network is then trained to perform abundance super-resolution using synthetic abundances only. These synthetic abundance maps are generated from a dead leaves model whose characteristics are inherited from the low-resolution image to be super-resolved and from the known point spread function (PSF) of the hyperspectral sensor. This trained network is subsequently used to enhance the spatial resolution of the original image's abundances, and the final super-resolution hyperspectral image is reconstructed by combining them with the endmembers. Experimental results demonstrate both the training value of the synthetic data and the effectiveness of the proposed method across 3 datasets, 3 scaling factors, and several evaluation metrics. The code is available at https://github.com/xinxinxu99/SISR-DL.git2026-01-30T09:31:46ZIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2026, pp. 1-14Xinxin XuLTCI, IDS, IP Paris, IMAGESYann GousseauLTCI, IMAGESChristophe KervazoIDS, IMAGESSaïd LadjalIMAGES, LTCI10.1109/JSTARS.2026.3682469http://arxiv.org/abs/2604.19202v1SketchFaceGS: Real-Time Sketch-Driven Face Editing and Generation with Gaussian Splatting2026-04-21T08:09:13Z3D Gaussian representations have emerged as a powerful paradigm for digital head modeling, achieving photorealistic quality with real-time rendering. However, intuitive and interactive creation or editing of 3D Gaussian head models remains challenging. Although 2D sketches provide an ideal interaction modality for fast, intuitive conceptual design, they are sparse, depth-ambiguous, and lack high-frequency appearance cues, making it difficult to infer dense, geometrically consistent 3D Gaussian structures from strokes - especially under real-time constraints. To address these challenges, we propose SketchFaceGS, the first sketch-driven framework for real-time generation and editing of photorealistic 3D Gaussian head models from 2D sketches. Our method uses a feed-forward, coarse-to-fine architecture. A Transformer-based UV feature-prediction module first reconstructs a coarse but geometrically consistent UV feature map from the input sketch, and then a 3D UV feature enhancement module refines it with high-frequency, photorealistic detail to produce a high-fidelity 3D head. For editing, we introduce a UV Mask Fusion technique combined with a layer-by-layer feature-fusion strategy, enabling precise, real-time, free-viewpoint modifications. Extensive experiments show that SketchFaceGS outperforms existing methods in both generation fidelity and editing flexibility, producing high-quality, editable 3D heads from sketches in a single forward pass.2026-04-21T08:09:13ZAccepted to CVPR 2026 as a Highlight. Jittor implementation: https://github.com/gogoneural/SketchFaceGS_jittor. (C) 2026 IEEE. Personal use of this material is permittedBo LiJiahao KangYubo MaFeng-Lin LiuBin LiuFang-Lue ZhangLin Gaohttp://arxiv.org/abs/2604.19194v1sumo3Dviz: A three dimensional traffic visualisation2026-04-21T08:04:14ZTraffic microsimulation software such as SUMO generate rich spatio-temporal data describing individual vehicle movements, interactions, and support the development of control strategies. While numerical outputs and 2D visualisations are sufficient for many technical analyses, they are often inadequate for applications that require intuitive interpretation, effective communication, or human-centred evaluation. In particular, user studies in mobility psychology, acceptance research, and virtual experience stated-preference experiments require realistic visualisations that reflect how traffic scenarios are perceived from a human perspective. This paper introduces sumo3Dviz, a lightweight, open-source 3D visualisation pipeline for SUMO traffic simulations. It converts standard SUMO simulation outputs, such as vehicle trajectories and signal states, into high-quality 3D renderings using a Python-based framework. In contrast to heavyweight game-engine-based approaches or tightly coupled co-simulation frameworks, sumo3Dviz is designed to be simple, scriptable, and reproducible. The tool is installable through the pip package manager, runs across operating systems, and works independently of any proprietary software or licenses. sumo3Dviz supports both external camera views and first-person perspectives, enabling cinematic overviews as well as driver-level experiences. The rendering process is optimized for batch video generation, making it suitable for large-scale scenario visualisation, educational demonstrations, and automated experiment pipelines. A key technical challenge addressed by the tool is trajectory interpolation and orientation smoothing, enabling visually coherent motion from discrete simulation outputs. Source Code on project's GitHub page: https://github.com/DerKevinRiehl/sumo3dviz/.2026-04-21T08:04:14ZKevin RiehlJulius SchlapbachAnastasios KouvelasMichail A. Makridishttp://arxiv.org/abs/2509.18831v2Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters2026-04-21T07:20:23ZRecent advances in diffusion models have significantly improved image and video synthesis. In addition, several concept control methods have been proposed to enable fine-grained, continuous, and flexible control over free-form text prompts. However, these methods not only require intensive training time and GPU memory usage to learn the sliders or embeddings but also need to be retrained for different diffusion backbones, limiting their scalability and adaptability. To address these limitations, we introduce Text Slider, a lightweight, efficient and plug-and-play framework that identifies low-rank directions within a pre-trained text encoder, enabling continuous control of visual concepts while significantly reducing training time, GPU memory consumption, and the number of trainable parameters. Furthermore, Text Slider supports multi-concept composition and continuous control, enabling fine-grained and flexible manipulation in both image and video synthesis. We show that Text Slider enables smooth and continuous modulation of specific attributes while preserving the original spatial layout and structure of the input. Text Slider achieves significantly better efficiency: 5$\times$ faster training than Concept Slider and 47$\times$ faster than Attribute Control, while reducing GPU memory usage by nearly 2$\times$ and 4$\times$, respectively.2025-09-23T09:17:18ZAccepted by WACV 2026. We provide more experimental results on the train-free version of our algorithm. Project page: https://textslider.github.io/ Code: https://github.com/aiiu-lab/TextSliderPin-Yen ChiuI-Sheng FangJun-Cheng Chenhttp://arxiv.org/abs/2604.19127v1OT-UVGS: Revisiting UV Mapping for Gaussian Splatting as a Capacity Allocation Problem2026-04-21T06:19:22ZUV-parameterized Gaussian Splatting (UVGS) maps an unstructured set of 3D Gaussians to a regular UV tensor, enabling compact storage and explicit control of representation capacity. Existing UVGS, however, uses a deterministic spherical pro- jection to assign Gaussians to UV locations. Because this mapping ignores the global Gaussian distribution, it often leaves many UV slots empty while causing frequent collisions in dense regions. We reinterpret UV mapping as a capacity-allocation problem under a fixed UV budget and propose OT-UVGS, a lightweight, separable one-dimensional optimal-transport-inspired mapping that globally couples assignments while preserving the original UVGS representation. The method is implemented with rank-based sorting, has O(N log N) complexity for N Gaussians, and can be used as a drop-in replacement for spherical UVGS. Across 184 object-centric scenes and the Mip-NeRF dataset, OT-UVGS consistently improves peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS) under the same UV resolution and per-slot capacity (K=1). These gains are accompanied by substantially better UV utilization, including higher non-empty slot ratios, fewer collisions, and higher Gaussian retention. Our results show that revisiting the mapping alone can unlock a significant fraction of the latent capacity of UVGS.2026-04-21T06:19:22ZAccepted to Eurographics 2026 Short PapersByunghyun Kimhttp://arxiv.org/abs/2604.18886v1Matrix-Free Multigrid with Algebraically Consistent Coarsening on Adaptive Octrees2026-04-20T22:08:19ZWe present a matrix-free GPU multigrid preconditioner with algebraically consistent coarsening for solving Poisson equations on adaptive octree grids with irregular domains. Within uniform-resolution regions, the coarsening satisfies the Galerkin principle. At T-junctions between refinement levels, we propose a flux-consistent coarse-grid correction that restores cross-level consistency while preserving the compact matrix-free representation. The coarse operators are stored in a compact matrix-free form suitable for parallel execution on GPUs. Numerical experiments demonstrate second-order accuracy, grid-independent convergence when used with PCG, and robust performance on cut-cell problems arising in fluid simulation. On a single NVIDIA RTX 4090 GPU, the solver achieves full-solve throughputs above 200 million cells per second on analytical Poisson tests and above 70 million cells per second on pressure projection problems in fluid simulation.2026-04-20T22:08:19ZSubmitted to Journal of Computational Physics in Apr 20, 2026Mengdi WangYuchen SunBo Zhuhttp://arxiv.org/abs/2511.16988v2PhysMorph-GS: Render-Guided Volumetric Morphing with Differentiable Physics2026-04-20T18:47:57ZDifferentiable particle-based simulation can produce physically plausible motion, but target-driven volumetric shape morphing remains underconstrained: physics-only mass matching captures coarse global structure yet struggles with fine geometric detail, while naive image-space coupling destabilizes elastic dynamics. We present PhysMorph-GS, a render-guided morphing framework that couples material point method simulation with differentiable 3D Gaussian splatting. The key idea is to inject visual supervision through the deformation gradient $\mathbf{F}$ rather than particle positions, so render gradients act as control-space guidance while trajectories remain governed by physics. We further introduce phased Chamfer-guided plasticity that delays rest-state migration until coarse structure has formed; in practice, rendering is evaluated on a surface-focused particle subset for efficiency and gradient concentration. Relative to a physics-only baseline, our method reduces silhouette error by 25.8\%, 10.8\%, and 49.9\% on representative examples, with the largest gains on models with thin features. These results suggest that the main challenge in render-guided differentiable morphing is not simply adding stronger image losses, but injecting visual guidance in a way that remains compatible with elastic simulation. We further observe that plasticity-driven rest-state migration drives different sources toward a shared target-determined attractor, distinguishing physics-based morphing from interpolation between registered shape pairs.2025-11-21T06:51:39Z12pages, 11figuresChang-Yong SongDavid Hydehttp://arxiv.org/abs/2604.18557v1SynAgent: Generalizable Cooperative Humanoid Manipulation via Solo-to-Cooperative Agent Synergy2026-04-20T17:46:20ZControllable cooperative humanoid manipulation is a fundamental yet challenging problem for embodied intelligence, due to severe data scarcity, complexities in multi-agent coordination, and limited generalization across objects. In this paper, we present SynAgent, a unified framework that enables scalable and physically plausible cooperative manipulation by leveraging Solo-to-Cooperative Agent Synergy to transfer skills from single-agent human-object interaction to multi-agent human-object-human scenarios. To maintain semantic integrity during motion transfer, we introduce an interaction-preserving retargeting method based on an Interact Mesh constructed via Delaunay tetrahedralization, which faithfully maintains spatial relationships among humans and objects. Building upon this refined data, we propose a single-agent pretraining and adaptation paradigm that bootstraps synergistic collaborative behaviors from abundant single-human data through decentralized training and multi-agent PPO. Finally, we develop a trajectory-conditioned generative policy using a conditional VAE, trained via multi-teacher distillation from motion imitation priors to achieve stable and controllable object-level trajectory execution. Extensive experiments demonstrate that SynAgent significantly outperforms existing baselines in both cooperative imitation and trajectory-conditioned control, while generalizing across diverse object geometries. Codes and data will be available after publication. Project Page: http://yw0208.github.io/synagent2026-04-20T17:46:20ZWei YaoHaohan MaHongwen ZhangYunlian SunLiangjun XingZhile YangYuanjun GuoYebin LiuJinhui Tanghttp://arxiv.org/abs/2604.18468v1Asset Harvester: Extracting 3D Assets from Autonomous Driving Logs for Simulation2026-04-20T16:20:57ZClosed-loop simulation is a core component of autonomous vehicle (AV) development, enabling scalable testing, training, and safety validation before real-world deployment. Neural scene reconstruction converts driving logs into interactive 3D environments for simulation, but it does not produce complete 3D object assets required for agent manipulation and large-viewpoint novel-view synthesis. To address this challenge, we present Asset Harvester, an image-to-3D model and end-to-end pipeline that converts sparse, in-the-wild object observations from real driving logs into complete, simulation-ready assets. Rather than relying on a single model component, we developed a system-level design for real-world AV data that combines large-scale curation of object-centric training tuples, geometry-aware preprocessing across heterogeneous sensors, and a robust training recipe that couples sparse-view-conditioned multiview generation with 3D Gaussian lifting. Within this system, SparseViewDiT is explicitly designed to address limited-angle views and other real-world data challenges. Together with hybrid data curation, augmentation, and self-distillation, this system enables scalable conversion of sparse AV object observations into reusable 3D assets.2026-04-20T16:20:57ZNVIDIA white paper. The project page: https://research.nvidia.com/labs/sil/projects/asset-harvester/Tianshi CaoJiawei RenYuxuan ZhangJaewoo SeoJiahui HuangShikhar SolankiHaotian ZhangMingfei GuoHaithem TurkiMuxingzi LiYue ZhuSipeng ZhangZan GojcicSanja FidlerKangxue Yin