https://arxiv.org/api/7Z/KTDSUqEfFzZfQPOQVczCcx882026-04-11T16:17:31Z17145033015http://arxiv.org/abs/2604.07486v1Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation2026-04-08T18:26:34ZLarge language models (LLMs) have emerged as a powerful tool for synthetic data generation. A particularly important use case is producing synthetic replicas of private text, which requires carefully balancing privacy and utility. We propose Realistic and Privacy-Preserving Synthetic Data Generation (RPSG), which leverages privacy-preserving mechanisms, including formal differential privacy (DP); and private seeds, in particular text containing personal information, to generate realistic synthetic data. Comprehensive experiments against state-of-the-art private synthetic data generation methods demonstrate that RPSG achieves high fidelity to private data while providing strong privacy protection.2026-04-08T18:26:34Z23 pages, 7 figures, 18 tablesQian MaSarah Rajtmajerhttp://arxiv.org/abs/2604.07484v1ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training2026-04-08T18:25:02ZGenerative reward models (GRMs) have emerged as a promising approach for aligning Large Language Models (LLMs) with human preferences by offering greater representational capacity and flexibility than traditional scalar reward models. However, GRMs face two major challenges: reliance on costly human-annotated data restricts scalability, and self-training approaches often suffer from instability and vulnerability to reward hacking. To address these issues, we propose ConsistRM, a self-training framework that enables effective and stable GRM training without human annotations. ConsistRM incorporates the Consistency-Aware Answer Reward, which produces reliable pseudo-labels with temporal consistency, thereby providing more stable model optimization. Moreover, the Consistency-Aware Critique Reward is introduced to assess semantic consistency across multiple critiques and allocates fine-grained and differentiated rewards. Experiments on five benchmark datasets across four base models demonstrate that ConsistRM outperforms vanilla Reinforcement Fine-Tuning (RFT) by an average of 1.5%. Further analysis shows that ConsistRM enhances output consistency and mitigates position bias caused by input order, highlighting the effectiveness of consistency-aware rewards in improving GRMs.2026-04-08T18:25:02ZPreprintYu LiangLiangxin LiuLongzheng WangYan WangYueyang ZhangLong XiaZhiyuan SunDaiting Shihttp://arxiv.org/abs/2406.13086v2NaviSplit: Dynamic Multi-Branch Split DNNs for Efficient Distributed Autonomous Navigation2026-04-08T18:23:59ZLightweight autonomous unmanned aerial vehicles (UAV) are emerging as a central component of a broad range of applications. However, autonomous navigation necessitates the implementation of perception algorithms, often deep neural networks (DNN), that process the input of sensor observations, such as that from cameras and LiDARs, for control logic. The complexity of such algorithms clashes with the severe constraints of these devices in terms of computing power, energy, memory, and execution time. In this paper, we propose NaviSplit, the first instance of a lightweight navigation framework embedding a distributed and dynamic multi-branched neural model. At its core is a DNN split at a compression point, resulting in two model parts: (1) the head model, that is executed at the vehicle, which partially processes and compacts perception from sensors; and (2) the tail model, that is executed at an interconnected compute-capable device, which processes the remainder of the compacted perception and infers navigation commands. Different from prior work, the NaviSplit framework includes a neural gate that dynamically selects a specific head model to minimize channel usage while efficiently supporting the navigation network. In our implementation, the perception model extracts a 2D depth map from a monocular RGB image captured by the drone using the robust simulator Microsoft AirSim. Our results demonstrate that the NaviSplit depth model achieves an extraction accuracy of 72-81% while transmitting an extremely small amount of data (1.2-18 KB) to the edge server. When using the neural gate, as utilized by NaviSplit, we obtain a slightly higher navigation accuracy as compared to a larger static network by 0.3% while significantly reducing the data rate by 95%. To the best of our knowledge, this is the first exemplar of dynamic multi-branched model based on split DNNs for autonomous navigation.2024-06-18T22:25:09Z6 pages, 3 figures2024 IEEE 25th International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM)Timothy K JohnsenIan HarshbargerZixia XiaMarco Levorato10.1109/WoWMoM60985.2024.00041http://arxiv.org/abs/2504.17069v2Distilling Specialized Orders for Visual Generation2026-04-08T18:21:06ZAutoregressive (AR) image generators are becoming increasingly popular due to their ability to produce high-quality images and their scalability. Typical AR models are locked onto a specific generation order, often a raster-scan from top-left to bottom-right; this prohibits multi-task flexibility (inpainting, editing, outpainting) without retraining. Any-order AR models address this by learning to generate under arbitrary patch orderings, but at the cost of increased complexity and lower performance. In this paper, we present Ordered Autoregressive (OAR) generation, a self-distillation pipeline that first trains an any-order AR model, then extracts specialized generation orders from the model's own confidence scores, and fine-tunes on these orders. This achieves two goals: 1) improved generation quality by redirecting capacity from learning all $N!$ orderings to a single specialized path, and 2) preserved flexibility of any-order models. On ImageNet $256\times 256$, OAR improves FID from 2.39 to 2.17 over the any-order baseline, with consistent gains on Fashion Products and CelebA-HQ. OAR supports zero-shot inpainting and outpainting without retraining, and human evaluation shows 64% preference over the baseline. The pipeline requires only lightweight fine-tuning on a pretrained any-order model, with no architectural changes or additional annotations.2025-04-23T19:33:58ZRishav PramanikAmin SghaierMasih AminbeidokhtiJuan A. RodriguezAntoine PouponDavid VazquezChristopher PalZhaozheng YinMarco Pedersolihttp://arxiv.org/abs/2604.07480v1Active Reward Machine Inference From Raw State Trajectories2026-04-08T18:19:55ZReward machines are automaton-like structures that capture the memory required to accomplish a multi-stage task. When combined with reinforcement learning or optimal control methods, they can be used to synthesize robot policies to achieve such tasks. However, specifying a reward machine by hand, including a labeling function capturing high-level features that the decisions are based on, can be a daunting task. This paper deals with the problem of learning reward machines directly from raw state and policy information. As opposed to existing works, we assume no access to observations of rewards, labels, or machine nodes, and show what trajectory data is sufficient for learning the reward machine in this information-scarce regime. We then extend the result to an active learning setting where we incrementally query trajectory extensions to improve data (and indirectly computational) efficiency. Results are demonstrated with several grid world examples.2026-04-08T18:19:55ZMohamad Louai ShehabAntoine AspeelNecmiye Ozayhttp://arxiv.org/abs/2407.01563v2NaviSlim: Adaptive Context-Aware Navigation and Sensing via Dynamic Slimmable Networks2026-04-08T18:19:36ZSmall-scale autonomous airborne vehicles, such as micro-drones, are expected to be a central component of a broad spectrum of applications ranging from exploration to surveillance and delivery. This class of vehicles is characterized by severe constraints in computing power and energy reservoir, which impairs their ability to support the complex state-of-the-art neural models needed for autonomous operations. The main contribution of this paper is a new class of neural navigation models -- NaviSlim -- capable of adapting the amount of resources spent on computing and sensing in response to the current context (i.e., difficulty of the environment, current trajectory, and navigation goals). Specifically, NaviSlim is designed as a gated slimmable neural network architecture that, different from existing slimmable networks, can dynamically select a slimming factor to autonomously scale model complexity, which consequently optimizes execution time and energy consumption. Moreover, different from existing sensor fusion approaches, NaviSlim can dynamically select power levels of onboard sensors to autonomously reduce power and time spent during sensor acquisition, without the need to switch between different neural networks. By means of extensive training and testing on the robust simulation environment Microsoft AirSim, we evaluate our NaviSlim models on scenarios with varying difficulty and a test set that showed a dynamic reduced model complexity on average between 57-92%, and between 61-80% sensor utilization, as compared to static neural networks designed to match computing and sensing of that required by the most difficult scenario.2024-05-16T01:18:52Z13 pages, 12 figures2024 IEEE/ACM Ninth International Conference on Internet-of-Things Design and Implementation (IoTDI)Tim JohnsenMarco Levorato10.1109/IoTDI61053.2024.00014http://arxiv.org/abs/2604.07473v1When Switching Algorithms Helps: A Theoretical Study of Online Algorithm Selection2026-04-08T18:11:36ZOnline algorithm selection (OAS) aims to adapt the optimization process to changes in the fitness landscape and is expected to outperform any single algorithm from a given portfolio. Although this expectation is supported by numerous empirical studies, there are currently no theoretical results proving that OAS can yield asymptotic speedups (apart from some artificial examples for hyper-heuristics). Moreover, theory-based guidelines for when and how to switch between algorithms are largely missing.
In this paper, we present the first theoretical example in which switching between two algorithms -- the $(1+λ)$ EA and the $(1+(λ,λ))$ GA -- solves the OneMax problem asymptotically faster than either algorithm used in isolation. We show that an appropriate choice of population sizes for the two algorithms allows the optimum to be reached in $O(n\log\log n)$ expected time, faster than the $Θ(n\sqrt{\frac{\log n \log\log\log n}{\log\log n}})$ runtime of the best of these two algorithms with optimally tuned parameters.
We first establish this bound under an idealized switching rule that changes from the $(1+λ)$ to the $(1+(λ,λ))$ GA at the optimal time. We then propose a realistic switching strategy that achieves the same performance. Our analysis combines fixed-start and fixed-target perspectives, illustrating how different algorithms dominate at different stages of the optimization process. This approach offers a promising path toward a deeper theoretical understanding of OAS.2026-04-08T18:11:36ZDenis AntipovCarola Doerrhttp://arxiv.org/abs/2604.07468v1M-ArtAgent: Evidence-Based Multimodal Agent for Implicit Art Influence Discovery2026-04-08T18:07:56ZImplicit artistic influence, although visually plausible, is often undocumented and thus poses a historically constrained attribution problem: resemblance is necessary but not sufficient evidence. Most prior systems reduce influence discovery to embedding similarity or label-driven graph completion, while recent multimodal large language models (LLMs) remain vulnerable to temporal inconsistency and unverified attributions. This paper introduces M-ArtAgent, an evidence-based multimodal agent that reframes implicit influence discovery as probabilistic adjudication. It follows a four-phase protocol consisting of Investigation, Corroboration, Falsification, and Verdict governed by a Reasoning and Acting (ReAct)-style controller that assembles verifiable evidence chains from images and biographies, enforces art-historical axioms, and subjects each hypothesis to adversarial falsification via a prompt-isolated critic. Two theory-grounded operators, StyleComparator for Wolfflin formal analysis and ConceptRetriever for ICONCLASS-based iconographic grounding, ensure that intermediate claims are formally auditable. On the balanced WikiArt Influence Benchmark-100 (WIB-100) of 100 artists and 2,000 directed pairs, M-ArtAgent achieves 83.7% positive-class F1, 0.666 Matthews correlation coefficient (MCC), and 0.910 area under the receiver operating characteristic curve (ROC-AUC), with leakage-control and robustness checks confirming that the gains persist when explicit influence phrases are masked. By coupling multimodal perception with domain-constrained falsification, M-ArtAgent demonstrates that implicit influence analysis benefits from historically grounded adjudication rather than pattern matching alone.2026-04-08T18:07:56Z13 pages, 5 figures, submitted to IEEE AccessHanyi LiuZhonghao JiuMinghao WangYuhang XieHeran Yanghttp://arxiv.org/abs/2604.07457v1CMP: Robust Whole-Body Tracking for Loco-Manipulation via Competence Manifold Projection2026-04-08T18:00:39ZWhile decoupled control schemes for legged mobile manipulators have shown robustness, learning holistic whole-body control policies for tracking global end-effector poses remains fragile against Out-of-Distribution (OOD) inputs induced by sensor noise or infeasible user commands. To improve robustness against these perturbations without sacrificing task performance and continuity, we propose Competence Manifold Projection (CMP). Specifically, we utilize a Frame-Wise Safety Scheme that transforms the infinite-horizon safety constraint into a computationally efficient single-step manifold inclusion. To instantiate this competence manifold, we employ a Lower-Bounded Safety Estimator that distinguishes unmastered intentions from the training distribution. We then introduce an Isomorphic Latent Space (ILS) that aligns manifold geometry with safety probability, enabling efficient O(1) seamless defense against arbitrary OOD intents. Experiments demonstrate that CMP achieves up to a 10-fold survival rate improvement in typical OOD scenarios where baselines suffer catastrophic failure, incurring under 10% tracking degradation. Notably, the system exhibits emergent ``best-effort'' generalization behaviors to progressively accomplish OOD goals by adhering to the competence boundaries. Result videos are available at: https://shepherd1226.github.io/CMP.2026-04-08T18:00:39Z14 pages, 8 figures. Under review. Project page and videos: https://shepherd1226.github.io/CMPZiyang ChengHaoyu WeiHang YinXiuwei XuBingyao YuJie ZhouJiwen Luhttp://arxiv.org/abs/2604.07455v1Munkres' General Topology Autoformalized in Isabelle/HOL2026-04-08T18:00:36ZWe describe an experiment in LLM-assisted autoformalization that produced over 85,000 lines of Isabelle/HOL code covering all 39 sections of Munkres' Topology (general topology, Chapters 2--8), from topological spaces through dimension theory. The LLM-based coding agents (initially ChatGPT 5.2 and then Claude Opus 4.6) used 24 active days for that. The formalization is complete: all 806 formal results are fully proved with zero sorry's. Proved results include the Tychonoff theorem, the Baire category theorem, the Nagata--Smirnov and Smirnov metrization theorems, the Stone--Čech compactification, Ascoli's theorem, the space-filling curve, and others. The methodology is based on a "sorry-first" declarative proof workflow combined with bulk use of sledgehammer - two of Isabelle major strengths. This leads to relatively fast autoformalization progress. We analyze the resulting formalization in detail, analyze the human--LLM interaction patterns from the session log, and briefly compare with related autoformalization efforts in Megalodon, HOL Light, and Naproche. The results indicate that LLM-assisted formalization of standard mathematical textbooks in Isabelle/HOL is quite feasible, cheap and fast, even if some human supervision is useful.2026-04-08T18:00:36ZDustin BryantJonathan Julián Huerta y MuniveCezary KaliszykJosef Urbanhttp://arxiv.org/abs/2604.07349v1Toward a Tractability Frontier for Exact Relevance Certification2026-04-08T17:59:47ZExact relevance certification asks which coordinates are necessary to determine the optimal action in a coordinate-structured decision problem. The tractable families treated here admit a finite primitive basis, but optimizer-quotient realizability is maximal, so quotient shape alone cannot characterize the frontier.
We prove a meta-impossibility theorem for efficiently checkable structural predicates invariant under the theorem-forced closure laws of exact certification. Structural convergence with zero-distortion summaries, quotient entropy bounds, and support-counting arguments explains why those closure laws are canonical. We establish the theorem by constructing same-orbit disagreements for four obstruction families, namely dominant-pair concentration, margin masking, ghost-action concentration, and additive/statewise offset concentration, using action-independent, pair-targeted affine witnesses. Consequently no correct tractability classifier on a closure-closed domain yields an exact characterization over these families. Here closure-orbit agreement is forced by correctness rather than assumed as an invariance axiom. The result therefore applies to correct classifiers on closure-closed domains, not only to classifiers presented through a designated admissibility package.2026-04-08T17:59:47Z23 pages. 2 tables. Lean 4 formalization available https://doi.org/10.5281/zenodo.19457896Tristan Simashttp://arxiv.org/abs/2604.07348v1MoRight: Motion Control Done Right2026-04-08T17:59:22ZGenerating motion-controlled videos--where user-specified actions drive physically plausible scene dynamics under freely chosen viewpoints--demands two capabilities: (1) disentangled motion control, allowing users to separately control the object motion and adjust camera viewpoint; and (2) motion causality, ensuring that user-driven actions trigger coherent reactions from other objects rather than merely displacing pixels. Existing methods fall short on both fronts: they entangle camera and object motion into a single tracking signal and treat motion as kinematic displacement without modeling causal relationships between object motion. We introduce MoRight, a unified framework that addresses both limitations through disentangled motion modeling. Object motion is specified in a canonical static-view and transferred to an arbitrary target camera viewpoint via temporal cross-view attention, enabling disentangled camera and object control. We further decompose motion into active (user-driven) and passive (consequence) components, training the model to learn motion causality from data. At inference, users can either supply active motion and MoRight predicts consequences (forward reasoning), or specify desired passive outcomes and MoRight recovers plausible driving actions (inverse reasoning), all while freely adjusting the camera viewpoint. Experiments on three benchmarks demonstrate state-of-the-art performance in generation quality, motion controllability, and interaction awareness.2026-04-08T17:59:22ZProject Page: https://research.nvidia.com/labs/sil/projects/morightShaowei LiuXuanchi RenTianchang ShenHuan LingSaurabh GuptaShenlong WangSanja FidlerJun Gaohttp://arxiv.org/abs/2604.01204v2Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction2026-04-08T17:59:08ZPrimitive-based methods such as 3D Gaussian Splatting have recently become the state-of-the-art for novel-view synthesis and related reconstruction tasks. Compared to neural fields, these representations are more flexible, adaptive, and scale better to large scenes. However, the limited expressivity of individual primitives makes modeling high-frequency detail challenging. We introduce Neural Harmonic Textures, a neural representation approach that anchors latent feature vectors on a virtual scaffold surrounding each primitive. These features are interpolated within the primitive at ray intersection points. Inspired by Fourier analysis, we apply periodic activations to the interpolated features, turning alpha blending into a weighted sum of harmonic components. The resulting signal is then decoded in a single deferred pass using a small neural network, significantly reducing computational cost. Neural Harmonic Textures yield state-of-the-art results in real-time novel view synthesis while bridging the gap between primitive- and neural-field-based reconstruction. Our method integrates seamlessly into existing primitive-based pipelines such as 3DGUT, Triangle Splatting, and 2DGS. We further demonstrate its generality with applications to 2D image fitting and semantic reconstruction.2026-04-01T17:48:22ZJorge CondorNicolas Moenne-LoccozMerlin Nimier-DavidPiotr DidykZan GojcicQi Wuhttp://arxiv.org/abs/2604.07429v1GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents2026-04-08T17:49:03ZTowards an embodied generalist for real-world interaction, Multimodal Large Language Model (MLLM) agents still suffer from challenging latency, sparse feedback, and irreversible mistakes. Video games offer an ideal testbed with rich visual observations and closed-loop interaction, demanding fine-grained perception, long-horizon planning, and precise control. However, systematically evaluating these capabilities is currently hindered by heterogeneous action interfaces and heuristic verification. To this end, we introduce GameWorld, a benchmark designed for standardized and verifiable evaluation of MLLMs as generalist game agents in browser environments. Two game agent interfaces are studied: (i) computer-use agents that directly emit keyboard and mouse controls, and (ii) generalist multimodal agents that act in a semantic action space via deterministic Semantic Action Parsing. GameWorld contains 34 diverse games and 170 tasks, each paired with state-verifiable metrics for outcome-based evaluation. The results across 18 model-interface pairs suggest that even the best performing agent is far from achieving human capabilities on video games. Extensive experiments of repeated full-benchmark reruns demonstrate the robustness of the benchmark, while further studies on real-time interaction, context-memory sensitivity, and action validity expose more challenges ahead for game agents. Together, by offering a standardized, verifiable, and reproducible evaluation framework, GameWorld lays a robust foundation for advancing research on multimodal game agents and beyond. The project page is at https://gameworld-bench.github.io.2026-04-08T17:49:03Z23 pages, 8 figuresMingyu OuyangSiyuan HuKevin Qinghong LinHwee Tou NgMike Zheng Shouhttp://arxiv.org/abs/2604.07331v1RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild2026-04-08T17:48:46ZScaling up robot learning will likely require human data containing rich and long-horizon interactions in the wild. Existing approaches for collecting such data trade off portability, robustness to occlusion, and global consistency. We introduce RoSHI, a hybrid wearable that fuses low-cost sparse IMUs with the Project Aria glasses to estimate the full 3D pose and body shape of the wearer in a metric global coordinate frame from egocentric perception. This system is motivated by the complementarity of the two sensors: IMUs provide robustness to occlusions and high-speed motions, while egocentric SLAM anchors long-horizon motion and stabilizes upper body pose. We collect a dataset of agile activities to evaluate RoSHI. On this dataset, we generally outperform other egocentric baselines and perform comparably to a state-of-the-art exocentric baseline (SAM3D). Finally, we demonstrate that the motion data recorded from our system are suitable for real-world humanoid policy learning. For videos, data and more, visit the project webpage: https://roshi-mocap.github.io/2026-04-08T17:48:46Z8 pages, 4 figures. *Equal contribution by first three authors. Project webpage: https://roshi-mocap.github.io/Wenjing Margaret MaoJefferson NgLuyang HuDaniel GehrigAntonio Loquercio