https://arxiv.org/api/6TVi04dPixL0Tlm0OP4nGOylhNQ2026-03-24T11:33:10Z510374515http://arxiv.org/abs/2603.21669v1PRM-as-a-Judge: A Dense Evaluation Paradigm for Fine-Grained Robotic Auditing2026-03-23T07:48:42ZCurrent robotic evaluation is still largely dominated by binary success rates, which collapse rich execution processes into a single outcome and obscure critical qualities such as progress, efficiency, and stability. To address this limitation, we propose PRM-as-a-Judge, a dense evaluation paradigm that leverages Process Reward Models (PRMs) to audit policy execution directly from trajectory videos by estimating task progress from observation sequences. Central to this paradigm is the OPD (Outcome-Process-Diagnosis) metric system, which explicitly formalizes execution quality via a task-aligned progress potential. We characterize dense robotic evaluation through two axiomatic properties: macro-consistency, which requires additive and path-consistent aggregation, and micro-resolution, which requires sensitivity to fine-grained physical evolution. Under this formulation, potential-based PRM judges provide a natural instantiation of dense evaluation, with macro-consistency following directly from the induced scalar potential. We empirically validate the micro-resolution property using RoboPulse, a diagnostic benchmark specifically designed for probing micro-scale progress discrimination, where several trajectory-trained PRM judges outperform discriminative similarity-based methods and general-purpose foundation-model judges. Finally, leveraging PRM-as-a-Judge and the OPD metric system, we conduct a structured audit of mainstream policy paradigms across long-horizon tasks, revealing behavioral signatures and failure modes that are invisible to outcome-only metrics.2026-03-23T07:48:42ZYuheng JiYuyang LiuHuajie TanXuchuan HuangFanding HuangYijie XuCheng ChiYuting ZhaoHuaihai LyuPeterson CoMingyu CaoQiongyu ZhangZhe LiEnshen ZhouPengwei WangZhongyuan WangShanghang ZhangXiaolong Zhenghttp://arxiv.org/abs/2509.24313v2Learning to Sample: Reinforcement Learning-Guided Sampling for Autonomous Vehicle Motion Planning2026-03-23T07:46:46ZSampling-based motion planning is a well-established approach in autonomous driving, valued for its modularity and analytical tractability. In complex urban scenarios, however, uniform or heuristic sampling often produces many infeasible or irrelevant trajectories. We address this limitation with a hybrid framework that learns where to sample while keeping trajectory generation and evaluation fully analytical and verifiable. A reinforcement learning (RL) agent guides the sampling process toward regions of the action space likely to yield feasible trajectories, while evaluation and final selection remains governed by deterministic feasibility checks and cost functions. We couple the RL sampler with a world model (WM) based on a decodable deep set encoder, enabling both variable numbers of traffic participants and reconstructable latent representations. The approach is evaluated in the CommonRoad (CR) simulation environment and compared against uniform-sampling baselines, showing up to 99% fewer required samples and a runtime reduction of up to 84% while maintaining planning quality in terms of success and collision-free rates. These improvements lead to faster, more reliable decision-making for autonomous vehicles in urban environments.2025-09-29T05:51:14Z8 pages, submitted to the IEEE for possible publicationKorbinian MollerRoland StroopMattia PiccininiAlexander LangmannJohannes Betzhttp://arxiv.org/abs/2603.21635v1RTD-RAX: Fast, Safe Trajectory Planning for Systems under Unknown Disturbances2026-03-23T06:59:13ZReachability-based Trajectory Design (RTD) is a provably safe, real-time trajectory planning framework that combines offline reachable-set computation with online trajectory optimization. However, standard RTD implementations suffer from two key limitations: conservatism induced by worst-case reachable-set overapproximations, and an inability to account for real-time disturbances during execution. This paper presents RTD-RAX, a runtime-assurance extension of RTD that utilizes a non-conservative RTD formulation to rapidly generate goal-directed candidate trajectories, and utilizes mixed monotone reachability for fast, disturbance-aware online safety certification. When proposed trajectories fail safety certification under real-time uncertainty, a repair procedure finds nearby safe trajectories that preserve progress toward the goal while guaranteeing safety under real-time disturbances.2026-03-23T06:59:13ZEvanns Morales-CuadradoLong Kiu ChungShreyas KousikSamuel Cooganhttp://arxiv.org/abs/2509.16963v2A Tactile-based Interactive Motion Planner for Robots in Unknown Cluttered Environments2026-03-23T06:52:21ZIn unknown cluttered environments with densely stacked objects, the free-motion space is extremely barren, posing significant challenges to motion planners. Collision-free planning methods often suffer from catastrophic failures due to unexpected collisions and motion obstructions. To address this issue, this paper proposes an interactive motion planning framework (I-MP), based on a perception-motion loop. This framework empowers robots to autonomously model and reason about contact models, which in turn enables safe expansion of the free-motion space. Specifically, the robot utilizes multimodal tactile perception to acquire stimulus-response signal pairs. This enables real-time identification of objects' mechanical properties and the subsequent construction of contact models. These models are integrated as computational constraints into a reactive planner. Based on fixed-point theorems, the planner computes the spatial state toward the target in real time, thus avoiding the computational burden associated with extrapolating on high-dimensional interaction models. Furthermore, high-dimensional interaction features are linearly superposed in Cartesian space in the form of energy, and the controller achieves trajectory tracking by solving the energy gradient from the current state to the planned state. The experimental results showed that at cruising speeds ranging from 0.01 to 0.07 $m/s$, the robot's initial contact force with objects remained stable at 1.0 +- 0.7 N. In the cabinet scenario test where collision-free trajectories were unavailable, I-MP expanded the free motion space by 37.5 % through active interaction, successfully completing the environmental exploration task.2025-09-21T07:50:40ZChengjin WangYanmin ZhouZheng YanFeng LuanRunjie ShenHongrui SangZhipeng WangBin Hehttp://arxiv.org/abs/2602.19107v2A User-driven Design Framework for Robotaxi2026-03-23T06:46:09ZRobotaxis are emerging as a promising form of urban mobility, but removing human drivers fundamentally reshapes passenger-vehicle interaction and raises new design challenges. To inform robotaxi design based on real-world experience, we conducted 18 semi-structured interviews and autoethnographic ride experiences to examine users' perceptions, experiences, and expectations for robotaxi design. We found that users valued benefits such as increased agency and consistent driving. However, they also encountered challenges such as limited flexibility, insufficient transparency, and emergency handling concerns. Notably, users perceived robotaxis not merely as a mode of transportation, but as autonomous, semi-private transitional spaces, which made users feel less socially intrusive to engage in personal activities. Safety perceptions were polarized: some felt anxiety about reduced control, while others viewed robotaxis as safer than humans due to their cautious, law-abiding nature. Based on the findings, we propose a user-driven design framework spanning hailing, pick-up, traveling, and drop-off phases to support trustworthy, transparent, and accountable robotaxi design.2026-02-22T09:33:18ZYue DengChangyang Hehttp://arxiv.org/abs/2603.07499v2Inverse-dynamics observer design for a linear single-track vehicle model with distributed tire dynamics2026-03-23T06:25:27ZAccurate estimation of the vehicle's sideslip angle and tire forces is essential for enhancing safety and handling performances in unknown driving scenarios. To this end, the present paper proposes an innovative observer that combines a linear single-track model with a distributed representation of the tires and information collected from standard sensors. In particular, by adopting a comprehensive representation of the tires in terms of hyperbolic partial differential equations (PDEs), the proposed estimation strategy exploits dynamical inversion to reconstruct the lumped and distributed vehicle states solely from yaw rate and lateral acceleration measurements. Simulation results demonstrate the effectiveness of the observer in estimating the sideslip angle and tire forces even in the presence of noise and model uncertainties.2026-03-08T07:05:16Z6 pages, 5 figures. Accepted at ECC 2026Luigi RomanoOle Morten AamoJan ÅslundErik Friskhttp://arxiv.org/abs/2603.21580v1Conformal Koopman for Embedded Nonlinear Control with Statistical Robustness: Theory and Real-World Validation2026-03-23T05:04:32ZWe propose a fully data-driven, Koopman-based framework for statistically robust control of discrete-time nonlinear systems with linear embeddings. Establishing a connection between the Koopman operator and contraction theory, it offers distribution-free probabilistic bounds on the state tracking error under Koopman modeling uncertainty. Conformal prediction is employed here to rigorously derive a bound on the state-dependent modeling uncertainty throughout the trajectory, ensuring safety and robustness without assuming a specific error prediction structure or distribution. Unlike prior approaches that merely combine conformal prediction with Koopman-based control in an open-loop setting, our method establishes a closed-loop control architecture with formal guarantees that explicitly account for both forward and inverse modeling errors. Also, by expressing the tracking error bound in terms of the control parameters and the modeling errors, our framework offers a quantitative means to formally enhance the performance of arbitrary Koopman-based control. We validate our method both in numerical simulations with the Dubins car and in real-world experiments with a highly nonlinear flapping-wing drone. The results demonstrate that our method indeed provides formal safety guarantees while maintaining accurate tracking performance under Koopman modeling uncertainty.2026-03-23T05:04:32Z8 pages, 6 figures. Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA). The final published version will be available via IEEE XploreKoki HiranoHiroyasu Tsukamotohttp://arxiv.org/abs/2603.21566v1CataractSAM-2: A Domain-Adapted Model for Anterior Segment Surgery Segmentation and Scalable Ground-Truth Annotation2026-03-23T04:40:35ZWe present CataractSAM-2, a domain-adapted extension of Meta's Segment Anything Model 2, designed for real-time semantic segmentation of cataract ophthalmic surgery videos with high accuracy. Positioned at the intersection of computer vision and medical robotics, CataractSAM-2 enables precise intraoperative perception crucial for robotic-assisted and computer-guided surgical systems. Furthermore, to alleviate the burden of manual labeling, we introduce an interactive annotation framework that combines sparse prompts with video-based mask propagation. This tool significantly reduces annotation time and facilitates the scalable creation of high-quality ground-truth masks, accelerating dataset development for ocular anterior segment surgeries. We also demonstrate the model's strong zero-shot generalization to glaucoma trabeculectomy procedures, confirming its cross-procedural utility and potential for broader surgical applications. The trained model and annotation toolkit are released as open-source resources, establishing CataractSAM-2 as a foundation for expanding anterior ophthalmic surgical datasets and advancing real-time AI-driven solutions in medical robotics, as well as surgical video understanding.2026-03-23T04:40:35ZMohammad EslamiDhanvinkumar GaneshkumarSaber KazeminasabMichael G. MorleyMichael V. BolandMichael M. LinJohn B. MillerDavid S. FriedmanNazlee ZebardastLucia SobrinTobias Elzehttp://arxiv.org/abs/2603.21545v1Auction-Based Task Allocation with Energy-Conscientious Trajectory Optimization for AMR Fleets2026-03-23T03:58:25ZThis paper presents a hierarchical two-stage framework for multi-robot task allocation and trajectory optimization in asymmetric task spaces: (1) a sequential auction allocates tasks using closed-form bid functions, and (2) each robot independently solves an optimal control problem for energy-minimal trajectories with a physics-based battery model, followed by a collision avoidance refinement step using pairwise proximity penalties. Event-triggered warm-start rescheduling with bounded trigger frequency handles robot faults, priority arrivals, and energy deviations. Across 505 scenarios with 2-20 robots and up to 100 tasks on three factory layouts, both energy- and distance-based auction variants achieve 11.8% average energy savings over nearest-task allocation, with rescheduling latency under 10 ms. The central finding is that bid-metric performance is regime-dependent: in uniform workspaces, distance bids outperform energy bids by 3.5% (p < 0.05, Wilcoxon) because a 15.7% closed-form approximation error degrades bid ranking accuracy to 87%; however, when workspace friction heterogeneity is sufficient (r < 0.85 energy-distance correlation), a zone-aware energy bid outperforms distance bids by 2-2.4%. These results provide practitioner guidance: use distance bids in near-uniform terrain and energy-aware bids when friction variation is significant.2026-03-23T03:58:25ZJiachen LiSoovadeep BakshiJian ChuShihao LiDongmei Chenhttp://arxiv.org/abs/2603.21523v1SafePilot: A Framework for Assuring LLM-enabled Cyber-Physical Systems2026-03-23T03:31:51ZLarge Language Models (LLMs), deep learning architectures with typically over 10 billion parameters, have recently begun to be integrated into various cyber-physical systems (CPS) such as robotics, industrial automation, and autopilot systems. The abstract knowledge and reasoning capabilities of LLMs are employed for tasks like planning and navigation. However, a significant challenge arises from the tendency of LLMs to produce "hallucinations" - outputs that are coherent yet factually incorrect or contextually unsuitable. This characteristic can lead to undesirable or unsafe actions in the CPS. Therefore, our research focuses on assuring the LLM-enabled CPS by enhancing their critical properties. We propose SafePilot, a novel hierarchical neuro-symbolic framework that provides end-to-end assurance for LLM-enabled CPS according to attribute-based and temporal specifications. Given a task and its specification, SafePilot first invokes a hierarchical planner with a discriminator that assesses task complexity. If the task is deemed manageable, it is passed directly to an LLM-based task planner with built-in verification. Otherwise, the hierarchical planner applies a divide-and-conquer strategy, decomposing the task into sub-tasks, each of which is individually planned and later merged into a final solution. The LLM-based task planner translates natural language constraints into formal specifications and verifies the LLM's output against them. If violations are detected, it identifies the flaw, adjusts the prompt accordingly, and re-invokes the LLM. This iterative process continues until a valid plan is produced or a predefined limit is reached. Our framework supports LLM-enabled CPS with both attribute-based and temporal constraints. Its effectiveness and adaptability are demonstrated through two illustrative case studies.2026-03-23T03:31:51Z12 pages, 8 figuresWeizhe XuMengyu LiuFanxin Konghttp://arxiv.org/abs/2603.21496v1A Framework for Closed-Loop Robotic Assembly, Alignment and Self-Recovery of Precision Optical Systems2026-03-23T02:36:34ZRobotic automation has transformed scientific workflows in domains such as chemistry and materials science, yet free-space optics, which is a high precision domain, remains largely manual. Optical systems impose strict spatial and angular tolerances, and their performance is governed by tightly coupled physical parameters, making generalizable automation particularly challenging. In this work, we present a robotics framework for the autonomous construction, alignment, and maintenance of precision optical systems. Our approach integrates hierarchical computer vision systems, optimization routines, and custom-built tools to achieve this functionality. As a representative demonstration, we perform the fully autonomous construction of a tabletop laser cavity from randomly distributed components. The system performs several tasks such as laser beam centering, spatial alignment of multiple beams, resonator alignment, laser mode selection, and self-recovery from induced misalignment and disturbances. By achieving closed-loop autonomy for highly sensitive optical systems, this work establishes a foundation for autonomous optical experiments for applications across technical domains.2026-03-23T02:36:34ZSeou ChoiSachin VaidyaCaio SilvaShiekh Zia UddinSajib Biswas ShuvoShrish ChoudharyMarin Soljačićhttp://arxiv.org/abs/2510.07028v2Efficient View Planning Guided by Previous-Session Reconstruction for Repeated Plant Monitoring2026-03-23T02:35:18ZRepeated plant monitoring is essential for tracking crop growth, and 3D reconstruction enables consistent comparison across monitoring sessions. However, rebuilding a 3D model from scratch in every session is costly and overlooks informative geometry already observed previously. We propose efficient view planning guided by a previous-session reconstruction, which reuses a 3D model from the previous session to improve active perception in the current session. Based on this previous-session reconstruction, our method replaces iterative next-best-view planning with one-shot view planning that selects an informative set of views and computes the globally shortest execution path connecting them. Experiments on real multi-session datasets, including public single-plant scans and a newly collected greenhouse crop-row dataset, show that our method achieves comparable or higher surface coverage with fewer executed views and shorter robot paths than iterative and one-shot baselines.2025-10-08T13:57:29ZSubmitted for reviewSicong PanLuca LobefaroMoein TaherkhaniXuying HuangRohit MenonCyrill StachnissMaren Bennewitzhttp://arxiv.org/abs/2603.21487v1GaussianSSC: Triplane-Guided Directional Gaussian Fields for 3D Semantic Completion2026-03-23T02:21:22ZWe present \emph{GaussianSSC}, a two-stage, grid-native and triplane-guided approach to semantic scene completion (SSC) that injects the benefits of Gaussians without replacing the voxel grid or maintaining a separate Gaussian set. We introduce \emph{Gaussian Anchoring}, a sub-pixel, Gaussian-weighted image aggregation over fused FPN features that tightens voxel--image alignment and improves monocular occupancy estimation. We further convert point-like voxel features into a learned per-voxel Gaussian field and refine triplane features via a triplane-aligned \emph{Gaussian--Triplane Refinement} module that combines \emph{local gathering} (target-centric) and \emph{global aggregation} (source-centric). This directional, anisotropic support captures surface tangency, scale, and occlusion-aware asymmetry while preserving the efficiency of triplane representations. On SemanticKITTI~\cite{behley2019semantickitti}, GaussianSSC improves Stage~1 occupancy by +1.0\% Recall, +2.0\% Precision, and +1.8\% IoU over state-of-the-art baselines, and improves Stage~2 semantic prediction by +1.8\% IoU and +0.8\% mIoU.2026-03-23T02:21:22ZRuiqi XianJing LiangHe YinXuewei QiDinesh Manochahttp://arxiv.org/abs/2509.16136v3Reward Evolution with Graph-of-Thoughts: A Bi-Level Language Model Framework for Reinforcement Learning2026-03-23T02:11:21ZDesigning effective reward functions remains a major challenge in reinforcement learning (RL), often requiring considerable human expertise and iterative refinement. Recent advances leverage Large Language Models (LLMs) for automated reward design, but these approaches are limited by hallucinations, reliance on human feedback, and challenges with handling complex, multi-step tasks. In this work, we introduce Reward Evolution with Graph-of-Thoughts (RE-GoT), a novel bi-level framework that enhances LLMs with structured graph-based reasoning and integrates Visual Language Models (VLMs) for automated rollout evaluation. RE-GoT first decomposes tasks into text-attributed graphs, enabling comprehensive analysis and reward function generation, and then iteratively refines rewards using visual feedback from VLMs without human intervention. Extensive experiments on 10 RoboGen and 4 ManiSkill2 tasks demonstrate that RE-GoT consistently outperforms existing LLM-based baselines. On RoboGen, our method improves average task success rates by 32.25%, with notable gains on complex multi-step tasks. On ManiSkill2, RE-GoT achieves an average success rate of 93.73% across four diverse manipulation tasks, significantly surpassing prior LLM-based approaches and even exceeding expert-designed rewards. Our results indicate that combining LLMs and VLMs with graph-of-thoughts reasoning provides a scalable and effective solution for autonomous reward evolution in RL.2025-09-19T16:35:27ZChangwei YaoXinzi LiuChen LiMarios Savvideshttp://arxiv.org/abs/2602.20323v3PhysMem: Self-Evolving Physical Memory for Robot Manipulation2026-03-23T00:23:00ZReliable object manipulation requires understanding physical properties that vary across objects and environments. Vision-language model (VLM) planners can reason about friction and stability in general terms; however, they often cannot predict how a specific ball will roll on a particular surface or which stone will provide a stable foundation without direct experience. We present PhysMem, a memory framework that enables VLM robot planners to learn physical principles from interaction at test time, without updating model parameters. The system records experiences, generates candidate hypotheses, and verifies them through targeted interaction before promoting validated knowledge to guide future decisions. A central design choice is verification before application: the system tests hypotheses against new observations rather than applying retrieved experience directly, reducing rigid reliance on prior experience when physical conditions change. We evaluate PhysMem on three real-world manipulation tasks and simulation benchmarks across four VLM backbones. On a controlled brick insertion task, principled abstraction achieves 76% success compared to 23% for direct experience retrieval, and real-world experiments show consistent improvement over 30-minute deployment sessions.2026-02-23T20:18:35ZHaoyang LiYang YouHao SuLeonidas Guibas