https://arxiv.org/api/c06sdAUIflswRME3Zt28RJsZrBY 2026-06-14T01:06:34Z 54141 195 15 http://arxiv.org/abs/2606.10832v1 GUIDE: Goal-Initialized Directional Understanding for End-to-End Visual Navigation 2026-06-09T13:19:30Z

Learning-based visual navigation for legged robots typically relies on continuous goal updates from hierarchical state estimation to provide a persistent directional reference. This reliance incurs additional sensory and computational overhead and deviates from fully end-to-end mobile autonomy. Furthermore, under partial observability, policies are prone to learn myopic behaviors, easily becoming trapped in dead ends and complex structural layouts. To address these limitations, we investigate a goal-initialized navigation setting, where the target is provided only once at the beginning of an episode, requiring the robot to operate based on intrinsic spatial memory without subsequent goal updates from external modules. In this work, we propose GUIDE, a fully end-to-end reinforcement learning framework designed to cultivate internal directional awareness. Specifically, GUIDE incorporates a spatial anchor predictor that leverages multi-frequency proprioceptive history to extract egomotion representations, thereby maintaining a persistent long-horizon spatial context for navigation. Concurrently, it utilizes raw depth streams to perceive local environmental geometry. We evaluate the proposed framework across both simulation and real-world scenarios on a quadruped robot. Experiments show that GUIDE learns reliable egomotion and directional awareness, enabling a fully end-to-end deployed policy to safely navigate through dense clutter and structured mazes without subsequent goal guidance or prior maps.

2026-06-09T13:19:30Z https://guide-navigation.github.io/ Liang Wang Jin Jin KanZhong Yao YiBin Wu Fangqiang Ding Jin Wang Jun Wu Zhe Sun Qiuguo Zhu http://arxiv.org/abs/2203.03018v3 RAPTOR: Rapid Aerial Pickup and Transport of Objects by Robots 2026-06-09T13:11:32Z

Rapid aerial grasping through robots can lead to many applications that utilize fast and dynamic picking and placing of objects. Rigid grippers traditionally used in aerial manipulators require high precision and specific object geometries for successful grasping. We propose RAPTOR, a quadcopter platform combined with a custom Fin Ray gripper to enable more flexible grasping of objects with different geometries, leveraging the properties of soft materials to increase the contact surface between the gripper and the objects. To reduce the communication latency, we present a new lightweight middleware solution based on Fast DDS (Data Distribution Service) as an alternative to ROS (Robot Operating System). We show that RAPTOR achieves an average of 83% grasping efficacy in a real-world setting for four different object geometries while moving at an average velocity of 1 m/s during grasping. In a high-velocity setting, RAPTOR supports up to four times the payload compared to previous works. Our results highlight the potential of aerial drones in automated warehouses and other manipulation applications where speed, swiftness, and robustness are essential while operating in hard-to-reach places.

2022-03-06T18:05:35Z 7 pages, 10 figures, accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022. Video: https://youtu.be/KHkBlBABsC8 Project page: https://srl-ethz.github.io/RAPTOR Aurel Appius Erik Bauer Marc Blöchlinger Aashi Kalra Robin Oberson Arman Raayatsanati Pascal Strauch Sarath Suresh Marco von Salis Robert K. Katzschmann http://arxiv.org/abs/2606.10818v1 IMPACT: Learning Internal-Model Predictive Control for Forceful Robotic Manipulation 2026-06-09T13:00:56Z

Real-world robotic manipulation tasks often involve forceful interactions with the environment, such as using tools of varying weights, transporting objects with different masses, and performing contact-rich tasks like table wiping. Previous learning-based approaches typically employ imitation learning policies that output target end-effector poses tracked by low-level impedance controllers. In these systems, forceful interactions are either implicitly realized through steady-state tracking errors or explicitly commanded using wrist force/torque or tactile sensors. However, implicit approaches generalize poorly across object weights, while explicit approaches require specialized hardware and increase system complexity. In this work, we propose IMPACT, a framework that decouples these forceful tasks into task-planning and internal-model-based predictive control. Extensive simulation and real-world experiments demonstrate that the proposed framework achieves higher success rates and improved generalization to unseen object weights, as well as better safety and energy efficiency.

2026-06-09T13:00:56Z Project website: https://gao-jiawei.com/IMPACT/ Jiawei Gao Chaoqi Liu Peilin Wu Haonan Chen Yilun Du http://arxiv.org/abs/2606.10808v1 Bridging Semantics and Physical Execution: A Neuro-Symbolic Framework for Multi-Pair Robotic Assembly 2026-06-09T12:53:25Z

Multi-pair robotic assembly in unstructured environments faces spatial interference and contact uncertainties. Existing paradigms fail to bridge cognitive decision-making and physical execution, as they either encounter state-space explosion and knowledge bottlenecks or suffer from logical hallucinations and topological conflicts. We propose an end-to-end neuro-symbolic framework that solves the challenge hierarchically: generating optimal subgraphs for each pair, decoupling generality from edge cases, and then resolving cross-pair interferences. Given an eye-on-hand RGB-D assembly scene, the framework extracts semantic instance identity and state while quantifying the scene for divergence calculation. For each pair, optimal subgraph is generated via LLM using barely basic actions to mitigate hallucinations. Supportive actions for edge cases are reasoned and inserted with a lightweight discriminator. Driven by the divergence between the quantified baseline and current scene, it is easily extensible at low cost. Augmented subgraphs are topologically coordinated into global sequences while preserving internal behavioral coherence. Dynamic behavior trees embedding atomic skills close the force-aware execution loop. Offline evaluation on 100 real-world scenes achieves 97.00% global executability, outperforming classical and state-of-the-art planners. Real-robot deployment on a UR3 arm attains 90% success rate with 0.5 mm tolerance under strong interference, demonstrating a unified and verifiable solution for complex autonomous assembly.

2026-06-09T12:53:25Z Corresponding author: Aiguo Song (a.g.song@seu.edu.cn) Xinyi Li Aiguo Song Linhu Wei Huijun Li http://arxiv.org/abs/2606.10771v1 On-sky demonstration of reinforcement learning for adaptive optics control 2026-06-09T12:26:06Z

Reinforcement learning (RL)-based algorithms have recently emerged as a promising approach for adaptive optics (AO) control. In simulations and laboratory experiments, they have demonstrated robustness to real-world effects such as photon and detector noise, misregistration, vibrations, and rapid variations in seeing conditions. However, their performance has not yet been validated on sky. We report the first on-sky demonstration of a reinforcement learning controller for adaptive optics, named Policy Optimization for AO (PO4AO). We further analyze its on-sky behavior and identify directions for improving the algorithm and its implementation.PO4AO was implemented and deployed on the Papyrus adaptive optics system installed at the Coudé focus of the 1.52 m telescope (T152) at the OHP. A Python-based implementation was interfaced with the existing real-time controller (DAO RTC) via shared-memory buffers. The performance of PO4AO was compared to that of a standard integrator controller over several nights, covering a range of flux levels and atmospheric conditions. PO4AO consistently outperformed the standard integrator in all tested configurations. The controller successfully learned and compensated for vibration patterns and demonstrated strong robustness to measurement noise. Once tuned for Papyrus, PO4AO operated in a turnkey fashion, using a single set of hyperparameters across varying observing conditions and science targets. These performance gains were achieved despite a non-optimized Python implementation introducing approximately $750\,μ\text{s}$ of additional latency, along with control jitter and occasional frame drops. When properly implemented and optimized, PO4AO constitutes a robust and high-performance turnkey controller for single-conjugate adaptive optics systems, paving the way for broader adoption of reinforcement learning strategies in on-sky AO operations.

2026-06-09T12:26:06Z 11 pages, 12 figures accepted by A&A Jalo Nousiainen Vincent Chambouleyron Benoit Neichel Sylvain Cetre Jean-Francois Sauvage Angelie Alagao Markus Kasper Jonathan Dray Romain Fetick Byron Engler http://arxiv.org/abs/2606.10746v1 ros2probe: Non-intrusive, Kernel-selective Observability for Robot Operating System 2 Middleware 2026-06-09T11:55:25Z

Robot Operating System 2 (ROS 2), the de facto standard middleware framework for robots, runs each robot as a graph of nodes communicating over the Data Distribution Service (DDS), a publish/subscribe substrate. Observing this inter-node communication in real time is essential to robot development, yet it has a price. A tool can receive data only by joining the DDS domain as a subscriber that discovery has matched to the publisher, so observing folds the tool into the system it measures and perturbs it. We define this protocol-inherent perturbation as the observer's probe effect. It inflates the discovery plane, adds deserialization cost on the observer, makes the loss it reports diverge from what the subscriber actually received, and near saturation displaces the subscriber's messages. The only escape, capturing all wire traffic passively, discards ROS 2 message semantics and scales with total traffic, not what is observed. We present ros2probe, a non-intrusive observation framework that removes the probe effect. It reconstructs the full ROS 2 communication state from the domain's discovery packets at no bandwidth cost, then drives an in-kernel filter restricted to the topics the user asks for, lifting only those packets at minimal cost and observing what the real subscriber receives. Its interfaces and recordings match the standard ROS 2 tools. Across three hardware platforms (laptop, Jetson, and Raspberry Pi), two DDS implementations, and seven robot-operation workloads, ros2probe holds the discovery graph within 0.5% of an unobserved system, whereas domain-joining tools inflate discovery up to 2.6$\times$ and drop 38.5% of the subscriber's messages at saturation while ros2probe drops none. It reports loss with a recall of 1.0, cuts observer CPU and memory by up to 7$\times$ and 28$\times$, and stays practical on the embedded robots where existing tools overload the system.

2026-06-09T11:55:25Z 13 pages, 8 figures, 7 tables Jisang Yu Sanghoon Lee Yeonwoo Choi Kyung-Joon Park http://arxiv.org/abs/2606.11278v1 Model-based Optimization of Anguilliform Swimming Gaits for Soft Robotic Applications 2026-06-09T11:54:44Z

In this paper, we introduce the Soft Lamprey-Inspired Dual Environment Robot (SLIDER) and a proper modeling and optimization procedure employed to design the robot. We represent the primary fluid environment actions - inertial effects, vortex forces, and viscous dissipation - using Lighthill's theory for large-amplitude elongated bodies. For structural design parameters such as internal pressure, tail size, and body stiffness, a fast, geometrically and materially nonlinear model is developed and validated. The fluid-structure interaction equations are solved implicitly with an efficient second-order box method. A pneumatic manifold robotic system is employed to actuate SLIDER in a quiescent water tank environment, allowing cross-comparison of computational and experimental results. We find that low-frequency swimming is dominated by resistant environmental forces, whereas higher-frequency swimming is primarily affected by inertial fluid forces. Using our efficient model alongside a genetic algorithm, we co-optimize a swimming control pattern and caudal fin design (subject to SLIDER's climbing morphology) to achieve a tethered swimming speed of 21.7 +/- 0.4 cm/s (0.59 Bl/s). Furthermore, we investigate the optimization procedure for a multimodal robot performing both swimming and climbing tasks.

2026-06-09T11:54:44Z Brian Van Stratum James Gallentine Caleb Rucker Eric Barth Jonathan E. Clark Kourosh Shoele http://arxiv.org/abs/2606.10743v1 Hand-centric Human-to-Robot Trajectory Transfer from Video Demonstrations via Open-World Contact Localization 2026-06-09T11:53:29Z

Learning from human video demonstrations remains challenging due to noisy hand-object interactions, unseen objects with partial observation, and cross-embodiment discrepancy. To address these challenges, we present \textit{HOWTransfer} (\emph{H}and-\emph{O}bject \emph{O}pen-\emph{W}orld Transfer), a hand-centric framework that distills human demonstrations into contact-aware, taxonomy-informed, and diverse robotic trajectories. Instead of relying on object-specific descriptions, vision-language queries, or explicit object-state tracking, \emph{HOWTransfer} recovers temporally consistent 3D hand motion and localizes temporal contact intervals by reasoning over observed hand-object interaction cues. The localized contact onsets are then used to retarget human grasp intent into multi-modal parallel-jaw grasp hypotheses, which are propagated along the recovered wrist trajectory to generate robot-executable motions. Finally, a trajectory editing stage refines contact alignment and produces diverse executable variants from a single demonstration. Experiments across diverse manipulation tasks show that \emph{HOWTransfer} enables accurate contact localization and high-quality robot motion retargeting with $86\%$ success, which is preferred over teleoperated trajectories in a blinded preference study.

2026-06-09T11:53:29Z Yitian Shi Di Wen Zhengqi Han Zicheng Guo Yu Hu Edgar Welte Kunyu Peng Rainer Stiefelhagen Rania Rayyes http://arxiv.org/abs/2606.10733v1 Pushing the Performance Limits in Autonomous Racing: Continuous Stability-Aware Adaptive Velocity Planning in Formula Student Driverless 2026-06-09T11:40:30Z

In autonomous racing, especially in competitions such as Formula Student Driverless, precise planning of the target velocity of a race car is crucial for competitive lap times and stable driving behavior. Especially at high speeds, Velocity Planning (VP) is a significant challenge as it has to be performed in real time, taking into account track layouts, environmental influences, mechanical tolerances, and the resulting control inaccuracies. In this paper, we present a novel approach to VP that dynamically adapts to such changing conditions. Instead of estimating the physical Tire-Road Friction Coefficient (TRFC), a continuous scaling factor is inferred indirectly from vehicle stability. This factor not only reflects the effective tire-road interaction but also captures effects of control inaccuracies. From this, we generate a continuous friction map, which serves as a robust, adaptive basis for computing the optimal target speed, accounting for both vehicle and environmental limits. Our proposed approach was evaluated on a real Formula Student race car, showing a lap time improvement of 35 % over ten laps and an average increase of 8 % compared to a non-adaptive approach.

2026-06-09T11:40:30Z Accepted as a conference paper in IEEE Intelligent Vehicles Symposium (IV) 2026, Detroit, MI, United States Tamara Bergerhoff Sebastian Baader Pascal Meißner Frank Deinzer http://arxiv.org/abs/2606.10732v1 Vehicle Prediction Model for Enhanced MPC Path Tracking in Formula Student Driverless 2026-06-09T11:40:30Z

Autonomous race cars, such as in Formula Student Driverless, operate close to their physical handling limits. The resulting highly nonlinear vehicle behavior increases the path tracking complexity, especially on narrow tracks. Model Predictive Control (MPC) is commonly used to address this issue, a method whose performance is closely tied to the accuracy of the underlying prediction model. This paper presents a novel, real-time capable prediction model for autonomous race cars that adjusts to changing conditions by combining information from past runs and the current driving situation. Our model is divided into three consecutive submodels: a nominal Kinematic Bicycle Model, an offline Bayesian Linear Regression (BLR) model, and an online Sparse Gaussian Process Regression (SGPR) model. The proposed approach enables efficient integration of all available data without significantly increasing computational cost, ensuring high prediction accuracy and a quantitative uncertainty assessment right from the start of the run. Compared to existing approaches, an improvement in prediction accuracy of up to 57% was achieved. Further, we successfully demonstrated the practical applicability of the model within an MPC-based path tracking controller on a real Formula Student race car.

2026-06-09T11:40:30Z Accepted as a conference paper in IEEE Intelligent Vehicles Symposium (IV) 2026, Detroit, MI, United States Sebastian Baader Tamara Bergerhoff Pascal Meißner Frank Deinzer http://arxiv.org/abs/2510.14836v3 QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models 2026-06-09T10:47:44Z

Spatial perception and reasoning are crucial for Vision-Language-Action (VLA) models to accomplish fine-grained manipulation tasks. However, existing approaches often lack the ability to understand and reason over the essential 3D structures necessary for precise control. To address this limitation, we propose QDepth-VLA, a general framework that augments VLA models with an auxiliary depth prediction task. A dedicated depth expert is designed to predict quantized latent tokens of depth maps obtained from a VQ-VAE encoder, enabling the model to learn depth-aware representations that capture critical geometric cues. Experimental results on the simulation benchmarks and real-world tasks demonstrate that QDepth-VLA yields strong spatial reasoning and competitive performance on manipulation tasks.

2025-10-16T16:11:18Z Yixuan Li Yuhui Chen Mingcai Zhou Haoran Li Zhengtao Zhang Dongbin Zhao http://arxiv.org/abs/2606.10688v1 Self-Supervised Relevance Modelling in Autonomous Driving via Counterfactual Analysis 2026-06-09T10:47:14Z

Autonomous driving relies on computationally intensive perception pipelines to continuously detect and track objects in the surrounding environment. While some objects are key to plan safe and effective maneuvers, others may not be relevant and have no impact on the autonomous vehicle's driving decisions. Focusing on relevant objects allows a more efficient usage of available computational resources, reduces processing latencies, and limits the downstream propagation of perception noise. In this work, we propose a novel self-supervised approach based on counterfactual analysis to develop a relevance model - an AI-based tool that quantifies the relevance of objects for an autonomous vehicle. To demonstrate the potential of the proposed approach, we train a relevance model on a synthetic causal dataset generated in a selected urban scenario. Results show that the relevance model is able to accurately estimate the objects' relevance with millisecond-level latency, enabling real-time relevance estimation also in high-density scenarios. We also show that the relevance model can be used to build relevance heatmaps that offer valuable insights into the autonomous vehicle's driving policy and can be used to proactively inform perception and planning tasks. We openly release both the relevance model and the causal dataset.

2026-06-09T10:47:14Z Luca Lusvarghi Javier Gozalvez Pablo Urbano Hidalgo http://arxiv.org/abs/2407.05886v3 Rod models in continuum and soft robot control: a review 2026-06-09T09:56:08Z

Continuum and soft robots can transform automation tasks requiring compliant interaction in constrained or unstructured environments, including healthcare, agriculture, marine, and space applications. However, their complex mechanics introduce significant challenges in modeling and control. Low-dimensional continuum mechanical models, such as rod theories, effectively capture the large deformations of slender bodies in contact-rich scenarios while balancing accuracy and computational efficiency. This paper presents a vertical survey of rod models for continuum and soft robots, spanning their mathematical foundations, robot modeling, and control applications. We review the main rod theories adopted in soft robotics and introduce a deformation-based classification of rod models for continuum and soft robots. Furthermore, we survey recent model-based and learning-based control strategies leveraging rod models, highlighting their role in manipulation and physical interaction tasks. Finally, we discuss advantages, limitations, research gaps, and emerging directions of rod-based approaches. This paper aims to serve as a reference for developing models and control strategies for continuum and soft robots.

2024-07-08T12:46:19Z Carlo Alessi Camilla Agabiti Daniele Caradonna Cecilia Laschi Federico Renda Egidio Falotico http://arxiv.org/abs/2606.04746v2 CADENCE: Predicting Realized MAPF Execution Time Beyond Sum of Costs 2026-06-09T09:33:04Z

Multi-Agent Path Finding (MAPF) algorithms are increasingly used to plan motion for robot teams in industrial warehouses and robotic shared workspaces, but standard MAPF algorithm evaluation metrics, such as Sum of Costs (SoC), makespan, and planner runtime, can obscure how planner choices translate into realistic execution performance. We present CADENCE (Coordination and Action-Driven Estimation for Networked Continuous Execution), a hardware study of this evaluation gap on a fixed 7 by 7 workcell with seven differential drive robots, asking which features available before execution can best predict final wall-clock completion time. We compare SoC, total planned travel cost, primitive motion burden (how much basic motion the plan requires, such as makespan, turns, consecutive moves, and start-stop transitions), and interaction aware coordination structure (how much inter-robot coordination the plan induces, such as dependency links, interacting robot pairs, dependency depth, and crowding exposure). To test this, we generate 120 plans across 15 scenarios -- 5 Empty, 5 Medium Random, and 5 Bottleneck and execute each plan four times, yielding a 480 trial hardware corpus. Using both a scenario-held -- out ridge model and a trial-level mixed-effects model, we find that SoC alone is informative but incomplete, while primitive motion burden gives the strongest improvement, reducing held out error by about 48.6%-59.8% in MAE and 44.2%-61.4% in RMSE relative to SoC-only models. Interaction-aware coordination features add smaller, less uniform gains, most clearly in the mixed-effects analysis. Across both models and uncertainty checks, primitive motion burden is the most reliable additional signal beyond SoC, suggesting that much of the execution time gap is already visible in the offline plan before any robot starts moving.

2026-06-03T11:28:09Z 7 pages, 4 figures, 3 tables and this paper was accepted at Multi-Agent Robotic Systems: Real-World Collaboration and Interaction a workshop at the international conference of robotics and automation (ICRA 2026) Abhishek S Badrikanath Praharaj Sreeram MV http://arxiv.org/abs/2505.01458v2 A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI 2026-06-09T09:31:25Z

Navigation and manipulation are core capabilities in Embodied AI, but training agents to perform them directly in the real world is costly, time-consuming, and unsafe. Therefore, sim-to-real transfer has emerged as a key approach, yet the sim-to-real gap persists. This survey examines how physics simulators address this gap by analyzing properties that have received limited attention in prior surveys. We also analyze their features for navigation and manipulation tasks, as well as their hardware requirements. Additionally, we offer a resource with benchmark datasets, metrics, simulation platforms, and methods to help researchers select suitable tools while accounting for hardware constraints.

2025-05-01T09:22:23Z Under Review Lik Hang Kenny Wong Xueyang Kang Kaixin Bai Jianwei Zhang