https://arxiv.org/api/7CfUGvUzaOU17RlY911EYERpyvI 2026-06-22T12:29:44Z 54510 315 15 http://arxiv.org/abs/2606.16413v1 An Augmented Reality Brain-Robot Interface for Generalist Robot Arm Manipulation 2026-06-15T08:50:43Z

The integration of augmented reality (AR) and EEG-based brain-computer interfaces (BCIs) offers a promising path for enabling intuitive control of robots for assistive purposes. However, existing AR brain-robot interface (BRI) systems are often constrained to task-specific structures, limiting their utility in real-world environments. We present an AR BRI designed for generalist robot arm manipulation that combines gaze-based object selection with motor imagery action control. Our system uses eye-tracking for intuitive object targeting and context-aware visual overlays ("Place" and "Use") to guide the user through tasks within a shared autonomy framework. We evaluated the interface through a feasibility study with 18 healthy participants performing three multi-step activities of daily living: drinking, using a drawer, and operating an oven. Our results demonstrate that this interaction paradigm enables effective sequential task execution and high user engagement, achieving a "Good" usability rating (SUS > 70). These findings support the feasibility of the proposed interaction paradigm for complex BCI-driven robotic assistance, and motivate future evaluation with the intended target population. Project website: https://ar-bri-manip.github.io/.

2026-06-15T08:50:43Z Accepted at the 2026 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) Shangkai Zhang Rousslan Fernand Julien Dossa Luca Nunziante Marina Di Vincenzo Kai Arulkumaran http://arxiv.org/abs/2606.16400v1 SemGeoNav:A Safety-Guided Visual Navigation Approach with Semantic Reasoning and Geometric Planning 2026-06-15T08:38:46Z

Learning-based visual navigation has enhanced semantic goal-reaching capabilities. However, due to their black-box nature, purely end-to-end models often lack explicit geometric constraints, leading to unpredictable and unreliable obstacle avoidance in open environments. Conversely, traditional geometric planners ensure safety but struggle with high-dimensional visual targets. To address these limitations, we propose SemGeoNav, a novel hierarchical visual navigation framework.It tightly integrates the high-level semantic reasoning of end-to-end models with the reliable local planning ability of geometry-based methods, achieving robust image-based navigation while significantly improving obstacle avoidance. Furthermore, we introduce a temporal trajectory smoothing mechanism to ensure continuous and stable robot motion. We evaluated SemGeoNav on a Unitree Go2 quadruped robot in real-world environments. The results demonstrate that SemGeoNav outperforms existing representative methods, including ViNT and NoMaD, achieving higher success rates and shorter navigation times.

2026-06-15T08:38:46Z The paper has been accepted by ICGNC 2026 Yu Liu Zongyang Chen Yan Guo Chao Liu Xianfei Pan http://arxiv.org/abs/2606.19383v1 3D Scene Graphs: Open Challenges and Future Directions 2026-06-15T08:14:08Z

3D Scene Graphs (3DSGs) have emerged as a powerful representation for spatial AI by combining geometric grounding with semantic and relational abstractions of the environment. Their expressiveness has made them relevant to a broad range of problems in robotics and computer vision, including manipulation, navigation, task planning, scene understanding, and many others. However, the field remains fragmented: different communities adopt distinct formulations, construction pipelines, and evaluation protocols, making it difficult to compare methods, identify common assumptions, and assess remaining challenges for robust real-world deployment. This survey provides a unified and critical review of 3DSGs, with particular emphasis on open challenges and future directions. We first formalize 3DSGs under a common definition and analyze the principal modeling choices that characterize existing formulations, including node and edge attributes, hierarchical structure, dynamic scene representations, and affordance-aware extensions. We then review how 3DSGs are built from raw sensory observations, discussing the most common terminologies, conventions, and techniques. Finally, we examine downstream applications and evaluation strategies, from intrinsic graph quality to task-level performance. To support the community, we also provide a dedicated website that organizes and extends the surveyed content, accessible at https://3dscenegraphs.com/.

2026-06-15T08:14:08Z Invited article for the Annual Review of Control, Robotics, and Autonomous Systems Volume 10 Dennis Rotondi Francesco Argenziano Sebastian Koch Nathan Hughes Martin Buechner Johanna Wald Lukas Rosenberger Schmid Daniele Nardi Abhinav Valada Liam Paull Federico Tombari Luca Carlone Kai O. Arras http://arxiv.org/abs/2606.16370v1 ART-Glove: Articulated Tactile Glove for Contact-Grounded Dexterous Interaction Capture 2026-06-15T08:07:34Z

We present ART-Glove, an articulated tactile glove designed to capture contact-grounded dexterous demonstrations while preserving human dexterity. ART-Glove makes hand-side contact geometry explicit with 16 rigid functional surfaces covering the fingers, thumb, and palm. Twenty-two anatomically aligned joints connect these surfaces and allow them to follow human hand motion during dexterous manipulation. Encoder-based sensing tracks surface motion, while dense piezoresistive tactile sensing records contact over the same surfaces. The complete system captures synchronized 22-DoF joint measurements and 2048-taxel tactile measurements at 120 Hz. We evaluate ART-Glove across experiments on motion freedom, joint sensing, tactile sensing, and contact-rich interaction capture, demonstrating its ability to preserve human dexterity while recording contact-grounded information that can support downstream dexterous robot learning.

2026-06-15T08:07:34Z Changyi Lin Ding Zhao http://arxiv.org/abs/2602.05608v3 HiCrowd: Hierarchical Crowd Flow Alignment for Dense Human Environments 2026-06-15T07:24:13Z

Navigating through dense human crowds remains a significant challenge for mobile robots. A key issue is the freezing robot problem, where the robot struggles to find safe motions and becomes stuck within the crowd. To address this, we propose HiCrowd, a hierarchical framework that integrates reinforcement learning (RL) with model predictive control (MPC). HiCrowd leverages surrounding pedestrian motion as guidance, enabling the robot to align with compatible crowd flows. A high-level RL policy generates a follow point to align the robot with a suitable pedestrian group, while a low-level MPC safely tracks this guidance with short horizon planning. The method combines long-term crowd aware decision making with safe short-term execution. We evaluate HiCrowd against reactive and learning-based baselines in offline setting (replaying recorded human trajectories) and online setting (human trajectories are updated to react to the robot in simulation). Experiments on a real-world dataset and a synthetic crowd dataset show that our method outperforms in navigation efficiency and safety, while reducing freezing behaviors. We further validate through real-world deployment in a public museum and Expo 2025 Osaka, where it navigates dense pedestrian flows without retraining, demonstrating robust and socially aware behavior. Our results suggest that leveraging human motion as guidance, rather than treating humans solely as dynamic obstacles, provides a powerful principle for safe and efficient robot navigation in crowds. Project code and demos are available at https://github.com/test-bai-cpu/HiCrowd.

2026-02-05T12:46:37Z 2026 IEEE International Conference on Robotics and Automation (ICRA) Yufei Zhu Shih-Min Yang Martin Magnusson Allan Wang http://arxiv.org/abs/2606.16313v1 Is Your Trajectory Displacement Safe in Long-tail? 2026-06-15T07:19:38Z

Long-tail scenarios remain a major bottleneck for autonomous driving evaluation, even as datasets grow by orders of magnitude. Existing evaluation pipelines are rarely human-aligned, safety-aware, verifiable, and explainable at the same time: closed-loop metrics often saturate among strong planners, while unstructured human ratings can be noisy without a carefully designed protocol. We formulate planning evaluation as additional-threat detection: given a planner trajectory and an expert reference, does the planner's displacement introduce new unsafe driving behavior? We propose FluidTest, an evaluation pipeline with three components: a pairwise WebUI protocol for reliable human annotation; a taxonomy of 32 semantic threats with evidence-grounded decision graphs; and a three-agent verification system with reflection for precision and auditability. Experiments on the WOD-E2E dataset show that FluidTest produces consistent labels among trained annotators and identifies additional threats in 65% of Poutine trajectories and 51% of RAP trajectories. These results show that state-of-the-art planners can still exhibit substantial safety-relevant failures despite high Rater Feedback Scores (RFS) and low Average Displacement Error (ADE). Additional details, guidance, and code are available at https://fluidtest.web.app.

2026-06-15T07:19:38Z 20 pages, 15 figures Qiao Sun Weicheng Zheng Yixin Huang Hang Zhao http://arxiv.org/abs/2606.16286v1 FlowMPC: Improving Flow Matching policies with World Models 2026-06-15T06:50:11Z

Flow Matching (FM) is a powerful approach for behavior cloning in multimodal action spaces [Jiang et al., 2025], but because it is not trained to directly maximize expected return, there is still room to improve how FM policies act at test time. This work investigates whether a learned world model can improve FM policies by enabling Model Predictive Path Integral (MPPI) planning over candidate action sequences proposed by the policy. Building on TD-MPC2 [Hansen et al., 2024], I introduce FlowMPC, a framework that combines an imitation-learned FM policy with a learned world model for test-time planning in ManiSkill manipulation tasks [Tao et al., 2025]. Across PickCube and PickSingleYCB, adding the world model improved performance over the FM policy alone, with especially clear gains in end-of-episode success. These results suggest that world-model-based planning can effectively complement flow-based imitation policies without modifying the FM training objective.

2026-06-15T06:50:11Z Chandon Hamel http://arxiv.org/abs/2602.13197v2 Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos 2026-06-15T06:47:24Z

The ability to learn manipulation skills by watching videos of humans has the potential to unlock a new source of highly scalable data for robot learning. Here, we tackle prehensile manipulation, in which tasks involve grasping an object before performing various post-grasp motions. Human videos offer strong signals for learning the post-grasp motions, but they are less useful for learning the prerequisite grasping behaviors, especially for robots without human-like hands. A promising way forward is to use a modular policy design, leveraging a dedicated grasp generator to produce stable grasps. However, arbitrary stable grasps are often not task-compatible, hindering the robot's ability to perform the desired downstream motion. To address this challenge, we present Perceive-Simulate-Imitate (PSI), a framework for training a modular manipulation policy using human video motion data processed by paired grasp-trajectory filtering in simulation. This simulation step extends the trajectory data with grasp suitability labels, which allows for supervised learning of task-oriented grasping capabilities. We show through real-world experiments that our framework can be used to learn precise manipulation skills efficiently without any robot data, resulting in significantly more robust performance than using a grasp generator naively.

2026-02-13T18:59:10Z Transactions on Machine Learning Research (TMLR) Albert J. Zhai Kuo-Hao Zeng Jiasen Lu Ali Farhadi Shenlong Wang Wei-Chiu Ma http://arxiv.org/abs/2601.19612v3 Safe Exploration via Policy Priors 2026-06-15T06:27:26Z

Safe exploration is a key requirement for reinforcement learning (RL) agents to learn and adapt online, beyond controlled (e.g. simulated) environments. In this work, we tackle this challenge by utilizing suboptimal yet conservative policies (e.g., obtained from offline data or simulators) as priors. Our approach, SOOPER, uses probabilistic dynamics models to optimistically explore, yet pessimistically fall back to the conservative policy prior if needed. We prove that SOOPER guarantees safety throughout learning, and establish convergence to an optimal policy by bounding its cumulative regret. Extensive experiments on key safe RL benchmarks and real-world hardware demonstrate that SOOPER is scalable, outperforms the state-of-the-art and validate our theoretical guarantees in practice.

2026-01-27T13:45:28Z Manuel Wendl Yarden As Manish Prajapat Anton Pollak Stelian Coros Andreas Krause http://arxiv.org/abs/2606.16272v1 TopoRetarget: Interaction-Preserving Retargeting for Dexterous Manipulation 2026-06-15T06:20:46Z

Human hand-object demonstrations provide dense reference motions for training dexterous manipulation reinforcement learning (RL) policies through reference tracking. However, to use such demonstrations for RL policy learning, retargeting must preserve hand pose and task-relevant hand-object contact structure. Otherwise, contact and feasibility artifacts can degrade downstream RL policy performance. We introduce TopoRetarget, an interaction-preserving retargeting framework that uses a single set of parameters across diverse retargeting conditions while maintaining task-relevant hand-object interaction and adapting human demonstrations to dexterous robot hands. The method constructs a sparse interaction graph over hand and object keypoints and optimizes distance-weighted Laplacian deformation with directional consistency, kinematic constraints, and penetration handling. Evaluations show that the generated references improve both interaction fidelity and policy learning: TopoRetarget achieves the best contact precision and alignment over all baselines on the ContactPose Dataset, improves Pen-Spin training success by 40.6 percentage points over the existing baseline methods, and enables zero-shot transfer to Wuji Hand hardware on cube reorientation and pen spinning.

2026-06-15T06:20:46Z Project page: https://toporetarget2026.github.io/TopoRetarget/ Jielin Wu Shenzhe Yao Guanqi He Xiaohan Liu Zhaoqing Zeng Xiangrui Jiang Han Yang Wentao Zhang Hang Zhao http://arxiv.org/abs/2606.13769v2 $μ_0$: A Scalable 3D Interaction-Trace World Model 2026-06-15T06:13:42Z

World models that capture how actions induce physical change enable scalable robot learning without reliance on embodiment-specific action labels. Pixel-space video models provide broad visual priors but expend model capacity on dense appearance reconstruction, while direct action models require embodiment-specific labels that hinder scalability. We present $μ_0$, a scalable world model based on 3D traces. Rather than predicting dense pixels or directly modeling actions, $μ_0$ forecasts smooth 3D trajectories for salient interaction points such as objects, tools, hands, and contact regions, yielding a compact, embodiment-agnostic motion interface. To enable training from diverse video sources, our TraceExtract system automatically extracts 3D supervision by selecting keypoints, constructing globally aligned traces, and associating motion segments with hierarchical language captions. This TraceExtract supervision pretrains $μ_0$ by combining a pretrained vision-language backbone with a modular trace expert, which represents each query via B-spline control points and predicts future traces. Experiments show that $μ_0$ outperforms baselines in both 2D and 3D trace prediction, including trace prediction models and tokenized VLM methods. Because $μ_0$ is frozen and reusable, it can be paired with action experts for downstream robot embodiments. Despite action-free pretraining, the resulting trace-conditioned policies achieve performance competitive with VLA models pretrained with action supervision, such as $π_0$. These results establish 3D traces as a scalable and transferable representation for cross-embodiment manipulation.

2026-06-11T17:59:56Z Seungjae Lee Yoonkyo Jung Jusuk Lee Jonghun Shin Amir Hossein Shahidzadeh Yao-Chih Lee H. Jin Kim Jia-Bin Huang Furong Huang http://arxiv.org/abs/2508.08706v3 OmniVTLA: Vision-Tactile-Language-Action Models with Semantic-Aligned Tactile Sensing 2026-06-15T06:01:10Z

Recent vision-language-action (VLA) models build upon vision-language foundations, and have achieved promising results and exhibit the possibility of task generalization in robot manipulation. However, due to the heterogeneity of tactile sensors and the difficulty of acquiring tactile data, current VLA models significantly overlook the importance of tactile perception and fail in contact-rich tasks. To address this issue, this paper proposes OmniVTLA, a novel architecture involving tactile sensing. Specifically, our contributions are threefold. First, our OmniVTLA features a dual-path tactile encoder framework. This framework enhances tactile perception across diverse vision-based and force-based tactile sensors by using a pretrained vision transformer (ViT) and a semantically-aligned tactile ViT (SA-ViT). Second, we introduce ObjTac, a comprehensive force-based tactile dataset capturing textual, visual, and tactile information for 56 objects across 10 categories. With 135K tri-modal samples, ObjTac supplements existing visuo-tactile datasets. Third, leveraging this dataset, we train a semantically-aligned tactile encoder to learn a unified tactile representation, serving as a better initialization for OmniVTLA. Real-world experiments demonstrate substantial improvements over state-of-the-art VLA baselines, achieving 96.9% success rates with grippers, (21.9% higher over baseline) and 100% success rates with dexterous hands (6.2% higher over baseline) in pick-and-place tasks. Besides, OmniVTLA significantly reduces task completion time and generates smoother trajectories through tactile sensing compared to existing VLA. Our ObjTac dataset can be found at https://readerek.github.io/Objtac.github.io

2025-08-12T07:53:36Z Accepted by IEEE Robotics and Automation Letters (RA-L). ObjTac dataset: https://readerek.github.io/Objtac.github.io Zhengxue Cheng Yiqian Zhang Anni Tang Keyu Wang Wenkang Zhang Haoyu Li Hengdi Zhang Li Song http://arxiv.org/abs/2512.13090v2 Multi-Robot Motion Planning from Vision and Language using Heat-Inspired Diffusion 2026-06-15T05:58:16Z

Diffusion models have recently emerged as powerful tools for robot motion planning by capturing the multi-modal distribution of feasible trajectories. However, their extension to multi-robot settings with flexible, language-conditioned task specifications remains limited. Furthermore, current diffusion-based approaches incur high computational cost during inference and struggle with generalization because they require explicit construction of environment representations and lack mechanisms for reasoning about geometric reachability. To address these limitations, we present Language-conditioned Heat-inspired Diffusion (LHD), an end-to-end vision-based framework that generates language-conditioned, collision-free trajectories. LHD integrates semantic priors from CLIP, a vision-language model (VLM), with a collision-avoiding diffusion kernel serving as a physical inductive bias that enables the planner to interpret language commands strictly within the reachable workspace. This naturally handles out-of-distribution (OOD) scenarios -- in terms of reachability -- by guiding robots toward accessible alternatives that match the semantic intent, while eliminating the need for explicit obstacle information at inference time. Extensive evaluations on diverse real-world-inspired maps, along with real-robot experiments, show that LHD consistently outperforms prior diffusion-based planners in success rate, while reducing planning latency. Project page is available at: https://jebeom.github.io/lhd_project_page/

2025-12-15T08:43:13Z 8 pages, 6 figures, accepted by IEEE Robotics and Automation Letters (RA-L) IEEE Robotics and Automation Letters, vol. 11, no. 6, pp. 7118-7125, June 2026 Jebeom Chae Junwoo Chang Seungho Yeom Yujin Kim Jongeun Choi http://arxiv.org/abs/2606.16232v1 PolyMerge: Compressing 3D Gaussian Splats with Polytope Coverings for Provably Safe Resource-Constrained Navigation 2026-06-15T05:30:14Z

Obstacle avoidance is essential for safe navigation and motion planning. Recent radiance field reconstruction methods enable object detection and modeling with high fidelity, but remain too memory- and compute-intensive for on-board perception-based path planning. To address these limitations, we propose PolyMerge to convert a large, photorealistic 3D Gaussian Splatting (3DGS) model of a scene into a lightweight representation of convex polytopes whose union provably over-approximates all obstacles in the original 3DGS model. PolyMerge tunes the polytope count to trade off conservativeness and compute cost, and integrates with control barrier functions (CBFs) to plan collision-free paths. We showcase PolyMerge in simulation and hardware experiments on a Crazyflie drone, which uses PolyMerge to compute and follow safe trajectories in real time under severe onboard compute constraints, outperforming baselines in speed while guaranteeing safety. For our code and videos, visit https://athlon76.github.io/PolyMerge-website/.

2026-06-15T05:30:14Z IEEE Robotics and Automation Letters, vol. 11, no. 7, pp. 8512-8519, July 2026 Jihoon Hong Chih-Yuan Chiu Sara Fridovich-Keil Glen Chou 10.1109/LRA.2026.3692083 http://arxiv.org/abs/2604.03386v2 Activity-Dependent Plasticity in Morphogenetically-Grown Recurrent Networks 2026-06-15T05:29:32Z

Developmental approaches to neural architecture search grow functional networks from compact genomes through self-organisation, but the resulting networks operate with fixed post-growth weights. We characterise Hebbian and anti-Hebbian plasticity across 50,000 morphogenetically grown recurrent controllers (5M+ configurations on CartPole and Acrobot), then test whether co-evolutionary experiments -- where plasticity parameters are encoded in the genome and evolved alongside the developmental architecture -- recover these patterns independently. Our characterisation reveals that (1) anti-Hebbian plasticity significantly outperforms Hebbian for competent networks (Cohen's d = 0.53-0.64), (2) regret (fraction of oracle improvement lost under the best fixed setting) reaches 52-100%, and (3) plasticity's role shifts from fine-tuning to genuine adaptation under non-stationarity. Co-evolution independently discovers these patterns: on CartPole, 70% of runs evolve anti-Hebbian plasticity (p = 0.043); on Acrobot, evolution finds near-zero eta with mixed signs -- exactly matching the characterisation. A random-RNN control shows that anti-Hebbian dominance is generic to small recurrent networks, but the degree of topology-dependence is developmental-specific: regret is 2-6x higher for morphogenetically grown networks than for random graphs with matched topology statistics.

2026-04-03T18:35:13Z 8 pages, 6 figures. Camera-ready version; accepted at GECCO 2026 Companion (EvoSelf workshop) Sergii Medvid Andrii Valenia Mykola Glybovets 10.1145/3795101.3814700