https://arxiv.org/api/La1Szm5/4m8FTP3qMeiuLfu7cAY 2026-06-21T10:20:28Z 12695 540 15 http://arxiv.org/abs/2508.11401v5 FACET: Teacher-Centred LLM-Based Multi-Agent Systems-Towards Personalized Educational Worksheets 2026-05-19T18:00:42Z

The increasing heterogeneity of student populations poses significant challenges for teachers, particularly in mathematics education, where cognitive, motivational, and emotional differences strongly influence learning outcomes. While AI-driven personalization tools have emerged, most remain performance-focused, offering limited support for teachers and neglecting broader pedagogical needs. This paper presents the FACET framework, a teacher-facing, large language model (LLM)-based multi-agent system designed to generate individualized classroom materials that integrate both cognitive and motivational dimensions of learner profiles. The framework comprises three specialized agents: (1) learner agents that simulate diverse profiles incorporating topic proficiency and intrinsic motivation, (2) a teacher agent that adapts instructional content according to didactical principles, and (3) an evaluator agent that provides automated quality assurance. We tested the system using authentic grade 8 mathematics curriculum content and evaluated its feasibility through a) automated agent-based assessment of output quality and b) exploratory feedback from K-12 in-service teachers. Results from ten internal evaluations highlighted high stability and alignment between generated materials and learner profiles, and teacher feedback particularly highlighted structure and suitability of tasks. The findings demonstrate the potential of multi-agent LLM architectures to provide scalable, context-aware personalization in heterogeneous classroom settings, and outline directions for extending the framework to richer learner profiles and real-world classroom trials.

2025-08-15T11:10:40Z Jana Gonnermann-Müller Jennifer Haase Konstantin Fackeldey Sebastian Pokutta http://arxiv.org/abs/2605.20312v1 Pramana: A Protocol-Layer Treatment of Claim Verification in Autonomous Agent Networks 2026-05-19T17:00:33Z

Autonomous agents deployed in regulated domains must produce a verification artifact per consequential output: a record an auditor can re-execute offline, capturing what was claimed, against what source, by whom, when, and how. Production verification today splits into two unstandardized halves. Probabilistic verdict patterns (self-consistency voting, reviewer LLM ensembles) produce judgments, not artifacts. Artifact-producing patterns (RAG, tool-augmented traces, generator-verifier loops) produce vendor-specific records no external auditor can reconstruct without bespoke integration. Pramana defines the missing wire format. Every consequential agent output is wrapped in a typed ClaimAttestation with one of four variants (measurement, inference, analogy, citation), each paired with a verify() operation against the recorded source. verify() is deterministic for MeasurementClaim and CitationClaim. For InferenceClaim and AnalogyClaim, determinism is conditional on the oracle (audit-replayable when LLM-backed). The four-way typology derives from classical Indian epistemology (pramana, valid means of knowledge). The lifecycle is specified in TLA+ and exhaustively verified under TLC across three symmetry-reduced models: 38,563 distinct reachable states, zero invariant violations. The Python reference implementation passes 84 tests. An A2A and MCP wire-extension manifest layers three deployment-grade invariants: reachability, SLA bound, and offline re-verifiability. An exploratory pilot (n=100, 2,275 reviewer calls) probes LLM-as-judge in code generation. The strongest observation is a 40-percentage-point raw FPR delta across corpora, consistent with reference-solution quality contributing significantly. The pilot does not validate Pramana on its own; the structural argument and formal verification do that.

2026-05-19T17:00:33Z 23 pages, 4 figures, 5 tables, 42 references Ravi Kiran Kadaboina 10.5281/zenodo.20283646 http://arxiv.org/abs/2601.20529v3 A Practical Framework of Key Performance Indicators for Multi-Robot Lunar and Planetary Field Tests 2026-05-19T15:08:26Z

Robotic prospecting for critical resources on the Moon, such as ilmenite, rare earth elements, and water ice, requires robust exploration methods given the diverse terrain and harsh environmental conditions. Although numerous analog field trials address these goals, comparing their results remains challenging because of differences in robot platforms and experimental setups. These missions typically assess performance using selected, scenario-specific engineering metrics that fail to establish a clear link between field performance and science-driven objectives. In this paper, we address this gap by deriving a structured framework of KPI from three realistic multi-robot lunar scenarios reflecting scientific objectives and operational constraints. Our framework emphasizes scenario-dependent priorities in efficiency, robustness, and precision, and is explicitly designed for practical applicability in field deployments. We validated the framework in a multi-robot field test and found it practical and easy to apply for efficiency- and robustness-related KPI, whereas precision-oriented KPI require reliable ground-truth data that is not always feasible to obtain in outdoor analog environments. Overall, we propose this framework as a common evaluation standard enabling consistent, goal-oriented comparison of multi-robot field trials and supporting systematic development of robotic systems for future planetary exploration.

2026-01-28T12:10:56Z Presented at ICRA 2026 Workshop on Multi-Agent Robotic Systems: Real-World Collaboration and Interaction Julia Richter David Oberacker Gabriela Ligeza Valentin T. Bickel Philip Arm William Talbot Marvin Grosse Besselmann Florian Kehl Tristan Schnell Hendrik Kolvenbach Rüdiger Dillmann Arne Roennau Marco Hutter http://arxiv.org/abs/2605.19954v1 Equilibria in Multiplayer Graph Games: An Algorithmic Study 2026-05-19T15:05:52Z

To verify the robustness of a program or protocol, it is common in the computer science community to rely on the theoretical framework of game theory. In particular, if one seeks to enforce a desired property, or specification, despite an unpredictable environment, a useful abstraction is to model the situation as a two-player zero-sum game. The goal is then to find a strategy for the system that guarantees the specification against any strategy of the environment. However, to model more complex situations, such as multiple systems with different objectives or an environment composed of various agents, the richer framework of multiplayer games must be considered. In this setting, a natural question is to identify equilibria, i.e., strategy profiles that are robust in the sense that no player has an incentive to deviate. The most well-known equilibrium concept is the Nash equilibrium, but several alternatives exist. We study five such notions and, for each of them, we provide complexity results for the constrained existence problem, which consists of deciding whether a given game contains an equilibrium that ensures each player a payoff within a specified interval.

2026-05-19T15:05:52Z Léonard Brice http://arxiv.org/abs/2603.13671v3 Grassroots Bonds as a Foundation for Market Liquidity 2026-05-19T14:57:55Z

Global cryptocurrencies are unbacked and have high transaction cost incurred by global consensus. In contrast, grassroots cryptocurrencies are backed by the goods and services of their issuers -- any person, natural or legal -- and have no transaction cost beyond operating a smartphone. Liquidity in grassroots cryptocurrencies arises from mutual credit via coin exchange among issuers. However, as grassroots coins are redeemable 1-for-1 against any other grassroots coin, the credit-forming exchange must also be 1-for-1, lest prompt redemption after exchange would leave the parties with undue profit or loss. Thus, grassroots coins are incongruent with liquidity through interest-bearing credit. Here we introduce grassroots bonds, which extend grassroots coins with a maturity date, reframing grassroots coins -- cash -- as mature grassroots bonds. Bond redemption generalises coin redemption, allowing the lending of liquid coins in exchange for interest-bearing future-maturity bonds. We show that digital social contracts -- voluntary agreements among persons, specified, fulfilled, and enforced digitally -- can express the full gamut of financial instruments as the voluntary swap of grassroots bonds, including loans, sale of debt, forward contracts, options, and escrow-based instruments, and that classical liquidity ratios are applicable just as well to grassroots bonds. Grassroots bonds may thus allow local digital economies to form and grow without initial capital or external credit, harnessing mutual trust within communities into liquidity. The formal specification presented here was implemented in GLP, a concurrent logic programming language running on Dart for smartphone deployment. The implementation is illustrated by a running multiagent village market scenario in GLP.

2026-03-14T00:44:25Z Ehud Shapiro http://arxiv.org/abs/2605.19915v1 LLM Agents Make Collective Belief Dynamics Programmable: Challenges and Research Directions 2026-05-19T14:39:54Z

Classical models of opinion dynamics assume human participants with bounded rationality and limited coordination. The rise of LLM-based agents introduces a qualitative shift: agents can now participate in online discussions at scale, maintain consistent persuasion strategies, and coordinate systematically. This paper argues that LLM agents make collective belief dynamics programmable, enabling deliberate steering of population-level beliefs. We term this emerging problem programmable collective belief control. Through controlled multi-agent simulations, we provide proof-of-concept evidence that coordinated AI agents can induce measurable belief shifts that stabilize within a few interaction rounds. We identify four structural properties (indistinguishability, persistence, contextuality, and configurability) that make detection and defense fundamentally difficult. Based on these findings, we outline a research agenda spanning theoretical foundations for adversarial belief dynamics, operational methods for system-level detection and intervention, and simulation infrastructure for scalable experimentation. Our goal is not to present a complete solution, but to articulate why this problem demands urgent attention and to provide a conceptual foundation for future work.

2026-05-19T14:39:54Z Xin He Junxi Shen Yuchen Mou David M. Bossens Caishun Chen Ivor W. Tsang Yew Soon Ong http://arxiv.org/abs/2509.15435v3 ORCA: An Agentic Reasoning Framework for Hallucination and Adversarial Robustness in Vision-Language Models 2026-05-19T14:30:01Z

Large Vision-Language Models (LVLMs) exhibit strong multimodal capabilities but remain vulnerable to hallucinations from intrinsic errors and adversarial attacks from external exploitations, limiting their reliability in real-world applications. We present ORCA, an agentic reasoning framework that improves the factual accuracy and adversarial robustness of pretrained LVLMs through inference-time structured inference reasoning with a suite of small vision models (less than 3B parameters). ORCA operates via an Observe-Reason-Critique-Act loop, querying multiple visual tools with evidential questions, validating cross-model inconsistencies, and refining predictions iteratively without access to model internals or retraining. ORCA also stores intermediate reasoning traces, which supports auditable decision-making. Though designed primarily to mitigate object-level hallucinations, ORCA also exhibits emergent adversarial robustness without requiring adversarial training or defense mechanisms. We evaluate ORCA across three settings: (1) clean images on hallucination benchmarks, (2) adversarially perturbed images without defense, and (3) adversarially perturbed images with defense applied. On the POPE hallucination benchmark, ORCA improves standalone LVLMs performance by +3.64% to +40.67% across different subsets. Under adversarial perturbations on POPE, ORCA achieves an average accuracy gain of +20.11% across LVLMs. When combined with defense techniques on adversarially perturbed AMBER images, ORCA further improves standalone LVLM performance, with gains ranging from +1.20% to +48.00% across metrics. These results demonstrate that ORCA offers a promising path toward building more reliable and robust multimodal systems.

2025-09-18T21:17:23Z Accepted at the ACM International Conference on Cloud and Big Data Computing (ICCBDC 2026) Chung-En Johnny Yu Brian Jalaian Nathaniel D. Bastian http://arxiv.org/abs/2605.19887v1 DAG-Based QoS-Aware Dynamic Task Placement for Networked Multi-Stage Control Pipelines 2026-05-19T14:17:02Z

Current Physical AI (PAI) relies heavily on closed-loop visual-servoing pipelines, whose perception and planning stages may become computationally intensive onboard due to complex models embedded on robots. In practice, offloading the perception task to on-site edges statically is inappropriate for latency-sensitive, precise industrial settings over a standardized industrial network. This emphasizes the importance of Control-Communication-Computing (3C) co-design in industrial automation: monolithic local execution saturates AI-accelerated machine and robot hardware, while static edge offloading exposes the control loop to network jitter. Existing adaptive task placement (ATP) controllers can partially address the gap by relocating a single pipeline stage on binary threshold rules, without a multi-stage model and an explicit cost on placement switching. In this Work-in-Progress (WiP) paper, we propose a directed acyclic graph (DAG) based quality-of-service (QoS)-aware dynamic task placement (DTP) framework for sensing-perception-planning-control pipelines in networked robotics. This pipeline is formalized as a DAG with task-level and node-level attributes for compute cost, communication delay, and feasible placement sets; over a small interpretable candidate set (fully local, static offload, hybrid), a window-based cost function combines tail end-to-end latency, deadline violation rate, hardware utilization, and a Hamming-distance switching penalty, and a DTP algorithm with hysteresis and a minimum dwell-time bounds placement chatter. Our WiP paper presents the theoretical framework, a structured qualitative analysis, and a two-phase simulation plus hardware-in-the-loop validation roadmap.

2026-05-19T14:17:02Z 4 pages, 1 figure, 1 algorithm, accepted as a Work-in-Progress (WiP) paper, on the 24th IEEE International Conference on Industrial Informatics (INDIN), 26-29 July, 2026, Melbourne, Australia Thien Tran Jonathan Kua Thuong Hoang Minh Tran Yuemin Ding Jiong Jin http://arxiv.org/abs/2605.27422v1 Speed-Weighted Adaptive Flocking for Sailing Swarms under Dynamic Environmental Forcing 2026-05-19T13:15:52Z

Collective behavior models, such as aggregation and flocking, usually assume self-propelled robots that can directly execute their desired speed and direction of motion without fundamental constraints. However, autonomous sailing robots violate this assumption. Their motion is shaped by wind-dependent propulsion, restricted headings, and spatially varying wind conditions. In particular, maneuverability is coupled to wind speed: in weak wind, sailboats may turn only slowly or not at all, whereas stronger wind enables faster turns. This introduces transient heterogeneity in speed and maneuverability across the flock. We focus on this fast-slow coordination problem in sailing robot flocks. To study this problem, we introduce SailSwarmSwIM, a reduced-order simulator for autonomous sailing robot swarms that captures wind-dependent speed and maneuverability, no-go zones, tacking behavior, and steady or gusty wind fields. To design our novel flocking technique, we start from the Couzin model and introduce a speed-weighted social interaction rule that accounts for each robot's transient motion constraints. A key result is that increasing the social influence of slower robots improves polarization and reduces close encounters. This effect arises from a balance between attraction to fast neighbors, which helps maintain movement, and cohesion around slow neighbors, which prevents the flock from fragmenting. Together, our simulator, SailSwarmSwIM, and the speed-weighted interaction rule provide a modeling framework for studying adaptive collective behavior in robotic fleets whose motion capabilities are continuously shaped by wind.

2026-05-19T13:15:52Z Submitted at 18th International Conference on the Simulation of Adaptive Behavior (SAB 2026) Pranav Kedia Aaron Gan Hannah J. Williams Andreagiovanni Reina Heiko Hamann http://arxiv.org/abs/2605.19748v1 Memory-Augmented Reinforcement Learning Agent for CAD Generation 2026-05-19T12:16:31Z

Automatic generation of computer-aided design (CAD) models is a core technology for enabling intelligence in advanced manufacturing. Existing generation methods based on large language models (LLMs) often fall short when handling complex CAD models characterized by long operation sequences, diverse operation types, and strong geometric constraints, primarily because reasoning chains break and effective error-correction mechanisms are lacking. To address this problem, this paper proposes a memory-augmented reinforcement learning framework for CAD generation agents. The framework encapsulates the underlying geometric kernel into a structured toolchain callable by the agent and builds a closed-loop mechanism of design intent understanding, global planning, execution, and multi-dimensional verification. It also designs a dual-track memory module consisting of a case library and a skill library, and proposes a dynamic utility retrieval algorithm. By introducing reinforcement learning into retrieval and policy optimization, the agent can effectively avoid retrieval traps in which examples are semantically similar but geometrically infeasible, enabling online self-correction and continual evolution without additional large-scale annotated data. Experiments show that the proposed method significantly improves both the success rate and geometric consistency on complex CAD model generation tasks.

2026-05-19T12:16:31Z 26 pages; multilingual submission: English version first, followed by Chinese version Yin Xiaolong Liu Yu Shen Jiahang Lu Xingyu Ni Jingzhe Fan Fengxiao Sang Fan http://arxiv.org/abs/2605.27419v1 APS: Bias-Controlled Adaptive Prototype Simulation for Population-Scale LLM Agents 2026-05-19T08:45:41Z

LLM-agent simulation offers a flexible computational tool for studying population response trajectories that depend on scenario events, memory, demographics, and evolving social context. However, full multi-round simulation scales linearly with both population size and horizon, requiring every agent to query the LLM at every round. We propose Adaptive Prototype Simulation (APS), a framework that reframes scalable LLM-based simulation as a recurrent oracle-allocation problem. APS retains the designated LLM as the online transition oracle while querying adaptive core prototypes, selected singleton-tail agents, and shadow-audit agents. Prototype responses induce local response surfaces for nearby agents, reducing online LLM calls without replacing the underlying transition model. To control approximation bias, shadow-audit residual correction estimates propagation residuals for aggregate correction and future budget allocation, while tail-protected singleton routing directly queries selected isolated, heterogeneous, or high-curvature regions that are vulnerable to smoothing. Theoretically, we treat APS as an estimator for full-scale high-precision individual social simulation and decompose its errors into prototype-coverage error, shadow-audit residual-correction error, local-propagation bias, and temporal context mismatch. Under the reported protocols, APS gives lower reference-aligned distributional discrepancy than scale-oriented and same-budget baselines while reducing online LLM calls, with ablations and compact robustness checks diagnosing the main bias-control mechanisms. In a 10M-agent, multi-round public-opinion simulation, APS achieves a 381.1-fold reduction over full simulation, with reference-aligned final-round JSD of 0.094 against the corresponding full-LLM reference.

2026-05-19T08:45:41Z 32 pages, 5 figures Quan Zheng Yan Gao Shaobin He Haoxiang Guan Yuanhe Tian Jie Feng Ming Wang Shuxin Zheng Zhen Liu http://arxiv.org/abs/2603.23722v2 Dual-Gated Epistemic Time-Dilation: Autonomous Compute Modulation in Asynchronous MARL 2026-05-19T05:50:46Z

While Multi-Agent Reinforcement Learning (MARL) algorithms achieve unprecedented successes across complex continuous domains, their standard deployment strictly adheres to a synchronous operational paradigm. Under this paradigm, agents are universally forced to execute deep neural network inferences at every micro-frame, regardless of immediate necessity. This dense throughput acts as a fundamental barrier to physical deployment on edge-devices where thermal and metabolic budgets are highly constrained. We propose Epistemic Time-Dilation MAPPO (ETD-MAPPO), augmented with a Dual-Gated Epistemic Trigger. Instead of depending on rigid frame-skipping (macro-actions), agents autonomously modulate their execution frequency by interpreting aleatoric uncertainty (via Shannon entropy of their policy) and epistemic uncertainty (via state-value divergence in a Twin-Critic architecture). To format this, we structure the environment as a Semi-Markov Decision Process (SMDP) and build the SMDP-Aligned Asynchronous Gradient Masking Critic to ensure proper credit assignment. Empirical findings demonstrate massive improvements (> 60% relative baseline acquisition leaps) over current temporal models. By assessing LBF, MPE, and the 115-dimensional state space of Google Research Football (GRF), ETD correctly prevented premature policy collapse. Remarkably, this unconstrained approach leads to emergent Temporal Role Specialization, reducing computational overhead by a statistically dominant 73.6% entirely during off-ball execution without deteriorating centralized task dominance.

2026-03-24T21:19:06Z 14 pages, 5 figures. Code available at: https://github.com/xaiqo/edtmappo. Related materials available on Zenodo: 10.5281/zenodo.19206838 Igor Jankowski http://arxiv.org/abs/2605.19351v1 PAVE: A Cognitive Architecture for Legitimate Violation in Generative Agent Societies 2026-05-19T04:40:09Z

Generative agents based on large language models reproduce believable human behavior in cooperative settings, but how they should reason in situations where rule-breaking may be required, such as fire evacuation or authority-supervised emergency, remains poorly characterized. We propose PAVE (Perception, Assessment, Verdict, Emulation), a novel four-module cognitive architecture that addresses this gap end to end: (i) Perception extracts a structured context with explicit authority distance, peer behaviors, and severity-tagged situational cues; (ii) Assessment scores the context along five scalars including an explicit legitimacy judgment that checks necessity, proportionality, and absence of alternatives; (iii) Verdict decides to comply or violate under a hard legitimacy gate, with a per-agent threshold elicited from the persona; (iv) Emulation enacts the verdict and scopes the violation to the rule the trigger justifies. We instantiate PAVE in Voville, a tile-based traffic environment forked from Smallville, and evaluate across three scenarios, four LLM backbones, and a focused ablation. PAVE agents satisfy four properties simultaneously: legitimate violation (only when a trigger justifies it), authority deference (officer instructions override even high legitimacy), bounded scope (violations confined to the targeted rule), and recovery (baseline restored once the trigger ends). PAVE agents make more structured and interpretable decisions than vanilla across all four properties, and human evaluators rate them as more plausible. Ablating the legitimacy gate reproduces vanilla-like failures. We release Voville, the PAVE prompts and code, and the evaluation pipeline.

2026-05-19T04:40:09Z Preprint. 23 pages, 4 figures. Code and environment will be released upon publication Ahmad Yehia Abduallah Mohamed Kun Qian Tianyi Wang Jiseop Byeon Omar Hassanin Christian Claudel http://arxiv.org/abs/2605.19338v1 STAR-PólyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision 2026-05-19T04:20:43Z

Frontier AI models and multi-agent systems have led to significant improvements in mathematical reasoning. However, for problems requiring extended, long-horizon reasoning, existing systems continue to suffer from fundamental reliability issues: hallucination accumulation, memory fragmentation, and imbalanced reasoning-tool trade-offs. In this paper, we introduce STAR-PólyaMath, a multi-agent framework that systematically addresses these challenges through meta-level supervision and structured Reasoner-Verifier interaction. STAR-PólyaMath is structured as an orchestrated state machine with nested challenge-step-replan loops, governed by a reasoning-free Python orchestrator that separates control from inference and bounds error propagation through trace-back and re-planning. Our key innovation is a persistent Meta-Strategist that maintains cross-attempt memory and exercises meta-level control by issuing high-level strategic guidance or mandatory directives, so the system can escape unproductive loops rather than stagnate or over-rely on tools. STAR-PólyaMath achieves state-of-the-art results on all eight top-tier competition benchmarks: AIME 2025-2026, MathArena Apex Shortlist, MathArena Apex 2025, Putnam 2025, IMO 2025, HMMT February 2026, and USAMO 2026. It obtains perfect scores on AIMEs, Putnam, and HMMT, and shows its largest margin on Apex 2025, scoring 93.75% compared with 80.21% by the strongest baseline GPT-5.5. Ablation studies show that the gains arise from the framework's orchestration rather than from model-level diversity since removing key components or substituting in mixed backbones consistently weakens performance. Code is available at https://github.com/Julius-Woo/STAR-PolyaMath.

2026-05-19T04:20:43Z 25 pages, 4 figures. Code: https://github.com/Julius-Woo/STAR-PolyaMath Jiaao Wu Xian Zhang Hanzhang Liu Sophia Zhang Fan Yang Yinpeng Dong http://arxiv.org/abs/2605.19264v1 Swimming with Whales: Analysis of Power Imbalances in Stake-Weighted Governance 2026-05-19T02:25:16Z

Voting methods weighted by stakes are the fundamental governance paradigm in Proof-of-Stake (PoS) blockchains. Such a paradigm is known to be prone to power distortions: a few users possessing large stakes may completely control decision making, even without owning the totality of the stakes. We study this phenomenon through the lens of computational social choice, focusing on the extent of power imbalances in stake-weighted voting when power is quantified using the Penrose-Banzhaf power index. Our work presents both analytical and empirical contributions. Analytically, we demonstrate that while a perfect alignment between power and relative stake ownership is generally unattainable, it can be approximated in expectation under specific conditions. Empirically, using data from a real-world on-chain governance system (Project Catalyst), we provide a more fine-grained understanding of the power imbalances that are likely to occur in current stake-weighted governance systems.

2026-05-19T02:25:16Z Yuzhe Zhang Manvir Schneider Qin Wang Davide Grossi