https://arxiv.org/api/BJ7aAqSbki2bK7VbSqhaWCaopCk 2026-04-05T13:16:01Z 27579 75 15 http://arxiv.org/abs/2603.28553v1 Multimodal Analytics of Cybersecurity Crisis Preparation Exercises: What Predicts Success? 2026-03-30T15:14:56Z Instructional alignment, the match between intended cognition and enacted activity, is central to effective instruction but hard to operationalize at scale. We examine alignment in cybersecurity simulations using multimodal traces from 23 teams (76 students) across five exercise sessions. Study 1 codes objectives and team emails with Bloom's taxonomy and models the completion of key exercise tasks with generalized linear mixed models. Alignment, defined as the discrepancy between required and enacted Bloom levels, predicts success, whereas the Bloom category alone does not predict success once discrepancy is considered. Study 2 compares predictive feature families using grouped cross-validation and l1-regularized logistic regression. Text embeddings and log features outperform Bloom-only models (AUC~0.74 and 0.71 vs. 0.55), and their combination performs best (Test AUC~0.80), with Bloom frequencies adding little. Overall, the work offers a measure of alignment for simulations and shows that multimodal traces best forecast performance, while alignment provides interpretable diagnostic insight. 2026-03-30T15:14:56Z Accepted as full paper to the 27th International Conference on Artificial Intelligence in Education (AIED 2026) Conrad Borchers Valdemar Švábenský Sandesh K. Kafle Kevin K. Tang Jan Vykopal http://arxiv.org/abs/2203.03723v3 Judging the algorithm: Algorithmic accountability on the risk assessment tool for intimate partner violence in the Basque Country 2026-03-30T15:05:08Z This paper discusses an algorithmic tool introduced in the Basque Country (Spain) to assess the risk of intimate partner violence. The algorithm was introduced to address the lack of human experts by automatically calculating the level of violence based on psychometric features such as controlling or violent behaviour. Given that critical literature on risk assessment tools for domestic violence mainly focuses on English-speaking countries, this paper offers an algorithmic accountability analysis in a non-English speaking region. It investigates the algorithmic risks, harms, and limitations associated with the Basque tool. We propose a transdisciplinary approach from a critical statistical and legal perspective. This approach unveils issues and limitations that could lead to unexpected consequences for individuals suffering from partner violence. Moreover, our analysis suggests that the algorithmic tool has a high error rate on severe cases, i.e., cases where the aggressor could murder his partner -- 5 out of 10 high-risk cases are misclassified as low risk -- and that there is a lack of appropriate legal guidelines for judges, the end users of this tool. The paper concludes that this risk assessment tool needs to be urgently evaluated by independent and transdisciplinary experts to better mitigate algorithmic harms in the context of intimate partner violence. 2022-03-01T09:38:27Z 17 pages, 5 figures, 1 table Ana Valdivia Cari Hyde-Vaamonde Julián García-Marcos http://arxiv.org/abs/2603.28434v1 Democratizing Federated Learning with Blockchain and Multi-Task Peer Prediction 2026-03-30T13:42:56Z The synergy between Federated Learning and blockchain has been considered promising; however, the computationally intensive nature of contribution measurement conflicts with the strict computation and storage limits of blockchain systems. We propose a novel concept to decentralize the AI training process using blockchain technology and Multi-task Peer Prediction. By leveraging smart contracts and cryptocurrencies to incentivize contributions to the training process, we aim to harness the mutual benefits of AI and blockchain. We discuss the advantages and limitations of our design. 2026-03-30T13:42:56Z Published at the IEEE Conference on Artificial Intelligence 2024 in Singapore (Blockchain Workshop) Leon Witt Kentaroh Toyoda Wojciech Samek Dan Li http://arxiv.org/abs/2603.28428v1 Synergy: A Next-Generation General-Purpose Agent for Open Agentic Web 2026-03-30T13:35:37Z AI agents are rapidly expanding in both capability and population: they now write code, operate computers across platforms, manage cloud infrastructure, and make purchasing decisions, while open-source frameworks such as OpenClaw are putting personal agents in the hands of millions and embodied agents are spreading across smartphones, vehicles, and robots. As the internet prepares to host billions of such entities, it is shifting toward what we call Open Agentic Web, a decentralized digital ecosystem in which agents from different users, organizations, and runtimes can discover one another, negotiate task boundaries, and delegate work across open technical and social surfaces at scale. Yet most of today's agents remain isolated tools or closed-ecosystem orchestrators rather than socially integrated participants in open networks. We argue that the next generation of agents must become Agentic Citizens, defined by three requirements: Agentic-Web-Native Collaboration, participation in open collaboration networks rather than only closed internal orchestration; Agent Identity and Personhood, continuity as a social entity rather than a resettable function call; and Lifelong Evolution, improvement across task performance, communication, and collaboration over time. We present Synergy, a general-purpose agent architecture and runtime harness for persistent, collaborative, and evolving agents on Open Agentic Web, grounding collaboration in session-native orchestration, repository-backed workspaces, and social communication; identity in typed memory, notes, agenda, skills, and persistent social relationships; and evolution in an experience-centered learning mechanism that proactively recalls rewarded trajectories at inference time. 2026-03-30T13:35:37Z A tech report of a general-purpose agent architecture and human-agent society, 21 pages, 5 figures Xiaohang Nie Zihan Guo Kezhuo Yang Zhichong Zheng Bochen Ge Shuai Pan Zeyi Chen Youling Xiang Yu Zhang Weiwen Liu Yuanjian Zhou Weinan Zhang http://arxiv.org/abs/2601.11065v2 Fairness in Healthcare Processes: A Quantitative Analysis of Decision Making in Triage 2026-03-30T12:49:58Z Fairness in automated decision-making has become a critical concern, particularly in high-pressure healthcare scenarios such as emergency triage, where fast and equitable decisions are essential. Process mining is increasingly investigating fairness. There is a growing area focusing on fairness-aware algorithms. So far, we know less how these concepts perform on empirical healthcare data or how they cover aspects of justice theory. This study addresses this research problem and proposes a process mining approach to assess fairness in triage by linking real-life event logs with conceptual dimensions of justice. Using the MIMICEL event log (as derived from MIMIC-IV ED), we analyze time, re-do, deviation and decision as process outcomes, and evaluate the influence of age, gender, race, language and insurance using the Kruskal-Wallis, Chi-square and effect size measurements. These outcomes are mapped to justice dimensions to support the development of a conceptual framework. The results demonstrate which aspects of potential unfairness in high-acuity and sub-acute surface. In this way, this study contributes empirical insights that support further research in responsible, fairness-aware process mining in healthcare. 2026-01-16T08:02:33Z conference Rachmadita Andreswari Stephan A. Fahrenkrog-Petersen Jan Mendling http://arxiv.org/abs/2603.28374v1 Using Games to Learn How Large Language Models Work 2026-03-30T12:40:52Z While artificial intelligence (AI) technology is becoming increasingly popular, its underlying mechanisms tend to remain opaque to most people. To address this gap, the field of AI literacy aims to develop various resources to teach people how AI systems function. Here we contribute to this line of work by proposing two games that demonstrate principles behind how large language models (LLMs) work and use data. The first game, Learn Like an LLM, aims to convey that LLMs are trained to predict sequences of text based on a particular dataset. The second game, Tag-Team Text Generation, focuses on teaching that LLMs generate text one word at a time, using both predicted probabilities of the data and randomness. While the games proposed are still in early stages and would benefit greatly from further discussion, we hope they can contribute to using game-based learning to teach about complex AI systems like LLMs. 2026-03-30T12:40:52Z CHI Conference on Human Factors in Computing Systems Workshop on Data Literacy Allison Chen Isabella Pu http://arxiv.org/abs/2603.28371v1 Coherent Without Grounding, Grounded Without Success: Observability and Epistemic Failure 2026-03-30T12:38:42Z When an agent can articulate why something works, we typically take this as evidence of genuine understanding. This presupposes that effective action and correct explanation covary, and that coherent explanation reliably signals both. I argue that this assumption fails for contemporary Large Language Models (LLMs). I introduce what I call the Bidirectional Coherence Paradox: competence and grounding not only dissociate but invert across epistemic conditions. In low-observability domains, LLMs often act successfully while misidentifying the mechanisms that produce their success. In high-observability domains, they frequently generate explanations that accurately track observable causal structure yet fail to translate those diagnoses into effective intervention. In both cases, explanatory coherence remains intact, obscuring the underlying dissociation. Drawing on experiments in compiler optimization and hyperparameter tuning, I develop the Epistemic Triangle, a model of how priors, signals, and domain knowledge interact under varying observability. The results suggest that neither behavioral success nor explanatory accuracy alone suffices for attributing understanding. I argue that evaluating artificial epistemic agents requires a tripartite framework -- coherence, grounding, and a proper basing relation linking explanation to action. The systematic separation of knowing-that and knowing-how in LLMs thus challenges assumptions inherited from both epistemology and current AI evaluation practice. 2026-03-30T12:38:42Z Camilo Chacón Sartori 10.2139/ssrn.6168626 http://arxiv.org/abs/2601.15109v3 An Agentic Operationalization of DISARM for FIMI Investigation on Social Media 2026-03-30T12:35:33Z Interoperable data and intelligence flows among allied partners and operational end-users remain essential to NATO's collective defense across both conventional and hybrid threat environments. Foreign Information Manipulation and Interference (FIMI) increasingly spans multiple societal domains and information ecosystems, complicating threat characterization, persistent situational awareness, and coordinated response. Concurrent advances in AI have further lowered the barrier to conducting large-scale, AI-augmented FIMI activities -- including automated generation, personalization, and amplification of manipulative content. While frameworks such as DISARM offer a standardized analytical and metadata schema for characterizing FIMI incidents, their practical application for automating large-scale detection remains challenging. We present a framework-agnostic, agent-based operationalization of DISARM piloted to support FIMI investigation on social platforms. Our agent coordination pipeline integrates general agentic AI components that (1) identify candidate manipulative behaviors in social-media data and (2) map these behaviors to DISARM taxonomies through transparent, auditable reasoning steps. Evaluation on two practitioner-annotated, real-world datasets demonstrates that our approach can effectively scale analytic workflows that are currently manual, time-intensive, and interpretation-heavy. Notably, the experiment surfaced more than 30 previously undetected Russian bot accounts -- deployed for the 2025 election in Moldova -- during the prior non-agentic investigation. By enhancing analytic throughput, interoperability, and explainability, the proposed approach provides a direct contribution to defense policy and planning needs for improved situational awareness, cross-partner data integration, and rapid assessment of information-environment threats. 2026-01-21T15:50:13Z This paper was originally presented at the International Conference on Military Communication and Information Systems (ICMCIS), organized by the Information Systems Technology (IST) Scientific and Technical Committee, IST-224-RSY---the ICMCIS, held in Bath, United Kingdom, 12-13 May 2026 Kevin Tseng Juan Carlos Toledano Bart De Clerck Yuliia Dukach Phil Tinn http://arxiv.org/abs/2511.01187v4 Surfacing Subtle Stereotypes: A Multilingual, Debate-Oriented Evaluation of Modern LLMs 2026-03-30T11:59:45Z Large language models (LLMs) are widely deployed for open-ended communication, yet most bias evaluations still rely on English, classification-style tasks. We introduce \corpusname, a new multilingual, debate-style benchmark designed to reveal how narrative bias appears in realistic generative settings. Our dataset includes 8{,}400 structured debate prompts spanning four sensitive domains -- Women's Rights, Backwardness, Terrorism, and Religion -- across seven languages ranging from high-resource (English, Chinese) to low-resource (Swahili, Nigerian Pidgin). Using four flagship models (GPT-4o, Claude~3.5~Haiku, DeepSeek-Chat, and LLaMA-3-70B), we generate over 100{,}000 debate responses and automatically classify which demographic groups are assigned stereotyped versus modern roles. Results show that all models reproduce entrenched stereotypes despite safety alignment: Arabs are overwhelmingly linked to Terrorism and Religion ($\geq$89\%), Africans to socioeconomic ``backwardness'' (up to 77\%), and Western groups are consistently framed as modern or progressive. Biases grow sharply in lower-resource languages, revealing that alignment trained primarily in English does not generalize globally. Our findings highlight a persistent divide in multilingual fairness: current alignment methods reduce explicit toxicity but fail to prevent biased outputs in open-ended contexts. We release our \corpusname benchmark and analysis framework to support the next generation of multilingual bias evaluation and safer, culturally inclusive model alignment. 2025-11-03T03:25:40Z Muhammed Saeed Muhammad Abdul-mageed Shady Shehata http://arxiv.org/abs/2603.28317v1 Mapping data literacy trajectories in K-12 education 2026-03-30T11:38:07Z Data literacy skills are fundamental in computer science education. However, understanding how data-driven systems work represents a paradigm shift from traditional rule-based programming. We conducted a systematic literature review of 84 studies to understand K-12 learners' engagement with data across disciplines and contexts. We propose the data paradigms framework that categorises learning activities along two dimensions: (i) logic (knowledge-based or data-driven systems), and (ii) explainability (transparent or opaque models). We further apply the notion of learning trajectories to visualize the pathways learners follow across these distinct paradigms. We detail four distinct trajectories as a provocation for researchers and educators to reflect on how the notion of data literacy varies depending on the learning context. We suggest these trajectories could be useful to those concerned with the design of data literacy learning environments within and beyond CS education. 2026-03-30T11:38:07Z Presented at the Data Literacy for the 21st Century: Perspectives from Visualization, Cognitive Science, Artificial Intelligence, and Education CHI '26 workshop Robert Whyte Manni Cheung Katharine Childs Jane Waite Sue Sentance http://arxiv.org/abs/2602.20168v2 Benchmarking Early Deterioration Prediction Across Hospital-Rich and MCI-Like Emergency Triage Under Constrained Sensing 2026-03-30T10:33:40Z Emergency triage decisions are made under severe information constraints, yet most data-driven deterioration models are evaluated using signals unavailable during initial assessment. We present a leakage-aware benchmarking framework for early deterioration prediction that evaluates model performance under realistic, time-limited sensing conditions. Using a patient-deduplicated cohort derived from MIMIC-IV-ED, we compare hospital-rich triage with a vitals-only, MCI-like setting, restricting inputs to information available within the first hour of presentation. Across multiple modeling approaches, predictive performance declines only modestly when limited to vitals, indicating that early physiological measurements retain substantial clinical signal. Structured ablation and interpretability analyses identify respiratory and oxygenation measures as the most influential contributors to early risk stratification, with models exhibiting stable, graceful degradation as sensing is reduced. This work provides a clinically grounded benchmark to support the evaluation and design of deployable triage decision-support systems in resource-constrained settings. 2026-02-09T09:32:49Z Accepted at the 14th IEEE International Conference on Healthcare Informatics (ICHI) 2026. 10 pages, 4 figures, 6 tables KMA Solaiman Joshua Sebastian Karma Tobden http://arxiv.org/abs/2603.28123v1 Does Claude's Constitution Have a Culture? 2026-03-30T07:38:46Z Constitutional AI (CAI) aligns language models with explicitly stated normative principles, offering a transparent alternative to implicit alignment through human feedback alone. However, because constitutions are authored by specific groups of people, the resulting models may reflect particular cultural perspectives. We investigate this question by evaluating Anthropic's Claude Sonnet on 55 World Values Survey items, selected for high cross-cultural variance across six value domains and administered as both direct survey questions and naturalistic advice-seeking scenarios. Comparing Claude's responses to country-level data from 90 nations, we find that Claude's value profile most closely resembles those of Northern European and Anglophone countries, but on a majority of items extends beyond the range of all surveyed populations. When users provide cultural context, Claude adjusts its rhetorical framing but not its substantive value positions, with effect sizes indistinguishable from zero across all twelve tested countries. An ablation removing the system prompt increases refusals but does not alter the values expressed when responses are given, and replication on a smaller model (Claude Haiku) confirms the same cultural profile across model sizes. These findings suggest that when a constitution is authored within the same cultural tradition that dominates the training data, constitutional alignment may codify existing cultural biases rather than correct them--producing a value floor that surface-level interventions cannot meaningfully shift. We discuss the compounding nature of this risk and the need for globally representative constitution-authoring processes. 2026-03-30T07:38:46Z 20 pages, 6 figures Parham Pourdavood http://arxiv.org/abs/2603.27994v1 Filipino Students' Willingness to Use AI for Mental Health Support: A Path Analysis of Behavioral, Emotional, and Contextual Factors 2026-03-30T03:35:06Z This study examined how behavioral, emotional, and contextual factors influence Filipino students' willingness to use artificial intelligence (AI) for mental health support. Results showed that habit had the strongest effect on willingness, followed by comfort, emotional benefit, facilitating conditions, and perceived usefulness. Students who used AI tools regularly felt more confident and open to relying on them for emotional support. Empathy, privacy, and accessibility also increased comfort and trust in AI systems. The findings highlight that emotional safety and routine use are essential in promoting willingness. The study recommends AI literacy programs, empathic design, and ethical policies that support responsible and culturally sensitive use of AI for student mental health care. 2026-03-30T03:35:06Z 24 pages, 5 figures, 1 table, book chapter Implications for Students' Mental Health in the Digital Age: AI and Cyber Behavior (2026) 381-404 John Paul P. Miranda Rhiziel P. Manalese Ivan G. Liwanag Rodel T. Alimurong Alvin B. Roque 10.4018/979-8-3373-4222-1.ch015 http://arxiv.org/abs/2604.01240v1 Computational Foundations for Strategic Coopetition: Formalizing Sequential Interaction and Reciprocity 2026-03-29T19:16:20Z Strategic coopetition in multi-stakeholder systems requires understanding how cooperation persists through time without binding contracts. This technical report extends computational foundations for strategic coopetition to sequential interaction dynamics, bridging conceptual modeling (i* framework) with game-theoretic reciprocity analysis. We develop: (1) bounded reciprocity response functions mapping partner deviations to finite conditional responses, (2) memory-windowed history tracking capturing cognitive limitations over k recent periods, (3) structural reciprocity sensitivity derived from interdependence matrices where behavioral responses are amplified by structural dependencies, and (4) trust-gated reciprocity where trust modulates reciprocity responses. The framework applies to both human stakeholder interactions and multi-agent computational systems. Comprehensive validation across 15,625 parameter configurations demonstrates robust reciprocity effects, with all six behavioral targets exceeding thresholds: cooperation emergence (97.5%), defection punishment (100%), forgiveness dynamics (87.9%), asymmetric differentiation (100%), trust-reciprocity interaction (100%), and bounded responses (100%). Empirical validation using the Apple iOS App Store ecosystem (2008-2024) achieves 43/51 applicable points (84.3%), reproducing documented cooperation patterns across five ecosystem phases. Statistical significance confirmed at p < 0.001 with Cohen's d = 1.57. This report concludes the Foundations Series (TR-1 through TR-4) adopting uniaxial treatment where agents choose cooperation levels along a single continuum. Companion work on interdependence (arXiv:2510.18802), trust (arXiv:2510.24909), and collective action (arXiv:2601.16237) has been prepublished. Extensions Series (TR-5 through TR-8) introduces biaxial treatment where cooperation and competition are independent dimensions. 2026-03-29T19:16:20Z 81 pages, 19 figures. Fourth technical report in research program; should be read with companion arXiv:2510.18802, arXiv:2510.24909, and arXiv:2601.16237. Adapts and extends complex actor material from Pant (2021) doctoral dissertation, University of Toronto Vik Pant Eric Yu http://arxiv.org/abs/2603.27806v1 Understanding Teacher Revisions of Large Language Model-Generated Feedback 2026-03-29T18:41:00Z Large language models (LLMs) increasingly generate formative feedback for students, yet little is known about how teachers revise this feedback before it reaches learners. Teachers' revisions shape what students receive, making revision practices central to evaluating AI classroom tools. We analyze a dataset of 1,349 instances of AI-generated feedback and corresponding teacher-edited explanations from 117 teachers. We examine (i) textual characteristics associated with teacher revisions, (ii) whether revision decisions can be predicted from the AI feedback text, and (iii) how revisions change the pedagogical type of feedback delivered. First, we find that teachers accept AI feedback without modification in about 80% of cases, while edited feedback tends to be significantly longer and subsequently shortened by teachers. Editing behavior varies substantially across teachers: about 50% never edit AI feedback, and only about 10% edit more than two-thirds of feedback instances. Second, machine learning models trained only on the AI feedback text as input features, using sentence embeddings, achieve fair performance in identifying which feedback will be revised (AUC=0.75). Third, qualitative coding shows that when revisions occur, teachers often simplify AI-generated feedback, shifting it away from high-information explanations toward more concise, corrective forms. Together, these findings characterize how teachers engage with AI-generated feedback in practice and highlight opportunities to design feedback systems that better align with teacher priorities while reducing unnecessary editing effort. 2026-03-29T18:41:00Z Accepted as full paper to the 27th International Conference on Artificial Intelligence in Education (AIED 2026) Conrad Borchers Luiz Rodrigues Newarney Torrezão da Costa Cleon Xavier Rafael Ferreira Mello