DeTox-Fed: Detecting Toxic Conversations in the Fediverse with Federated Graph Neural Networks

2026-05-20T11:41:27Z

The rise of decentralized social networks (DSNs), and in particular the rapid uptake of the Fediverse (e.g., Pleroma, Mastodon, Lemygrad), introduces new challenges in content moderation. Independent instances host their own data, follow different moderation policies, and often observe only partial views of conversations. We present DeTox-Fed, a federated graph-learning framework for detecting toxic conversations in DSNs without requiring instances to share raw conversations or moderation labels. Each instance constructs a local conversation graph, where nodes represent conversation trees and edges capture shared user participation across conversations. A Graph Neural Network (GNN) is then trained in a federated learning setup, allowing instances to collaboratively learn a toxicity classifier while preserving data locality. Unlike text-only moderation approaches, DeTox-Fed combines conversational structure, user-interaction patterns, conversation-level statistics, and aggregate sentiment signals. We evaluate the framework on a large Pleroma conversation dataset and show that it achieves stable toxic conversation detection under limited local labels, partial client participation, and varying moderation thresholds. Our results indicate that federated graph-based moderation is a promising direction for semi-automated moderation in decentralized social networks.

LIDSA: Cognitive Arbitration for Signal-Free Autonomous Intersection Management

2026-05-20T11:33:59Z

Large language models (LLMs) show strong potential for Intelligent Transportation Systems (ITS), particularly in tasks requiring situational reasoning and multi-agent coordination. These capabilities make them well suited for cooperative driving, where rule-based approaches struggle in complex and dynamic traffic environments. Intersection management remains especially challenging due to conflicting right-of-way demands, heterogeneous vehicle priorities, and vehicle-specific kinematic constraints that must be resolved in real time. However, existing approaches typically use LLMs as auxiliary components on top of signal-based systems rather than as primary decision-makers. Signal controllers remain vehicle-agnostic, reservation-based methods lack intent awareness, and recent LLM-based systems still depend on signal infrastructure. In addition, LLM inference latency limits their use in sub-second control settings. We propose LIDSA (LLM-Based Intent-Driven Speed Advisory), a signal-free cognitive arbitration framework for autonomous intersection management. LIDSA uses an LLM to reason over declared vehicle intents, incorporating priority classes, queue pressure, and energy preferences. We evaluate LIDSA against fixed-cycle control, SCATS, AIM, and GLOSA across varying traffic loads. Results show that LIDSA reduces mean control delay by up to 89.1% and maintains Level of Service C while all non-LLM baselines degrade to Level of Service F. Under near-saturated demand, LIDSA reduces mean waiting time by 93% and peak queue length by 60.6% relative to fixed-cycle control. It also lowers fuel consumption by up to 48.8% and achieves 86.2% intent satisfaction, compared to 61.2% for the best non-LLM method. These results demonstrate that LLM-based reasoning can enable real-time, signal-free intersection management.

The Knowledge Gap in a High-Choice Media Environment: Experimental Evidence from Online Search

2026-05-20T10:52:32Z

Persistent inequalities in political knowledge are a central concern in political communication. We organize the mechanisms underlying the knowledge-gap literature by distinguishing between individual preconditions, structural features of the information environment, and topic characteristics. Within this framework, we note that self-directed information seeking, a prototypical form of intentional exposure, has received little attention despite its importance in navigating today's complex information environment. We conducted a field experiment in Germany combining randomized encouragements and passive browser tracking to examine how individuals with varying education levels acquire policy-specific knowledge through online search. Participants were randomly assigned to one of three conditions (verbal encouragement, financial encouragement, or control) to seek information on three salient policy topics differing in divisiveness and complexity (child support, energy transition, and cannabis legalization). We estimate both intention-to-treat (ITT) and local average treatment effects (LATE) of information seeking on post-search knowledge outcomes, with a focus on education and civic knowledge as moderators. While the interventions equalized information-seeking behavior, the results provide some support for the knowledge gap hypothesis: knowledge gains were concentrated among participants with higher education or baseline civic knowledge, who, according to our post-hoc exploratory analyses, appeared more effective at navigating search results. These findings indicate that a narrowing of knowledge inequalities goes beyond motivation: it calls for both individual-level interventions to strengthen citizens' skills and structural-level adaptations to foster more equitable learning environments.

Verifiable Provenance and Watermarking for Generative AI: An Evidentiary Framework for International Operational Law and Domestic Courts

2026-05-20T10:39:56Z

Generative artificial intelligence now synthesizes photorealistic imagery, audio, and video at a cost that defeats traditional forensic intuition. The legal consequences span three regimes studied so far in isolation: international operational law, domestic procedure, and product regulation. This article presents a unified evidentiary framework that maps cryptographic content provenance, robust statistical watermarking, and zero knowledge attestation to the proof requirements of each regime. We define a five tier threat model spanning naive regeneration, adversarial laundering, cross model regeneration, active watermark removal, and insider provenance forgery. We release a public benchmark of 12000 generated items across image, audio, and video modalities under six laundering pipelines for 72000 evaluation samples. We evaluate four representative schemes and report true positive rate at fixed false positive rate, robustness area under the curve, computational overhead, and a regime conditioned legal sufficiency score. We translate empirical detection bounds into legal sufficiency thresholds for command decisions under the law of armed conflict, for criminal and civil admissibility under domestic procedure, and for persistence audits under the European Union Artificial Intelligence Act and analogous regimes. The result is a reproducible reference pipeline, a public benchmark, and model annexes that lawyers, engineers, and operators can deploy together.

Towards Zero Trust Architecture: A Pilot Study on Information Systems Security Readiness amongst Small and Medium Enterprises

2026-05-20T10:08:14Z

Small and medium enterprises (SMEs) face growing cyber threats but often lack the resources and expertise needed to adopt Zero Trust Architecture (ZTA). This pilot study examines the drivers and barriers shaping SME perceptions of ZTA necessity and proposes an exploratory staged adoption path. Survey data from 64 IT and security professionals in the Asia-Pacific region show that ZTA familiarity and cloud-computing needs are the strongest positive correlates of perceived necessity, whereas accumulated barriers show only a weak negative association. Identity and access management complexity and scalability emerge as the main implementation hurdles. Based on these findings, we propose a three-stage route for SMEs: strengthening identity governance, segmenting high-value assets, and introducing targeted monitoring in line with operational capacity. The study offers early evidence for more realistic Zero Trust transitions in resource-constrained firms.

A Deployment Audit of Release-Side Risk in Conformal Triage under Prevalence Shift

2026-05-20T09:44:09Z

Conformal triage converts predictive scores into deployment actions that either release a case, flag it for urgent attention, or defer it to human review. Under prevalence shift, however, the usual summaries of marginal coverage and human-review rate can miss the safety-critical question of whether patients who truly experience the target event are released without review. To address this gap, we introduce a leakage-aware deployment audit for release-side conformal triage. It first assigns target subjects to three non-overlapping roles: prevalence correction, conformal calibration, and held-out release-safety evaluation. This separation then lets the audit evaluate release directly: how many event-positive patients are cleared without review, whether the pilot has enough event labels for calibration, and how the safety-review trade-off shifts. Applying this audit to a retrospective NSCLC pilot shows why lower review can be misleading: after prevalence correction, the pooled conformal branch lowers review by releasing more patients, some of whom are event-positive. Within the audit, the classwise branch acts as a scarcity diagnostic: the pilot has too few event labels to certify safe low-review release.

Fairness in Opinion Dynamics

2026-05-20T08:48:16Z

Ways in which people's opinions change are, without a doubt, subject to a rich tapestry of differing influences. Factors that affect how one arrives at an opinion reflect how they have been shaped by their environment throughout their lives, education, material status, what belief systems are they subscribed to, and what socio-economic minorities are they a part of. This already complex system is further expanded by the ever-changing nature of one's social network. It is therefore no surprise that many models have a tendency to perform best for the majority of the population and discriminating those people who are members of various marginalized groups . This bias and the study of how to counter it are subject to a rapidly developing field of Fairness in Social Network Analysis (SNA). The focus of this work is to look into how a state-of-the-art model discriminates certain minority groups and whether it is possible to reliably predict for whom it will perform worse. Moreover, is such prediction possible based solely on one's demographic or topological features? To this end, the NetSense dataset, together with a state-of-the-art CoDiNG model for opinion prediction have been employed. Our work explores how three classifier models (Demography-Based, Topology-Based, and Hybrid) perform when assessing for whom this algorithm will provide inaccurate predictions. Finally, through a comprehensive analysis of these experimental results, we identify four key patterns of algorithmic bias. Our findings suggest that no single paradigm provides the best results and that there is a real need for context-aware strategies in fairness-oriented social network analysis. We conclude that a multi-faceted approach, incorporating both individual attributes and network structures, is essential for reducing algorithmic bias and promoting inclusive decision-making.

Detecting Synthetic Political Narratives in Cross-Platform Social Media Discourse

2026-05-20T07:58:49Z

The proliferation of large language models has introduced a new paradigm of synthetic political communication in which narratives may be generated, semantically coordinated, and strategically disseminated across platforms at scale. We present a cross-platform framework for detecting synthetic political narratives using four coordination signals -- lexical diversity D(C), temporal burstiness B(C), rhetorical repetition R(C), and semantic homogenization H(C) -- combined into a Synthetic Narrative Coordination Score SNC(C). We apply the framework to a corpus of 353,223 records spanning six geopolitical event windows collected from six Telegram channels and nine Reddit communities (2023--2026). Results show that IntelSlava exhibits the lowest lexical diversity (MATTR 0.52--0.54), the highest burstiness (B=+0.48 to +0.73), and the highest rhetorical overlap with peer channels (Jaccard 0.12), ranking first in the composite SNC(C) on four of six event windows (SNC 0.45--0.60). Rybar ranks last on all windows despite its high semantic homogenization, because its Russian-language output yields high lexical diversity and near-zero rhetorical Jaccard with English-language channels -- demonstrating that no single indicator is sufficient for coordination detection. Multi-dimensional SNC(C) scoring provides a more robust and interpretable signal than any individual metric.

Can Multi-Agent LLMs Identify Their Peers? Stylometric Fingerprinting in Role-Constrained Political Analysis

2026-05-20T04:56:42Z

Multi-agent large language model (LLM) pipelines for political statement analysis are vulnerable to peer-preservation bias: models tend to protect peer models from deactivation and show identity-dependent scoring distortions. Prompt-level anonymization was proposed as a mitigation, but prior work simultaneously documented that stylometric fingerprints survive anonymization in role-constrained outputs - raising the question of whether this mitigation is sufficient. This paper provides the first systematic investigation of whether LLMs can identify the model family behind political analysis texts under anonymization conditions. We evaluate three classifier approaches - LLM zero-shot and few-shot (Claude Sonnet 4.6 and Llama-3.3-70B) and a fine-tuned T5-base model - on a five-class attribution task covering four commercial LLM families and an open-world 'unknown' class. We introduce a statement-disjoint cross-validation protocol (SD-CV; defined in Section 3.5) that guarantees no content overlap between training and validation data, and contrast it with a run-disjoint baseline (RD-CV). T5 achieves Macro F1 = 0.991 (+-0.008) under SD-CV and F1 = 0.978 on 24 completely held-out statements - robust despite a 2.1x increase in train-test content distance versus RD-CV (0.767 vs. 0.366, p<0.001), demonstrating genuine stylometric generalization. A fractional SD-CV analysis identifies a performance knee at 40% of training data (~440 texts). Our findings confirm that prompt-level anonymization alone cannot neutralize model identity signals, with direct implications for EU AI Act compliance (Articles 13, 14, 26) and for computer system validation (CSV) in quality-critical multi-agent deployments.

Unpacking "Personal" Health Informatics for Proactive Collective Care

2026-05-20T04:47:11Z

Care is primarily a collective phenomenon, with a practice that involves sharing health and wellbeing information within a trusted "care circle" of family members and companions for sensemaking, interpretation, decision-making, and follow-through. However, current digital health tools and information systems are designed for individuals and primarily intended for Personal Health Informatics (PHI). This mismatch between collective practice and individualistic design creates new challenges for the proactive use of such systems in care settings and limits adoption, sustained engagement, and meaningful use. To examine how people practice collective care and how (if) they perceive, adopt, and integrate PHI systems for proactive care, we conducted a sequential mixed-methods study. Through an initial survey (n=87) and semi-structured interviews (n=22), we found that their practices involve collectively understanding, analyzing, and sensemaking health information. However, we also found that their use of existing systems to support such practices is constrained by factors at personal, relational, technological, and structural levels that evolve over time. To explore redesigning PHI toward "Collective Health Informatics", we conducted stakeholder-specific interviews (n=12), a follow-up survey (n=116), and co-design workshops (n=6) to understand the dynamics required for collective settings while retaining agency. Using a design probe evaluation (n=38), we refine a design vision for coordinated, trustworthy action across such care relationships. Our findings motivate CC-Proact, an operational map that translates ecological influences into three design levers: Agency, Elicitation, and Engagement. Using this map, our work empirically examines collective care practices and offers ten design recommendations for building responsible systems that proactively support collective care.

Design Principles and Observable Indicators for AI-Enabled Pedagogical Accompaniment: Evidence from the Amico Dual-Mode Prototype in Italy and China

2026-05-20T03:32:07Z

AI-enabled systems are increasingly introduced into educational contexts, yet their effectiveness depends less on technological sophistication than on the quality of pedagogical mediation, ethical constraints, and context-sensitive design. This paper proposes a replicable framework for AI-enabled pedagogical accompaniment, grounded in a human-in-command approach in which adult responsibility remains central and AI functions as an enabling, non-substitutive infrastructure. Building on the Amico project, we operationalize the concept of a relational bridge as a sequence of micro-mediations that lower the threshold of access to educational relationships and facilitate transitions toward meaningful human interaction with teachers, peers, and communities of practice. The contribution synthesizes a set of design principles, including transparency of system identity and limits, scaffolding toward human contact, maieutic questioning, prevention of dependency dynamics, and data minimization, and maps them to observable indicators suitable for real educational settings. The paper also outlines an initial cross-context exploration of the prototype in Italy and China and discusses how the two interaction modes, AmicoMio (structured, task-oriented) and AmicoTuo (reflective, supportive), can be used as complementary pedagogical mediations. Pilot observations and participant feedback suggested feasibility and perceived usefulness in vocational contexts, motivating the present framework, informing the subsequent doctoral research program, and supporting the proposed collaborative research agenda.

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

2026-05-20T00:57:59Z

Medical large language models (LLMs), including custom medical GPTs (MedGPTs) and open-source models, are increasingly deployed on web platforms to provide clinical guidance. However, they pose risks of hallucination, policy noncompliance, and unsafe design. We conduct a large-scale assessment of 6,233 MedGPTs, evaluating a stratified sample of 1,500, together with 10 open-source LLMs. We introduce two frameworks: MedGPT-HEval for hallucination detection and an LLM-based pipeline for assessing policy violations and developer intent. Our results show that 25-30% of MedGPTs exhibit low factual accuracy, with bottom- and middle-tier models at highest risk; 33.6-54.3% violate operational thresholds, and 57.06% of Action-enabled models lack adequate privacy disclosures. Compared with open-source models, MedGPTs achieve higher factual accuracy and semantic alignment, though open-source models are more stable. These results reveal systemic gaps in hallucination and compliance, highlighting the need for multi-metric evaluation and stronger safeguards. We release HAA-MedGPT, a structured dataset that supports future research on the safety of web-facing medical LLMs.

Gender Differences in AI Literacy Workshop Outcomes and Deepfake Engagement

2026-05-19T22:21:08Z

As Artificial Intelligence (AI) literacy initiatives expand in K-12 settings, understanding how gender shapes student baseline perceptions, tool-use, and responsiveness to interventions is essential for equitable curriculum design. This study examines gender differences in AI literacy, safety awareness, and STEM career aspirations among Australian secondary students (Years 7, 8, and 10; N(pre) = 199, n(post) = 136) from two co-educational government schools who participated in a one-day AI literacy workshop. Using statistical regression methods controlling for year level and school, we found that pre-workshop, male students reported significantly higher STEM career interest across all three domains (AI, computer science, and engineering), while female students were significantly more likely to use AI for schoolwork and to seek advice from AI tools. Gender-differentiated patterns also emerged in deepfake behaviours: males were significantly more likely to have created or shared deepfake content. Both genders improved in AI knowledge post-intervention, yet females showed a richer profile of gains: wider conceptual understanding, greater confidence, and meaningful increases in AI and CS career interest that partially narrowed the gender STEM gap. These findings highlight the need for gender-responsive AI curricula, particularly deepfake safety education for male students, and demonstrate that even single-day workshops can narrow gender gaps in STEM aspirations and AI confidence.

Binge, Bot, Repeat: Unpacking the Ecosystem of Video Piracy on Telegram

2026-05-19T21:45:28Z

Telegram has emerged as a major platform for large-scale video piracy, where copyrighted content is rapidly distributed among users. Despite its prominence, the structural and operational dynamics of this ecosystem remain insufficiently understood. To address this gap, we present the first large-scale study of video piracy on Telegram through a mixed-method analysis of 1,057 channels that shared 209k unique posts between December 2023 and January 2026 - systematically characterizing their content, distribution strategies, and how the ecosystem is sustained at scale. Central to our approach is the development of a fine-grained taxonomy that enables a structured understanding of the activity and intent of these channels on a per-post level. The channels collectively distributed 19,033 unique copyrighted titles originating from 175 countries, accumulating over 4.85B unique views and resulting in a lower-bound estimated financial loss of $17.49B for content rights holders. We also find that this ecosystem is deliberately engineered to be resilient against takedown efforts, frequently redirecting users through chains of intermediary channels and automated bots that collectively handle hosting, access control, monetization, and channel discovery. The scale and persistence of this ecosystem motivated the development of Anti-RIP, a real-time framework for detecting emerging video piracy communities on Telegram. Anti-RIP utilizes our taxonomy to generate contextual, interpretable insights that stakeholders confirmed improve the triaging action against reported posts and channels. Over a 61-day period, the framework facilitated the takedown of 524 previously unknown piracy channels and 71 bots. To support reproducibility and future research, we open-source both the dataset and the Anti-RIP framework.

ShadeBench: A Benchmark Dataset for Building Shade Simulation in Sustainable Society

2026-05-19T21:28:02Z

Urban heat exposure is becoming an increasingly critical challenge due to the intensifying urban heat island effect. Fine-grained shade patterns, especially those induced by urban buildings, strongly influence pedestrians' thermal exposure and outdoor activity planning. However, accurately modeling and analyzing urban shade at scale remains difficult because of the lack of large-scale datasets and systematic evaluation frameworks. To address this challenge, we present ShadeBench, a comprehensive dataset and benchmark for urban shade understanding. ShadeBench contains geographically diverse urban scenes with temporally varying simulated shade maps and textual descriptions, together with aligned satellite imagery, building skeleton representations, and 3D building meshes. Built upon this multimodal dataset, ShadeBench supports a range of downstream tasks, including shade generation, shade segmentation, and 3D building reconstruction. We further establish standardized evaluation protocols and baseline methods for these tasks. By enabling scalable and fine-grained shade analysis, ShadeBench provides a foundation for data-driven urban climate research and supports future studies in heat-resilient urban planning and decision-making. The code and dataset are publicly available at https://darl-genai.github.io/shadebench/.