https://arxiv.org/api/i5UsK5WlBEuK0uT6HH+AgLxeD5g 2026-03-24T08:34:39Z 27415 15 15 http://arxiv.org/abs/2603.21519v1 Triangulating Temporal Dynamics in Multilingual Swiss Online News 2026-03-23T03:27:00Z Analyzing news coverage in multilingual societies can offer valuable insights into the dynamics of public discourse and the development of collective narratives, yet comprehensive studies that account for linguistic and cultural diversity within national media ecosystems remain limited, particularly in complex contexts such as Switzerland. This paper studies temporal trends in Swiss digital media across the country's three main linguistic regions, French, German, and Italian, using a triangulated methodology that combines quantitative analyses with qualitative insights. We collected and processed over 1.7 million news articles, applying lexical metrics, named entity recognition and Wikidata-based linking, targeted sentiment analysis, and consensus-based change-point detection. To enable principled cross-language comparisons and to connect to theories of domestication and cultural proximity, we derive domestication profiles together with a proximity salience ratio. Our analysis spans thematic, recurrent, and singular events. By integrating quantitative data with qualitative interpretation, we provide new insights into the dynamics of Swiss digital media and demonstrate the usefulness of triangulation in media studies. The findings reveal distinct temporal patterns and highlight how linguistic and cultural contexts influence reporting. Our approach offers a framework applicable to other multilingual or culturally diverse media environments, contributing to a deeper understanding of how news is shaped by linguistic and cultural factors. 2026-03-23T03:27:00Z ICWSM 2026 Bros Victor Dufraisse Evan Popescu Adrian Gatica-Perez Daniel http://arxiv.org/abs/2603.21507v1 Delineating hierarchical activity space from high-resolution urban mobility flows 2026-03-23T02:53:06Z Current studies on activity space are limited by the conceptualization of absolute physical space that fails to consider the heterogeneity of relational spaces reconstructed from spatial interactions of human movements between locations and falls short in incorporating the inherent hierarchical property of human mobility. Consequently, these approaches cannot faithfully reflect how people interact with urban spaces through travels. From the lens of relational space, this study proposes the new Hierarchical Activity Region Model (HARM) to derive the space and hierarchical properties of activity spaces perceived by various urban groups. We demonstrate the enhanced validity of our model on travel behavior in Manhattan, New York City, before, during, and after Hurricane Sandy on the basis of taxi data. Empirical results show that intra-urban travel retains clear hierarchical organization, even under disruption of a major weather event. Yet, travel undergoes a compression effect in travel hierarchies, characterized by fewer hierarchical levels and enlarged characteristic scales, followed by a rebound. Clustering the derived hierarchies reveals pronounced heterogeneity that stems from differences in population profiles; some groups sustain deeper structures or recover quickly, while others experience a persistent loss of levels. This study provides valuable insights into the functional hierarchies of urban mobility, which could inform more sustainable, resilient and equitable urban planning. The proposed methodological framework is generic for studying human mobility in broader contexts. 2026-03-23T02:53:06Z Zhicheng Deng Zhaoya Gong Jean-Claude Thill Elizabeth C. Delmelle http://arxiv.org/abs/2511.12920v4 Auditing Google's AI Overviews and Featured Snippets: A Case Study on Baby Care and Pregnancy 2026-03-23T01:15:32Z Google Search increasingly surfaces AI-generated content through features like AI Overviews (AIO) and Featured Snippets (FS), which users frequently rely on despite having no control over their presentation. Through a systematic algorithm audit of 1,508 real baby care and pregnancy-related queries, we evaluate the quality and consistency of these information displays. Our robust evaluation framework assesses multiple quality dimensions, including answer consistency, relevance, presence of medical safeguards, source categories, and sentiment alignment. Our results reveal concerning gaps in information consistency, with information in AIO and FS displayed on the same search result page being inconsistent with each other in 33% of cases. Despite high relevance scores, both features critically lack medical safeguards (present in just 11% of AIO and 7% of FS responses). While health and wellness websites dominate source categories for both, AIO and FS, FS also often link to commercial sources. These findings have important implications for public health information access and demonstrate the need for stronger quality controls in AI-mediated health information. Our methodology provides a transferable framework for auditing AI systems across high-stakes domains where information quality directly impacts user well-being. 2025-11-17T03:16:36Z 18 pages, 10 figures; to appear in AAAI ICWSM 2026 Desheng Hu Joachim Baumann Aleksandra Urman Elsa Lichtenegger Robin Forsberg Aniko Hannak Christo Wilson http://arxiv.org/abs/2603.21470v1 Empirical Evaluation of Link Deletion Methods for Limiting Information Diffusion on Social Media 2026-03-23T01:14:59Z Although beneficial information abounds on social media, the dissemination of harmful information such as so-called ``fake news'' has become a serious issue. Therefore, many researchers have devoted considerable effort to limiting the diffusion of harmful information. A promising approach to limiting diffusion of such information is link deletion methods in social networks. Link deletion methods have been shown to be effective in reducing the size of information diffusion cascades generated by synthetic models on a given social network. In this study, we evaluate the effectiveness of link deletion methods by using actual logs of retweet cascades, rather than by using synthetic diffusion models. Our results show that even after deleting 10\%--50\% of links from a social network, the size of cascades after link deletion is estimated to be only 50\% the original size under the optimistic estimation, which suggests that the effectiveness of the link deletion strategy for suppressing information diffusion is limited. Moreover, our results also show that there is a considerable number of cascades with many seed users, which renders link deletion methods inefficient. 2026-03-23T01:14:59Z Social Network Analysis and Mining Furukawa, S., Tsugawa, S. Empirical evaluation of link deletion methods for limiting information diffusion on social media. Soc. Netw. Anal. Min. 12, 169 (2022) Shiori Furukawa Sho Tsugawa 10.1007/s13278-022-00994-6 http://arxiv.org/abs/2603.21447v1 Deliberative multi-agent large language models improve clinical reasoning in ophthalmology 2026-03-22T23:36:48Z Large language models (LLMs) show potential for ophthalmic clinical reasoning, yet individual models risk introducing harm. We evaluated whether multi-agent LLM deliberative councils improve diagnostic performance and mitigate harm compared to individual LLMs. In a comparative cross-sectional study, we assessed 12 individual LLMs and three multi-agent councils on 100 ophthalmology clinical vignettes. Each council comprised four models assembled by type: proprietary flagship, proprietary fast, and open-source. Models independently answered a vignette, anonymously ranked one another's responses, and a designated chair synthesized all responses and peer reviews into a final answer. Councils consistently outperformed pooled individual models across all three tiers. Accuracy improved for proprietary flagship (95.0% vs 90.8%; risk difference [RD]: 4.25 [95% CI: 0.45, 8.05]), proprietary fast (96.0% vs 86.5%; RD: 9.50 [5.31, 13.59]), and open-source councils (91.0% vs 83.2%; RD: 7.75 [4.17, 11.33]). Harm rates declined for proprietary flagship (10.0% vs 22.5%; RD: -12.50 [-16.86, -8.14]), proprietary fast (16.0% vs 31.8%; RD: -15.75 [-21.49, -10.01]), and open-source councils (22.0% vs 38.5%; RD: -16.50 [-22.27, -10.73]). Coverage analysis revealed net positive gains for accuracy (ΔCoverage: 4.4-9.8 percentage points) and safety (ΔCoverage: 13.6-20.6), indicating councils recovered correct diagnoses and averted harm. Councils elevated correct diagnoses to higher rank positions; and produced more complete differentials and management plans (all P<.05). Harmful council responses showed reduced combined commission-and-omission errors and tended to be less severe. Structured deliberation via multi-agent LLM councils may enhance the reliability of LLM-assisted ophthalmic clinical reasoning. 2026-03-22T23:36:48Z Ehsan Misaghi Sean T Berkowitz Bing Yu Chen Qingyu Chen Renaud Duval Pearse A Keane Danny A Mammo Ariel Yuhan Ong Mertcan Sevgi Sumit Sharma Sunil K Srivastava Yih Chung Tham Fares Antaki http://arxiv.org/abs/2508.04668v7 Inequality in the Age of Pseudonymity 2026-03-22T23:26:47Z Inequality measures such as the Gini coefficient are used to inform and motivate policymaking, and are increasingly applied to digital platforms. We analyze how measures fare in pseudonymous settings that are common in the digital age. One key challenge of such environments is the ability of actors to create fake identities under fictitious false names, also known as ``Sybils.'' While some actors may do so to preserve their privacy, we show that this can hamper inequality measurements: it is impossible for measures satisfying the literature's canonical set of desired properties to assess the inequality of an economy that may harbor Sybils. We characterize the class of all Sybil-proof measures, and prove that they must satisfy relaxed version of the aforementioned properties. Furthermore, we show that the structure imposed restricts the ability to assess inequality at a fine-grained level. We then apply our results to prove that popular measures are not Sybil-proof, with the famous Gini coefficient being but one example out of many. Finally, we examine dynamics leading to the creation of Sybils in digital and traditional settings. 2025-08-06T17:36:01Z 41 pages, 1 figure Proceedings of the AAAI Conference on Artificial Intelligence, 40(20), 17293-17301 Aviv Yaish Nir Chemaya Dahlia Malkhi Lin William Cong 10.1609/aaai.v40i20.38781 http://arxiv.org/abs/2603.18203v2 How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence 2026-03-22T22:23:15Z The dominant paradigms of artificial intelligence were shaped by learning theories from psychology: behaviorism inspired reinforcement learning, cognitivism gave rise to deep learning and memory-augmented architectures, and constructivism influenced curriculum learning and compositional approaches. This paper argues that each AI paradigm inherited not only the strengths but the structural limitations of the psychological theory that inspired it. Reinforcement learning cannot account for the internal structure of knowledge, deep learning compresses representations into opaque parameter spaces resistant to principled update, and current integrative approaches lack a formal account of how new understanding is constructed from existing components. The paper further examines a cross-cultural divergence in the interpretation of rote learning, arguing that the Eastern conception of memorization as a structured, multi-phase precursor to understanding offers an underexploited bridge between psychological theory and AI methodology. Drawing on the systematicity debate and critique of Aizawa of both classicism and connectionism, this paper introduces ReSynth, a trimodular framework that separates reasoning (Intellect), purpose (Identity), and knowledge (Memory) as architecturally independent components. The paper traces the genealogy from psychological paradigm to AI method, diagnoses the inherited limitations at each stage, and argues that adaptability, the central challenge of artificial general intelligence requires a representational architecture in which systematic behavior is a necessary consequence rather than an accidental property. 2026-03-18T18:54:36Z preprint journal Alex Anvi Eponon Ildar Batyrshin Christian E. Maldonado-Sifuentes Grigori Sidorov http://arxiv.org/abs/2505.18351v2 Persona Alchemy: Designing, Evaluating, and Implementing Psychologically-Grounded LLM Agents for Diverse Stakeholder Representation 2026-03-22T19:34:22Z Despite advances in designing personas for Large Language Models (LLM), challenges remain in aligning them with human cognitive processes and representing diverse stakeholder perspectives. We introduce a Social Cognitive Theory (SCT) agent design framework for designing, evaluating, and implementing psychologically grounded LLMs with consistent behavior. Our framework operationalizes SCT through four personal factors (cognitive, motivational, biological, and affective) for designing, six quantifiable constructs for evaluating, and a graph database-backed architecture for implementing stakeholder personas. Experiments tested agents' responses to contradicting information of varying reliability. In the highly polarized renewable energy transition discourse, we design five diverse agents with distinct ideologies, roles, and stakes to examine stakeholder representation. The evaluation of these agents in contradictory scenarios occurs through comprehensive processes that implement the SCT. Results show consistent response patterns ($R^2$ range: $0.58-0.61$) and systematic temporal development of SCT construct effects. Principal component analysis identifies two dimensions explaining $73$% of variance, validating the theoretical structure. Our framework offers improved explainability and reproducibility compared to black-box approaches. This work contributes to ongoing efforts to improve diverse stakeholder representation while maintaining psychological consistency in LLM personas. 2025-05-23T20:18:14Z Accepted at ICLR 2026 Algorithmic Fairness Across Alignment Procedures and Agentic Systems (AFAA) Workshop Sola Kim Dongjune Chang Jieshu Wang http://arxiv.org/abs/2603.21359v1 Benchmarking Bengali Dialectal Bias: A Multi-Stage Framework Integrating RAG-Based Translation and Human-Augmented RLAIF 2026-03-22T18:44:57Z Large language models (LLMs) frequently exhibit performance biases against regional dialects of low-resource languages. However, frameworks to quantify these disparities remain scarce. We propose a two-phase framework to evaluate dialectal bias in LLM question-answering across nine Bengali dialects. First, we translate and gold-label standard Bengali questions into dialectal variants adopting a retrieval-augmented generation (RAG) pipeline to prepare 4,000 question sets. Since traditional translation quality evaluation metrics fail on unstandardized dialects, we evaluate fidelity using an LLM-as-a-judge, which human correlation confirms outperforms legacy metrics. Second, we benchmark 19 LLMs across these gold-labeled sets, running 68,395 RLAIF evaluations validated through multi-judge agreement and human fallback. Our findings reveal severe performance drops linked to linguistic divergence. For instance, responses to the highly divergent Chittagong dialect score 5.44/10, compared to 7.68/10 for Tangail. Furthermore, increased model scale does not consistently mitigate this bias. We contribute a validated translation quality evaluation method, a rigorous benchmark dataset, and a Critical Bias Sensitivity (CBS) metric for safety-critical applications. 2026-03-22T18:44:57Z 12 pages, 1 figure, 5 tables K. M. Jubair Sami Dipto Sumit Ariyan Hossain Farig Sadeque http://arxiv.org/abs/2603.21358v1 Personality-Driven Student Agent-Based Modeling in Mathematics Education: How Well Do Student Agents Align with Human Learners? 2026-03-22T18:43:11Z It is crucial to explore the impact of different teaching methods on student learning in educational research. However, real-person experiments face significant ethical constraints, and we cannot conduct repeated teaching experiments on the same student. LLM-based generative agents offer a promising avenue for simulating student behavior. Before large-scale experiments, a fundamental question must be addressed: are student agents truly credible, and can they faithfully simulate human learning? In this study, we built a Big Five Personality-based student agent model with a full pipeline of student-teacher interaction, self-study, and examination. To evaluate behavioral fidelity, we collected 13 empirical studies on Big Five traits and learning, and distilled them into 14 criteria. We found that the 71.4% of the student agents' behavior was aligned with human learners. 2026-03-22T18:43:11Z Short Paper Bushi Xiao Qian Shen http://arxiv.org/abs/2402.01749v3 Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models 2026-03-22T18:12:37Z The integration of machine learning techniques has become a cornerstone in the development of intelligent urban services, significantly contributing to the enhancement of urban efficiency, sustainability, and overall livability. Recent advancements in foundational models, such as ChatGPT, have introduced a paradigm shift within the fields of machine learning and artificial intelligence. These models, with their exceptional capacity for contextual comprehension, problem-solving, and task adaptability, present a transformative opportunity to reshape the future of smart cities and drive progress toward Urban General Intelligence (UGI). Despite increasing attention to Urban Foundation Models (UFMs), this rapidly evolving field faces critical challenges, including the lack of clear definitions, systematic reviews, and universalizable solutions. To address these issues, this paper first introduces the definition and concept of UFMs and highlights the distinctive challenges involved in their development. Furthermore, we present a data-centric taxonomy that classifies existing research on UFMs according to the various urban data modalities and types. In addition, we propose a prospective framework designed to facilitate the realization of versatile UFMs, aimed at overcoming the identified challenges and driving further progress in this field. Finally, this paper systematically summarizes and discusses existing benchmarks and datasets related to UFMs, and explores the wide-ranging applications of UFMs within urban contexts, illustrating their potential to significantly impact and transform urban systems. A comprehensive collection of relevant research papers and open-source resources have been collated and are continuously updated at: https://github.com/usail-hkust/Awesome-Urban-Foundation-Models. 2024-01-30T04:48:16Z Weijia Zhang Jindong Han Zhao Xu Hang Ni Tengfei Lyu Hao Liu Hui Xiong http://arxiv.org/abs/2507.12007v3 Predictable Drifts in Collective Cultural Attention: Evidence from Nation-Level Library Takeout Data 2026-03-22T17:38:11Z Predicting changes in consumer attention for cultural products, such as books, movies, and songs, is notoriously difficult. Past research suggests intrinsic limits for predicting consumer attention towards individual products. However, little is known about the limits for predicting shifts in collective attention. Here, we analyze five years of nationwide library loan data for almost 3 million individuals, comprising over 136 million loans of more than 750,000 unique titles. We find that culture, as measured by popularity distributions of loaned books, drifts continually from month to month at a near-constant rate, leading to a growing divergence over time, and that drift varies between book genres. By linking book loans to registry data, we investigate the influence of age, sex, educational level, and residential area type on cultural drift, finding heterogeneous effects. Our findings have important implications for market forecasting and algorithmic recommender systems, highlighting the need to account for drift dynamics. 2025-07-16T08:03:52Z Anders Weile Vedran Sekara http://arxiv.org/abs/2512.01166v2 Evaluating AI Companies' Frontier Safety Frameworks: Methodology and Results 2026-03-22T17:02:43Z Following the AI Seoul Summit in 2024, twelve AI companies published frontier AI safety frameworks (Frameworks) outlining their approaches to managing catastrophic risks from advanced AI systems. Emerging legislation increasingly treats these Frameworks as external accountability mechanisms, incorporating them into reporting requirements. But what do the Frameworks actually commit each company to do? This study assesses 12 Frameworks, using 65 weighted criteria, across four dimensions: risk identification, risk analysis & evaluation, risk treatment, and risk governance. Our criteria adapt established risk management principles from other high-risk industries (e.g. aviation, nuclear power) to the frontier AI context, following Campos et al. (2025). Overall scores range from 34% (Anthropic) to 8% (Cohere), with a median of 18%. Many aspects are missing or under-specified. These low scores may be natural given the nascency of AI risk management compared to industries with decades of practice. The current Frameworks are limited as accountability functions, with vague commitments that make it difficult to predict company decisions, assess whether planned responses are adequate, or determine whether commitments have been kept. Higher scores appear feasible within current constraints: a company adopting all leading practices currently adopted across their peers would score 51%, almost triple the median. 2025-12-01T00:55:18Z Lily Stelling Malcolm Murray Bruno Galizzi Max Schaffelder Siméon Campos Henry Papadatos http://arxiv.org/abs/2603.21288v1 Unpacking Interaction Profiles and Strategies in Human-AI Collaborative Problem Solving: A Cognitive Distribution and Regulation Perspective 2026-03-22T15:21:55Z This study adopts an integrated distributed cognition and regulation of learning perspective to examine the collaboration patterns and dynamics of human-AI collaboration when college students collaborating with AI for complex problem-solving. Through cluster analysis, three distinct collaborative problem-solving modes were identified in this study: Delegated Reasoning (DR), Concerted Interpretation (CI), and Delegated Elaboration (DE). This study found that the DR group achieved the highest task performance, significantly outperforming the CI group. Additionally, the semantic similarity between human and AI discourse was notably the highest in the DR group. In contrast, the CI group reported significantly greater use of self-regulation strategies. These findings uncover a critical tension between the efficiency of the distributed system and the depth of human learners regulatory engagement. Insights from this study offer valuable implications for the future design of AI-empowered educational tools and student-AI collaborative learning frameworks. 2026-03-22T15:21:55Z Zhanxin Hao Xiaobo Liu Jiaxin Fan Yun Long Jifan Yu Wenli Chen Yu Zhang http://arxiv.org/abs/2603.21280v1 WARBENCH: A Comprehensive Benchmark for Evaluating LLMs in Military Decision-Making 2026-03-22T15:13:29Z Large Language Models are increasingly being considered for deployment in safety-critical military applications. However, current benchmarks suffer from structural blindspots that systematically overestimate model capabilities in real-world tactical scenarios. Existing frameworks typically ignore strict legal constraints based on International Humanitarian Law (IHL), omit edge computing limitations, lack robustness testing for fog of war, and inadequately evaluate explicit reasoning. To address these vulnerabilities, we present WARBENCH, a comprehensive evaluation framework establishing a foundational tactical baseline alongside four distinct stress testing dimensions. Through a large scale empirical evaluation of nine leading models on 136 high-fidelity historical scenarios, we reveal severe structural flaws. First, baseline tactical reasoning systematically collapses under complex terrain and high force asymmetry. Second, while state of the art closed source models maintain functional compliance, edge-optimized small models expose extreme operational risks with legal violation rates approaching 70 percent. Furthermore, models experience catastrophic performance degradation under 4-bit quantization and systematic information loss. Conversely, explicit reasoning mechanisms serve as highly effective structural safeguards against inadvertent violations. Ultimately, these findings demonstrate that current models remain fundamentally unready for autonomous deployment in high stakes tactical environments. 2026-03-22T15:13:29Z Zongjie Li Chaozheng Wang Yuchong Xie Pingchuan Ma Shuai Wang