https://arxiv.org/api/pqhN8L71JceSViXq5rr6LHyAIzI 2026-06-14T01:22:42Z 28886 210 15 http://arxiv.org/abs/2606.04274v1 Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit 2026-06-02T22:58:59Z

As large language models (LLMs) become default tools for online information verification, an implicit assumption follows them: that scale and general capability are sufficient for nuanced classification of misinformation discourse. We test this assumption directly on 900 Reddit comments spanning three PolitiFact-verified misinformation claims (environment, health, immigration), labelled as belief (propagates the claim), fact-check (corrects it), or other. We compare nine models across three paradigms -- BART-MNLI, three Llama variants, three commercial frontier LLMs (Claude Haiku 4.5, Gemini Flash Lite 2.5, Claude Sonnet 4.6), and fine-tuned DistilBERT and RoBERTa -- under universal and topic-specific label schemas. The assumption does not hold. Fine-tuned RoBERTa reaches 0.62 macro-$F_1$ against a best zero-shot result of 0.50 (Claude Haiku 4.5), at a fraction of the per-query cost; the supervised advantage is concentrated on the belief class, the implicit, affective category every zero-shot model under-detects. Scaling does not help: Llama-3-8B matches Llama-3-70B, and Claude Sonnet 4.6 underperforms the smaller Haiku under generic labels, collapsing belief detection to 0.17 and refusing outright on a subset of comments flagged as sensitive. This is a safety-alignment artefact, not a capacity limit. Label schema and topic jointly shape zero-shot performance, with the same model varying by more than 0.13 macro-$F_1$ across topics under matched labels. In a verification context, where missing belief is the costlier error, task-specific fine-tuning remains the more reliable choice despite the proliferation of large generative models.

2026-06-02T22:58:59Z JooYoung Lee Lin Tian Angela Brillantes Adriana-Simona Mihăiţă Marian-Andrei Rizoiu http://arxiv.org/abs/2606.04254v1 Behavioral and Performance Indicators of Depression and Anxiety in Electronic Learning Systems 2026-06-02T22:08:07Z

This study investigates whether behavioral and performance indicators derived from a Moodle-based learning management system are associated with university students' depression and anxiety in two undergraduate Computer Engineering courses. Using a quantitative observational design, LMS event logs, academic records, and self-reported Beck Depression Inventory-II and Beck Anxiety Inventory scores from 97 students were integrated. A broad set of behavioral and performance indicators spanning temporal engagement, session structure, deadline-related behavior, page-refresh patterns, and LMS navigation was extracted from raw event logs and analyzed using descriptive statistics, independent-samples t-tests with Benjamini-Hochberg FDR correction, effect sizes, and Spearman correlations; inventory scores were confirmed invariant by sex and academic year. Several indicators were significantly associated with depression and anxiety. Higher depression was associated with shifted temporal activity patterns, longer session durations, and shorter homework submission lead times, while higher anxiety was associated with concentrated temporal engagement and session-based differences. These findings suggest that routine LMS data can provide meaningful behavioral signals related to student well-being and may support earlier educational awareness of students who experience mental-health-related strain. At the same time, such indicators should be interpreted as contextual and non-diagnostic markers rather than as substitutes for clinical assessment.

2026-06-02T22:08:07Z Arya VarastehNezhad Fattaneh Taghiyareh http://arxiv.org/abs/2606.04214v1 Plateau That Never Comes: When Efficiency Claims in Datacenters and AI Become Greenwashing 2026-06-02T21:01:40Z

Datacenter expansion under generative AI is increasingly framed as compatible with sustainability because of efficiency gains, cleaner electricity procurement, and improved facility design. Yet these claims often do not show that absolute electricity, water, material, waste, and community-facing burdens are falling. This Perspective addresses that evidentiary gap. Rather than asking whether efficiency gains are real, we ask when such gains are being enlarged into claims of system-wide sustainability to justify continued expansion. We develop a rebound-informed diagnostic framework for evaluating AI and datacenter sustainability narratives across five tests: metric, boundary, reinvestment, burden shifting, and governance. Applied to major AI industry sustainability reporting, the framework shows that firms largely justify continued expansion through efficiency improvements and clean-energy procurement, rather than by demonstrating reductions in absolute resource use. Applied to plateau claims in the literature, we show that many claims establish local or relative improvements while leaving energy rebound, lifecycle burdens, and enforceable limits unresolved. We argue that these sustainable-growth narratives begin to function as greenwashing when they use efficiency improvements to claim sustainability even as absolute energy, water, material, and public health burdens continue to increase. We conclude by positioning digital sufficiency as a burden-of-proof framework for governance: those advocating further datacenter expansion must show that it reduces, rather than merely redistributes or defers, absolute burdens across the full system.

2026-06-02T21:01:40Z Harshit Gujral Eshta Bhardwaj Dushani Perera Christoph Becker Steve Easterbrook http://arxiv.org/abs/2004.10846v5 Reducing the Filtering Effect in Public School Admissions: A Bias-aware Analysis for Targeted Interventions 2026-06-02T20:09:17Z

Problem definition: Traditionally, New York City's top 8 public schools have selected candidates solely based on their scores in the Specialized High School Admissions Test (SHSAT). These scores are known to be impacted by socioeconomic status of students and test preparation received in middle schools, leading to a massive filtering effect in the education pipeline. The classical mechanisms for assigning students to schools do not naturally address problems like school segregation and class diversity, which have worsened over the years. The scientific community, including policymakers, have reacted by incorporating group-specific quotas and proportionality constraints, with mixed results. The problem of finding effective and fair methods for broadening access to top-notch education is still unsolved. Methodology/results: We take an operations approach to the problem different from most established literature, with the goal of increasing opportunities for students with high economic needs. Using data from the Department of Education (DOE) in New York City, we show that there is a shift in the distribution of scores obtained by students that the DOE classifies as "disadvantaged" (following criteria mostly based on economic factors). We model this shift as a "bias" that results from an underestimation of the true potential of disadvantaged students. We analyze the impact this bias has on an assortative matching market. We show that centrally planned interventions can significantly reduce the impact of bias through scholarships or training, when they target the segment of disadvantaged students with average performance.

2020-04-22T20:50:31Z Yuri Faenza Swati Gupta Aapeli Vuorinen Xuan Zhang http://arxiv.org/abs/2010.04396v7 Dropping Standardized Testing for Admissions Trades Off Information and Access 2026-06-02T20:06:23Z

We study the role of information and access in capacity-constrained selection problems with fairness concerns. We develop a statistical discrimination framework, where each applicant has multiple features and is potentially strategic. The model formalizes the trade-off between the (potentially positive) informational role of a feature and its (negative) exclusionary nature when members of different social groups have unequal access to this feature. Our framework finds a natural application to policy debates on dropping standardized testing in admissions. Our primary takeaway is that the decision to drop a feature (such as test scores) cannot be made without the joint context of the information provided by other features and how the requirement affects the applicant pool composition. Dropping a feature may exacerbate disparities by decreasing the amount of information available for each applicant, especially those from non-traditional backgrounds. However, in the presence of access barriers to a feature, the interaction between the informational environment and the effect of access barriers on the applicant pool size becomes highly complex. Furthermore, we consider an extension with two schools and costly tests, where strategic students decide whether to take the test or not. Our theoretical results reveal that the students' test-taking behavior can be non-monotonic. We characterize the two-school policy equilibria and show that each school's optimal decision to drop the test critically depends on the other school's test policy. Finally, using calibrated simulations, we demonstrate the presence of practical instances where the decision to eliminate standardized testing improves or worsens all metrics.

2020-10-09T07:07:28Z Forthcoming in Management Science Nikhil Garg Hannah Li Faidra Monachou 10.1287/mnsc.2023.02573 http://arxiv.org/abs/2606.04155v1 SocialCoach: Personalized Social Skill Learning with RL-based Agentic Tutoring and Practice 2026-06-02T19:20:54Z

Social skills such as negotiation and leadership are crucial for personal and professional success in today's interconnected world. However, scalable and effective training remains a significant challenge due to the scarcity of expert coaching. In this paper, we introduce SocialCoach, a holistic LLM-powered agentic tutoring system for personalized social skill development at scale. First, SocialCoach automatically constructs a pedagogically-grounded, theory-to-practice knowledge corpus from diverse expert sources, leveraging a multi-agent pipeline. Second, to personalize the learning journey, it employs an adaptive practice scheduling module that follows a prescription-retrieval-adaptation process. To maximize the long-term learning experience while overcoming the cold-start problem, this policy is optimized within a learner simulation environment through reinforcement learning. Finally, SocialCoach integrates immersive, goal-driven practice, causality-driven proficiency assessment and knowledge-grounded, reflective tutoring to help address the knowing-doing gap. We deploy it in our product, EQoach, and conduct extensive experiments. The results show that SocialCoach improves simulated pathway quality and judge-rated tutoring quality over baseline approaches, while early user feedback indicates strong perceived engagement and usefulness. These findings suggest a practical architecture for personalized and gamified pedagogical platforms on soft skill learning.

2026-06-02T19:20:54Z Tianfu Wang Max Xiong Jianxun Lian Hongyuan Zhu Zhengyu Hu Yuxuan Lei Linxiao Gong Xiaofang Li Peiting Tsai Nicholas Jing Yuan Qi Zhang http://arxiv.org/abs/2606.04152v1 Thinking Through Signs: PEEL as a Semiotic Scaffolding for Epistemically Accountable AI-Enabled Research 2026-06-02T19:19:52Z

Large language models are reshaping research practice while quietly eroding researchers epistemic accountability. This commentary introduces PEEL - Protocols for Epistemically Engaged Literacy in AI, a working scaffolding that combines deterministic distant reading via Voyant Tools with LLM interpretation via Claude, grounded in Peircean semiotics and abductive reasoning. Applied to AI-generated condensations of three source texts, PEEL reveals systematic distortions in quantity, term frequency, and epistemic voice that are invisible without non-AI measurement -- and yields three design implications: deterministic instruments must accompany AI tools; fluency is not fidelity; epistemic authority must be designed in, not assumed.

2026-06-02T19:19:52Z 10 pages, 5 figuras Clarisse de Souza Gabriel Barbosa Simone Diniz Junqueira Barbosa Bárbara Betts Renato Cerqueira Juliana Jansen Ferreira http://arxiv.org/abs/2605.04235v2 Conflict-Aware Seat Assignment in Classroom Environments 2026-06-02T17:30:59Z

Classroom dynamics depend on various elements that influence teaching performance and learning activities. A key challenge is to determine the most effective seating plan, where students will seat in a specific classroom setting to achieve the best learning environment. This paper introduces the Student Seat Allocation Problem (SSAP) for strategically organizing student seating in traditional classrooms to minimize interpersonal conflicts. We propose a mathematical model and an Iterated Local Search (ILS) heuristic to solve the SSAP. Computational experiments demonstrated that ILS outperformed in more complex scenarios when compared to the results obtained by a commercial solver on the introduced mathematical model. ILS was particularly efficient in real and artificial instances that exhibited a higher number of conflicts.

2026-05-05T19:23:04Z This manuscript is currently under review Bruna Cristina Braga Charytitsch Mariá Cristina Vasconcelos Nascimento http://arxiv.org/abs/2606.03919v1 Forecasting Conceptual Diffusion in Science: The Case of Quantum Computing 2026-06-02T17:12:02Z

Understanding and anticipating scientific change requires models that distinguish between endogenous consolidation and exogenous diffusion of scientific concepts. Using the quantum computing subtree of concepts in OpenAlex, we construct a temporally resolved concept co-occurrence network and track each concept pair through its upstream citation lineage and downstream diffusion. We train LightGBM models on distributional and diversity-aware features to predict four outcomes: endogenous reinforcement, exogenous diffusion, their ratio, and diffusion entropy. After controlling for overall publication growth of the scientific body, endogenous reinforcement proves largely unpredictable in the primary quantum-computing benchmark. In contrast, exogenous diffusion and entropy are strongly predictable ($R^2$ up to $0.78à) and are driven by upstream heterogeneity, citation breadth, and distributional dispersion, as shown by SHAP analyses; replications on robotics, advanced materials, and neuro implants confirm that exogenous diffusion remains the top-ranked target across fields ($R^2_test \sim 0.60-0.87$), while endogenous predictability rises markedly in neuro implants (R^2_test = 0.83), indicating that the quantum-computing asymmetry does not generalise uniformly. Case studies reveal that sharp entropy increases coincide with the opening of new conceptual frontiers, while entropy collapses signal technological convergence or paradigm displacement. These results demonstrate that conceptual diffusion is governed by stable structural regularities embedded in semantic and citation environments. By identifying early diversity-based signals of cross-domain uptake, the approach provides a scalable foundation for anticipatory scientometrics, technology foresight, and innovation-oriented policy analysis in rapidly evolving research fields.

2026-06-02T17:12:02Z 19 pages, 5 figures, 6 tables. Code and manuscript sources: https://github.com/wazaahhh/breakthroughs-diffusion . An earlier version was presented at the Global Tech Mining Conference (GTM) 2026 (submission #117) Thomas Maillart Thibaut Chataing David Dosu Paul Bagourd Julian Jang-Jaccard Alain Mermoud http://arxiv.org/abs/2606.03864v1 Explainable Forecasting of Scientific Breakthroughs from Concept Network Dynamics 2026-06-02T16:38:41Z

We introduce an explainable machine-learning approach that forecasts the structural precursors of scientific breakthroughs -- the emergence and intensification of links between research concepts -- by modelling how OpenAlex concept networks evolve over time. Using 59 semantic and topological features, a two-stage LightGBM model jointly predicts the formation and the future weight of concept pairs, adding a regression stage that quantifies expected intensity to prior link-existence forecasts. Relative to the state of the art, the approach improves accuracy and explainability at once: comparative validation across four technology and biomedical domains yields ROC-AUC in [0.954, 0.967] at all horizons without re-tuning, exceeding the roughly 0.90 of prior models, while every forecast rests on structural, auditable features rather than opaque embeddings. Classification performance is high (AUC about 0.95) and regression remains stable (RMSLE 0.45 to 0.6 over one to five years). Feature attribution shows that structural factors -- particularly Adamic-Adar similarity and degree-based Hadamard measures -- consistently drive accuracy, suggesting that breakthrough-relevant recombinations emerge in tightly connected sub-networks. Two expert-anchored cases, quantum annealing and AI-enabled quantum architectures, show the model surfacing technological convergence consistent with expert expectations. We then outline a three-layer decision architecture -- detection, expert translation, institutional integration -- that turns these forecasts into evidence-based research strategy and policy, anchored in open data and explainable features.

2026-06-02T16:38:41Z 18 pages, 10 figures, 4 tables. An earlier version was presented at Global Tech Mining Conference 2026. Code and data: https://github.com/wazaahhh/breakthroughs-forecasting Thomas Maillart Thibaut Chataing Ntorina Antoni David Dosu Paul Bagourd Julian Jang-Jaccard Alain Mermoud http://arxiv.org/abs/2606.04075v1 Large Language Models Hack Rewards, and Society 2026-06-02T16:29:48Z

Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs) to learn from rewards. We observe that societal regulations are structurally similar to reward functions. They define measurable outcomes, thresholds, and exceptions, while often leaving institutional intent only partially specified. We hypothesise that the RL training process may exploit these gaps and therefore ask whether models' well-known tendency to hack reward functions during RL can scale into a more consequential failure mode named societal hacking: discovering loopholes in the rules society runs on. To study this phenomenon, we introduce SocioHack, a sandbox of 72 societal environments, and find that within these environments, reward hacking naturally emerges and leads to regulatory loophole discovery. Models learn to hack the social rules and generate strategies that remain technically compliant while defeating regulatory intent, and current LLM safeguards provide only limited mitigation. Therefore, collecting in-the-wild feedback for model training requires greater caution, and we need a next-generation post-training paradigm for safely iterating LLMs in real society.=

2026-06-02T16:29:48Z 14 pages, 9 figures, 7 tables Wei Liu Xinyi Mou Hanqi Yan Zhongyu Wei Yulan He http://arxiv.org/abs/2606.03823v1 Calibrating Urban Traffic Simulation from Sparse Road Observations via Genetic Optimization 2026-06-02T16:04:01Z

Urban traffic simulation is a critical tool for infrastructure planning, including the placement of electric vehicle charging stations. However, realistic traffic simulation across many cities is hindered by two fundamental data limitations: detailed real-world traffic measurements are available for only a small fraction of road segments in most cities, and employment distribution data critical for modeling commuter traffic is rarely available at the resolution needed for simulation. This paper presents a genetic algorithm-based framework that directly addresses both limitations, calibrating urban traffic simulations from sparse road observations without requiring detailed job location data. Using the SUMO traffic simulation platform for Greensboro, North Carolina, our approach optimizes job distributions and gate-traffic parameters to align simulated traffic with a small sample of roads with known traffic-flow rates. We demonstrate that this approach produces simulated traffic that correlates well with real-world measurements, generalizes to road segments withheld from training, and produces job distributions that show promising qualitative agreement with census employment data despite never directly training on that employment data. This work demonstrates that realistic urban traffic simulation can be achieved from minimal real-world observations, offering a scalable and data-light approach to simulation calibration that reduces the barrier to deploying traffic models across diverse cities.

2026-06-02T16:04:01Z Hunter Sawyer Jesse Roberts Simon Matei http://arxiv.org/abs/2603.26791v3 Crystal: Characterizing Relative Impact of Scholarly Publications 2026-06-02T15:20:04Z

Assessing a cited paper's impact is typically done by analyzing its citation context in isolation within the citing paper. While this focuses on the most directly relevant text, it prevents relative comparisons across all the works a paper cites. We propose Crystal, which instead jointly ranks all cited papers within a citing paper using large language models (LLMs). To mitigate LLMs' positional bias, we rank each list three times in a randomized order and aggregate the impact labels through majority voting. This joint approach leverages the full citation context, rather than evaluating citations independently, to more reliably distinguish impactful references. Crystal outperforms a prior state-of-the-art impact classifier by +9.5% accuracy and +8.3% F1 on a dataset of human-annotated citations. Crystal further gains efficiency through fewer LLM calls and outperforms prior baselines using an open-weight model, enabling scalable, cost-effective citation impact analysis. In a case study of ACL Test-of-Time award-winning papers, we find that Crystal's impact characterizations align closely with long-term scientific recognition. We release Crystal-Bank, a 46.8k-paper dataset with rankings and impact labels, along with code.

2026-03-25T16:42:30Z Hannah Collison Benjamin Van Durme Daniel Khashabi http://arxiv.org/abs/2601.02380v5 LLMs, Reasoning and Plagiarism 2026-06-02T15:01:49Z

Recent reports claim that Large Language Models (LLMs) derive new science and exhibit human-level general intelligence. Such claims are entangled with two different narratives about what LLMs do: one in which they are an engine of synthesis that genuinely reasons to new knowledge, and one in which they retrieve and re-emit the work of others without attribution. In the scientific setting these are best understood as a contrast between \emph{reasoning} and \emph{plagiarism}. Finding where the truth lies between these two narratives is very challenging, as central components of the model -- the training data and the interaction transcript -- remain opaque. Thus claims of LLM reasoning do not satisfy Popper's refutability principle. We propose guidelines for transparency and reproducibility that will allow reasoning claims to be studied using the scientific method. The dominance of the reasoning narrative, we suggest, is in practice encouraging plagiarism in the scientific literature; we discuss what might be done about it.

2025-12-18T14:42:03Z The authors explicitly reserve all rights in this work. No permission is granted for the reproduction, storage, or use of this document for the purpose of training artificial intelligence systems or for text and data mining (TDM), including but not limited to the generation of embeddings, summaries, or synthetic derivatives. Claude and Gemini were used in writing this manuscript Elchanan Mossel http://arxiv.org/abs/2606.03704v1 Dynamic Objective Selection with Safeguards and LLM Oversight for Financial Decision-Making 2026-06-02T14:22:07Z

Financial decision-making tasks such as stock recommendation and portfolio allocation typically estimate future return and risk and then select trades or allocations for an investor, and the chosen optimization objective often determines realized performance. However, because market conditions evolve over time, a fixed objective can be suboptimal across regimes, while regime-switching pipelines that rely on latent regime estimates can be noisy or delayed and frequent switching can increase turnover and operational instability. In this paper, we propose DOSS (Dynamic Objective Selection with Safeguards), a learning-based selector that directly chooses the decision-relevant objective function at each time point from interpretable statistical summaries of recent returns, selecting among a small set of candidates (e.g., return-seeking, loss-averse, and risk-adjusted) without introducing intermediate regime variables. DOSS formulates objective selection as a classification problem over objectives and performs sequential updates with a rolling window to make forward-looking selections without temporal leakage, while also outputting a confidence score for each proposal. To mitigate misselection and excessive switching in deployment, DOSS applies confidence-aware gating with a fail-safe that overrides low-confidence proposals to a conservative default and enforces explicit controls tied to switching frequency. We further integrate governance by positioning a Large Language Model (LLM) as an oversight component rather than a generator of new objectives: the LLM is restricted to accept a proposed objective or override it to a predefined safe default, with deterministic rule-based constraints triggering overrides when needed.

2026-06-02T14:22:07Z Accpeted to The 2nd Workskop on Advances in Financial AI Workshop: Towards Agentic and Responsible Systems at ICLR 2026 Keigo Sakurai Takahiro Ogawa Miki Haseyama Anjyu Anan Kei Nakagawa