https://arxiv.org/api/ssOoTPkhR0yCHtRFSR635V4v4hE 2026-06-18T13:14:26Z 28983 330 15 http://arxiv.org/abs/2603.10829v2 Spatially conditioned dynamics between population and built form 2026-06-03T09:00:40Z

Understanding the relationship between population and the built environment is essential for addressing socio-spatial inequalities. While researchers have long theorized these dynamics, empirical analyses remain limited. This study proposes a spatially explicit framework to quantify the relationship between population and the built environment at the scale of local census tracts in Czechia. The approach integrates a fine-grained classification of built form with a comprehensive set of socio-demographic indicators. The method compares global and geographically weighted classification models to assess the overall strength and spatial variability of the associations between population structure and built form. The results of the study show that population characteristics exhibit linear, spatially conditioned relationships with built form, emphasizing that spatial heterogeneity must be accounted for when assessing these relationships. The analysis also reveals that some built form types are more socially selective than others, underscoring the importance of built form in reproducing social-spatial inequalities.

2026-03-11T14:36:37Z Anna Brazdova Martin Fleischmann http://arxiv.org/abs/2606.04617v1 When Firms Learn to Game the Rules 2026-06-03T08:53:36Z

Rules-as-Code promises more testable legal obligations, but it also changes what regulated firms can learn. Existing work mostly emphasizes implementation gains; the strategic gap is whether machine-readable rules make boundary search cheaper. I study that gap with a synthetic agent-based reinforcement-learning simulation that separates actual conduct near a legal threshold from proximity in the computable enforcement signal. Across 150 seed-level scenario runs, 378 common-random-number computability-sweep runs, 288 Latin-hypercube global-design runs, and a 2,880,000-row firm-period panel, computable static rules raise conduct boundary mass relative to ambiguous static rules (0.411 versus 0.367) and raise signal boundary mass more sharply (0.403 versus 0.281). Ordinary adaptive updates lower consumer harm (0.202 to 0.194) but do not reliably reduce boundary search. A budget-neutral anti-gaming design reduces conduct boundary mass by 0.032 and consumer harm by 0.025 relative to computable static rules. These are mechanism-oriented synthetic results, not estimates of real firm behavior in a jurisdiction or industry. The contribution is an estimand distinction, an inspectable ABM/RL mechanism, and a reproducible artifact showing that transparent behavioral assumptions are sufficient to generate gaming-like boundary dynamics without implying that computable regulation is inherently undesirable.

2026-06-03T08:53:36Z Includes synthetic simulation data, source code, figures, and reproducibility materials Xufeng He http://arxiv.org/abs/2606.04592v1 Synthetic Personalities: How Well Can LLMs Mimic Individual Respondents Using Socio-Economic Microdata? 2026-06-03T08:30:03Z

LLM-based digital twins promise to scale and accelerate market research, but most published twins are either coarse persona bots conditioned on a few demographic questions or detailed individual-level twins built on purpose-collected surveys and interview transcripts. Neither setup speaks to the operationally most relevant case for marketing practice: building detailed individual twins from the pre-existing heterogeneous panel data that firms already accumulate through CRM systems, loyalty programs, and repeat surveys. We construct detailed individual-level twins from the German Socio-Economic Panel (SOEP) and evaluate them across a $3 \times 5 \times 2 \times 2$ construction-method grid that covers three open-weights LLMs, five cumulative information depths ranked by normalized Shannon entropy, two embedding methods, and two reasoning modes, scoring over 2.1 million twin responses on 500 participants and 183 held-out questions. Twin quality rises with information depth but with diminishing returns past the 75 percent entropy quartile, which acts as a cost-efficient Pareto point relative to the best-performing 100 percent cells. Switching the embedding from a narrative persona summary to a raw dialog history of past responses raises hold-out accuracy in every model-by-reasoning cell at the 100 percent depth, while an explicit thinking mode raises rank-order correlation without moving accuracy. Best-cell accuracy reaches 78.8 percent and Fisher-$z$ correlation reaches $r = 0.590$ on the SOEP held-out evaluation set. The findings suggest that twin-based market research is no longer gated by data design, but by item volume, model selection, and a small set of construction-level decisions that this paper now maps.

2026-06-03T08:30:03Z Leonard Kinzinger Jochen Hartmann http://arxiv.org/abs/2606.04563v1 Addressing Negative Commons Governance with Positive Commons Principles 2026-06-03T07:52:38Z

Computing is accompanied by both positive and negative commons throughout its lifecycle of creation, execution, and disposal. We examine two governance systems situated within this lifecycle -- global e-waste trade and the Linux kernel community -- to evaluate whether Elinor Ostrom's eight design principles for common-pool resource (CPR) governance extend to the management of negative common-pool resources (NCPRs). Unlike traditional CPRs where communities work to preserve a finite resource (i.e. clean water), NCPR governance seeks to collectively reduce a negative shared stock. In our two cases, e-waste governance aims to reduce the volume of mismanaged waste and illicit trade, while the Linux community aims to reduce the number of error-prone or malicious contributions that reach the main branch and, in turn, extend the life of existing hardware. Through qualitative analysis of primary sources from each domain, we find that the same eight principles by Ostrom that aid positive commons governance tend to appear in successful negative commons governance systems. We argue that future NCPR governance design should prioritize Ostrom's principles, particularly clearly defined boundaries and well-functioning nested structures.

2026-06-03T07:52:38Z Paper in Proceedings of LIMITS 2026: 12th Workshop on Computing within Limits, 2026-06-23-25, Online Boyang Zhou Oleg Ianchenko http://arxiv.org/abs/2606.04543v1 Agentic AI and Pedagogical Best Practice: The Tension Between Automation and Learning 2026-06-03T07:26:23Z

Artificial intelligence in education is evolving from passive chatbots to proactive AI agents capable of initiation and goal-directed interactions. While offering opportunities for personalised learning, this shift risks undermining learner agency and cognitive effort. This paper reviews six pedagogical principles-prior knowledge activation, collaborative learning, problem-based learning, formative assessment, scaffolding, and metacognition-through the lens of agentic AI. We discuss the tension between automation and learning, proposing design recommendations that prioritise intentional friction, dynamic scaffolding, human-in-the-loop oversight, and considered AI utilisation to ensure AI supports rather than supplants human learning.

2026-06-03T07:26:23Z Accepted for publication at AIED 2026 - Festival of Learning HAI-Agency Workshop on Orchestrating Human and AI Agency for Proactive and Reflective Learning Steve Woollaston Brendan Flanagan Isanka Wijerathne Hiroaki Ogata http://arxiv.org/abs/2605.28829v2 Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning 2026-06-03T07:25:11Z

Competitive STEM examinations such as JEE and NEET require multi-step symbolic reasoning, precise numerical computation, and deep conceptual understanding across physics, chemistry, and mathematics. Recent large language models perform strongly on common reasoning benchmarks, yet they remain difficult to deploy at scale, where millions of student doubts demand domain-specific, consistently structured problem solving. We introduce Aryabhata 2, a reasoning-focused language model for competitive STEM examinations, trained via reinforcement-learning post-training. Using PhysicsWallah's internal question banks, we construct a high-quality training curriculum and post-train GPT-OSS-20B through reinforcement learning with verifiable rewards. Training combines prolonged reinforcement learning with broadened exploration via progressively larger rollout group sizes. We evaluate Aryabhata 2 on competitive examination benchmarks, including JEE Main, JEE Advanced, and NEET, as well as out-of-distribution reasoning datasets such as AIME, HMMT, MMLU-Pro, MMLU-Redux 2.0, and GPQA. Results show that Aryabhata 2 outperforms its base model GPT-OSS-20B on competitive STEM reasoning while requiring substantially fewer output tokens (up to 64\% fewer).

2026-04-10T06:53:27Z Ritvik Rastogi Vishal Singh Tejas Chaudhari Sandeep Varma http://arxiv.org/abs/2512.03296v2 Associating Healthcare Teamwork with Patient Outcomes for Predictive Analysis 2026-06-03T06:14:44Z

Cancer treatment outcomes are influenced not only by clinical and demographic factors but also by the collaboration of healthcare teams. However, prior work has largely overlooked the potential role of human collaboration in shaping patient survival. This paper presents an applied AI approach to uncovering the impact of healthcare professionals' (HCPs) collaboration, captured through electronic health record (EHR) systems, on cancer patient outcomes. We model EHR-mediated HCP interactions as networks and apply machine learning techniques to detect predictive signals of patient survival embedded in these collaborations. Our models are cross validated to ensure generalizability, and we explain the predictions by identifying key network traits associated with improved outcomes. Importantly, clinical experts and literature validate the relevance of the identified crucial collaboration traits, reinforcing their potential for real-world applications. This work contributes to a practical workflow for leveraging digital traces of collaboration and AI to assess and improve team-based healthcare. The approach is potentially transferable to other domains involving complex collaboration and offers actionable insights to support data-informed interventions in healthcare delivery.

2025-12-02T23:16:03Z Hsiao-Ying Lu Kwan-Liu Ma http://arxiv.org/abs/2606.04490v1 Prioritization of Risks from Artificial Intelligence: A Delphi Study of 272 International Experts 2026-06-03T06:14:41Z

Artificial intelligence poses many risks, ranging from familiar present-day harms to unprecedented and potentially catastrophic ones. Effective risk management requires prioritization: we must understand which risks are most severe, who is most vulnerable, and who is most responsible for addressing them. We report results from a three-round Delphi study conducted late 2025 with 272 international AI experts. Experts rated 24 AI risks on harm probability and severity, sector and actor vulnerability, actor responsibility, and overall concern. Experts estimated the five most severe harms in the next 5 years were likely to come from dangerous capabilities, competitive dynamics, weapons & cyberattacks (including CBRNE), power centralization, and false information. In a business-as-usual scenario, experts judged 18 of 24 risks as having a more than 10% probability of catastrophic outcomes (e.g., more than 1 million deaths or more than USD 100B in financial loss) in the next 5 years (2025-2030). In a scenario where pragmatic mitigations are implemented, experts still judged five risks as having a more than 10% probability of catastrophic outcomes: dangerous capabilities, weapons & cyberattacks, environmental harm, inequality & unemployment, and power centralization. All 24 risks were judged as being more than 5% likely to cause catastrophic outcomes. AI users and the general public were judged the most vulnerable to these risks, but experts assigned the highest responsibility for addressing them to general-purpose AI developers and governance actors (including governments, regulators, and standards bodies). Across most risks, experts identified information, finance, and national security as the most vulnerable sectors. These findings can guide AI risk prioritization and clarify expert expectations about who should bear responsibility for mitigation.

2026-06-03T06:14:41Z Access data at https://osf.io/pj2qr Alexander K. Saeri Jess Graham Michael Noetel Peter Slattery Dennis Ah-king Edla Aittokallio Ibitola Akindehin Abbas Al Mahdi Elie Alhajjar Rafael Andersson Lipcsey Gary Ang Catherine M. Azam Amos Azaria Rishal Balkissoon Isabel Barberá Claudio Bareato Jonathan Barry Michael Basehart Andrew M. Bean Danny Belitz Samantha Augusta Bennett Kayla Blomquist Damian Borstel Ben Bucknall Tomas Bueno Momcilovic Aurelie Bugeau Nicholas Caputo Stephen Casper Gulam Chagani Ze Shen Chin Jiyeon Cho Jay Chooi Joel N. Christoph Dmytro Chumachenko Kieran Conboy Elizabeth M. Daly Tom David Paul de Font-Reaulx Antonio De Santis Fabrizio Degni Christopher W. DiCarlo Yawen Duan Janet Egan Ian W. Eisenberg Sherif M. Elsafty Adam Ennamli Mark Esposito Nicola Fabiano Gallo Fall Neil R. Fernandes Pip Foweraker Chiara Gallese Sandra Galletti Andrew Gamino-Cheong Rokas Gipiškis Gwyn Glasser Delaram Golpayegani Jeff Grayson Hans Gundlach Josiah Hagen Alexander Hagenah Amelia S. Haines The Anh Han Yixiong Hao Kasii Harris Tianxing He Koen Holtman Giorgos Iacovides Kenneth L. Ingham Krystal Jackson Adam Jones Himanshu Joshi Brian Judge Arturs Kanepajs Shreya Kapoor Win Myat Nwe Khine Aidan Kierans Aleksandra Korolova Markus Krebsz Nicholas Kruus Joe Kwon Valeria Lazzaroli Ray X. Lee Evelina Leivada Stephan Lewandowsky Michael B. Li Xiaojian Li Geunsik Lim Henrique Lisakowski Fabio Lonardoni Todd C. Lowe Jackson G. Lu Alexander Lyzhov Nada Madkour Parv Mahajan David Manheim Kareem Mathias Claudio Mayrink Verdun Sean McGregor Scott McLean Matthew J. McMahon Minas Megalokonomos Nicolas Moës Fernando Mourao Yaroslav Mukhin Malcolm Murray Simon Mylius Neeraj Nagpal Koichi Nakada Anna Neumann Jessica Newman Kwan Yee Ng Minh N. Nguyen Quynh Phuong Nguyen Seán S. Ó hÉigeartaigh Daria Onitiu Kelly Onu Oscar Oviedo-Trespalacios Ugur Ozer Chanwoo Park M. Alejandra Parra-Orlandoni Patricia Paskov Anna M. Pastwa Burak Piskin Jacob Pratt Claudiu A. Predincea Marjana Prifti Skenduli Kenneth Priore Mukunda Madhab Pujari Zhenting Qi Preethi Raghunathan Robi Rahman Deepika Raman Max Reddel Jyoti Ruparel Emma B. Ruttkamp-Bloem Tiffany Saade Greg Sadler Said Saillant Paul M. Salmon Ayrton San Joaquin Lama Saouma Maziya Sarangpurwala Supheakmungkol Sarin Daniel S. Schiff Anna D. Schilling Chris Schmitz Reva Schwartz Abeer Sharma Tianhao Shen Kehan Sheng Maury D. Shenk Eli Sherman Chandler Smith Julie M. Smith Estevenson Solano Oliver Sourbut Madhulika Srikumar Ryan Stendall Jakob Stenseke Michael Stern Joshua Sternfeld Nikko Stevens Ilia Sucholutsky Yuanyuan Sun Mariami Tkeshelashvili Cristian Trout Brian Tse Nikolaos Tsinganos Michelle Vaccaro Anthony R. Valiaveedu Ramakrishnan Veeramony Jeremy Verdo Pulkit Verma Andrea Luigi Vitali Jinge Wang JR Washebek Yonah Welker George F. Westerman James Williams Tristan Williams Rongwu Xu Mick Yang Xuemeng Yang Sander Zeijlemaker Jingyu Zhang Marta Ziosi Neil Thompson http://arxiv.org/abs/2606.04450v1 Listening to the Workforce: Measuring Construction Worker Safety Attitudes from Social Media Discourse Using LLMs 2026-06-03T04:54:40Z

Worker safety attitudes are key determinants of whether protective practices are applied or bypassed on construction sites. Yet measuring them at scale has remained out of reach. Safety attitudes are multidimensional, vary across topics, and surface most candidly in workers' own conversations. This study created and validated the Construction Safety Attitude Framework (CSAF), which integrates two components: a theory-grounded structure that characterizes safety attitudes along eight dimensions, and an operational codebook for measuring them in worker naturalistic discourse. Applying CSAF to 250 posts and comments from the r/Construction community on Reddit, trained coders reached strong agreement (Krippendorff's α = 0.85). Pairwise lift and conditional probability confirmed that the eight dimensions are related yet distinct. To apply the framework across large volumes of discourse, CSAF was operationalized through a large language model (LLM) classifier. On 450 r/Construction contributions, the classifier reproduced expert human coding (Cohen's \k{appa} = 0.90, precision = 0.98, recall = 0.98), and on 400 contributions from r/Roofing it retained that accuracy after transfer to a different trade community (\k{appa} = 0.89, precision = 0.98, recall = 0.97). A proof-of-value case study then applied the validated classifier to 10,346 contributions from r/Roofing, demonstrating that CSAF can distinguish multidimensional attitudes by safety topic, track how they shift over time, and trace the reasoning behind unfavorable ones. The study therefore provides a theoretically grounded, empirically vetted instrument for examining safety attitudes, offering a basis for targeted interventions that address the attitudes underlying unsafe practices.

2026-06-03T04:54:40Z Farouq Sammour Yuxin Zhang Zhenyu Zhang http://arxiv.org/abs/2606.04274v1 Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit 2026-06-02T22:58:59Z

As large language models (LLMs) become default tools for online information verification, an implicit assumption follows them: that scale and general capability are sufficient for nuanced classification of misinformation discourse. We test this assumption directly on 900 Reddit comments spanning three PolitiFact-verified misinformation claims (environment, health, immigration), labelled as belief (propagates the claim), fact-check (corrects it), or other. We compare nine models across three paradigms -- BART-MNLI, three Llama variants, three commercial frontier LLMs (Claude Haiku 4.5, Gemini Flash Lite 2.5, Claude Sonnet 4.6), and fine-tuned DistilBERT and RoBERTa -- under universal and topic-specific label schemas. The assumption does not hold. Fine-tuned RoBERTa reaches 0.62 macro-$F_1$ against a best zero-shot result of 0.50 (Claude Haiku 4.5), at a fraction of the per-query cost; the supervised advantage is concentrated on the belief class, the implicit, affective category every zero-shot model under-detects. Scaling does not help: Llama-3-8B matches Llama-3-70B, and Claude Sonnet 4.6 underperforms the smaller Haiku under generic labels, collapsing belief detection to 0.17 and refusing outright on a subset of comments flagged as sensitive. This is a safety-alignment artefact, not a capacity limit. Label schema and topic jointly shape zero-shot performance, with the same model varying by more than 0.13 macro-$F_1$ across topics under matched labels. In a verification context, where missing belief is the costlier error, task-specific fine-tuning remains the more reliable choice despite the proliferation of large generative models.

2026-06-02T22:58:59Z JooYoung Lee Lin Tian Angela Brillantes Adriana-Simona Mihăiţă Marian-Andrei Rizoiu http://arxiv.org/abs/2606.04254v1 Behavioral and Performance Indicators of Depression and Anxiety in Electronic Learning Systems 2026-06-02T22:08:07Z

This study investigates whether behavioral and performance indicators derived from a Moodle-based learning management system are associated with university students' depression and anxiety in two undergraduate Computer Engineering courses. Using a quantitative observational design, LMS event logs, academic records, and self-reported Beck Depression Inventory-II and Beck Anxiety Inventory scores from 97 students were integrated. A broad set of behavioral and performance indicators spanning temporal engagement, session structure, deadline-related behavior, page-refresh patterns, and LMS navigation was extracted from raw event logs and analyzed using descriptive statistics, independent-samples t-tests with Benjamini-Hochberg FDR correction, effect sizes, and Spearman correlations; inventory scores were confirmed invariant by sex and academic year. Several indicators were significantly associated with depression and anxiety. Higher depression was associated with shifted temporal activity patterns, longer session durations, and shorter homework submission lead times, while higher anxiety was associated with concentrated temporal engagement and session-based differences. These findings suggest that routine LMS data can provide meaningful behavioral signals related to student well-being and may support earlier educational awareness of students who experience mental-health-related strain. At the same time, such indicators should be interpreted as contextual and non-diagnostic markers rather than as substitutes for clinical assessment.

2026-06-02T22:08:07Z Arya VarastehNezhad Fattaneh Taghiyareh http://arxiv.org/abs/2606.04214v1 Plateau That Never Comes: When Efficiency Claims in Datacenters and AI Become Greenwashing 2026-06-02T21:01:40Z

Datacenter expansion under generative AI is increasingly framed as compatible with sustainability because of efficiency gains, cleaner electricity procurement, and improved facility design. Yet these claims often do not show that absolute electricity, water, material, waste, and community-facing burdens are falling. This Perspective addresses that evidentiary gap. Rather than asking whether efficiency gains are real, we ask when such gains are being enlarged into claims of system-wide sustainability to justify continued expansion. We develop a rebound-informed diagnostic framework for evaluating AI and datacenter sustainability narratives across five tests: metric, boundary, reinvestment, burden shifting, and governance. Applied to major AI industry sustainability reporting, the framework shows that firms largely justify continued expansion through efficiency improvements and clean-energy procurement, rather than by demonstrating reductions in absolute resource use. Applied to plateau claims in the literature, we show that many claims establish local or relative improvements while leaving energy rebound, lifecycle burdens, and enforceable limits unresolved. We argue that these sustainable-growth narratives begin to function as greenwashing when they use efficiency improvements to claim sustainability even as absolute energy, water, material, and public health burdens continue to increase. We conclude by positioning digital sufficiency as a burden-of-proof framework for governance: those advocating further datacenter expansion must show that it reduces, rather than merely redistributes or defers, absolute burdens across the full system.

2026-06-02T21:01:40Z Harshit Gujral Eshta Bhardwaj Dushani Perera Christoph Becker Steve Easterbrook http://arxiv.org/abs/2004.10846v5 Reducing the Filtering Effect in Public School Admissions: A Bias-aware Analysis for Targeted Interventions 2026-06-02T20:09:17Z

Problem definition: Traditionally, New York City's top 8 public schools have selected candidates solely based on their scores in the Specialized High School Admissions Test (SHSAT). These scores are known to be impacted by socioeconomic status of students and test preparation received in middle schools, leading to a massive filtering effect in the education pipeline. The classical mechanisms for assigning students to schools do not naturally address problems like school segregation and class diversity, which have worsened over the years. The scientific community, including policymakers, have reacted by incorporating group-specific quotas and proportionality constraints, with mixed results. The problem of finding effective and fair methods for broadening access to top-notch education is still unsolved. Methodology/results: We take an operations approach to the problem different from most established literature, with the goal of increasing opportunities for students with high economic needs. Using data from the Department of Education (DOE) in New York City, we show that there is a shift in the distribution of scores obtained by students that the DOE classifies as "disadvantaged" (following criteria mostly based on economic factors). We model this shift as a "bias" that results from an underestimation of the true potential of disadvantaged students. We analyze the impact this bias has on an assortative matching market. We show that centrally planned interventions can significantly reduce the impact of bias through scholarships or training, when they target the segment of disadvantaged students with average performance.

2020-04-22T20:50:31Z Yuri Faenza Swati Gupta Aapeli Vuorinen Xuan Zhang http://arxiv.org/abs/2010.04396v7 Dropping Standardized Testing for Admissions Trades Off Information and Access 2026-06-02T20:06:23Z

We study the role of information and access in capacity-constrained selection problems with fairness concerns. We develop a statistical discrimination framework, where each applicant has multiple features and is potentially strategic. The model formalizes the trade-off between the (potentially positive) informational role of a feature and its (negative) exclusionary nature when members of different social groups have unequal access to this feature. Our framework finds a natural application to policy debates on dropping standardized testing in admissions. Our primary takeaway is that the decision to drop a feature (such as test scores) cannot be made without the joint context of the information provided by other features and how the requirement affects the applicant pool composition. Dropping a feature may exacerbate disparities by decreasing the amount of information available for each applicant, especially those from non-traditional backgrounds. However, in the presence of access barriers to a feature, the interaction between the informational environment and the effect of access barriers on the applicant pool size becomes highly complex. Furthermore, we consider an extension with two schools and costly tests, where strategic students decide whether to take the test or not. Our theoretical results reveal that the students' test-taking behavior can be non-monotonic. We characterize the two-school policy equilibria and show that each school's optimal decision to drop the test critically depends on the other school's test policy. Finally, using calibrated simulations, we demonstrate the presence of practical instances where the decision to eliminate standardized testing improves or worsens all metrics.

2020-10-09T07:07:28Z Forthcoming in Management Science Nikhil Garg Hannah Li Faidra Monachou 10.1287/mnsc.2023.02573 http://arxiv.org/abs/2606.04155v1 SocialCoach: Personalized Social Skill Learning with RL-based Agentic Tutoring and Practice 2026-06-02T19:20:54Z

Social skills such as negotiation and leadership are crucial for personal and professional success in today's interconnected world. However, scalable and effective training remains a significant challenge due to the scarcity of expert coaching. In this paper, we introduce SocialCoach, a holistic LLM-powered agentic tutoring system for personalized social skill development at scale. First, SocialCoach automatically constructs a pedagogically-grounded, theory-to-practice knowledge corpus from diverse expert sources, leveraging a multi-agent pipeline. Second, to personalize the learning journey, it employs an adaptive practice scheduling module that follows a prescription-retrieval-adaptation process. To maximize the long-term learning experience while overcoming the cold-start problem, this policy is optimized within a learner simulation environment through reinforcement learning. Finally, SocialCoach integrates immersive, goal-driven practice, causality-driven proficiency assessment and knowledge-grounded, reflective tutoring to help address the knowing-doing gap. We deploy it in our product, EQoach, and conduct extensive experiments. The results show that SocialCoach improves simulated pathway quality and judge-rated tutoring quality over baseline approaches, while early user feedback indicates strong perceived engagement and usefulness. These findings suggest a practical architecture for personalized and gamified pedagogical platforms on soft skill learning.

2026-06-02T19:20:54Z Tianfu Wang Max Xiong Jianxun Lian Hongyuan Zhu Zhengyu Hu Yuxuan Lei Linxiao Gong Xiaofang Li Peiting Tsai Nicholas Jing Yuan Qi Zhang