https://arxiv.org/api/oUYAo7HGT2qty3/HB/v/479OHpQ2026-03-28T10:42:35Z163516515http://arxiv.org/abs/2503.15821v2Temporal Point Process Modeling of Aggressive Behavior Onset in Psychiatric Inpatient Youths with Autism2025-08-27T18:08:00ZAggressive behavior, including aggression towards others and self-injury, occurs in up to 80% of children and adolescents with autism, making it a leading cause of behavioral health referrals and a major driver of healthcare costs. Predicting when autistic youth will exhibit aggression can be challenging due to their communication difficulties. Many are minimally verbal or have poor emotional insight. Recent advances in Machine Learning and wearable biosensing demonstrate the ability to predict aggression within a limited future window (typically one to three minutes) in autistic individuals. However, existing works don't estimate aggression onset probability or the expected number of aggression onsets over longer periods, nor do they provide interpretable insights into onset dynamics. To address these limitations, we apply Temporal Point Processes (TPPs) - particularly self-exciting Hawkes processes - to model the timing of aggressive behavior onsets in psychiatric inpatient autistic youth. We benchmark several TPP models by evaluating their goodness-of-fit and predictive metrics. Our results demonstrate that self-exciting TPPs more accurately captures the irregular and clustered nature of aggression onsets, especially compared to traditional Poisson models. These incipient findings suggest that TPPs can provide interpretable, probabilistic forecasts of aggression onset along a time continuum, supporting future clinical decision-making and preemptive intervention.2025-03-20T03:12:54ZSubmitted to Nature Scientific ReportsMichael PotterMichael EverettAshutosh SinghGeorgios StratisYuna WatanabeAhmet DemirkayaDeniz ErdogmusTales ImbiribaMatthew S. Goodwinhttp://arxiv.org/abs/2211.08637v3Near-Peer Mentoring in Data Science: A Plot for Mutual Growth2025-08-26T18:11:02ZUniversities have been expanding undergraduate data science programs. Involving graduate students in these new opportunities can foster their growth as data science educators. We describe two programs that employ a near-peer mentoring structure, in which graduate students mentor undergraduates, to (1) strengthen their teaching and mentoring skills and (2) provide research and learning experiences for undergraduates from diverse backgrounds. In the Data Science for Social Good program, undergraduate participants work in teams to tackle a data science project with social impact. Graduate mentors guide project work and provide just-in-time teaching and feedback. The Stanford Mentoring in Data Science course offers training in effective and inclusive mentorship strategies. In an experiential learning framework, enrolled graduate students are paired with undergraduate students from non-R1 schools, whom they mentor through weekly one-on-one remote meetings. In end-of-program surveys, mentors reported growth through both programs. Drawing from these experiences, we developed a self-paced mentor training guide, which engages teaching, mentoring and project management abilities. These initiatives and the shared materials can serve as prototypes of future programs that cultivate mutual growth of both undergraduate and graduate students in a high-touch, inclusive, and encouraging environment.2022-11-16T03:13:01ZChiara SabattiQian Zhao10.1080/00031305.2025.2550314http://arxiv.org/abs/2508.19070v1Replicability: Terminology, Measuring Success, and Strategy2025-08-26T14:26:36ZEmpirical science needs to be based on facts and claims that can be reproduced. This calls for replicating the studies that proclaim the claims, but practice in most fields still fails to implement this idea. When such studies emerged in the past decade, the results were generally disappointing. There have been an overwhelming number of papers addressing the ``reproducibility crisis'' in the last 20 years. Nevertheless, terminology is not yet settled, and there is no consensus about when a replication should be called successful. This paper intends to clarify such issues. A fundamental problem in empirical science is that usual claims only state that effects are non-zero, and such statements are scientifically void. An effect must have a \emph{relevant} size to become a reasonable item of knowledge. Therefore, estimation of an effect, with an indication of precision, forms a substantial scientific task, whereas testing it against zero does not. A relevant effect is one that is shown to exceed a relevance threshold. This paradigm has implications for the judgement on replication success.
A further issue is the unavoidable variability between studies, called heterogeneity in meta-analysis. Therefore, it is of little value, again, to test for zero difference between an original effect and its replication, but exceedance of a corresponding relevance threshold should be tested. In order to estimate the degree of heterogeneity, more than one replication is needed, and an appropriate indication of the precision of an estimated effect requires such an estimate.
These insights, which are discussed in the paper, show the complexity of obtaining solid scientific results, implying the need for a strategy to make replication happen.2025-08-26T14:26:36Z36 pages, 3 figuresWerner A. StahelETH Zurich, Switzerlandhttp://arxiv.org/abs/2508.14009v2Understanding Pedagogical Content Knowledge of Data Science Instructors: An Inaugural Framework2025-08-25T20:27:53ZAs data science emerges as a distinct academic discipline, introductory data science (IDS) courses have also drawn attention to their role in providing foundational knowledge of data science to students. IDS courses not only help students transition to higher education but also expose students to the field, often for the first time. They are often taught by instructors without formal training in data science or pedagogy, creating a unique context for examining their pedagogical content knowledge (PCK). This study explores IDS instructors' PCK, particularly how instructors' varied backgrounds interact with their instructional practices. Employing empirical phenomenological methodology, we conducted semi-structured interviews to understand the nature of their PCK. Comparing instructors' PCK was inherently challenging due to their diverse backgrounds and teaching contexts. Prior experiences played a central role in shaping participants' instructional choices. Their perceptions regarding the goals and rationale for teaching data science reflected three distinct orientations. Instructors also acknowledged students entering IDS courses often brought preconceived notions that shaped their learning experiences. Despite the absence of national guidelines, participants demonstrated notable overlap in foundational IDS content, though some instructors felt less confident with advanced or specialized topics. Additionally, instructors commonly employed formative and summative assessment approaches, though few explicitly labeled their practices using these terms. The findings highlight key components of PCK in IDS and offer insights into supporting instructor development through targeted training and curriculum design. This work contributes to ongoing efforts to build capacity in data science education and expand the scope of PCK research into new interdisciplinary domains.2025-08-19T17:15:14Z76 pages, 3 tablesSinem DemirciMine DoğucuAndrew ZiefflerJoshua M. Rosenberghttp://arxiv.org/abs/2507.12424v3Hierarchical Temporal Point Process Modeling of Aggressive Behavior Onset in Psychiatric Inpatient Youth with Autism for Branching Factor Estimation2025-08-19T21:19:02ZAggressive behavior in autistic inpatient youth often arises in temporally clustered bursts complicating efforts to distinguish external triggers from internal escalation. The sample population branching factor-the expected number of new onsets triggered by a given event-is a key summary of self-excitation in behavior dynamics. Prior pooled models overestimate this quantity by ignoring patient-specific variability. We addressed this using a hierarchical Hawkes process with an exponential kernel and edge-effect correction allowing partial pooling across patients. This approach reduces bias from high-frequency individuals and stabilizes estimates for those with sparse data. Bayesian inference was performed using the No U-Turn Sampler with model evaluation via convergence diagnostics, power-scaling sensitivity analysis, and multiple Goodness-of-Fit (GOF) metrics: PSIS-LOO the Lewis test with Durbin's modification and residual analysis based on the Random Time Change Theorem (RTCT). The hierarchical model yielded a significantly lower and more precise branching factor estimate mean (0.742 +- 0.026) than the pooled model (0.899 +- 0.015) and narrower intervals than the unpooled model (0.717 +- 0.139). This led to a threefold smaller cascade of events per onset under the hierarchical model. Sensitivity analyses confirmed robustness to prior and likelihood perturbations while the unpooled model showed instability for sparse individuals. GOF measures consistently favored or on par to the hierarchical model. Hierarchical Hawkes modeling with edge-effect correction provides robust estimation of branching dynamics by capturing both within- and between-patient variability. This enables clearer separation of endogenous from exogenous events supports linkage to physiological signals and enhances early warning systems individualized treatment and resource allocation in inpatient care.2025-07-16T17:11:48ZSubmitted to BMC Medical Research MethodologyMichael PotterMichael EverettDeniz ErdogmusYuna WatanabeTales ImbiribaMatthew S. Goodwinhttp://arxiv.org/abs/2508.11726v1Relationship Between Leisure Activities, Stress Management Methods, Study Methods, and Methods of Learning New Things Among First-Year Statistics Students2025-08-15T07:09:03ZThe interplay between leisure activities, stress management methods, studying methods, and methods of learning new things is crucial and affects performance in all aspects of life. On the other hand, data science and statistics are rapidly growing fields with high demands across universities. Thus, this study aimed to identify the similarities and dissimilarities between the four dimensions: leisure activities, stress management methods, studying methods and methods of learning new things. The participants of this study were first-year undergraduates studying statistics at one of the universities in Sri Lanka. There were 117 students in the sample (female-65, male-52). A self-reported questionnaire was used to collect data. First, individual responses for each question under each dimension were visualized using tile maps separately for males and females to identify similarities and dissimilarities in responses. Next, individuals were clustered based on the responses for each dimension separately. Finally, all resulting clusters were re-clustered to identify the relationships between the dimensions. In all cluster analyses, we used Jaccard distance with hierarchical clustering using the complete linkage method. The results were visualized using tile maps. Across all four dimensions we considered, the top activities were either listening to music or lectures and watching videos or TV shows, suggesting that individuals are introverts and passive learners. There was no strong relationship between these dimensions. By identifying these clusters and relationships, educators can tailor instructional approaches to enhance engagement and effectiveness in diverse learning environments.2025-08-15T07:09:03Z23 pages, 10 figures 23 pages, 10 figuresThiyanga S. Talagalahttp://arxiv.org/abs/2508.10207v1Examining the Association between Estimated Prevalence and Diagnostic Test Accuracy using Directed Acyclic Graphs2025-08-13T21:35:12ZThere have been reports of correlation between estimates of prevalence and test accuracy across studies included in diagnostic meta-analyses. It has been hypothesized that this unexpected association arises because of certain biases commonly found in diagnostic accuracy studies. A theoretical explanation has not been studied systematically. In this work, we introduce directed acyclic graphs to illustrate common structures of bias in diagnostic test accuracy studies and to define the resulting data-generating mechanism behind a diagnostic meta-analysis. Using simulation studies, we examine how these common biases can produce a correlation between estimates of prevalence and index test accuracy and what factors influence its magnitude and direction. We found that an association arises either in the absence of a perfect reference test or in the presence of a covariate that simultaneously causes spectrum effect and is associated with the prevalence (confounding). We also show that the association between prevalence and accuracy can be removed by appropriate statistical methods. In the risk of bias evaluation in diagnostic meta-analyses, an observed association between estimates of prevalence and accuracy should be explored to understand its source and to adjust for latent or observed variables if possible.2025-08-13T21:35:12ZYang LuRobert PlattNandini Dendukurihttp://arxiv.org/abs/2508.09563v1Performances and Correlations of Centrality Measures in Complex Networks2025-08-13T07:30:09ZNumerous centrality measures have been proposed to evaluate the importance of nodes in networks, yet comparative analyses of these measures remain limited. Based on 80 real-world networks, we conducted an empirical analysis of 16 representative centrality measures. In general, there exists a moderate to high level of correlation between node rankings derived from different measures. We identified two distinct communities: one comprising 4 measures and the other 7 measures. Measures within the same community exhibit exceptionally strong pairwise correlations. In contrast, the remaining five measures display markedly different behaviors, showing weak correlations not only among themselves but also with the other measures. This suggests that each of these five measures likely captures unique properties of node importance. Further analysis reveals that the distribution patterns of the most influential nodes identified by different centrality measures vary significantly: some measures tend to cluster influential nodes closely together, while others disperse them across distant locations within the network. Using the epidemic spreading model, we found that LocalRank, Subgraph Centrality, and Katz Centrality perform best in identifying the most influential single node, whereas Leverage Centrality, Collective Influence, and Cycle Ratio excel in identifying the most influential node sets. Overall, measures that identify influential nodes with larger topological distances between them tend to perform better in detecting influential node sets. Interestingly, despite being applied to the same dynamical process, when using two seemingly similar tasks, identifying influential nodes versus identifying influential node sets, to rank the performances of the 16 centrality measures, the resulting rankings are negatively correlated.2025-08-13T07:30:09ZYilin BiXinshan JiaoTao Zhouhttp://arxiv.org/abs/2508.09328v1Dynamic Survival Prediction using Longitudinal Images based on Transformer2025-08-12T20:31:55ZSurvival analysis utilizing multiple longitudinal medical images plays a pivotal role in the early detection and prognosis of diseases by providing insight beyond single-image evaluations. However, current methodologies often inadequately utilize censored data, overlook correlations among longitudinal images measured over multiple time points, and lack interpretability. We introduce SurLonFormer, a novel Transformer-based neural network that integrates longitudinal medical imaging with structured data for survival prediction. Our architecture comprises three key components: a Vision Encoder for extracting spatial features, a Sequence Encoder for aggregating temporal information, and a Survival Encoder based on the Cox proportional hazards model. This framework effectively incorporates censored data, addresses scalability issues, and enhances interpretability through occlusion sensitivity analysis and dynamic survival prediction. Extensive simulations and a real-world application in Alzheimer's disease analysis demonstrate that SurLonFormer achieves superior predictive performance and successfully identifies disease-related imaging biomarkers.2025-08-12T20:31:55ZBingfan LiuHaolun ShiJiguo Caohttp://arxiv.org/abs/2508.09079v1The shape of economics before and after the financial crisis2025-08-12T16:58:23ZThis paper investigates the impact of the global financial crisis on the shape of economics as a discipline by analyzing EconLit-indexed journals from 2006 to 2020 using a multilayer network approach. We consider two types of social relationships among journals, based on shared editors (interlocking editorship) and shared authors (interlocking authorship), as well as two forms of intellectual proximity, derived from bibliographic coupling and textual similarity. These four dimensions are integrated using Similarity Network Fusion to produce a unified similarity network from which journal communities are identified. Comparing the field in 2006, 2012, and 2019 reveals a high degree of structural continuity. Our findings suggest that, despite changes in research topics after the crisis, fundamental social and intellectual relationships among journals have remained remarkably stable. Editorial networks, in particular, continue to shape hierarchies and legitimize knowledge production.2025-08-12T16:58:23Z65 pages, 3 figures, 7 tablesAlberto BacciniLucio BarabesiCarlo Debernardihttp://arxiv.org/abs/2508.07864v1A Review and Classification of Model Uncertainty2025-08-11T11:26:44ZModel uncertainty is a crucial issue in statistics, econometrics and machine learning, yet its definition remains ambiguous and is subject to various interpretations in the literature. So far, there has not been a universally accepted definition of model uncertainty. We review different understandings of model uncertainty and categorize them into three distinct types: uncertainty about the true model, model selection uncertainty, and model selection instability. We further offer interpretations and examples for a better illustration of these definitions. We also discuss the potential consequences of neglecting model uncertainty in the process of conducting statistical inference, and provide effective solutions to these problems. Our aim is to help researchers better understand the concept of model uncertainty and obtain valid statistical inference results on the premise of its existence.2025-08-11T11:26:44ZGuangyuan CuiYuting WeiXinyu Zhanghttp://arxiv.org/abs/2508.07474v1The p-value from a fuzzy point of view2025-08-10T20:17:44ZThe purpose of the paper is to provide a new way of seeing the p-value in terms of a fuzzy membership function. According to the ASAs statement, we aim at removing the arbitrary choice of the significance level and at demonstrating that the p-value can be profitably interpreted from a fuzzy point of view. In particular, we propose a new class of membership functions by viewing the p-value as a function of the null hypothesis and we apply our approach to compare two independent binomial proportions. The proposed membership functions can also be employed to assess the precision of confidence intervals and the power of statistical tests.2025-08-10T20:17:44ZPiero Quattohttp://arxiv.org/abs/2312.13619v2The many routes to the ubiquitous Bradley-Terry model2025-08-07T10:50:50ZThe rating of items based on pairwise comparisons has been a topic of statistical investigation for many decades. Numerous approaches have been proposed. One of the best known is the Bradley-Terry model. This paper seeks to assemble and explain a variety of motivations for its use. Some are based on principles or on maximising an objective function; others are derived from well-known statistical models, or stylised game scenarios. They include both examples well-known in the literature as well as what are believed to be novel presentations.2023-12-21T07:14:19ZTo be published in Statistical ScienceIan HamiltonNick TawnDavid Firthhttp://arxiv.org/abs/2406.10612v2Producing treatment hierarchies in network meta-analysis using probabilistic models and treatment-choice criteria2025-08-06T14:05:25ZA key output of network meta-analysis (NMA) is the relative ranking of treatments; nevertheless, it has attracted substantial criticism. Existing ranking methods often lack clear interpretability and fail to adequately account for uncertainty, over-emphasizing small differences in treatment effects. We propose a novel framework to estimate treatment hierarchies in NMA using a probabilistic model, focusing on a clinically relevant treatment-choice criterion (TCC). Initially, we formulate a mathematical expression to define a TCC based on smallest worthwhile differences (SWD), converting NMA relative treatment effects into treatment preference format. This data is then synthesized using a probabilistic ranking model, assigning each treatment a latent 'ability' parameter, representing its propensity to yield clinically important and beneficial true treatment effects relative to the rest of the treatments in the network. Parameter estimation relies on the maximum likelihood theory, with standard errors derived asymptotically from Fisher's information matrix. To facilitate the use of our methods, we launched the R package mtrank. We applied our method to two clinical datasets: one comparing 18 antidepressants for major depression and another comparing 6 antihypertensives for the incidence of diabetes. Our approach provided robust, interpretable treatment hierarchies that account for a concrete TCC. We further examined the agreement between the proposed method and existing ranking metrics in 153 published networks, concluding that the degree of agreement depends on the precision of the NMA estimates. Our framework offers a valuable alternative for NMA treatment ranking, mitigating over-interpretation of minor differences. This enables more reliable and clinically meaningful treatment hierarchies.2024-06-15T12:26:09ZTheodoros EvrenoglouAdriani NikolakopoulouGuido SchwarzerGerta RückerAnna Chaimani10.1017/rsm.2026.10071http://arxiv.org/abs/2508.04080v1GeoSR: Cognitive-Agentic Framework for Probing Geospatial Knowledge Boundaries via Iterative Self-Refinement2025-08-06T04:45:34ZRecent studies have extended the application of large language models (LLMs) to geographic problems, revealing surprising geospatial competence even without explicit spatial supervision. However, LLMs still face challenges in spatial consistency, multi-hop reasoning, and geographic bias. To address these issues, we propose GeoSR, a self-refining agentic reasoning framework that embeds core geographic principles -- most notably Tobler's First Law of Geography -- into an iterative prediction loop. In GeoSR, the reasoning process is decomposed into three collaborating agents: (1) a variable-selection agent that selects relevant covariates from the same location; (2) a point-selection agent that chooses reference predictions at nearby locations generated by the LLM in previous rounds; and (3) a refine agent that coordinates the iterative refinement process by evaluating prediction quality and triggering further rounds when necessary. This agentic loop progressively improves prediction quality by leveraging both spatial dependencies and inter-variable relationships. We validate GeoSR on tasks ranging from physical-world property estimation to socioeconomic prediction. Experimental results show consistent improvements over standard prompting strategies, demonstrating that incorporating geostatistical priors and spatially structured reasoning into LLMs leads to more accurate and equitable geospatial predictions. The code of GeoSR is available at https://github.com/JinfanTang/GeoSR.2025-08-06T04:45:34Z16 pages, 9 figuresJinfan TangKunming WuRuifeng GongxieYuya HeYuankai Wu