https://arxiv.org/api/MsfUF9pXNHQRYmj6n93FoxEUI2w2026-06-10T07:33:49Z168613515http://arxiv.org/abs/2512.09790v2A Conversation with Mike West2026-01-06T18:15:19ZMike West is currently the Arts & Sciences Distinguished Professor Emeritus of Statistics and Decision Sciences at Duke University. Mike's research in Bayesian analysis spans multiple interlinked areas: theory and methods of dynamic models in time series analysis, foundations of inference and decision analysis, multivariate and latent structure analysis, stochastic computation and optimisation, among others. Inter-disciplinary R&D has ranged across applications in commercial forecasting, dynamic networks, finance, econometrics, signal processing, climatology, systems biology, genomics and neuroscience, among other areas. Among Mike's currently active research areas are forecasting, causal prediction and decision analysis in business, economic policy and finance, as well as in personal decision making. Mike led the development of academic statistics at Duke University from 1990-2002, and has been broadly engaged in professional leadership elsewhere. He is past president of the International Society for Bayesian Analysis (ISBA), and has served in founding roles and as board member for several professional societies, national and international centres and institutes. Recipient of numerous awards, Mike has been active in research with various companies, banks, government agencies and academic centres, co-founder of a successful biotechnology company, and board member for several financial and IT companies. He has published 4 books, several edited volumes and over 200 papers. Mike has worked with many undergraduate and Master's research students, and as of 2025 has mentored around 65 primary PhD students and postdoctoral associates who moved to academic, industrial or governmental positions involving advanced statistical and data science research.2025-12-10T16:10:21ZHedibert F. LopesFilippo Ascolanihttp://arxiv.org/abs/2512.23110v2Assessing the Effects of Macroeconomic Variables on Child Mortality in D-8 Countries Using Panel Data Analysis2026-01-04T23:55:51ZThis research analyses the axiomatic link among health expenditures, inflation rate, and gross national income (GNI) per capita concerning the child mortality (CMU5) rate in D-8 nations, employing panel data analysis from 1995 to 2014. Utilising conventional panel unit root tests and linear regression models, we establish that education expenditures, in conjunction with health expenditures, inflation rate, and GNI per capita, display stationarity at level. Additionally, we examine fixed effects and random effects estimators for the pertinent variables, utilising metrics such as the Hausman Test (HT) and comparisons with CCMR correlations. Our data demonstrate that the CMU5 rate in D-8 nations has steadily decreased, according to a somewhat negative linear regression model, therefore slightly undermining the fourth Millennium Development Goal (MDG4) of the World Health Organisation (WHO).2025-12-28T23:17:53Z13 pages, 3 Figures, 4 tables, updated referencesM. Waseem AkramBinita ShahiM. Javed Akramhttp://arxiv.org/abs/2512.24930v1Constraints on the perfect phylogeny mixture model and their effect on reducing degeneracy2025-12-31T15:39:15ZThe perfect phylogeny mixture (PPM) model is useful due to its simplicity and applicability in scenarios where mutations can be assumed to accumulate monotonically over time. It is the underlying model in many tools that have been used, for example, to infer phylogenetic trees for tumor evolution and reconstruction. Unfortunately, the PPM model gives rise to substantial ambiguity -- in that many different phylogenetic trees can explain the same observed data -- even in the idealized setting where data are observed perfectly, i.e. fully and without noise. This ambiguity has been studied in this perfect setting by Pradhan et al. 2018, which proposed a procedure to bound the number of solutions given a fixed instance of observation data. Beyond this, studies have been primarily empirical. Recent work (Myers et al. 2019) proposed adding extra constraints to the PPM model to tackle ambiguity. In this paper, we first show that the extra constraints of Myers et al. 2019, called longitudinal constraints (LC), often fail to reduce the number of distinct trees that explain the observations. We then propose novel alternative constraints to limit solution ambiguity and study their impact when the data are observed perfectly. Unlike the analysis in Pradhan et al. 2018, our theoretical results regarding both the inefficacy of the LC and the extent to which our new constrains reduce ambiguity are not tied to a single observation instance. Rather, our theorems hold over large ensembles of possible inference problems. To the best of our knowledge, we are the first to study degeneracy in the PPM model in this ensemble-based theoretical framework.2025-12-31T15:39:15ZJohn MarangolaAzadeh SheikholeslamiJosé Bentohttp://arxiv.org/abs/2512.23371v1Domain matters: Towards domain-informed evaluation for link prediction2025-12-29T11:04:36ZLink prediction, a foundational task in complex network analysis, has extensive applications in critical scenarios such as social recommendation, drug target discovery, and knowledge graph completion. However, existing evaluations of algorithmic often rely on experiments conducted on a limited number of networks, assuming consistent performance rankings across domains. Despite the significant disparities in generative mechanisms and semantic contexts, previous studies often improperly highlight ``universally optimal" algorithms based solely on naive average over networks across domains. This paper systematically evaluates 12 mainstream link prediction algorithms across 740 real-world networks spanning seven domains. We present substantial empirical evidence elucidating the performance of algorithms in specific domains. This findings reveal a notably low degree of consistency in inter-domain algorithm rankings, a phenomenon that stands in stark contrast to the high degree of consistency observed within individual domains. Principal Component Analysis shows that response vectors formed by the rankings of the 12 algorithms cluster distinctly by domain in low-dimensional space, thus confirming domain attributes as a pivotal factor affecting algorithm performance. We propose a metric called Winner Score that could identify the superior algorithm in each domain: Non-Negative Matrix Factorization for social networks, Neighborhood Overlap-aware Graph Neural Networks for economics, Graph Convolutional Networks for chemistry, and L3-based Resource Allocation for biology. However, these domain-specific top-performing algorithms tend to exhibit suboptimal performance in other domains. This finding underscores the importance of aligning an algorithm's mechanism with the network structure.2025-12-29T11:04:36ZPhysica A: Statistical Mechanics and its Applications, 693, 131551 (2026)Yilin BiJunhao BianShuyan WanShuaijia WangTao Zhou10.1016/j.physa.2026.131551http://arxiv.org/abs/2512.23053v1LLteacher: A Tool for the Integration of Generative AI into Statistics Assignments2025-12-28T19:39:45ZAs generative AI becomes increasingly embedded in everyday life, the thoughtful and intentional integration of AI-based tools into statistics education has become essential. We address this need with a focus on homework assignments and we propose the use of LLMs as a companion to complete homework by developing an open-source tool named LLteacher. This LLM-based tool preserves learning processes and it guides students to engage with AI in ways that support their learning, while ensuring alignment with course content and equitable access. We illustrate LLteacher's design and functionality with examples from an undergraduate Statistical Computing course in R, showing how it supports two distinct pedagogical goals: recalling prior knowledge and discovering new concepts. While this is an initial version, LLteacher demonstrates one possible pathway for integrating generative AI into statistics courses, with strong potential for adaptation to other types of classes and assignments.2025-12-28T19:39:45ZEmanuela FurfaroSimone Mosciattihttp://arxiv.org/abs/2512.20068v2Change Point Detection and Mean-Field Dynamics of Variable Productivity Hawkes Processes2025-12-26T21:23:24ZMany self-exciting systems change because endogenous amplification, as opposed to exogenous forcing, varies. We study a Hawkes process with fixed background rate and kernel, but piecewise time-varying productivity. For exponential kernels we derive closed-form mean-field relaxation after a change and a deterministic surrogate for post-change Fisher information, revealing a boundary layer in which change time information localises and saturates, while post-change level information grows linearly beyond a short transient. These results motivate a Bayesian change point procedure that stabilizes inference on finite windows. We illustrate the method on invasive pneumococcal disease incidence in The Gambia, identifying a decline in productivity aligned with pneumococcal conjugate vaccine rollout.2025-12-23T05:43:55ZConor KresinBoris BaeumerSophie Phillipshttp://arxiv.org/abs/1006.5471v10Cognitive Constructivism and the Epistemic Significance of Sharp Statistical Hypotheses in Natural Sciences2025-12-24T09:07:31ZThis book presents our case in defense of a constructivist epistemological framework and the use of compatible statistical theory and inference tools. The basic metaphor of decision theory is the maximization of a gambler's expected fortune, according to his own subjective utility, prior beliefs an learned experiences. This metaphor has proven to be very useful, leading the development of Bayesian statistics since its XX-th century revival, rooted on the work of de Finetti, Savage and others. The basic metaphor presented in this text, as a foundation for cognitive constructivism, is that of an eigen-solution, and the verification of its objective epistemic status. The FBST - Full Bayesian Significance Test - is the cornerstone of a set of statistical tolls conceived to assess the epistemic value of such eigen-solutions, according to their four essential attributes, namely, sharpness, stability, separability and composability. We believe that this alternative perspective, complementary to the one ofered by decision theory, can provide powerful insights and make pertinent contributions in the context of scientific research.2010-06-28T21:15:07Z453 pagesJ. M. Sternhttp://arxiv.org/abs/2501.05584v3The Impact of Question Framing on the Performance of Automatic Occupation Coding2025-12-23T13:13:43ZOccupational data play a vital role in research, official statistics, and policymaking, yet their collection and accurate classification remain a challenge. This study investigates the effects of occupational question wording on data variability and the performance of automatic coding tools. We conducted and replicated a split-ballot survey experiment in Germany using two common occupational question formats: one focusing on "job title" (Berufsbezeichnung) and another on "berufliche Tätigkeit" (loosely translated as occupation or occupational task). Our analysis reveals that automatic coding tools, such as CASCOT and OccuCoDe, exhibit sensitivity to the form and origin of the data. Specifically, these tools were more efficient when coding responses to the job title question format than the occupational task format, suggesting a potential way to improve the respective questions for many German surveys. In a subsequent "detailed tasks and duties" question, providing a guiding example prompted respondents to give longer answers without broadening the range of unique words they used. These findings highlight the importance of harmonising survey questions and and ensuring that automatic coding tools are robust to differences in question wording. Further research is needed to optimise question design and coding tools for greater accuracy and applicability in occupational data collection.2025-01-09T21:27:43ZOlga KononykhinaFrauke KreuterMalte Schierholzhttp://arxiv.org/abs/2512.19740v1Asia Cup 2025: A Structured T20 Match-Level Dataset and Exploratory Analysis for Cricket Analytics2025-12-17T20:02:50ZThis paper presents a structured and comprehensive dataset corresponding to the 2025 Asia Cup T20 cricket tournament, designed to facilitate data-driven research in sports analytics. The dataset comprises records from all 19 matches of the tournament and includes 61 variables covering team scores, wickets, powerplay statistics, boundary counts, toss decisions, venues, and player-specific highlights. To demonstrate its analytical value, we conduct an exploratory data analysis focusing on team performance indicators, boundary distributions, and scoring patterns. The dataset is publicly released through Zenodo under a CC-BY 4.0 license to support reproducibility and further research in cricket analytics, predictive modeling, and strategic decision-making. This work contributes an open, machine-readable benchmark dataset for advancing cricket analytics research.2025-12-17T20:02:50ZDataset available via Zenodo:{https://doi.org/10.5281/zenodo.17228056}. Source code and analysis scripts are publicly available at : https://github.com/kousarraza/AsiaCup2025Kousar RazaFaizan Alihttp://arxiv.org/abs/2512.15507v1Change detection with adaptive sampling for binary responses2025-12-17T14:55:27ZWe propose using an adaptive sampling method to detect changes for a system with multiple lines. The adaptive sampling utilizes the information in responses to learn on which line is more likely to have a change thus allocating more units to the line. The learning process is formatted as a Markov decision process by integrating sampling information with likelihood ratio for changes to define rewards and the optimal sampling is approximated by using the Bellman operator iteratively based on the average reward criterion. We demonstrate the performance of the proposed method for binary responses using the exact distribution method for adaptive sampling. Our numeric results show that the adaptive sampling samples more often the line that has a change and the statistical power to detect a change is better than those with the equal randomization for sample sizes of 20 or higher. When sample sizes increase or the difference between out-of-control and in-control probabilities increases, the adaptive sampling allocates higher proportion of units averagely to the line with a change and the statistical power to detect a change increases.2025-12-17T14:55:27ZYanqing YiSu-Fen Yanghttp://arxiv.org/abs/2512.13354v1Data-driven inverse uncertainty quantification: application to the Chemical Vapor Deposition Reactor Modeling2025-12-15T14:07:11ZThis study presents a Bayesian framework for (inverse) uncertainty quantification and parameter estimation in a two-step Chemical Vapor Deposition coating process using production data. We develop an XGBoost surrogate model that maps reactor setup parameters to coating thickness measurements, enabling efficient Bayesian analysis while reducing sampling costs. The methodology handles a mixture of data including continuous, discrete integer, binary, and encoded categorical variables. We establish parameter prior distributions through Bayesian Model Selection and perform Inverse Uncertainty Quantification via weighted Approximate Bayesian Computation with summary statistics, providing robust parameter credible intervals while filtering measurement noise across multiple reactor locations. Furthermore, we employ clustering methods guided by geometry embeddings to focus analysis within homogeneous production groups. This integrated approach provides a validated tool for improving industrial process control under uncertainty.2025-12-15T14:07:11ZGeremy LoachamínEleni D. KoronakiDimitrios G. GiovanisMartin KathreinChristoph CzettlAndreas G. BoudouvisStéphane P. A. Bordashttp://arxiv.org/abs/2512.12579v1A Real Data-Driven, Robust Survival Analysis on Patients who Underwent Deep Brain Stimulation for Parkinson's Disease by Utilizing Parametric, Non-Parametric, and Semi-Parametric Approaches2025-12-14T07:09:02ZParkinson's Disease (PD) is a devastating neurodegenerative disorder that affects millions of people around the globe. Many researchers are continuously working to understand PD and develop treatments to improve the condition of PD patients, which affects their day-to-day lives. Since the last decades, the treatment, Deep Brain Stimulation (DBS) has given promising results for motor symptoms by improving the quality of daily living of PD patients. In the methodology of the present study, we have utilized sophisticated statistical approaches such as Nonparametric, Semi-parametric, and robust Parametric survival analysis to extract useful and important information about the long-term survival outcomes of the patients who underwent DBS for PD. Finally, we were able to conclude that the probabilistic behavior of the survival time of female patients is statistically different from that of male patients. Furthermore, we have identified that the probabilistic behavior of the survival times of Female patients is characterized by the 3-parameter Lognormal distribution, while that of Male patients is characterized by the 3-parameter Weibull distribution. More importantly, we have found that the Female patients have higher survival compared to the Male patients after conducting a robust parametric survival analysis. Using the semi-parametric COX-PH, we found that the initial implant of the right side leads to a high frequency of events occurring for the female patients with a bad prognostic factor, while for the male patients, a low events occurs with a good prognostic factor. Furthermore, we have found an interaction term between the number of revisions and the initial size of the implant, which increases the frequency of events occurring for the Male patients with a bad prognostic factor.2025-12-14T07:09:02ZMalinda IluppangamaDilmi AbeywardanaChris Tsokoshttp://arxiv.org/abs/2511.13384v4CBDC Stress Test in a Dual-Currency Setting2025-12-13T16:34:30ZThis study explores the potential impact of introducing a Central Bank Digital Currency (CBDC) on financial stability in an emerging dual-currency economy (Romania), where the domestic currency (RON) coexists with the euro. It develops an integrated analytical framework combining econometrics, machine learning, and behavioural modelling. CBDC adoption probabilities are estimated using XGBoost and logistic regression models trained on behavioural and macro-financial indicators rather than survey data. Liquidity stress simulations assess how banks would respond to deposit withdrawals resulting from CBDC adoption, while VAR, MSVAR, and SVAR models capture the macro-financial transmission of liquidity shocks into credit contraction and changes in monetary conditions. The findings indicate that CBDC uptake (co-circulating Digital RON and Digital EUR) would be moderate at issuance, amounting to around EUR 1 billion, primarily driven by digital readiness and trust in the central bank. The study concludes that a non-remunerated, capped CBDC, designed primarily as a means of payment rather than a store of value, can be introduced without compromising financial stability. In dual currency economies, differentiated holding limits for domestic and foreign digital currencies (e.g., Digital RON versus Digital Euro) are crucial to prevent uncontrolled euroisation and preserve monetary sovereignty. A prudent design with moderate caps, non remuneration, and macroprudential coordination can transform CBDC into a digital liquidity buffer and a complementary monetary policy instrument that enhances resilience and inclusion rather than destabilising the financial system.2025-11-17T13:55:02Z724 pages, including annexes; most figures and tables included; if not, then referencedCatalin Dumitrescuhttp://arxiv.org/abs/2512.11079v1Applying NLP to iMessages: Understanding Topic Avoidance, Responsiveness, and Sentiment2025-12-11T19:48:51ZWhat is your messaging data used for? While many users do not often think about the information companies can gather based off of their messaging platform of choice, it is nonetheless important to consider as society increasingly relies on short-form electronic communication. While most companies keep their data closely guarded, inaccessible to users or potential hackers, Apple has opened a door to their walled-garden ecosystem, providing iMessage users on Mac with one file storing all their messages and attached metadata. With knowledge of this locally stored file, the question now becomes: What can our data do for us? In the creation of our iMessage text message analyzer, we set out to answer five main research questions focusing on topic modeling, response times, reluctance scoring, and sentiment analysis. This paper uses our exploratory data to show how these questions can be answered using our analyzer and its potential in future studies on iMessage data.2025-12-11T19:48:51Z11 pages, 18 figures, https://github.com/Alanshnir/imessage-analyzer/blob/main/Research/NLP-iMessage-Analyzer%20Findings.pdfAlan GerberSam Coopermanhttp://arxiv.org/abs/2512.09316v2Group Cooperation Diverges onto Durable Low versus High Paths: Public Goods Experiments in 134 Honduran Villages2025-12-11T10:11:54ZWe performed large, lab-in-the-field experiment (2,591 participants across 134 Honduran villages; ten rounds) and tracked how contribution behavior unfolds in fixed, anonymous groups of size five. Contribution separates early into two durable paths, one low and one high, with rare convergence thereafter. High-path players can be identified with strong accuracy early on. Groups that begin with an early majority of above-norm contributors (about 60%) are very likely finish high. The empirical finding of a bifurcation, consistent with the theory, shows that early, high contributions by socially central people steer groups onto, and help keep them on, a high-cooperation path.2025-12-10T05:03:20ZThis is the initial version of the manuscript. The presentation of figures, tables, and analyses may be revised in future versions to better align with the requirements and scope of the target journalMarios PapamichalisNicholas ChristakisFeng Fu