https://arxiv.org/api/9Ag6TQIzJHH2CahqBQ34Yj9YyKU2026-06-09T22:28:41Z16833015http://arxiv.org/abs/2605.02403v1Development and performance of npd for the evaluation of models with ordinal data2026-05-04T09:45:53ZIntroduction: Normalised prediction distribution errors (npde) are used to graphically and statistically evaluate continuous responses in non-linear mixed effect models. Here, our aim was to extend npde for categorical data and to evaluate their performance. We applied our approach to a real case-study describing the evolution of severe onychomycosis (toenail infection) in a trial comparing two treatment groups.
Methods: Let V denote a dataset with categorical observations. The null hypothesis H0 is that observations in V can be described by a model M. Residuals called npde can be adapted to categorical observations using jittering techniques. Their theoretical standard normal distribution can be evaluated through the Kolmogorov-Smirnov test. We evaluated the performance in terms of power through a simulation and compared it to a Chi-square. We illustrated the test and graphs on a real case-study.
Results: npd were able to detect misspecifications in the structural model and model parameter value. As expected, the power to detect model misspecifications increased both with the difference in the shape of the probability, and with the sample size. Chi-square test performed better but npd could be readily applied in all type of design. Based on the toe-nail data, graphs reveal a huge discrepancy of the base model, and a good adequation for the best model we found.
Conclusions: npde can be extended to categorical data, particularly in clinical settings with unbalanced design and graphs can be useful to evaluate the model as well as the covariate effects.2026-05-04T09:45:53ZSupplementary material is available in a PDF fileMarc CerouMarylore ChenelEmmanuelle Cometshttp://arxiv.org/abs/2605.02338v1Evaluation of the npde performance for the evaluation of joint model with longitudinal and TTE data: an application in metastatic hormono-resistant prostate cancer2026-05-04T08:42:25ZIntroduction: Joint models are increasingly used in clinical trials. An important part of model building is to properly assess the descriptive and predictive ability of these models. Normalised prediction discrepancies (npd) and normalised prediction distribution errors (npde) have been developed to evaluate graphically and statistically non-linear mixed effect models for continuous responses. In this work, we propose to use a combined test to evaluate joint models.
Methods: Prediction discrepancies (pd) are defined as the quantile of the observation within its predictive distribution and obtained by Monte-Carlo simulations. The pd for unobserved (censored) event times are imputed in a uniform distribution based on the model prediction of the probability of censoring, using a similar method as the one developed to handle data under the lower quantification limit (LOQ). We propose to combine the p-values of the tests on longitudinal data and on time-to-event (TTE) data, adjusted with a Bonferroni correction. We performed simulation studies based on a joint model characterising the relationship between prostate specific antigen biomarker (PSA) and survival in prostate cancer patients to evaluate the type I error and power of npd/npde to detect different types of model misspecifications.
Results: For all types of misspecifications, the type I error of the combined test was found to be close to the expected 5%. The power of the combined test to detect model misspecifications increased with the difference from the true model and as expected, with sample size. Graphically the power increase can be related to larger differences in the shape of the survival function or PSA evolution.
Conclusions: npd can be readily extended for event data by imputing the pd for censored event under the model. The test showed an adequate type I error, and was quite sensitive to alternative models tested.2026-05-04T08:42:25ZSupplementary material in additional PDF fileMarc CerouJimmy MullaertMarc LavielleSophie PeignéMarylore ChenelEmmanuelle Cometshttp://arxiv.org/abs/2605.01615v1Threshold Exceedance Estimation in Spatially Correlated Areal Data Using Maxima-Nominated Sampling2026-05-02T21:33:38ZWe study estimation of the proportion of areal units in a spatially correlated domain whose success probabilities exceed a prespecified threshold. Such problems arise in health surveillance, environmental monitoring, and social policy, where the goal is to estimate the fraction of high-risk areas. We propose a DUST-MNS design that combines maxima-nominated sampling (MNS) with the probability-proportional-to-size dependent unit sequential technique (pps-DUST), thereby promoting spatial spread while mitigating the effect of spatial autocorrelation. The design forms $n$ candidate sets of size $k$ and obtains final measurements only from the area judged to be at highest risk in each set, yielding $n$ measured areas from $nk$ screened candidates. Ranking may be based on expert judgment, prior surveys, or easily obtained auxiliary covariates. We derive a closed-form estimator of the exceedance probability $θ$ based on data from DUST-MNS design, establish its bias and variance, and show that, in the rare-to-moderate exceedance regime $θ<θ^\star(k)$, the proposed DUST-MNS estimator outperforms its SRS and DUST-SRS counterparts, where $θ^\star(k)$ depends only on $k$. We also provide guidance on the choice of $k$, derive efficiency bounds under a Beta model, extend the method to imperfect ranking, and develop variance estimation and bootstrap confidence intervals. An application to county-level stroke prevalence data from CDC PLACES, using diabetes prevalence as the ranking concomitant, illustrates the proposed approach.2026-05-02T21:33:38Z26 pages, 4 figures, 6 tablesMohammad Jafari Jozanihttp://arxiv.org/abs/2605.01571v1Functional Liu Regression for Scalar-on-Functional Models in High-Dimensional Settings2026-05-02T18:44:24ZThis study develops a functional Liu-type shrinkage estimator (fLiu) for scalar-on-function regression in the presence of strong multicollinearity and high-dimensional functional predictors. The approach extends the classical Liu estimator to the functional setting by combining directional shrinkage with smoothness regularization, providing flexible control over the bias-variance trade-off. Theoretical analysis is used to examine the behavior of the estimator and the associated parameter selection problem. In particular, an explicit mean squared error (MSE) decomposition is derived, characterizing the risk of the estimator in terms of variance reduction and shrinkage bias. This further yields an explicit optimal choice of the shrinkage parameter of the fLiu estimator through a one-dimensional convex risk minimization problem, leading to a practical plug-in tuning rule. Moreover, it is shown that in high-dimensional (underdetermined) settings, commonly used criterion such as GCV (and equivalently PRESS/LOO-CV) become constant with respect to the parameter d, thus uninformative for tuning. This provides a theoretical explanation for the predominant focus on the overdetermined regime in existing Liu-type methods. Numerical results demonstrate that the estimator achieves competitive predictive accuracy relative to existing methods. Implementation is carried out in R using the fda package, and in Python via the fLiu.py package developed for this study.2026-05-02T18:44:24ZShaista AshrafStephen BeckerFarrukh JavedIsmail Shahhttp://arxiv.org/abs/2604.22791v2R Package iglm: Regression under Interference in Connected Populations2026-05-02T15:25:47ZWe introduce R package iglm, which implements a comprehensive framework for studying relationships among predictors and outcomes under interference. The implemented regression framework facilitates the study of spillover and other phenomena in connected populations and has important advantages over existing packages, among them scalability and provable theoretical guarantees. On the computational side, the regression framework relies on scalable methods that can be applied to small and large data sets, by solving a convex optimization program based on pseudo-likelihoods using Minorization-Maximization and Quasi-Newton algorithms. On the statistical side, the regression framework comes with provable theoretical guarantees. To increase the versatility of iglm, users can add custom-built model terms. We showcase iglm using two data sets, including hate speech on the social media platform X and communications among students.2026-04-13T10:32:57ZCornelius FritzMichael Schweinbergerhttp://arxiv.org/abs/2605.00750v1Quenched Amplification and Tail Shaping in Networked Systems with Memory and Regime Switching2026-05-01T16:01:12ZNetworked systems operating under intermittent adverse conditions and long memory can remain stable on average while exhibiting rare but extreme trajectory-level excursions. We study linear regime-switching network dynamics with Volterra-type memory, formulated through a finite-dimensional lifted ordinary differential equation embedding. Despite finite-horizon annealed boundedness, we show that quenched amplification emerges generically from the interaction of regime persistence, memory accumulation, and non-normal lifted operator geometry. A lower bound on burst-size distributions reveals power-law tails whose exponent is determined by the ratio between unfavorable dwell-time rates and an operator-defined instantaneous growth parameter. This parameter is computable online via the Euclidean logarithmic norm of the lifted operator, yielding a practical early-warning indicator. Building on this structure, we introduce a dynamic data-driven intervention strategy that enforces contraction on demand along rare amplification channels, thereby shaping or truncating tail risk without altering exogenous regime statistics or typical system behavior. The results provide a geometrically grounded and operationally actionable framework for understanding and mitigating extreme events in memory-driven regime-switching systems.2026-05-01T16:01:12ZMauricio Herrera-Marínhttp://arxiv.org/abs/2605.00729v1Intermittency induced by long memory under stochastic regime switching2026-05-01T15:28:50ZWe study a fundamental instability mechanism in nonlinear, nonlocal dynamical systems arising from the interaction of long-range memory and stochastic regime switching. The dynamics are governed by network-coupled, operator-valued Volterra evolutions with completely monotone memory kernels whose excitation operators and kernel parameters are modulated by an ergodic finite-state continuous-time Markov chain. We formalize a sharp separation between annealed stability (in expectation) and quenched behaviour (along typical sample paths). On the annealed side, we identify an averaged memory gain that yields uniform moment bounds and a memory-adapted Lyapunov functional implying mean-square control under an averaged subcriticality condition. On the quenched side, we show that rare but persistent excursions into supercritical regimes are amplified by memory, producing intermittent macroscopic bursts with heavy-tailed statistics and a deterministic almost sure growth exponent obtained via a subadditive ergodic argument. This establishes an annealed--quenched dichotomy specific to non-Markovian switching systems, where stability in expectation can coexist with pathwise growth and metastable burst phases. We further derive a micro--macro correspondence by proving that a population of regime-modulated self-exciting point processes converges, both annealed and quenched, to the random-coefficient Volterra limit, transferring the burst mechanism from microscopic branching dynamics to macroscopic long-memory flows. Numerical experiments illustrate how burst localization depends on graph geometry and on noncommuting excitation operators.2026-05-01T15:28:50ZMauricio Herrera-Marínhttp://arxiv.org/abs/2504.15290v2Parental Imprints On Birth Weight: A Data-Driven Model For Neonatal Prediction In Low Resource Prenatal Care2026-04-30T14:19:19ZAccurate fetal birth weight prediction is a cornerstone of prenatal care, yet traditional methods often rely on imaging technologies that remain inaccessible in resource-limited settings. This study presents a novel machine learning-based framework that circumvents these conventional dependencies, using a diverse set of physiological, environmental, and parental factors to refine birth weight estimation. A multi-stage feature selection pipeline filters the dataset into an optimized subset, demonstrating previously underexplored yet clinically relevant predictors of fetal growth. By integrating advanced regression architectures and ensemble learning strategies, the model captures non-linear relationships often overlooked by traditional approaches, offering a predictive solution that is both interpretable and scalable. Beyond predictive accuracy, this study addresses a question: whether birth weight can be reliably estimated without conventional diagnostic tools. The findings challenge entrenched methodologies by introducing an alternative pathway that enhances accessibility without compromising clinical utility. While limitations exist, the study lays the foundation for a new era in prenatal analytics, one where data-driven inference competes with, and potentially redefines, established medical assessments. By bridging computational intelligence with obstetric science, this research establishes a framework for equitable, technology-driven advancements in maternal-fetal healthcare.2025-04-07T08:15:39ZWithdrawn due to identified issues in manuscript originality and overlap in some Sections requiring substantial revision and restructuring of the text and methodology. A corrected and improved version will be submittedRajeshwari MistriHarsh JoshiNachiket KapureParul KumariManasi MaliSeema PurohitNeha SharmaMrityunjoy PandayChittaranjan S. Yajnikhttp://arxiv.org/abs/2604.27732v1A Note on the Generalized Cape Cod Reserving Method2026-04-30T11:23:31ZClaims reserving is one of the most important actuarial tasks in non-life insurance modeling. There are several popular methods to perform claims reserving such as the chain-ladder (CL), the Bornhuetter--Ferguson (BF) or the generalized Cape Cod (GCC) methods. These methods have originally been introduced as deterministic algorithms, and only in a later step, they have been lifted to stochastic models allowing for analyzing claims prediction uncertainty. This holds true for the CL and the BF methods, but not for the GCC method. The purpose of this article is to close this gap and derive an analytical formula for the mean squared error of prediction (MSEP) of the GCC method.2026-04-30T11:23:31ZRonald RichmanMario V. Wüthrichhttp://arxiv.org/abs/2506.23040v5Treatment, evidence, imitation, and chat2026-04-28T19:11:23ZLarge language models are thought to have the potential to aid in medical decision making. This work investigates the degree to which this might be the case. We start with the treatment problem, the patient's core medical decision-making task, which is solved in collaboration with a clinician. We discuss different approaches to solving it, including, within evidence-based medicine, experimental and observational data. We then discuss the chat problem, and how this differs from the treatment problem -- in particular with respect to imitation (and how imitation alone cannot solve the true treatment problem, although this does not mean it is not useful). We then discuss how a large-language-model-based system might be trained to solve the treatment problem, highlighting that the major challenges relate to the ethics of experimentation and the assumptions associated with observation. We finally discuss how these challenges relate to evidence-based medicine and how this might inform the efforts of the medical research community to solve the treatment problem. Throughout, we illustrate our arguments with the cholesterol medications, statins.2025-06-29T00:23:06Z12 pagesSamuel J. Weisenthalhttp://arxiv.org/abs/2604.25402v1Sudoku Solving and Finding Magic Squares by Probability Models and Markov Chains2026-04-28T09:15:04ZThe sudoku puzzles have a long history, with variations going back more than a hundred years, but its current and perhaps surprising world-wide prominence goes back to certain initiatives and then puzzle-generating computer programmes from just after 2000. To solve a sudoko puzzle, a statistician can put up a probabilitymodel on the enormous space of $9\times9$ matrix possibilities, constructed to favour `good attempts', and then engineer a Markov chain to sample a long enough chain of sudoku table realisations from that model, until the solution is found. The methods work also for other types of puzzles, like constructing `magic squares' with wished-for properties (sums of rows, columns, diagonals equal, etc.), as is also illustrated in this article; via magic models and equally magic Markov chains I find impressively magic $8\times8$ and $10\times10$ squares.2026-04-28T09:15:04Z11 pages, 5 figures. Statistical Research Report, Department of Mathematics, University of Oslo; will be submitted for publicationNils Lid Hjorthttp://arxiv.org/abs/2508.09079v2Exploring the Shape of Economics: A Multilayer Network Analysis of Social Communities and Intellectual Similarity Among Journals Before and After the 2008 Financial Crisis2026-04-27T13:07:04ZThis paper develops a multilayer network approach for exploring the evolution of scientific disciplines, using the case of economics before and after the 2008 global financial crisis as a large-scale empirical testing ground. The units of analysis are journals, linked by social and intellectual relationships. The analysis covers all journals indexed in EconLit across three years (2006, 2012 and 2019). In the most recent year (2019), the dataset includes 909 journals, over 30,000 editorial board members, more than 260,000 authors, 134,000 articles, and nearly 2 million cited references. For each period, we model journals as connected in a four-layer multiplex network: the social relationships are based on shared editors (interlocking editorship) and shared authors (interlocking authorship), while the intellectual ones are based on shared references (bibliographic coupling) and textual similarity between articles. These four layers are integrated using Similarity Network Fusion to produce unified similarity networks from which journal communities are identified. Comparing the field across the three periods reveals a high degree of structural continuity. Although research topics changed after the crisis, the fundamental social and intellectual relationships among journals remained remarkably stable. A major result of the analysis is that editorial networks play the dominant role in shaping hierarchies and legitimize knowledge production within the discipline. Whether this finding holds in other scientific disciplines remains an open question for future research.2025-08-12T16:58:23Z66 pages, 3 figures, 7 tablesAlberto BacciniLucio BarabesiCarlo Debernardihttp://arxiv.org/abs/2604.23797v1From Random Fringes to Deterministic Response: Statistical Foundations of Time-Reversed Young Interferometry2026-04-26T16:40:59ZYoung interference is usually read as the gradual statistical accumulation of random detection events. Here we show that a time-reversed Young (TRY) geometry has a different statistical character: the fringe is not a marginal distribution of detector positions, but a conditional response indexed by a programmed source coordinate. With a fixed detector and a scanned source basis, the observable is an operational hybrid correlator between detector signal and source label. The resulting interference is deterministic at the response-function level, while noise enters only through estimation precision. We formulate this distinction using Fisher information, estimator variance, and noise scaling, clarifying why TRY naturally supports calibration, lock-in readout, null-fringe sensing, and source-plane superresolution.2026-04-26T16:40:59ZJianming Wenhttp://arxiv.org/abs/2604.23744v1How temperature regimes near the equinox synchronize spring biological events2026-04-26T14:47:28ZMany biological processes, including plant leafout and flowering, occur once cumulative temperatures reach a threshold (the thermal-sum model). In this way, temperatures are thought to coordinate the timing of biological events. But growing evidence suggests that as climates warm, both the advancement of spring has slowed (declining sensitivity) and the variance in the timing of spring events has increased (declining synchrony), raising questions about the resilience of temperature-based coordination to anthropogenic climate change. To answer these questions, researchers have complicated the thermal-sum model, introducing additional factors and mechanisms. We consider whether such complexity is necessary. Using results from the theory of stopped random walks, we show that sensitivity and synchrony are exactly as predicted by the basic thermal-sum model. The theory suggests a nonlinear relationship between temperatures and both the timing and synchrony of biological events. In particular, it predicts that as temperatures increase and springtime events shift from the equinox toward the solstice, the events themselves become less coordinated and more variable. We verify these predictions using experimental and real-world data, including 10,000 observations of common lilacs (United States, 1956-2025). We conclude that the theory provides a powerful tool for understanding the thermal-sum model, particularly when considering additional complexity.2026-04-26T14:47:28ZJonathan AuerbachAndrew GelmanE. M. Wolkovichhttp://arxiv.org/abs/2604.22998v1Perceptions and Utilization of GenAI Tools among Data Science Students and Faculty2026-04-24T20:32:14ZThis study investigates perceptions and use of generative artificial intelligence (GenAI) tools among students and faculty in statistics and data science at a historically Black college or university. Survey data from 119 valid student responses and 14 faculty responses were used to examine familiarity, usage patterns, perceived benefits, awareness of limitations, and instructional support needs. Students reported substantial use of GenAI, with ChatGPT as the dominant tool, primarily for coding assistance and writing support. Although student perceptions of AI in data science workflows and careers were generally positive, confidence in interpreting AI-generated outputs was limited, and concerns about accuracy, reliability, and over-reliance were common. Faculty also viewed GenAI favorably, but self-rated proficiency and the frequency of classroom integration remained limited. Comparisons across student subgroups suggested that familiarity with GenAI and awareness of its limitations varied more by academic level than by gender. These findings highlight a gap between AI adoption and AI literacy and underscore the need for structured training, validation practices, and clearer institutional guidance for responsible AI integration in data science education.2026-04-24T20:32:14ZAbeer M. HasanSayed A. Mostafa