https://arxiv.org/api/aoXFtZKkisI6Ffuqnq44tOrPflc 2026-06-18T20:13:11Z 23571 420 15 http://arxiv.org/abs/2605.19401v1 External Demand, Domestic Monetary Conditions, and Remittance Dynamics in Nepal 2026-05-19T05:54:31Z

This study investigates the macroeconomic determinants and dynamic behaviour of personal remittances as a share of Gross Domestic Product (GDP) in Nepal, emphasizing external demand in major destination countries and domestic monetary policy. Using annual data (1993-2024), we construct composite indices via Principal Component Analysis (PCA) for multi-country external demand and a domestic Monetary Conditions Index (MCI). Our small-sample econometric pipeline includes Autoregressive Distributed Lag (ARDL) bounds testing, Engle-Granger cointegration, Dynamic OLS (DOLS), and a two-step Error Correction Model (ECM). We also employ Granger causality tests and multi-model forecasting using machine learning and ECM scenarios. The analysis reveals a strong positive long-run effect of external demand on remittances and a significant negative impact of tighter domestic monetary conditions. The ECM confirms a stable cointegrating relationship, correcting approximately 26% of disequilibria annually. Medium-term projections indicate remittances will remain structurally important, reaching around 28.3% of GDP by 2030 under baseline conditions, while exhibiting high sensitivity to external demand shocks. This study advances the literature by integrating PCA-derived external demand and monetary conditions indices within a unified ARDL-ECM framework for small samples. Focusing on one of the world's most remittance-dependent economies, it offers actionable insights for monetary policy calibration, migration diversification, and the productive utilization of remittance inflows.

2026-05-19T05:54:31Z 16 pages, 1 figure, 7 tables Sahaj Raj Malla http://arxiv.org/abs/2605.19370v1 A General Statistical Framework for Hardy-Weinberg Equilibrium Inference on the X Chromosome 2026-05-19T05:07:23Z

Testing for Hardy-Weinberg equilibrium (HWE) is a fundamental component of genetic data analysis, widely used for quality control and model validation. Although HWE testing is well established for autosomal loci, inference on the X chromosome is more complex due to sex-specific genotype structures and potential sex differences in minor allele frequency (sdMAF). Existing tests differ in their assumptions about sdMAF and male sample inclusion, often leading to distinct but poorly characterized null hypotheses. We develop a general statistical framework for HWE inference using the robust allele-based regression model. By formulating HWE testing as an assessment of allele-level dependence, the framework directly parameterizes Hardy-Weinberg disequilibrium, unifies existing Pearson chi-square-based tests under explicit modeling assumptions, and clarifies their null hypotheses, degrees of freedom, and sensitivity to sdMAF. The framework also accommodates covariate and population-structure adjustment within a unified regression-based formulation. The proposed framework provides robust, interpretable, and flexible inference, establishing a unified statistical foundation for HWE testing across autosomal and X-chromosomal regions. Simulation studies and analysis of high-coverage 1000 Genomes Project data demonstrate that commonly used X-chromosome tests can exhibit inflated type I error or misleading inference when sdMAF is present.

2026-05-19T05:07:23Z Lin Zhang Andrew Paterson Lei Sun http://arxiv.org/abs/2605.19275v1 Open-Weight LLMs Are Often Competitive with Commercial APIs for Political Science Text Classification 2026-05-19T02:46:35Z

Can researchers use local open-weight models instead of commercial APIs for LLM text classification? Local models avoid marginal API charges, keep data on the researcher's machine, and make exact model versions easier to preserve. I benchmark five local models against four commercial API models on 34 political science classification tasks. Local models are often competitive, especially on simpler tasks. In a task-specific oracle comparison, local models match or exceed API performance on 9 tasks; on average, the best API model exceeds the best local model by 0.015 F1. The four strongest observed model means fall within 0.021 F1. API models have their clearest edge on complex tasks with many labels or multiple outputs per item. Batching several items in one prompt usually reduces local runtime per item, but specific model-task pairs can return invalid response formats or labels. Taken together, the results make local open-weight models a practical candidate alternative for many political science classification tasks, provided researchers validate candidate models on task-specific labels and check batching reliability before scaling up.

2026-05-19T02:46:35Z Hanno Hilbig http://arxiv.org/abs/2605.19208v1 Precision Physical Activity Prescription via Reinforcement Learning for Functional Actions 2026-05-19T00:17:59Z

Physical activity (PA) plays an important role in maintaining and improving health. Daily steps have been a key PA measure that is easily accessible with common wearable devices. However, methods are lacking to recommend a personalized optimal distribution of daily steps over a period of time for the best of certain health biomarkers. In this paper, we fill this void based on the data from the All of Us Research Program which includes months of step counts as well as repeated measurements of key health biomarkers. We develop a new offline reinforcement learning (RL) algorithm to learn personalized and optimal PA distributions associated with cardiometabolic risk, where the action is a function representing the daily step distribution over a period of time. Simulation studies demonstrate the advantage of the proposed approach over existing continuous-action RL methods. The learned optimal policy from the All of Us data generally suggests people take more daily steps and also follow a more consistent pattern of PA over time while offering tailored recommendations for subgroups in blood glucose level, body mass index, blood pressure, age, and sex.

2026-05-19T00:17:59Z Gefei Lin Rui Miao Jennifer Sacheck Xiaoke Zhang http://arxiv.org/abs/2504.08220v2 Feature aware covariance estimation, with application to mixtures of chemical exposures 2026-05-18T20:59:11Z

The motivation of this article is to improve inferences on the covariation in environmental exposures, motivated by data from a study of Toddlers Exposure to SVOCs in Indoor Environments (TESIE). The challenge is that the sample size is limited, so empirical covariance provides a poor estimate. In related applications, Bayesian factor models have been popular; these approaches express the covariance as low rank plus diagonal and can infer the number of factors adaptively. However, they have the disadvantage of shrinking towards a diagonal covariance, often under estimating important covariation patterns in the data. Alternatively, the dimensionality problem is addressed by collapsing the detailed exposure data within chemical classes, potentially obscuring important information. We apply a feature aware covariance regression extension of Bayesian factor analysis, which improves performance by including information from features summarizing properties of the different exposures. This approach enables shrinkage to more flexible covariance structures, reducing the over-shrinkage problem, as we illustrate in the TESIE data using various chemical features.

2025-04-11T03:00:12Z 25 pages, 6 figures Elizabeth Bersson Kate Hoffman Heather M. Stapleton David B. Dunson http://arxiv.org/abs/2506.20058v2 Causal mediation analysis for longitudinal and survival data in continuous time using Bayesian non-parametric joint models 2026-05-18T20:51:54Z

Observational cohort data is an important source of information for understanding the causal effects of treatments on survival and the degree to which these effects are mediated through changes in disease-related risk factors. However, these analyses are often complicated by irregular data collection intervals and the presence of longitudinal confounders and mediators. We propose a causal mediation framework that jointly models longitudinal exposures, confounders, mediators, and time-to-event outcomes as continuous functions of age. This framework for longitudinal covariate trajectories enables statistical inference even at ages where the subject's covariate measurements are unavailable. The observed data distribution in our framework is modeled using an enriched Dirichlet process mixture (EDPM) model. Using data from the Atherosclerosis Risk in Communities cohort study, we apply our methods to assess how medication -- prescribed to target cardiovascular disease (CVD) risk factors -- affects the time-to-CVD death.

2025-06-24T23:43:36Z Saurabh Bhandari Michael J. Daniels Juned Siddique http://arxiv.org/abs/2605.19100v1 ldmppr: Location Dependent Marked Point Processes in R 2026-05-18T20:40:43Z

In this article, we present $\textbf{ldmppr}$, an R package for estimating, evaluating, simulating from, and visualizing location-dependent marked spatial point processes. To date, it has commonly been assumed that the marks associated with a point process are independent of the locations. However, when dealing with many point processes, such as those arising in forestry applications, the independence assumption proves unreasonable. We introduce a practical framework for generating marked point processes with dependence between the marks and locations. We provide a brief discussion of the theory underpinning our modeling approach and outline the use of the package in a typical scenario involving real data. We highlight the functionality of the package for both generating from and assessing the goodness-of-fit of a given model, enabling users to generate realistic point patterns given a reference pattern or parameter values of interest.

2026-05-18T20:40:43Z 31 pages, 5 figures Lane Drew Andee Kaplan http://arxiv.org/abs/2506.19958v4 RobustiPy: An efficient next generation multiversal library with model selection, averaging, resampling, and explainable artificial intelligence 2026-05-18T18:14:02Z

Scientific inference is often undermined by the vast but rarely explored "multiverse" of defensible modelling choices, which can generate results as variable as the phenomena under study. We introduce RobustiPy, an open-source Python library that systematizes multiverse analysis and model-uncertainty quantification at scale. RobustiPy unifies bootstrap-based inference, combinatorial specification search, model selection and averaging, joint-inference routines, and explainable AI methods within a modular, reproducible framework. Beyond exhaustive specification curves, it supports rigorous out-of-sample validation and quantifies the marginal contribution of each covariate. We demonstrate its utility across five simulation designs and ten empirical case studies spanning economics, sociology, psychology, and medicine, including a re-analysis of widely cited findings with documented discrepancies. Benchmarking on ~672 million simulated regressions shows that RobustiPy delivers state-of-the-art computational efficiency while expanding transparency in empirical research. By standardizing and accelerating robustness analysis, RobustiPy transforms how researchers interrogate sensitivity across the analytical multiverse, offering a practical foundation for more reproducible and interpretable computational science.

2025-06-24T19:11:42Z Daniel Valdenegro Jiani Yan Duiyi Dai Charles Rahal http://arxiv.org/abs/2605.18728v1 Bayesian Sparse Regression for Microbiome-Metabolite Data Integration 2026-05-18T17:51:59Z

Numerous studies have shown that microbial metabolites, which represent the products of bacteria in the human gut, play a key role in shaping cancer risk and response to treatment. However, metabolite data typically contain a large proportion of missing values, which may result from either low abundance or technical challenges in data processing. Moreover, given the compositionality of microbiome data, where the observed abundances can only be interpreted on a relative scale, standard variable selection methods are not applicable. In this project, we propose a novel Bayesian regression method to address these challenges in the integration of metabolite and microbiome data. Key features of our proposed model include modeling the two different mechanisms of missingness for the metabolite data and adopting a Bayesian prior designed to address the compositional characteristics of microbiome data. We demonstrate on simulated data that our proposed model can accurately impute the unobserved true metabolite values and correctly select the relevant microbiome predictors. We further illustrate our method using real data from a study focused on understanding the interplay between the microbiome and metabolome in colorectal cancer.

2026-05-18T17:51:59Z 28 pages including references Kai Jiang Satabdi Saha Christine B. Peterson http://arxiv.org/abs/2605.14565v2 A Bayesian Longitudinal Spatial Normative Model for Individualized Brain Deviation Mapping 2026-05-18T17:38:32Z

Normative modeling enables individualized characterization of structural brain deviations by evaluating subjects against a reference population rather than a group average. Most existing implementations treat brain regions independently and remain cross-sectional, despite the availability of repeated neuroimaging measurements and the well-documented spatial organization of neuroanatomical variation. We propose a Bayesian longitudinal spatial normative model that jointly captures within-subject temporal dependence and spatially structured subject-specific deviations within a unified hierarchical framework. The individualized deviation map is treated as a latent spatial process with an explicit posterior distribution, yielding a principled Bayes estimator under squared error loss rather than an ad hoc residual summary. Across six simulation scenarios encompassing varying spatial dependence, nonlinear trajectories, irregular visit schedules, and missing follow-up, the proposed model consistently reduced deviation-map reconstruction error relative to independent cross-sectional and longitudinal non-spatial benchmarks while maintaining stable calibration. In an application to OASIS-3 structural MRI data, the model reduced RMSE by 54% relative to the independent cross-sectional model and by 45% relative to the longitudinal non-spatial model. Regional deviation burden was concentrated in the temporal pole, entorhinal cortex, inferior temporal cortex, posterior cingulate, and parahippocampal cortex, consistent with regions implicated in early Alzheimer-type neurodegeneration. Subject-level profiles revealed substantial heterogeneity in regional abnormality patterns, including marked multiregional deviation with preserved global cognitive scores.

2026-05-14T08:36:45Z J. T. Korley http://arxiv.org/abs/2605.07855v2 Jagged AI in Scientific Peer Review: Evidence from POMP Data Analysis 2026-05-18T17:09:53Z

Despite their growing use in academic writing and statistical analysis, the performance of artificial intelligence (AI) tools in scientific peer review remains a largely unexplored area. A key challenge is jagged AI, a phenomenon where AI exhibits strong ability spikes in some domains while remaining deficient in others. To study this jaggedness in a practical data science context, we considered the task of reviewing partially observed Markov process (POMP) data analyses. POMP models, also known as state-space models or hidden Markov models, are used to fit mechanistic dynamic models to time series data in diverse applications including disease transmission, ecological dynamics, and financial risk assessment. High-quality peer review in this area entails assessment of scientific context, identification of errors in implementing complex algorithms, and decisions concerning methodological best practices. We studied 72 POMP projects from four semesters of a University of Michigan graduate time series course for which the project reports, the source code, and student peer reviews are anonymized and open-access. We compared the human reviews with four AI reviewing agents, using Claude Code with differing instructions implemented as skill files. We found that AI reviewers exhibited a jagged capability profile, proficiently catching human-overlooked technical errors and invalid inference methodology, while failing to match human standards in checking interpretive errors, narrative coherence, and domain-informed model critique. The jaggedness was found to be similar for all agents, consistent with it being primarily a property of the underlying AI model rather than the specific instructions. Skill file configuration shifted which weaknesses agents emphasized, without removing the jaggedness.

2026-05-08T15:17:29Z Jin Wook Lee William Szegda Zhisheng Song Edward L. Ionides http://arxiv.org/abs/2605.18562v1 Estimating Item Difficulty with Large Language Models as Experts 2026-05-18T15:42:13Z

Accurate estimates of item difficulty are essential for valid assessment and effective adaptive learning. However, for newly created tasks, response data are typically unavailable. Pretesting and expert judgement can be costly and slow, while machine learning methods often require large labelled training datasets. Recent work suggests that large language models (LLMs) may help. However, there is limited evidence on the elicitation procedures and prompt configurations used to emulate experts for difficulty estimation. This study addresses this gap by evaluating three off-the-shelf LLMs as difficulty raters for newly created items without access to response data. Using an item bank from an online learning system, the study examined 6 domains of primary-school mathematics, with empirical difficulty estimates treated as empirical reference. The study used a full factorial design crossing three factors: judgement format (absolute vs pairwise), decision type (hard decisions vs token-probability-based estimates), and prompting strategy (zero-shot vs few-shot). LLM-derived difficulty estimates were compared with empirical difficulties using Spearman rank correlations. Across domains, LLM-based estimates exhibited moderate to strong positive correlations with empirical item difficulties. For simpler arithmetic tasks, some configurations approached the upper end of the accuracy range reported for human experts in previous research. Pairwise comparison consistently outperformed absolute judgement in the absence of additional refinements. However, when token-level probabilities were incorporated and examples of items with known empirical difficulty were provided, the absolute judgement configuration likewise demonstrated moderate-to-high alignment. The study positions LLMs as a promising tool for initial item calibration and offers insights into effective workflow configuration.

2026-05-18T15:42:13Z 24 pages, 2 figures, 9 tables Diana Kolesnikova Department of Methodology and Statistics, Tilburg University, Tilburg, Netherlands Kirill Fedyanin Smart Business Technologies, Belgrade, Serbia Abe D. Hofman Department of Psychological Methods, University of Amsterdam, Amsterdam, Netherlands Prowise Learn, Amsterdam, Netherlands Matthieu J. S. Brinkhuis Department of Information and Computing Sciences, Utrecht University, Utrecht, Netherlands Maria Bolsinova Department of Methodology and Statistics, Tilburg University, Tilburg, Netherlands http://arxiv.org/abs/2605.18338v1 Robust Player-Conditional Champion Ranking for League of Legends: Style Similarity, Mastery Priors, and Archetype-Constrained Discovery 2026-05-18T12:52:16Z

Champion recommendation in multiplayer online battle arena games is usually framed informally as a problem of metagame strength, personal comfort, or global win rate. We formalize champion recommendation in League of Legends as an interpretable, player-conditional ranking problem under sparse, noisy, and non-stationary behavioral data. The proposed framework combines four information sources: a population-strength proxy, player-style similarity, direct and indirect mastery priors, and archetype-level guardrails. The method uses robust median/MAD normalization, logarithmic transforms for skewed event counts, recency-weighted player style vectors, mastery-weighted champion-pool vectors, weighted cosine similarity, rank-scaled score components, and k-means++ clustering for coarse archetype support. The implemented prototype uses a Python/Pandas modeling layer, Supabase-backed storage, and a web-facing recommendation interface. Unlike black-box supervised win-prediction systems, the proposed method returns decomposed recommendation scores that can be inspected as expected-performance proxy, fit, mastery, and archetype compatibility. A single-player case study on a 100-game history for the player identifier DIVINERAINRACCON is included as an end-to-end sanity check. The manuscript is therefore a methods and systems contribution: it specifies a reproducible, modular, and auditable champion recommender and gives a validation protocol for future large-scale evaluation through temporal train-test splits, next-champion recovery, calibration analysis, and ablation studies.

2026-05-18T12:52:16Z 11 pages, 3 figures Min Heo Pranav Kadiyam Prasun Panthi http://arxiv.org/abs/2508.08080v3 Symbolic Quantile Regression for the Interpretable Prediction of Conditional Quantiles 2026-05-18T12:48:24Z

Symbolic Regression (SR) is a well-established framework for generating interpretable or white-box predictive models. Although SR has been successfully applied to create interpretable estimates of the average of the outcome, it is currently not well understood how it can be used to estimate the relationship between variables at other points in the distribution of the target variable. Such estimates of e.g. the median or an extreme value provide a fuller picture of how predictive variables affect the outcome and are necessary in high-stakes, safety-critical application domains. This study introduces Symbolic Quantile Regression (SQR), an approach to predict conditional quantiles with SR. In an extensive evaluation, we find that SQR outperforms transparent models and performs comparably to a strong black-box baseline without compromising transparency. We also show how SQR can be used to explain differences in the target distribution by comparing models that predict extreme and central outcomes in an airline fuel usage case study. We conclude that SQR is suitable for predicting conditional quantiles and understanding interesting feature influences at varying quantiles.

2025-08-11T15:27:40Z Transactions on Machine Learning Research, May 2026, https://openreview.net/pdf?id=x9OYbyPJOG Cas Oude Hoekstra Floris den Hengst http://arxiv.org/abs/2605.17920v1 Multivariate reconciliation for hierarchical time series 2026-05-18T06:29:24Z

Some time series can be hierarchically organized into levels based on certain characteristics, such as geography or other attributes of interest. These series are referred to as hierarchical time series. Typically, forecasts are generated at all levels to ensure coherence, meaning that the forecasts should satisfy the same aggregation constraints as the observed data. Various approaches have been proposed to guarantee this coherence by using a set of base forecasts. The process through which these forecasts are adjusted to become coherent is known as forecast reconciliation. Similar to the univariate case, multivariate time series can also be structured hierarchically. However, all existing approaches are limited to a single variable. As a result, ensuring coherent forecasts requires reconciling each variable separately. However, this process does not account for correlations among multiple variables. To address this limitation, this paper proposes a multivariate reconciliation methodology that ensures coherent forecasts and incorporates relationships among variables. The proposed methodology was tested through numerical simulations, considering distinct scenarios within the series hierarchy and across multiple variables. Additionally, some base forecasting models were evaluated. The methodology was also applied to real employment data of admissions and dismissals in Brazil. The results demonstrated that multivariate reconciliation yielded more accurate outcomes than the other methods considered, both in simulated data and in practical applications.

2026-05-18T06:29:24Z 22 pages, 7 figures, 8 tables Ana Caroline Pinheiro Rodrigo de Souza Bulhões Rob J. Hyndman Paulo Canas Rodrigues