https://arxiv.org/api/3y4RIBCAd7yD1L8t1ZHaBPU5Jo0 2026-06-18T14:07:39Z 23571 330 15 http://arxiv.org/abs/2605.27184v1 Posterior Quantification of Borrowing from Multiple Historical Control Data in Bayesian Dynamic Borrowing Methods: A Scoping Review 2026-05-26T15:36:28Z

Bayesian dynamic borrowing methods incorporate historical control data into current clinical trial analyses while allowing the degree of borrowing to depend on the compatibility between historical and current data. Although many methods have been proposed, the degree of borrowing is often difficult to interpret, especially when multiple historical control sources are available. This scoping review focuses on posterior quantification of borrowing from multiple historical controls. We discuss overall borrowing summaries based on effective historical sample size, together with method-specific source-level summaries of borrowing, information contribution, or compatibility arising from power priors, unit information priors, multisource exchangeability models, Dirichlet process mixture models, and potential bias models. We distinguish posterior borrowing measures from quantities describing prior information allocation or source-specific conflict. Two case studies, one with a binary endpoint and one with a continuous endpoint, illustrate that methods with broadly similar posterior treatment effect estimates may differ in both the overall amount and source-specific pattern of borrowing. These examples show that large overall borrowing may reflect selective borrowing from compatible historical sources rather than uniform borrowing from all sources. We recommend reporting treatment effect estimates together with overall and source-specific borrowing summaries, when available, to improve transparency in posterior inference.

2026-05-26T15:36:28Z Tomohiro Ohigashi Wataru Murasaki Masahiko Gosho http://arxiv.org/abs/2605.27085v1 Estimation and Inference for Win Measures with Multiple Ordinal Endpoints Subject to Missingness 2026-05-26T14:34:45Z

Win measures, including the win ratio (WR), win odds (WO), net benefit (NB), and desirability of outcome ranking (DOOR), are increasingly used in randomized clinical trials with multiple hierarchical ordinal endpoints. In practice, however, one or more component endpoints may have missing data. The standard pairwise-comparison approach, which treats pairs with missing outcomes as ties, can produce biased estimates, even if the data are missing completely at random (MCAR). Although inverse probability of censoring weighting (IPCW) methods have been developed for censored survival endpoints, corresponding methods for addressing missing hierarchical ordinal endpoints are not yet available. To address this gap, we develop inverse probability weighting (IPW) and augmented IPW (AIPW) estimators for win measures with hierarchical ordinal endpoints subject to missing data, allowing missingness to depend on treatment assignment and baseline covariates. The IPW estimator corrects bias by reweighting complete observed outcomes using joint non-missingness probabilities involved in estimating the joint cell probabilities that define the win measures. The AIPW estimator additionally incorporates outcome modeling, improving efficiency and achieving double robustness. For inference, we derive closed-form variance estimators for both methods based on influence functions. Simulation studies show that the standard approach can be substantially biased, whereas the proposed IPW and AIPW estimators remain consistent with near-nominal coverage. Furthermore, the AIPW estimator is generally more efficient than IPW estimator. Applications to the SCOUT-CAP and ACTT-1 trials illustrate the practical utility of the proposed methods. An R package, WinMO, is provided for implementation.

2026-05-26T14:34:45Z Yi Liu Huiman Barnhart Sean O'Brien Yuliya Lokhnygina Roland A. Matsouaka http://arxiv.org/abs/2605.26890v1 Nonlinear and Heavy-Tailed Predictability in Transition-Energy Financial Markets 2026-05-26T11:52:31Z

Transition-related financial markets are increasingly exposed to abrupt repricing episodes, elevated volatility, and heterogeneous macro-financial shocks. Under such conditions, conventional Gaussian-linear forecasting frameworks may provide an incomplete representation of the dependence structure linking fossil-energy, renewable-energy, technology, and utility-sector assets. This paper investigates whether transition-related financial returns exhibit residual non-linear predictability after controlling for heavy-tailed multivariate linear dynamics. To address this question, we develop a hybrid forecasting framework combining Student-t Vector Autoregressions with nonlinear recurrent residual learning architectures. The empirical analysis considers six major exchange-traded funds representing broad equity markets and key transition-sensitive sectors. The results reveal substantial departures from Gaussian-linear behavior, including excess kurtosis, volatility clustering, and remaining nonlinear dependence after econometric filtering. Out-of-sample forecasting experiments show that the proposed framework consistently improves predictive accuracy relative to conventional VAR models, standalone machine-learning methods, and alternative hybrid specifications. The forecasting gains become more pronounced during periods of macro-financial stress, particularly during the COVID-19 crisis and the Ukraine-related energy shock. Overall, the findings suggest that transition-related financial systems exhibit regime-sensitive and heavy-tailed predictive dynamics that are insufficiently captured by standard Gaussian-linear models alone.

2026-05-26T11:52:31Z Kpante Emmanuel Gnandi INSA Toulouse Fredy Pokou MRE, CRIStAL Jules Sadefo Kamdem MRE http://arxiv.org/abs/2605.26843v1 A warning system for risk prediction of metabolic syndrome in a healthy population of blood donors 2026-05-26T10:56:30Z

Metabolic syndrome is a complex clinical condition characterized by the simultaneous presence of multiple metabolic risk factors and represents a major public health concern. The syndrome develops silently and may remain undiagnosed for long periods, highlighting the importance of investigating early metabolic alterations before overt disease onset. Longitudinal monitoring of predominantly healthy individuals may help identify metabolic risk early. The paper proposes a Bayesian statistical model to estimate the probability of metabolic syndrome among blood donors during pre-donation screening, incorporating information collected at previous visits. Using longitudinal data from one of the main blood donor associations in Italy, AVIS Milan, we analyze repeated clinical and lifestyle measurements from a predominantly healthy population of donors. In particular, we fit a Bayesian multivariate model that jointly represents the logarithm of the five diagnostic components of metabolic syndrome. The model accounts for within-donor dependence across repeated visits and provides probabilistic estimates of individual risk. Our framework aims to provide clinicians at AVIS Milan with an interpretable traffic-light warning system (low, intermediate, high risk) during pre-donation screening to facilitate the identification of individuals at risk of metabolic syndrome at future visits and to support targeted preventive interventions during routine donor assessment, ultimately contributing to a long-term reduction in healthcare costs for the Italian national healthcare system.

2026-05-26T10:56:30Z Simone Colombara Ilenia Epifani Alessandra Guglielmi Ettore Lanzarone http://arxiv.org/abs/2604.11481v2 Emergence of Complex Web Structures 2026-05-26T09:26:25Z

Complex structures often emerge from initially homogeneous or weakly correlated states. We address the apparent tension between this ordering and entropy growth through a unified framework combining semi-microscopic phase-space dynamics, transport geometry, information theory, and coarse-grained effective modeling. The key point is that entropy depends on the level of description: a coarse-grained spatial field may become more ordered as structure forms, even while the full phase-space description becomes more complex through shell crossing, multistreaming, and the activation of velocity degrees of freedom. Using a Lagrangian--Eulerian transport map, we show how density amplification is governed by the Jacobian of the deformation and how anisotropic collapse arises from the eigenvalues of a hierarchy of deformation tensors. Long-range interaction or information flow is encoded in the displacement field, so that nonlocality enters directly through transport. We connect this geometric description to a maximum-entropy Gaussian baseline and show how nonlinear transport and nonlocal coupling generate scale coupling, higher-order correlations, and non-Gaussianity. We then formulate a Landau--Ginzburg description in which the growth of seed anisotropies is interpreted as the activation of lower effective free-energy branches, providing a coarse-grained realization of self-organization. Applied to generated cosmological fields, this framework indicates that the nonlocal tidal level becomes relevant already at moderate overdensity. Although cosmological structure formation is the main realization considered here, the framework is intended more broadly as a mesoscopic language for systems in which transport, anisotropy, nonlocality, and self-organization are central.

2026-04-13T13:47:35Z 38 pages, 8 figures, revised manuscript after referee report Francisco-Shu Kitaura http://arxiv.org/abs/2606.07572v1 Forecasting Japanese elections: A nonlinear machine-learning approach 2026-05-26T06:46:24Z

Despite Japan being one of the world's largest advanced democracies, the development of election forecasting models for its national elections remains limited. This study introduces nonlinear machine-learning forecasting models, based on decision tree and ensemble learning methods, for predicting the outcomes of Japanese lower-house elections. To assess the methodological benefits of our approach, we replicated the theoretical framework and dataset of Lewis-Beck and Tien's (LBT) foundational statistical forecasting model for Japanese elections. Our models demonstrated moderately but consistently improved predictive accuracy compared to LBT's model in both in-sample and out-of-sample evaluations, suggesting that nonlinear algorithms offer an alternative approach to classical linear methods in capturing complex electoral dynamics. This study represents one of the earlier applications of nonlinear machine-learning techniques to single-country election forecasting. It offers a replicable framework that, when combined with the country-specific electoral theories of other nations, may enhance the predictive performance of forecasting models in broader national contexts.

2026-05-26T06:46:24Z Sota Kato Xuan Luo Budrul Ahsan Asahi Obata Takafumi Nakanishi http://arxiv.org/abs/2605.26607v1 Log-linear Model for Dual System Estimation and Computational Considerations 2026-05-26T06:43:28Z

The use of dual system estimation (DSE) is heavily used in Census Bureau operations. With DSE methods, it is important to implement methods to infer the population size among those with missing data from one or both data sources. The use of log-linear models, calculated through EM algorithms, promises a way for estimation of counts among all groups with incomplete recorded data, as displayed by Van der Heijden et al. 2022. Unfortunately, the numerical computations involved scale very poorly the more the population is divided, to the point where simultaneous analysis of several demographic and geographic factors, such as state of residence and ethnicity, becomes computationally infeasible. Here, an alternative method to calculate the log-linear estimates will be provided, which can calculate the maximum likelihood estimator in orders of computation lower than through the EM algorithm.

2026-05-26T06:43:28Z Zhiyuan Lu http://arxiv.org/abs/2503.11673v2 Crossing the Kolmogorov-Smirnov Boundary: Exact Tails, Sharp Bounds, and Broken Pivots 2026-05-26T03:21:34Z

The Kolmogorov-Smirnov statistic is usually introduced as a supremum, but its finite-sample behavior is governed by a more local question: where does the empirical process first cross a boundary? This letter gives a partial answer through a finite-sample crossing ledger. The ledger rewrites the Smirnov- Birnbaum-Tingey one-sample formula as an explicit hitting-time law and yields a stable log-scale tail evaluator. For two samples, it gives one-wall and two-wall exact lattice recursions for arbitrary sample sizes, with the balanced reflection formula appearing as a special closed form. The same viewpoint explains the Dvoretzky-Kiefer-Wolfowitz-Massart inequality as an exponential compression of exact crossing sums and shows where exact distribution-free counting stops: under a composite null, fitted parameters change the path itself. Simulations and two small data diagnostics illustrate the resulting calibration warning.

2025-02-27T22:20:22Z Elvis Han Cui Yihao Li Zhuang Liu http://arxiv.org/abs/2605.27463v1 When prompt perturbations break your A/B test: A valid statistical test for generative surveying 2026-05-26T00:35:58Z

Generative surveying -- where collections of LLM-based personas provide feedback on messages -- has emerged as a cheap and scalable alternative to traditional market research. However, LLMs are sensitive to small variations in prompt design and conclusions drawn from generative surveys may depend on arbitrary phrasing choices. Controlling for this sensitivity requires including semantically equivalent perturbations in the analysis. In this paper, we show that standard hypothesis tests, including the sign test and Wilcoxon signed-rank test, are invalid under a statistical model for generative surveying that includes realistic perturbation structure. We propose a permutation test that is valid under this model and formally characterize the conditions under which standard tests fail. Applying our framework to a simple generative surveying problem, we estimate relevant parameters, characterize the power of the permutation test under realistic conditions, and provide practical guidance on budget allocation across personas, perturbations, and replicates. Finally, we show that both the magnitude and direction of the estimated effect are sensitive to the choice of model, even within the same model family.

2026-05-26T00:35:58Z Hayden Helm Carey Priebe http://arxiv.org/abs/2605.26401v1 Small-Area Precipitation Forecasting and Drought--Flood Early Warning with Reverse-Martingale Regularized Recurrent Networks 2026-05-26T00:18:05Z

Small-area precipitation forecasts support real-time decisions for reservoir operation, irrigation planning, drought monitoring, and flash-flood response. Operational value depends not only on point accuracy, but also on calibrated exceedance probabilities and warning rules that remain stable when local weather regimes depart from the training climatology. We evaluate a reverse-martingale regularized recurrent neural network (\RMRNN) for probabilistic precipitation forecasting and sequential early warning. A backward-coherence penalty is added to the recurrent hidden state; the resulting residual process drives a Shiryaev--Roberts (SR) detector, so the same latent trajectory that produces the forecast also supplies a continuously updated drought or flood-regime indicator. The framework is tested on the Taiwan CWA dense rain-gauge network, CHIRPS v2 daily gridded precipitation over Taiwan and the Horn of Africa, and NOAA GHCN-Daily stations over the Texas Hill Country. Across 1{,}000 replications, \RMRNN{} matches or slightly improves the GRU baseline in RMSE, MAE, and CRPS at 1~h--72~h lead while substantially improving alarm characteristics. The SR detector reduces false-alarm ratios by a factor of three to five at matched detection power. In the 2020--2021 Taiwan drought, onset is flagged 8--12 days earlier than SPI-3 thresholding; in the 2023 Typhoon Haikui flood, flash-flood risk is signalled 4~h before the CWA operational alert.

2026-05-26T00:18:05Z 4 figures Foo Hui-Mean Yuan-chin Ivan Chang http://arxiv.org/abs/2605.26253v1 Length-biased Birnbaum-Saunders quantile regression with application to water evaporation 2026-05-25T18:24:09Z

Length-biased distributions arise naturally in environmental, reliability, and economic studies where the sampling mechanism favors larger observational units. In this paper, we propose a quantile regression model based on the length-biased Birnbaum--Saunders (QLBS) distribution. The model is constructed through a reparameterization of the length-biased Birnbaum--Saunders distribution in terms of its quantile function, thereby allowing direct interpretation of covariate effects on conditional quantiles of the response variable. We derive the log-likelihood function and the corresponding score equations, and obtain maximum likelihood estimators via numerical optimization. Asymptotic and bootstrap confidence intervals are considered. Two types of residuals are proposed for model assessment, namely the generalized Cox--Snell and randomized quantile residuals. An elaborate Monte Carlo simulation study is carried out to evaluate the performance of the maximum likelihood estimators for several sample sizes and quantile levels. The proposed methodology is illustrated with a real meteorological data set from Brazil.

2026-05-25T18:24:09Z 21 pages, 3 figures Helton Saulo Tailine Nonato Roberto Vila http://arxiv.org/abs/2601.09525v2 Sparse covariate-driven factorization of high-dimensional brain connectivity with application to site effect correction 2026-05-25T15:34:55Z

Large-scale neuroimaging studies often collect data from multiple scanners across different sites, where variations in scanners, scanning procedures, and other conditions across sites can introduce artificial site effects. These effects may bias brain connectivity measures, such as functional connectivity (FC), which quantify functional network organization derived from functional magnetic resonance imaging (fMRI). How to leverage high-dimensional network structures to effectively mitigate site effects has yet to be addressed. In this paper, we propose SLACC (Sparse LAtent Covariate-driven Connectome) factorization, a multivariate method that explicitly parameterizes covariate effects in latent subject scores corresponding to sparse rank-1 latent patterns derived from brain connectivity. The proposed method identifies localized site-driven variability within and across brain networks, enabling targeted correction. We develop a penalized Expectation-Maximization (EM) algorithm for parameter estimation, incorporating the Bayesian Information Criterion (BIC) to guide optimization. Extensive simulations validate SLACC's robustness in recovering the true parameters and underlying connectivity patterns. Applied to the Autism Brain Imaging Data Exchange (ABIDE) dataset, SLACC demonstrates its ability to reduce site effects.

2026-01-14T14:48:13Z Rongqian Zhang Elena Tuzhilina Jun Young Park http://arxiv.org/abs/2605.25870v1 The Symmetric Location Problem: a Song of Efficiency and Robustness 2026-05-25T13:56:35Z

The aim of this Lecture Note is to introduce the Signal Processing (SP) community to a powerful yet still under-utilised tool: the semiparametric statistics. In short, the semiparametric framework allows us to estimate or perform hypothesis testing on a finite-dimensional parameter in the presence of an infinite-dimensional nuisance parameter (i.e. a function), such as the density of the noise. Clearly, this framework is general enough to include almost every SP application. Remarkably, as the title suggests drawing on George R. R. Martin's famous book series, the greatest advantage of semiparametric statistics over parametric and non-parametric ones lies in the fact that it is able to reconcile two seemingly dichotomous concepts: statistical efficiency and robustness. Here, robustness is understood in the sense of distribution-freeness, that is the estimation performance must be robust with respect to the lack of knowledge of the functional form of the generating data distribution. To explain exactly what this means, in this Lecture Note we will focus our attention on the famous and fundamental symmetric location problem. The symmetric location problem is a fundamental problem that can be found (in various forms) in countless areas of SP: source localization, time synchronization, array signal processing, and distributed sensor networks, just to name a few. Furthermore, it is important to note that the methodology we will develop for this specific problem can be extended to much more general semiparametric estimation problems, such as the estimation of the location vector and covariance matrix in elliptical data.

2026-05-25T13:56:35Z Stefano Fortunati http://arxiv.org/abs/2602.05938v2 DiPPER: A Bayesian approach to differential prevalence analysis with applications in microbiome studies 2026-05-25T12:31:56Z

Recent evidence suggests that analyzing the presence/absence of taxonomic features can offer a compelling alternative to differential abundance analysis in microbiome studies. However, standard approaches to differential prevalence analysis face challenges with boundary cases and multiple testing. To address these limitations, we developed DiPPER (Differential Prevalence via Probabilistic Estimation in R), a method based on Bayesian hierarchical modeling. We benchmarked our method against existing differential prevalence methods, along with two differential abundance tools, using publicly available data from 57 human gut microbiome studies. We observed considerable variation in performance across the evaluated methods. Importantly, DiPPER demonstrated high sensitivity to detect potentially differentially prevalent features while maintaining a well-calibrated family-wise error rate under the global null hypothesis. Most notably, it outperformed the alternatives in the replication of findings across independent studies. Furthermore, DiPPER provides differential prevalence estimates and uncertainty intervals that are inherently adjusted for multiple testing.

2026-02-05T17:49:08Z Source code and datasets: https://github.com/jepelt/differential-prevalence. R package: https://github.com/jepelt/DiPPER Juho Pelto Kari Auranen Janne V. Kujala Leo Lahti http://arxiv.org/abs/2605.25734v1 Stein-Encoder: A White-Box Supervised Encoder via Stein Identities in Multi-Modal Studies 2026-05-25T11:43:09Z

In multi-modal biomedical research, integrating high-dimensional genomic data with clinical baselines is essential for precision medicine. However, standard deep neural network approaches often entangle these modalities, obscuring the specific predictive impact of genetic features and leading to possibly suboptimal predictive performance. Motivated by the landmark METABRIC cohort primary breast tumors study, we propose the Stein-Encoder, a white-box supervised framework designed to isolate the genetic signal driving clinical outcomes conditional on nuisance covariates. By leveraging Stein's method and residualization techniques, our approach constructs an interpretable single index that summarizes relevant biological heterogeneity while flexibly incorporating clinical factors and can be used to improve downstream prediction. We establish theoretical guarantees for identification, consistency and efficiency improvement. Applied to the METABRIC cohort, the Stein-Encoder outperforms unsupervised benchmarks in predictive accuracy. Crucially, it achieves structural disentanglement by revealing response-specific biological mechanisms: we find that tumor size is driven primarily by mitotic networks, whereas prognostic indices rely on a distinct proliferation-versus-immune axis. This work contributes a unified, computationally efficient framework that bridges statistical rigor with the representational power of neural networks, enabling interpretable, task-specific and efficient compression of multi-modal health data for a wide range of precision medicine applications, beyond biomarker discovery.

2026-05-25T11:43:09Z Jiarui Zhang Shuoxun Xu Jiasheng Shi Xinzhou Guo