https://arxiv.org/api/UHiOO8o1URzZm2j/R7u8mPJLOXs 2026-03-20T18:51:36Z 34634 75 15 http://arxiv.org/abs/2508.03675v2 Partial Conjunction Analysis in Neuroimaging: A Comparative Study 2026-03-17T12:49:19Z Replicability is a cornerstone of science. The partial conjunction (PC) hypothesis testing framework objectively quantifies replicability across disciplines. Although several statistical methodologies for testing PC hypotheses exist, it is not clear which method performs well under which circumstances. In this paper, we consider the PC hypothesis testing problem from a neuroimaging perspective. Identifying the brain regions activated by a specific cognitive task constitutes a central challenge in neuroimaging. This problem becomes complex when the objective is to evaluate whether activation patterns are consistent across different cognitive tasks or subjects. In this paper, we cast this question as a PC hypothesis testing problem, assessing, for each location in the brain, whether it is activated in at least $γ$ subjects, for a pre-specified granularity $γ$. In our comparative study, we consider three methods, namely: adaFilter, CoFilter, and a method proposed by Benjamini, Heller, and Yekutieli (BHY). In equi-correlated simulated data, the BHY procedure tends to outperform the competing methods for high values of $γ$, while CoFilter performs well for low values of $γ$. In the real-data analysis, CoFilter dominates the other methods for intermediate values of $γ$. 2025-08-05T17:44:15Z Monitirtha Dey Anna Vesely Thorsten Dickhaus http://arxiv.org/abs/2603.01381v2 Differential gene expression analysis via two-component mixture models with a semiparametric skew-normal scale mixture alternative 2026-03-17T12:48:42Z Two-component mixture models are particularly useful for identifying differentially expressed genes, but their performance can deteriorate markedly when the alternative distribution departs from parametric assumptions or symmetry. We propose a semiparametric mixture model in which the null component is standard normal and the alternative follows a skew-normal scale mixture with an unspecified scale mixing distribution. This formulation accommodates skewness and heavy tails, providing a flexible and computationally tractable tool for differential gene-expression analysis without restrictive distributional assumptions. We establish identifiability and consistency of the model and develop an efficient estimation algorithm that incorporates nonparametric maximum likelihood estimation of the scale distribution. Numerical studies show notable improvements over existing parametric and nonparametric approaches for modeling the alternative distribution, and applications to colon cancer and leukemia datasets demonstrate reduced false discovery and false negative rates. 2026-03-02T02:22:09Z Sangkon Oh Geoffrey J. McLachlan http://arxiv.org/abs/2509.01435v2 On the interplay between prior weight and variance of the robustification component in Robust Mixture Prior Bayesian Dynamic Borrowing approach 2026-03-17T11:53:32Z Robust Mixture Prior (RMP) is a popular Bayesian dynamic borrowing method, which combines an informative historical distribution with a less informative component (referred as robustification component) in a mixture prior to enhance the efficiency of hybrid-control randomized trials. Current practice typically focuses solely on the selection of the prior weight that governs the relative influence of these two components, often fixing the variance of the robustification component to that of a single observation. In this study we demonstrate that the performance of RMPs critically depends on the joint selection of both weight and variance of the robustification component. In particular, we show that a wide range of weight-variance pairs can yield practically identical posterior inferences (in particular regions of the parameter space) and that large variance robust components may be employed without incurring in the so called Lindley's paradox. We further show that the use of large variance robustification components leads to improved asymptotic Type I error control and enhanced robustness of the RMP to the specification of the location parameter of the robustification component. Finally, we leverage these theoretical results to propose a novel and practical hyper-parameter elicitation routine. 2025-09-01T12:44:57Z 38 pages, 11 figures (5 in main, 6 in SM), 1 Table Marco Ratta Gaelle Saint-Hilary Mauro Gasparini Pavel Mozgunov http://arxiv.org/abs/2603.16400v1 A nonparametric approach to understand multivariate quantile dynamics in financial time series 2026-03-17T11:39:16Z Over the last decade, nonparametric methods have gained increasing attention for modeling complex data structures due to their flexibility and minimal structural assumptions. In this paper, we study a general multivariate nonparametric regression framework that encompasses a broad class of parametric models commonly used in financial econometrics. Both the response and the covariate processes are allowed to be multivariate with fixed finite dimensions, and the framework accommodates temporal dependence, thereby introducing additional modeling and theoretical hurdles. To address these challenges, we adopt a functional dependence structure which permits flexible dynamic behavior while maintaining tractable asymptotic analysis. Within this setting, we establish strong and weak convergence results for the estimators of the conditional mean and volatility functions. In addition, we investigate conditional geometric quantiles in the multivariate time series context and prove their consistency under mild regularity conditions. The finite sample performance is examined through comprehensive simulation studies, and the methodology is illustrated by modeling the stock returns of Maersk and Lockheed Martin as a nonparametric function of a geopolitical risk index. 2026-03-17T11:39:16Z Kunal Rai Archi Roy Itai Dattner Soudeep Deb http://arxiv.org/abs/2512.07709v2 Bounds on inequality with incomplete data 2026-03-17T11:35:03Z We develop a unified nonparametric framework for sharp partial identification and inference on inequality indices when the data contain coarsened observations of the variable of interest. We characterize the extremal allocations for all Schur-convex inequality measures, and show that sharp bounds are attained by distributions with finite support. This reduces the computational problem to finite-dimensional optimization, and for indices admitting linear-fractional representations after suitable ordering of the data (including the Gini coefficient and quantile ratios), we express the bound problems as linear or quadratic programs. We then establish $\sqrt{n}$ inference for the upper and lower bounds using a directional delta method and bootstrap confidence intervals. In applications, we compute sharp Gini bounds from household wealth data with mixed point and interval observations and use historical U.S. grouped income tables to bound time series for the Gini and quantile ratios. 2025-12-08T16:55:38Z James Banks Thomas Glinnan Tatiana Komarova http://arxiv.org/abs/2603.15175v2 Bayesian Inference in Epidemic Modelling: A Beginner's Guide 2026-03-17T11:30:41Z This lecture note provides a self-contained introduction to Bayesian inference and Markov Chain Monte Carlo (MCMC) methods for parameter estimation in epidemic models. Using the classical Susceptible-Infectious-Recovered (SIR) compartmental model as a running example, we derive the likelihood function from first principles, specify priors on the transmission and recovery parameters, and implement the Metropolis-Hastings algorithm to sample from the posterior distribution. The note is aimed at graduate students and researchers in mathematical epidemiology with limited prior exposure to Bayesian statistics. 2026-03-16T12:10:45Z 12 pages, 2 plots Augustine Okolie http://arxiv.org/abs/2603.16344v1 A flexible wrapped Lindley-type distribution for angular data modelling 2026-03-17T10:21:28Z Flexible distributions for modelling angular data have received considerable attention in recent years, with ongoing work extending existing circular models to provide greater flexibility in capturing diverse angular behaviours. In this paper, we introduce and study the w3PL distribution, a circular model obtained by extending the wrapped Lindley distribution by incorporating two additional shape parameters. The proposed generalisation increases flexibility in modelling concentration and skewness while preserving analytical tractability and encompassing existing circular models as special cases. Closed-form expressions for the probability density function, cumulative distribution function, and trigonometric moments are derived, allowing key distributional properties to be studied analytically. The distributional modality is characterised, and the nature of invariance is investigated for the newly proposed circular model. Parameter estimation is developed within a regularised maximum likelihood framework, and a simulation study demonstrates reliable parameter recovery and stable finite-sample performance. Applications to angular datasets from geology, marine biology, and finance illustrate the model's practical significance and show improved fit relative to existing circular alternatives. 2026-03-17T10:21:28Z Johan Ferreira Delene van Wyk-de Ridder Janet van Niekerk http://arxiv.org/abs/2409.01983v3 The causal interpretation of acceleration factors 2026-03-17T10:20:48Z In studies of time-to-event outcomes with unmeasured heterogeneity, the hazard ratio for treatment is known to have a complex causal interpretation. Accelerated failure time (AFT) models, which assess the effect on the survival time ratio scale, are often suggested as a better alternative because they model a parameter with direct causal interpretation while allowing straightforward adjustment for measured confounders. In this work, we formalize the causal interpretation of the acceleration factor in AFT models using structural causal models and data under independent censoring. We prove that the acceleration factor is a valid causal effect measure, even in the presence of frailty and treatment effect heterogeneity. Through simulations, we show that the acceleration factor better captures the causal effect than the hazard ratio when both AFT and conditional proportional hazards models apply. Additionally, we extend the interpretation to systems with time-dependent acceleration factors, illustrating the impossibility of distinguishing between a time-varying homogeneous effect and unmeasured effect heterogeneity. While the causal interpretation of acceleration factors is promising, we caution practitioners about potential challenges for the interpretation in the presence of effect heterogeneity. 2024-09-03T15:25:55Z Mari Brathovde Hein Putter Morten Valberg Richard A. J. Post http://arxiv.org/abs/2501.11738v2 A new class of non-stationary Gaussian fields with general smoothness on metric graphs 2026-03-17T10:16:16Z The increasing availability of network data has driven the development of advanced statistical models specifically designed for metric graphs, where Gaussian processes play a pivotal role. While models such as Whittle-Matérn fields have been introduced, there remains a lack of practically applicable options that accommodate flexible non-stationary covariance structures or general smoothness. To address this gap, we propose a novel class of generalized Whittle-Matérn fields, which are rigorously defined on general compact metric graphs and permit both non-stationarity and arbitrary smoothness. We establish new regularity results for these fields, which extend even to the standard Whittle-Matérn case. Furthermore, we introduce a method to approximate the covariance operator of these processes by combining the finite element method with a rational approximation of the operator's fractional power, enabling computationally efficient Bayesian inference for large datasets. Theoretical guarantees are provided by deriving explicit convergence rates for the covariance approximation error, and the practical utility of our approach is demonstrated through simulation studies and an application to traffic speed data, highlighting the flexibility and effectiveness of the proposed model class. 2025-01-20T20:49:56Z David Bolin Lenin Riera-Segura Alexandre B. Simas http://arxiv.org/abs/2503.07327v2 Casewise and Cellwise Robust Multilinear Principal Component Analysis 2026-03-17T09:49:22Z Multilinear Principal Component Analysis (MPCA) is an important tool for analyzing tensor data. It performs dimension reduction similar to PCA for multivariate data. However, standard MPCA is sensitive to outliers. It is highly influenced by observations deviating from the bulk of the data, called casewise outliers, as well as by individual outlying cells in the tensors, so-called cellwise outliers. This latter type of outlier is highly likely to occur in tensor data, as tensors typically consist of many cells. This paper introduces a novel robust MPCA method that can handle both types of outliers simultaneously, and can cope with missing values as well. This method uses a single loss function to reduce the influence of both casewise and cellwise outliers. The solution that minimizes this loss function is computed using an iteratively reweighted least squares algorithm with a robust initialization. Graphical diagnostic tools are also proposed to identify the different types of outliers that have been found by the new robust MPCA method. The performance of the method and associated graphical displays is assessed through simulations and illustrated on two real datasets. 2025-03-10T13:41:03Z Mehdi Hirari Fabio Centofanti Mia Hubert Stefan Van Aelst 10.1080/10618600.2026.2637632 http://arxiv.org/abs/2408.03415v2 Gradient-Based Approximate Bayesian Inference with Entropy-Optimized Summary Statistics for Compartmental Models 2026-03-17T09:21:22Z Recent pandemics have highlighted the critical role of infectious disease models in guiding public health decision-making, driving demand for realistic models that can provide timely answers under uncertainty. Compartmental models are widely used to capture disease dynamics, and advances in data availability, computational resources, and epidemiological understanding have allowed the development of models that incorporate detailed representations of population structure, disease progression, and intervention effects. While these improvements improve model fidelity, they also increase model complexity, leading to high-dimensional parameter spaces, intractable likelihoods, and computational challenges for fitting models to limited surveillance data in real time. Existing likelihood-free methods, such as Approximate Bayesian Computation (ABC) and Bayesian Synthetic Likelihood (BSL), have developed largely independently, each with distinct strengths and limitations. We propose an integrated three-stage framework that synthesizes advances from both likelihood-based and likelihood-free method: (1) ABC-based entropy minimization to identify low-dimensional, approximately orthogonal summary statistics; (2) BSL inference using these optimized summaries to construct tractable Gaussian approximations; and (3) Hamiltonian Monte Carlo sampling for efficient posterior exploration. Through SEIR simulation study and application to the 1978 British boarding school influenza outbreak, we demonstrate that our framework achieves reliable parameter estimation and uncertainty quantification while maintaining computational efficiency. 2024-08-06T19:29:34Z Xiahui Li Fergus J. Chadwick Ben Swallow http://arxiv.org/abs/2503.13148v3 Spearman's rho for zero-inflated count data: formulation and attainable bounds 2026-03-17T08:09:08Z We propose an alternative formulation of Spearman's rho for zero-inflated count data. The formulation yields an estimator with explicitly attainable bounds, facilitating interpretation in settings where the standard range [-1,1] is no longer informative. 2025-03-17T13:19:22Z Jasper Arends Guanjie Lyu Mhamed Mesfioui Elisa Perrone Julien Trufin http://arxiv.org/abs/2602.17922v2 Data-driven configuration tuning of glmnet for balancing accuracy and computational efficiency 2026-03-17T07:46:50Z The glmnet package in R is widely used for lasso estimation because of its computational efficiency. Despite its popularity, glmnet occasionally yields solutions that deviate substantially from the true ones because of the inappropriate default configuration of the algorithm. The accuracy of the obtained solutions can be improved by appropriately tuning the configuration. However, such improvements typically increase computational time, resulting in a tradeoff between accuracy and computational efficiency. Therefore, a systematic approach is required to determine the appropriate configuration. To address this need, we propose a unified data-driven framework specifically designed to optimize the configuration by balancing solution path accuracy and computational cost. Specifically, we generate a large-scale training dataset by measuring the accuracy and computation time of glmnet. Using this dataset, we construct neural networks to predict accuracy and computation time from data characteristics and configuration. For a new dataset, the proposed framework uses the trained networks to explore the configuration space and derive a Pareto front that represents the tradeoff between accuracy and computational cost. This front enables automatic selection of the configuration that maximizes accuracy under a user-specified time constraint. The proposed method is implemented in the R package glmnetconf, available at https://github.com/Shuhei-Muroya/glmnetconf.git. 2026-02-20T00:58:59Z 23 pages, 9 figure. Title changed. Revised for linguistic clarity and stylistic improvements; no changes to the main results Shuhei Muroya Kei Hirose http://arxiv.org/abs/2603.16213v1 Equivalence testing with data-dependent and post-hoc equivalence margins 2026-03-17T07:44:04Z Equivalence testing compares the hypothesis that an effect $μ$ is large against the alternative that it is negligible. Here, `large' is classically expressed as being larger than some `equivalence margin' $Δ$. A longstanding problem is that this margin must be specified but can rarely be objectively justified in practice. We lay the foundation for an alternative paradigm, arguing to instead report a data-dependent margin $\widehatΔ_α$ that bounds the true effect $μ$ with probability $1 - α$. Our key argument is that $\widehatΔ_α$ is more useful than a test outcome at a fixed margin $Δ$, as measured by the guarantees it offers to decision makers. We generalize this to a curve of margins $α\mapsto \widehatΔ_α$, uniformly valid under the post-hoc selection of the margin. These ideas rely on e-values, which we derive for models that are strictly totally positive of order 3, nesting the classical z-test and t-test settings. 2026-03-17T07:44:04Z Stan Koobs Nick W. Koning http://arxiv.org/abs/2603.16146v1 Deep Adaptive Model-Based Design of Experiments 2026-03-17T05:53:09Z Model-based design of experiments (MBDOE) is essential for efficient parameter estimation in nonlinear dynamical systems. However, conventional adaptive MBDOE requires costly posterior inference and design optimization between each experimental step, precluding real-time applications. We address this by combining Deep Adaptive Design (DAD), which amortizes sequential design into a neural network policy trained offline, with differentiable mechanistic models. For dynamical systems with known governing equations but uncertain parameters, we extend sequential contrastive training objectives to handle nuisance parameters and propose a transformer-based policy architecture that respects the temporal structure of dynamical systems. We demonstrate the approach on four systems of increasing complexity: a fed-batch bioreactor with Monod kinetics, a Haldane bioreactor with uncertain substrate inhibition, a two-compartment pharmacokinetic model with nuisance clearance parameters, and a DC motor for real-time deployment. 2026-03-17T05:53:09Z Arno Strouwen Sebastian Micluţa-Câmpeanu