https://arxiv.org/api/7tJaVMfPpN6boyMSlW3Xdcxy0ZI 2026-06-10T19:41:55Z 36124 315 15 http://arxiv.org/abs/2606.01428v1 Quantifying Evidential Rigor in Meta-Analytic Corpora: A Simulation-Characterized, Bias-Robust Bayesian Workflow with a Nutrition Case Study 2026-05-31T19:56:12Z

Conventional meta-analysis summarizes evidence through pooled estimates, intervals, and p-values, but these outputs do not directly measure evidence for an effect, evidence for no effect, or the degree to which conclusions depend on publication selection or small-study effects. We introduce a corpus-scale Bayesian evidential-audit workflow for meta-analytic corpora. The workflow reconstructs or accepts study-level effects and standard errors, harmonizes directions, fits a matched Bayesian random-effects baseline and a bias-aware model-averaged ensemble, and reports paired estimates with component and joint model-family evidence. The central estimand is rigor: a joint Bayes-factor summary combining resolved effect/no-effect evidence with absence of an explicit bias component in the fitted ensemble. Rigor is not a positive-finding score; no-effect evidence can score highly, whereas inconclusive or bias-dependent evidence scores poorly. We characterize the workflow using an ADEMP-framed simulation/resampling design with known-cell synthetic simulation, empirical registry resampling, and empirical fitted-profile-weighted synthetic sampling. A nutrition intervention corpus provides the worked case study, where bias-aware fitting often attenuates conventional estimates and many nominally meaningful effects lose clean evidential support. A public companion repository provides empirical inputs, generated artifacts, simulation source/design files, and documentation for reproducing and adapting the audit.

2026-05-31T19:56:12Z 59 pages, 13 figures; supplementary material included as ancillary file; companion repository archived at Zenodo, DOI: 10.5281/zenodo.20467258 Matt Hester http://arxiv.org/abs/2506.02075v3 Position: Stop Chasing the C-index when Evaluating Survival Analysis Models 2026-05-31T18:10:23Z

The current state of evaluation in survival analysis is plagued by the persistent use of evaluation metrics in ways that are misaligned with the stated modeling objective. In addition, many such evaluations are based on censoring assumptions that are left implicit or unjustified. This means that the reported performance can be misleading and may fail to answer the scientific or modeling question the evaluation was intended to address. In this position paper, we critically examine evaluation practices in survival analysis and highlight how censoring makes evaluation fundamentally different from standard regression or classification. We place particular focus on concordance-based measures, such as the C-index, which we show are heavily overused in the literature. To help identify appropriate metrics, we propose a set of key desiderata and introduce a double-helix ladder, in which valid evaluation requires alignment between metric and modeling assumptions. Through controlled experiments, we show that violations of this alignment can lead to misleading model comparisons. We conclude by providing practical guidance on how to evaluate a survival model.

2025-06-02T07:59:34Z ICML 2026 Position Paper Track (Spotlight) Christian Marius Lillelund Shi-ang Qi Russell Greiner Christian Fischer Pedersen http://arxiv.org/abs/2508.01973v4 A New Class of Asymptotically Distribution-Free Smooth Tests 2026-05-31T18:02:36Z

This article demonstrates how recent developments in the theory of empirical processes allow us to construct a new family of asymptotically distribution-free smooth tests. Their distribution-free property is preserved even when the parameters are estimated, model selection is performed, and the sample size is only moderately large. A computationally efficient alternative to the classical parametric bootstrap is also discussed.

2025-08-04T01:15:07Z Xiangyu Zhang Sara Algeri http://arxiv.org/abs/2606.01346v1 FlowSDR: Sufficient Dimension Reduction via Conditional Normalizing Flows 2026-05-31T16:54:56Z

Sufficient dimension reduction (SDR) seeks a low-dimensional linear projection of predictors that preserves the conditional distribution of the response. Existing methods target this conditional distribution indirectly, via inverse moments, local forward regression, or neural ensemble regression. We propose FlowSDR, a likelihood-based framework that jointly learns the projection and the conditional density by maximizing a conditional log-likelihood, with the density parameterized by monotone rational-quadratic spline flows. The estimator is Fisher consistent under the SDR model, and its sample objective admits a population interpretation in terms of mutual information. As a complementary model within the same likelihood framework, we introduce the neural Gaussian SDR, a heteroscedastic conditional Gaussian model whose mean and variance are parameterized by shared neural-network functions of the projected predictors. In simulations spanning Gaussian errors, heavy-tailed distributions, two-component mixtures, and settings with tail behavior not captured by mean-variance structure, FlowSDR recovers the central subspace more accurately than existing SDR methods and the neural Gaussian SDR baseline. We further validate these advantages on a face-age prediction task using the UTKFace dataset.

2026-05-31T16:54:56Z 20 pages, 8 tables Yuexiao Dong Kenichiro Mcalinn Edoardo Airoldi Lei Li http://arxiv.org/abs/2606.01328v1 Scale-Free Priors and Survival Dynamics: A Bayesian Framework for Conflict Duration 2026-05-31T16:27:38Z

We have developed a fully Bayesian survival-analysis framework that reformulates inference about system lifetimes in terms of hazard and survival functions, and extends this representation to interacting actors. Starting from J.~Richard Gott's Copernican principle, we express the scale-free prior as a baseline hazard $λ(t)=1/t$, thereby linking a static prior over lifetimes to the dynamic language of survival analysis. In this formulation, Bayesian updating corresponds to conditioning on survival, while the resulting posterior distribution admits a natural representation in terms of hazard and survival functions. The approach is intended for settings where data are sparse or unreliable, and where a scale-free, assumption-light baseline is preferable to heavily parameterized models. Building on this foundation, we derive general expressions for two-actor systems that characterize joint survival, conditional lifetimes, and comparative outcomes without requiring a specific parametric form of interaction. This yields a flexible and modular framework in which baseline dynamics are separated from interaction effects, allowing different mechanisms to be incorporated transparently. Thus, the primary contribution is a general hazard-based formulation of Bayesian updating and its extension to interacting systems To illustrate the framework, we consider a multiplicative resource-depletion specification in which interaction modifies the baseline hazard through cumulative engagement intensity. This example demonstrates how interaction terms can be embedded while preserving analytical tractability, including closed-form expressions under simplifying assumptions. We further provide a stylized application to an asymmetric two-actor conflict, the 2026 US/Israel--Iran hostilities, to highlight the qualitative implications of the approach.

2026-05-31T16:27:38Z 10 pages, 1 figure Tomasz F. Stepinski http://arxiv.org/abs/2505.19925v2 Cellwise and Casewise Robust Covariance in High Dimensions 2026-05-31T14:26:42Z

The sample covariance matrix is a cornerstone of multivariate statistics, but it is highly sensitive to outliers. These can be casewise outliers, such as cases belonging to a different population, or cellwise outliers, which are deviating cells (entries) of the data matrix. Recently some robust covariance estimators have been developed that can handle both types of outliers, but their computation is only feasible up to at most 20 dimensions. To remedy this we propose the cellRCov method, a robust covariance estimator that simultaneously handles casewise outliers, cellwise outliers, and missing data. It relies on a decomposition of the covariance on principal and orthogonal subspaces, leveraging recent work on robust PCA. It also employs a ridge-type regularization to stabilize the estimated covariance matrix. We establish some theoretical properties of cellRCov, including its casewise and cellwise influence functions as well as consistency and asymptotic normality. A simulation study demonstrates the superior performance of cellRCov in contaminated and missing data scenarios. Furthermore, its practical utility is illustrated in a real-world application to anomaly detection. We also construct and illustrate the cellRCCA method for robust and regularized canonical correlation analysis.

2025-05-26T12:46:44Z Fabio Centofanti Mia Hubert Peter J. Rousseeuw http://arxiv.org/abs/2606.01257v1 Statistical Inference on Gradient Flows 2026-05-31T14:22:37Z

Gradient-based algorithms are central to modern statistical estimation, yet their statistical analysis is often restricted to fixed-time behavior, such as convergence to a population target or fluctuations at a prescribed iteration. In many applications, however, uncertainty quantification is needed along the entire optimization path, especially when the stopping time is data-dependent or divergent. In this paper, we develop a theory for time-uniform statistical inference on gradient flows arising from empirical risk minimization. We prove a uniform central limit theorem that characterizes the deviation between empirical and population gradient flows as a continuous-time Gaussian process over the entire nonnegative real line. Building on this result, we introduce an algorithm-aware covariance estimator that evolves jointly with the gradient flow and avoids matrix inversion, resampling, or sample splitting. We show that the covariance estimator is uniformly consistent over time and use it to construct confidence intervals for the target parameter with asymptotically valid coverage. Our results connect optimization dynamics with statistical inference and provide practical tools for uncertainty quantification in gradient-based methods.

2026-05-31T14:22:37Z Tongyu Li Alexander Giessing http://arxiv.org/abs/2606.01256v1 Distribution-free changepoint localization after sequential change detection 2026-05-31T14:18:41Z

This paper introduces a distribution-free framework for constructing post-detection confidence sets for changepoints after stopping a sequential change detection procedure. It is well known that conformal test martingales can be used to sequentially detect changes in distribution, but by themselves provide no inference for the time at which a proclaimed change occurred. Past work on post-detection inference requires pre- and post-change classes of distributions to be known, but this paper accomplishes localization of the changepoint without any distributional assumptions. We establish finite-sample coverage guarantees (conditional on correct detection). We provide non-asymptotic bounds on the conditional expected size of the confidence sets. Under suitable asymptotic regimes, we proved that the conditional expected size of the confidence set remains uniformly bounded. and demonstrate strong empirical performance on simulated and real data. To the best of our knowledge, this is the first general distribution-free framework for sequential changepoint localization with a valid post-detection coverage guarantee.

2026-05-31T14:18:41Z Aytijhya Saha Aaditya Ramdas http://arxiv.org/abs/2605.29388v2 Gaussian Differentially Private $e$-values: Construction, Threshold Calibration, and Multiple Testing 2026-05-31T13:47:15Z

This paper develops a framework for differentially private $e$-values under Gaussian differential privacy ($μ$-GDP). We characterize the canonical noise mechanism, establishing that optimal multiplicative perturbation follows a Gaussian distribution. Using this distribution, we derive a globally sharp rejection threshold that strictly improves upon the standard Markov bound. Asymptotic analysis shows that in low-sensitivity regimes, the calibrated private test achieves a net power gain over the non-private baseline. For multiple testing, we introduce a recursive peeling algorithm that adaptively concentrates the privacy budget on the most promising hypotheses. This construction guarantees rigorous $μ$-GDP and yields valid private $e$-values compatible with standard multiple testing procedures. Simulations and a genome-wide association study confirm that the method controls the false discovery rate while improving upon naive all-noisy privatization and recovering power close to non-private benchmarks.

2026-05-28T05:41:52Z Qi Kuang Bowen Gang Yin Xia http://arxiv.org/abs/2606.01239v1 Functional Clustering of Survival Data via Smoothed Log-Hazard Trajectories: A Risk-Dynamics Perspective 2026-05-31T13:40:49Z

This paper investigates clustering in survival data by shifting the analytical focus from cumulative survival probabilities to instantaneous risk, as characterized by the hazard function. We model smoothed log-hazard trajectories as functional objects that capture the temporal evolution of risk and propose a clustering framework based on Functional Principal Component Analysis applied to B-spline smoothed log-hazard trajectories. The number of retained functional principal components is selected before clustering using a 95% cumulative explained-variance rule, and clustering is then performed on the unstandardized FPCA scores. The proposed methodology is evaluated through simulation studies covering progressively complex scenarios, including overlapping and crossing hazard functions, cohort imbalance, heterogeneous risk profiles, and outlier contamination. The framework is further illustrated on two real-world clinical datasets, the German Breast Cancer Study and the Primary Biliary Cirrhosis dataset. Results show that the proposed log-hazard-based functional clustering framework provides an interpretable representation of relative temporal risk dynamics, with competitive internal cohesion and explicit robustness diagnostics when compared with cumulative-survival-based benchmarks.

2026-05-31T13:40:49Z 23 pages, 16 figures, 4 tables Anna De Magistris Elvira Romano Fabrizio Maturo http://arxiv.org/abs/2211.04697v5 An average-case sensitivity analysis for unmeasured confounding 2026-05-31T11:56:23Z

Sensitivity analysis for the unconfoundedness assumption is crucial in observational studies. For this purpose, the marginal sensitivity model gained popularity recently due to good interpretability and mathematical properties. However, most existing models only consider a worst-case parameter that bounds the logit difference between the observed and full data propensity scores, which may not fully capture the extent of unmeasured confounding. We propose a new sensitivity model that is parameterized by the second moment of the propensity score ratio, requiring only the average strength of unmeasured confounding to be bounded. By characterizing the associated sensitivity analysis as an optimization problem, we derive sharp closed-form bounds of the average potential outcomes under our model. We propose efficient one-step estimators for these bounds based on the corresponding efficient influence functions. Additionally, we apply multiplier bootstrap to construct simultaneous confidence bands to cover the sensitivity curve that consists of bounds at different values of the sensitivity parameters. Through a real-data study, we illustrate how this average-case sensitivity analysis can provide tighter bounds and facilitate calibration of the results using observed covariates.

2022-11-09T06:05:37Z 42 pages, 3 figures, 2 tables Biometrika, 2026 Yao Zhang Qingyuan Zhao 10.1093/biomet/asag030 http://arxiv.org/abs/2606.01172v1 Revisiting Neural Processes via Fourier Transform and Volterra Series 2026-05-31T11:27:48Z

Modeling unknown latent functions from finite, irregularly sampled measurements is a recurring challenge across science and engineering. Neural processes (NPs), a family of probabilistic functional models, are promising solutions -- especially when endowed with domain-specific symmetries like translation equivariance, which improve sample efficiency and generalization. Yet existing translation-equivariant NPs face two limitations: (i) they stack generic components with non-linearities, obscuring the induced function class and limiting interpretability; and (ii) convolutional designs rely on kernels with local receptive fields and require dense uniform input grids, while attention-based methods avoid these issues but scale quadratically with the number of observations. We address both with two contributions. First, using the Volterra expansion, we characterize continuous translation-equivariant operators as sums of higher-order convolutions, yielding analytical transparency while admitting efficient approximation by first-order convolutions. Second, we introduce set Fourier convolutions (SFConvs), a frequency-domain parameterization that operates directly on irregularly sampled points, achieves approximately global receptive fields, and scales linearly in the number of observations. Building on these ideas, we propose two conditional NPs (CNPs): SFConvCNPs, which stack SFConv blocks with non-linearities, and SFVConvCNPs, which integrate the Volterra formulation. Experiments on synthetic and real-world datasets demonstrate our methods' efficacy against state-of-the-art baselines.

2026-05-31T11:27:48Z Peiman Mohseni Nick Duffield Raymond K. W. Wong http://arxiv.org/abs/2605.13203v2 Double Descent and Ensemble Emergence in Model Averaging Prediction 2026-05-31T08:57:45Z

This paper investigates the predictive performance of model averaging in high-dimensional linear regression where the number of regressors is comparable to the sample size. Leveraging tools from random matrix theory, we derive the exact limiting out-of-sample risk under a nested model setting and comprehensively characterize the risk landscape. This limiting risk helps to reveal two phenomena: simple weighting inherits the double descent trajectory and its associated variance explosion near the interpolation boundary; strategic weighting triggers an ensemble emergence that suppresses the localized risk surge and yields a globally flat risk surface. Building on this limiting risk, we also propose the Large Model Averaging (LaMA) method, in which we consider the discrepancy between in-sample and out-of-sample risks in the high-dimensional regime. Numerical studies and real data applications confirm that LaMA achieves superior predictive accuracy in high-dimensional environments.

2026-05-13T08:55:11Z Ke Chen Dandan Jiang Xinyu Zhang http://arxiv.org/abs/2606.01090v1 Measuring the Symmetry--Data Exchange Rate 2026-05-31T08:17:40Z

Equivariance theory predicts that an architectural symmetry prior reduces sample complexity by a factor of |G|; this is widely cited but rarely measured as a scaling law with controls that separate the prior from its confounds. On a controlled C_n-symmetric task, we report three findings. First, a wrong-group control with identical orbit size and matched compute is worse than no constraint (joint pairwise CI [+0.79, +3.26] excludes zero, robust across estimators); misaligned constraint is actively harmful, not merely unhelpful. Second, an augmentation baseline equipped with test-time orbit averaging matches the equivariant model exactly -- bit-identical per-epoch validation curves across matched cells -- so the architecture-vs-augmentation gap is conditional on asymmetric test-time computation, not unconditional. Third, the relative exchange rate beta_diff = 1.28 is consistent in sign and order of magnitude with the theoretical 1.0 (single-level CI [+0.92, +2.05]); the more conservative two-level bootstrap (seeds x group sizes) widens this to [-0.63, +1.72], including zero, and a finer-N replication on a sqrt(2)-spaced grid is inconclusive (point estimate -0.82). The methodological contributions -- the relative-rate estimator that cancels the shared-difficulty confound, the wrong-group control, and a pre-specified failure taxonomy -- transfer to any inductive bias whose strength can be parameterised. Honest scoping: the primary estimator beta_diff was adopted post-hoc after the initial analysis revealed a positive-slope identifiability problem; the design was never externally pre-registered; and the headline number rests on an OLS slope over seven group sizes on a coarse N grid. This is an exploratory study, not a confirmatory measurement; the wrong-group result is the cleanest finding and the one we report with the most confidence. A registered replication on fresh seeds is future work.

2026-05-31T08:17:40Z 19 pages, 9 figures. Exploratory study. Code and data at https://github.com/AhmedMostafa16/symmetry-exchange Ahmed M. Adly http://arxiv.org/abs/2606.01078v1 Non-Vacuous Certification of Transport MCMC via Oscillation-Controlled Normalizing Flows 2026-05-31T07:46:48Z

Transport MCMC trains a normalizing flow to precondition Metropolis--Hastings proposals, achieving high empirical efficiency on challenging posteriors; yet no prior work produces a numerically non-vacuous, rigorous spectral-gap bound for such samplers. We establish the first such bounds. For independence MH on the banana family we certify (γ^\ast = 0.828) at (D = 2) (covering in the original space) and (γ^\ast \ge 7.6\times 10^{-4}) at (D = 5) (covering in an analytically unwarped Gaussian space with a grid-certified gradient bound under the stated numerical Lipschitz certification), both rigorous at 95% confidence. The framework rests on three pillars: (i) spectral normalization with reduced scale clips constrains the flow Lipschitz constant from (10^{47}) to (10^4); (ii) a coverage-based empirical oscillation bound replaces the vacuous analytical bound with a data-dependent certificate; and (iii) oscillation-regularised training cuts the empirical oscillation by 60--90% at no cost to density fit, extending practical certificates through (D = 20) ((γ^\ast \ge 1.7\times 10^{-4})). Tests on four further targets (Gaussian mixture, shear-building, Neal's funnel, Bayesian logistic regression) identify three precise barriers: boundary curvature, target stiffness, and tail-coverage mismatch. An affine-vs-spline comparison shows that simpler architectures yield tighter certificates at identical NLL, inverting the usual expressiveness hierarchy.

2026-05-31T07:46:48Z 36 pages, includes appendix Jun Hu