https://arxiv.org/api/7tJaVMfPpN6boyMSlW3Xdcxy0ZI2026-06-10T19:41:55Z3612431515http://arxiv.org/abs/2606.01428v1Quantifying Evidential Rigor in Meta-Analytic Corpora: A Simulation-Characterized, Bias-Robust Bayesian Workflow with a Nutrition Case Study2026-05-31T19:56:12ZConventional meta-analysis summarizes evidence through pooled estimates, intervals, and p-values, but these outputs do not directly measure evidence for an effect, evidence for no effect, or the degree to which conclusions depend on publication selection or small-study effects. We introduce a corpus-scale Bayesian evidential-audit workflow for meta-analytic corpora. The workflow reconstructs or accepts study-level effects and standard errors, harmonizes directions, fits a matched Bayesian random-effects baseline and a bias-aware model-averaged ensemble, and reports paired estimates with component and joint model-family evidence. The central estimand is rigor: a joint Bayes-factor summary combining resolved effect/no-effect evidence with absence of an explicit bias component in the fitted ensemble. Rigor is not a positive-finding score; no-effect evidence can score highly, whereas inconclusive or bias-dependent evidence scores poorly. We characterize the workflow using an ADEMP-framed simulation/resampling design with known-cell synthetic simulation, empirical registry resampling, and empirical fitted-profile-weighted synthetic sampling. A nutrition intervention corpus provides the worked case study, where bias-aware fitting often attenuates conventional estimates and many nominally meaningful effects lose clean evidential support. A public companion repository provides empirical inputs, generated artifacts, simulation source/design files, and documentation for reproducing and adapting the audit.2026-05-31T19:56:12Z59 pages, 13 figures; supplementary material included as ancillary file; companion repository archived at Zenodo, DOI: 10.5281/zenodo.20467258Matt Hesterhttp://arxiv.org/abs/2506.02075v3Position: Stop Chasing the C-index when Evaluating Survival Analysis Models2026-05-31T18:10:23ZThe current state of evaluation in survival analysis is plagued by the persistent use of evaluation metrics in ways that are misaligned with the stated modeling objective. In addition, many such evaluations are based on censoring assumptions that are left implicit or unjustified. This means that the reported performance can be misleading and may fail to answer the scientific or modeling question the evaluation was intended to address. In this position paper, we critically examine evaluation practices in survival analysis and highlight how censoring makes evaluation fundamentally different from standard regression or classification. We place particular focus on concordance-based measures, such as the C-index, which we show are heavily overused in the literature. To help identify appropriate metrics, we propose a set of key desiderata and introduce a double-helix ladder, in which valid evaluation requires alignment between metric and modeling assumptions. Through controlled experiments, we show that violations of this alignment can lead to misleading model comparisons. We conclude by providing practical guidance on how to evaluate a survival model.2025-06-02T07:59:34ZICML 2026 Position Paper Track (Spotlight)Christian Marius LillelundShi-ang QiRussell GreinerChristian Fischer Pedersenhttp://arxiv.org/abs/2508.01973v4A New Class of Asymptotically Distribution-Free Smooth Tests2026-05-31T18:02:36ZThis article demonstrates how recent developments in the theory of empirical processes allow us to construct a new family of asymptotically distribution-free smooth tests. Their distribution-free property is preserved even when the parameters are estimated, model selection is performed, and the sample size is only moderately large. A computationally efficient alternative to the classical parametric bootstrap is also discussed.2025-08-04T01:15:07ZXiangyu ZhangSara Algerihttp://arxiv.org/abs/2606.01346v1FlowSDR: Sufficient Dimension Reduction via Conditional Normalizing Flows2026-05-31T16:54:56ZSufficient dimension reduction (SDR) seeks a low-dimensional linear projection of predictors that preserves the conditional distribution of the response. Existing methods target this conditional distribution indirectly, via inverse moments, local forward regression, or neural ensemble regression. We propose FlowSDR, a likelihood-based framework that jointly learns the projection and the conditional density by maximizing a conditional log-likelihood, with the density parameterized by monotone rational-quadratic spline flows. The estimator is Fisher consistent under the SDR model, and its sample objective admits a population interpretation in terms of mutual information. As a complementary model within the same likelihood framework, we introduce the neural Gaussian SDR, a heteroscedastic conditional Gaussian model whose mean and variance are parameterized by shared neural-network functions of the projected predictors. In simulations spanning Gaussian errors, heavy-tailed distributions, two-component mixtures, and settings with tail behavior not captured by mean-variance structure, FlowSDR recovers the central subspace more accurately than existing SDR methods and the neural Gaussian SDR baseline. We further validate these advantages on a face-age prediction task using the UTKFace dataset.2026-05-31T16:54:56Z20 pages, 8 tablesYuexiao DongKenichiro McalinnEdoardo AiroldiLei Lihttp://arxiv.org/abs/2606.01328v1Scale-Free Priors and Survival Dynamics: A Bayesian Framework for Conflict Duration2026-05-31T16:27:38ZWe have developed a fully Bayesian survival-analysis framework that reformulates inference about system lifetimes in terms of hazard and survival functions, and extends this representation to interacting actors. Starting from J.~Richard Gott's Copernican principle, we express the scale-free prior as a baseline hazard $λ(t)=1/t$, thereby linking a static prior over lifetimes to the dynamic language of survival analysis. In this formulation, Bayesian updating corresponds to conditioning on survival, while the resulting posterior distribution admits a natural representation in terms of hazard and survival functions. The approach is intended for settings where data are sparse or unreliable, and where a scale-free, assumption-light baseline is preferable to heavily parameterized models.
Building on this foundation, we derive general expressions for two-actor systems that characterize joint survival, conditional lifetimes, and comparative outcomes without requiring a specific parametric form of interaction. This yields a flexible and modular framework in which baseline dynamics are separated from interaction effects, allowing different mechanisms to be incorporated transparently. Thus, the primary contribution is a general hazard-based formulation of Bayesian updating and its extension to interacting systems
To illustrate the framework, we consider a multiplicative resource-depletion specification in which interaction modifies the baseline hazard through cumulative engagement intensity. This example demonstrates how interaction terms can be embedded while preserving analytical tractability, including closed-form expressions under simplifying assumptions. We further provide a stylized application to an asymmetric two-actor conflict, the 2026 US/Israel--Iran hostilities, to highlight the qualitative implications of the approach.2026-05-31T16:27:38Z10 pages, 1 figureTomasz F. Stepinskihttp://arxiv.org/abs/2505.19925v2Cellwise and Casewise Robust Covariance in High Dimensions2026-05-31T14:26:42ZThe sample covariance matrix is a cornerstone of multivariate statistics, but it is highly sensitive to outliers. These can be casewise outliers, such as cases belonging to a different population, or cellwise outliers, which are deviating cells (entries) of the data matrix. Recently some robust covariance estimators have been developed that can handle both types of outliers, but their computation is only feasible up to at most 20 dimensions. To remedy this we propose the cellRCov method, a robust covariance estimator that simultaneously handles casewise outliers, cellwise outliers, and missing data. It relies on a decomposition of the covariance on principal and orthogonal subspaces, leveraging recent work on robust PCA. It also employs a ridge-type regularization to stabilize the estimated covariance matrix. We establish some theoretical properties of cellRCov, including its casewise and cellwise influence functions as well as consistency and asymptotic normality. A simulation study demonstrates the superior performance of cellRCov in contaminated and missing data scenarios. Furthermore, its practical utility is illustrated in a real-world application to anomaly detection. We also construct and illustrate the cellRCCA method for robust and regularized canonical correlation analysis.2025-05-26T12:46:44ZFabio CentofantiMia HubertPeter J. Rousseeuwhttp://arxiv.org/abs/2606.01257v1Statistical Inference on Gradient Flows2026-05-31T14:22:37ZGradient-based algorithms are central to modern statistical estimation, yet their statistical analysis is often restricted to fixed-time behavior, such as convergence to a population target or fluctuations at a prescribed iteration. In many applications, however, uncertainty quantification is needed along the entire optimization path, especially when the stopping time is data-dependent or divergent. In this paper, we develop a theory for time-uniform statistical inference on gradient flows arising from empirical risk minimization. We prove a uniform central limit theorem that characterizes the deviation between empirical and population gradient flows as a continuous-time Gaussian process over the entire nonnegative real line. Building on this result, we introduce an algorithm-aware covariance estimator that evolves jointly with the gradient flow and avoids matrix inversion, resampling, or sample splitting. We show that the covariance estimator is uniformly consistent over time and use it to construct confidence intervals for the target parameter with asymptotically valid coverage. Our results connect optimization dynamics with statistical inference and provide practical tools for uncertainty quantification in gradient-based methods.2026-05-31T14:22:37ZTongyu LiAlexander Giessinghttp://arxiv.org/abs/2606.01256v1Distribution-free changepoint localization after sequential change detection2026-05-31T14:18:41ZThis paper introduces a distribution-free framework for constructing post-detection confidence sets for changepoints after stopping a sequential change detection procedure. It is well known that conformal test martingales can be used to sequentially detect changes in distribution, but by themselves provide no inference for the time at which a proclaimed change occurred. Past work on post-detection inference requires pre- and post-change classes of distributions to be known, but this paper accomplishes localization of the changepoint without any distributional assumptions. We establish finite-sample coverage guarantees (conditional on correct detection). We provide non-asymptotic bounds on the conditional expected size of the confidence sets. Under suitable asymptotic regimes, we proved that the conditional expected size of the confidence set remains uniformly bounded. and demonstrate strong empirical performance on simulated and real data. To the best of our knowledge, this is the first general distribution-free framework for sequential changepoint localization with a valid post-detection coverage guarantee.2026-05-31T14:18:41ZAytijhya SahaAaditya Ramdashttp://arxiv.org/abs/2605.29388v2Gaussian Differentially Private $e$-values: Construction, Threshold Calibration, and Multiple Testing2026-05-31T13:47:15ZThis paper develops a framework for differentially private $e$-values under Gaussian differential privacy ($μ$-GDP). We characterize the canonical noise mechanism, establishing that optimal multiplicative perturbation follows a Gaussian distribution. Using this distribution, we derive a globally sharp rejection threshold that strictly improves upon the standard Markov bound. Asymptotic analysis shows that in low-sensitivity regimes, the calibrated private test achieves a net power gain over the non-private baseline. For multiple testing, we introduce a recursive peeling algorithm that adaptively concentrates the privacy budget on the most promising hypotheses. This construction guarantees rigorous $μ$-GDP and yields valid private $e$-values compatible with standard multiple testing procedures. Simulations and a genome-wide association study confirm that the method controls the false discovery rate while improving upon naive all-noisy privatization and recovering power close to non-private benchmarks.2026-05-28T05:41:52ZQi KuangBowen GangYin Xiahttp://arxiv.org/abs/2606.01239v1Functional Clustering of Survival Data via Smoothed Log-Hazard Trajectories: A Risk-Dynamics Perspective2026-05-31T13:40:49ZThis paper investigates clustering in survival data by shifting the analytical focus from cumulative survival probabilities to instantaneous risk, as characterized by the hazard function. We model smoothed log-hazard trajectories as functional objects that capture the temporal evolution of risk and propose a clustering framework based on Functional Principal Component Analysis applied to B-spline smoothed log-hazard trajectories. The number of retained functional principal components is selected before clustering using a 95% cumulative explained-variance rule, and clustering is then performed on the unstandardized FPCA scores. The proposed methodology is evaluated through simulation studies covering progressively complex scenarios, including overlapping and crossing hazard functions, cohort imbalance, heterogeneous risk profiles, and outlier contamination. The framework is further illustrated on two real-world clinical datasets, the German Breast Cancer Study and the Primary Biliary Cirrhosis dataset. Results show that the proposed log-hazard-based functional clustering framework provides an interpretable representation of relative temporal risk dynamics, with competitive internal cohesion and explicit robustness diagnostics when compared with cumulative-survival-based benchmarks.2026-05-31T13:40:49Z23 pages, 16 figures, 4 tablesAnna De MagistrisElvira RomanoFabrizio Maturohttp://arxiv.org/abs/2211.04697v5An average-case sensitivity analysis for unmeasured confounding2026-05-31T11:56:23ZSensitivity analysis for the unconfoundedness assumption is crucial in observational studies. For this purpose, the marginal sensitivity model gained popularity recently due to good interpretability and mathematical properties. However, most existing models only consider a worst-case parameter that bounds the logit difference between the observed and full data propensity scores, which may not fully capture the extent of unmeasured confounding. We propose a new sensitivity model that is parameterized by the second moment of the propensity score ratio, requiring only the average strength of unmeasured confounding to be bounded. By characterizing the associated sensitivity analysis as an optimization problem, we derive sharp closed-form bounds of the average potential outcomes under our model. We propose efficient one-step estimators for these bounds based on the corresponding efficient influence functions. Additionally, we apply multiplier bootstrap to construct simultaneous confidence bands to cover the sensitivity curve that consists of bounds at different values of the sensitivity parameters. Through a real-data study, we illustrate how this average-case sensitivity analysis can provide tighter bounds and facilitate calibration of the results using observed covariates.2022-11-09T06:05:37Z42 pages, 3 figures, 2 tablesBiometrika, 2026Yao ZhangQingyuan Zhao10.1093/biomet/asag030http://arxiv.org/abs/2606.01172v1Revisiting Neural Processes via Fourier Transform and Volterra Series2026-05-31T11:27:48ZModeling unknown latent functions from finite, irregularly sampled measurements is a recurring challenge across science and engineering. Neural processes (NPs), a family of probabilistic functional models, are promising solutions -- especially when endowed with domain-specific symmetries like translation equivariance, which improve sample efficiency and generalization. Yet existing translation-equivariant NPs face two limitations: (i) they stack generic components with non-linearities, obscuring the induced function class and limiting interpretability; and (ii) convolutional designs rely on kernels with local receptive fields and require dense uniform input grids, while attention-based methods avoid these issues but scale quadratically with the number of observations. We address both with two contributions. First, using the Volterra expansion, we characterize continuous translation-equivariant operators as sums of higher-order convolutions, yielding analytical transparency while admitting efficient approximation by first-order convolutions. Second, we introduce set Fourier convolutions (SFConvs), a frequency-domain parameterization that operates directly on irregularly sampled points, achieves approximately global receptive fields, and scales linearly in the number of observations. Building on these ideas, we propose two conditional NPs (CNPs): SFConvCNPs, which stack SFConv blocks with non-linearities, and SFVConvCNPs, which integrate the Volterra formulation. Experiments on synthetic and real-world datasets demonstrate our methods' efficacy against state-of-the-art baselines.2026-05-31T11:27:48ZPeiman MohseniNick DuffieldRaymond K. W. Wonghttp://arxiv.org/abs/2605.13203v2Double Descent and Ensemble Emergence in Model Averaging Prediction2026-05-31T08:57:45ZThis paper investigates the predictive performance of model averaging in high-dimensional linear regression where the number of regressors is comparable to the sample size. Leveraging tools from random matrix theory, we derive the exact limiting out-of-sample risk under a nested model setting and comprehensively characterize the risk landscape. This limiting risk helps to reveal two phenomena: simple weighting inherits the double descent trajectory and its associated variance explosion near the interpolation boundary; strategic weighting triggers an ensemble emergence that suppresses the localized risk surge and yields a globally flat risk surface. Building on this limiting risk, we also propose the Large Model Averaging (LaMA) method, in which we consider the discrepancy between in-sample and out-of-sample risks in the high-dimensional regime. Numerical studies and real data applications confirm that LaMA achieves superior predictive accuracy in high-dimensional environments.2026-05-13T08:55:11ZKe ChenDandan JiangXinyu Zhanghttp://arxiv.org/abs/2606.01090v1Measuring the Symmetry--Data Exchange Rate2026-05-31T08:17:40ZEquivariance theory predicts that an architectural symmetry prior reduces sample complexity by a factor of |G|; this is widely cited but rarely measured as a scaling law with controls that separate the prior from its confounds. On a controlled C_n-symmetric task, we report three findings. First, a wrong-group control with identical orbit size and matched compute is worse than no constraint (joint pairwise CI [+0.79, +3.26] excludes zero, robust across estimators); misaligned constraint is actively harmful, not merely unhelpful. Second, an augmentation baseline equipped with test-time orbit averaging matches the equivariant model exactly -- bit-identical per-epoch validation curves across matched cells -- so the architecture-vs-augmentation gap is conditional on asymmetric test-time computation, not unconditional. Third, the relative exchange rate beta_diff = 1.28 is consistent in sign and order of magnitude with the theoretical 1.0 (single-level CI [+0.92, +2.05]); the more conservative two-level bootstrap (seeds x group sizes) widens this to [-0.63, +1.72], including zero, and a finer-N replication on a sqrt(2)-spaced grid is inconclusive (point estimate -0.82). The methodological contributions -- the relative-rate estimator that cancels the shared-difficulty confound, the wrong-group control, and a pre-specified failure taxonomy -- transfer to any inductive bias whose strength can be parameterised. Honest scoping: the primary estimator beta_diff was adopted post-hoc after the initial analysis revealed a positive-slope identifiability problem; the design was never externally pre-registered; and the headline number rests on an OLS slope over seven group sizes on a coarse N grid. This is an exploratory study, not a confirmatory measurement; the wrong-group result is the cleanest finding and the one we report with the most confidence. A registered replication on fresh seeds is future work.2026-05-31T08:17:40Z19 pages, 9 figures. Exploratory study. Code and data at https://github.com/AhmedMostafa16/symmetry-exchangeAhmed M. Adlyhttp://arxiv.org/abs/2606.01078v1Non-Vacuous Certification of Transport MCMC via Oscillation-Controlled Normalizing Flows2026-05-31T07:46:48ZTransport MCMC trains a normalizing flow to precondition Metropolis--Hastings proposals, achieving high empirical efficiency on challenging posteriors; yet no prior work produces a numerically non-vacuous, rigorous spectral-gap bound for such samplers. We establish the first such bounds. For independence MH on the banana family we certify (γ^\ast = 0.828) at (D = 2) (covering in the original space) and (γ^\ast \ge 7.6\times 10^{-4}) at (D = 5) (covering in an analytically unwarped Gaussian space with a grid-certified gradient bound under the stated numerical Lipschitz certification), both rigorous at 95% confidence. The framework rests on three pillars: (i) spectral normalization with reduced scale clips constrains the flow Lipschitz constant from (10^{47}) to (10^4); (ii) a coverage-based empirical oscillation bound replaces the vacuous analytical bound with a data-dependent certificate; and (iii) oscillation-regularised training cuts the empirical oscillation by 60--90% at no cost to density fit, extending practical certificates through (D = 20) ((γ^\ast \ge 1.7\times 10^{-4})). Tests on four further targets (Gaussian mixture, shear-building, Neal's funnel, Bayesian logistic regression) identify three precise barriers: boundary curvature, target stiffness, and tail-coverage mismatch. An affine-vs-spline comparison shows that simpler architectures yield tighter certificates at identical NLL, inverting the usual expressiveness hierarchy.2026-05-31T07:46:48Z36 pages, includes appendixJun Hu