https://arxiv.org/api/B8sNcrOZtE4RK0/hSYTeBDcLKuM 2026-06-09T22:27:05Z 36101 30 15 http://arxiv.org/abs/2606.08819v1 Model Selection for SLOPE Models: A Bayesian Perspective 2026-06-07T20:15:52Z

Sorted $\ell_1$ Penalized Estimation (SLOPE) models, that perform either variable or group selection, control the false discovery rate (FDR) under orthogonal settings with known noise, but such settings are rare in practice. Under general conditions, cross-validation is the default model selection approach for SLOPE, yet it targets predictive performance rather than FDR control. We address this gap for the SLOPE family of models by proposing new Bayesian approaches, Bayesian Group SLOPE (BGSLOPE) and Bayesian Sparse-group SLOPE (BSGS). BGSLOPE and BSGS embed group-based SLOPE models into a spike-and-slab framework, with BSGS providing a continuous spike-and-slab framework for sparse-group models. We further introduce Two-step Orthogonal (TSO), which transforms a general setting into an orthogonal one to recover SLOPE's FDR control properties. Through extensive synthetic and real data studies comparing all major model selection strategies for SLOPE models, the proposed Bayesian models consistently control FDR, achieve higher power, and outperform competing methods in prediction.

2026-06-07T20:15:52Z Fabio Feser Marina Evangelou http://arxiv.org/abs/2606.08786v1 Inference for Balance in Dynamic Signed Networks 2026-06-07T19:07:14Z

Signed networks consist of both positive and negative relations, and structural balance theory provides an important conceptural framework for understanding their global tension structure. While existing statistical methods mainly focus on assessing empirical evidence of balance in a single observed network, many real-world signed relations evolve over time. This paper develops nonparametric inference for the population degree of structural balance at specified time points in dynamic signed networks, where the target time may or may not coincide with an observed snapshot. We consider a dynamic signed graphon model in which both edge formation and sign generation are governed by smoothly time-varying graphon functions. To exploit temporal smoothness, we construct a kernel-smoothed estimator that borrows information from snapshots near the target time point. Our theoretical analysis establishes a studentized inference procedure and a higher-order distributional approximation based on Edgeworth expansion, showing that temporal smoothing improves inference in sparse networks by reducing variance of observation noise, up to smoothing bias and time-discretization errors. We demonstrate the finite-sample performance and practical usefulness of the proposed method through extensive simulation studies and an application to a dynamic international relation network in political science.

2026-06-07T19:07:14Z Ergan Shang Yuan Zhang Weijing Tang http://arxiv.org/abs/2502.15131v4 Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling 2026-06-07T15:55:48Z

We study the fundamental problem of calibrating a linear binary classifier of the form $σ(\hat{w}^\top x)$, where the feature vector $x$ is Gaussian, $σ$ is a link function, and $\hat{w}$ is an estimator of the true linear weight $w^\star$. By interpolating with a noninformative $\textit{chance classifier}$, we construct a well-calibrated predictor whose interpolation weight depends on the angle $\angle(\hat{w}, w_\star)$ between the estimator $\hat{w}$ and the true linear weight $w_\star$. We establish that this angular calibration approach is provably well-calibrated in a high-dimensional regime where the number of samples and features both diverge, at a comparable rate. The angle $\angle(\hat{w}, w_\star)$ can be consistently estimated. Furthermore, the resulting predictor is uniquely $\textit{Bregman-optimal}$, minimizing the Bregman divergence to the true label distribution within a suitable class of calibrated predictors. Our work is the first to provide a calibration strategy that satisfies both calibration and optimality properties provably in high dimensions. Additionally, we identify conditions under which a classical Platt-scaling predictor converges to our Bregman-optimal calibrated solution. Thus, Platt-scaling also inherits these desirable properties provably in high dimensions.

2025-02-21T01:24:27Z Yufan Li Pragya Sur http://arxiv.org/abs/2606.08691v1 Hierarchical Projection for Adaptive Knowledge Transfer 2026-06-07T15:44:34Z

Modern data-driven applications increasingly involve learning from multiple heterogeneous sources, where a target dataset is limited but related information is available across domains. Naively combining these sources can degrade performance when relevance varies or spurious signals are present, posing a fundamental challenge for trustworthy cross-domain learning. We propose Projection Transfer Learning (ProjectionTL), a unified framework that integrates hierarchical Bayesian modeling with adaptive projection for selective knowledge transfer. The key idea is to decouple transfer at two levels: first, we construct a source-guided hierarchical prior that aggregates information across sources using data-driven weights, capturing global alignment between each source and the target; second, we refine this borrowing through a posterior-projection step that operates at the feature level, selectively retaining coordinates that exhibit local agreement with the target signal. This two-stage design enables the method to simultaneously perform source selection and feature selection, thereby mitigating negative transfer while preserving interpretability. ProjectionTL provides a principled approach to integrating heterogeneous data across domains, bridging statistical modeling and modern machine learning paradigms for robust and interpretable transfer. Through simulations and real-world biomedical applications, we demonstrate improved accuracy, stability, and interpretability compared to existing methods. Our framework offers a scalable and generalizable strategy for trustworthy cross-domain learning in high-dimensional settings.

2026-06-07T15:44:34Z Samhita Pal Tian Gu http://arxiv.org/abs/2606.08679v1 Rank Intervals for Leaderboards: A Hierarchical Framework for Model Evaluation 2026-06-07T15:31:29Z

Pretrained models are often evaluated on multi-task leaderboards to measure their applicability in diverse contexts. However, current methods for aggregating performance across tasks into leaderboard-level rankings do not address the uncertainty and variability at the task level. While recent works have proposed interval-based model rankings, the principled aggregation of uncertainty from individual tasks to leaderboard-level rankings remains unaddressed, and variation in models' performance across tasks is frequently obscured. In this work, we introduce a hierarchical framework that constructs model rank intervals with statistical guarantees at both levels: task-level rank confidence intervals from pairwise comparisons, and leaderboard-level rank prediction intervals using a conformal approach. This enables reliable quantification of model rank for each observed task and for new potential tasks. Experiments on simulated data and the TabArena and PromptEval (MMLU) benchmarks show that our method yields statistically valid and informative intervals, enabling reliable, uncertainty-aware model ranking on leaderboards.

2026-06-07T15:31:29Z Bitya Neuhof Yuval Benjamini http://arxiv.org/abs/2606.08660v1 Active Learning with Bayesian Reasoning: A POGIL-Based Pedagogy in Introductory Statistics 2026-06-07T15:02:43Z

We introduce a Process Oriented Guided Inquiry Learning (POGIL)-style activity for teaching Bayesian reasoning in introductory statistics through conditional probability, Bayes' theorem, and belief updating. The activity is self-contained, uses hand-computable probabilities organized in two-way tables, and engages students in structured team roles. We evaluated the activity in four sections of an undergraduate introductory statistics course using a quasi-experimental comparison of POGIL-style and lecture-based instruction for a Bayes' theorem unit. Outcomes included student performance on Bayes' theorem final exam questions and satisfaction with instruction. We used a Bayesian bivariate generalized linear model to compare the two approaches while accounting for major type, gender, and race. The results indicated similar exam performance and similar probabilities of high satisfaction across instructional styles and demographic groups, with considerable uncertainty and no clear evidence of meaningful differences. These findings suggest that the POGIL-style activity performed comparably to lecture-based instruction for this unit while offering an active and classroom-ready way to introduce Bayesian reasoning without requiring difficult computation or simulation. We provide adaptable instructional materials and a reproducible Bayesian analytic framework for evaluating active learning innovations in introductory statistics. Our study supports the feasible inclusion of Bayesian reasoning in introductory courses and may help instructors considering active learning.

2026-06-07T15:02:43Z Cheng-Han Yu Angela Ebeling http://arxiv.org/abs/2302.01233v3 Sparse High-Dimensional Vector Autoregressive Bootstrap 2026-06-07T14:52:43Z

We introduce a high-dimensional multiplier bootstrap for time series data based on capturing dependence through a sparsely estimated vector autoregressive model. We prove its consistency for inference on high-dimensional means under two different moment assumptions on the errors, namely sub-gaussian moments and a finite number of absolute moments. In establishing these results, we derive a Gaussian approximation for the maximum mean of a linear process, which may be of independent interest.

2023-02-02T17:14:54Z Robert Adamek Stephan Smeekes Ines Wilms http://arxiv.org/abs/2606.08642v1 A Practical Framework for Sensitivity Analysis in Externally Controlled Trials: An Illustration with a Bayesian Hybrid Evidence Synthesis Case Study 2026-06-07T14:14:50Z

Externally controlled trials (ECTs), including single-arm studies augmented with historical data and hybrid randomized designs with partial external augmentation, are increasingly used when concurrent randomized controls are infeasible or unethical. Regulatory guidance from the FDA, EMA, and NMPA calls for sensitivity analysis of borrowing assumptions, yet provides no structured template for which analyses to run or how to interpret them together. We propose a three-pillar framework organized around three questions: was the borrowing appropriate, did it contribute meaningful value, and are the conclusions robust to perturbation? The framework comprises eight modular analyses covering heterogeneity diagnostics, source influence, no-borrowing references, effective sample size, prior sensitivity, tipping points, alternative borrowing methods, and structural model sensitivity. It is method-agnostic and applies to both Bayesian and frequentist borrowing in patient-level or hybrid settings. We illustrate the framework using simulated data that mimic a hybrid evidence synthesis from a historical approval of ethnic-bridging submission under a real-world-evidence regulatory pathway. That original analysis combined individual patient data from a global pivotal study and a regional real-world study with aggregate data from two published cohorts, fitted via a Bayesian longitudinal model with ethnic-difference parameters. The worked example provides a reproducible template for sensitivity analysis in ECT submissions.

2026-06-07T14:14:50Z Xuemin Gu Kitty Guo Jane Zhang http://arxiv.org/abs/2605.16866v3 Heavy Tails and Predictive Ability Testing 2026-06-07T13:11:45Z

We study the asymptotic behaviour of widely used tests for evaluating and comparing predictive accuracy when forecast errors exhibit heavy tails. In particular, when loss differentials have infinite variance, the Diebold-Mariano test statistic converges to a nonstandard limit involving non-Gaussian stable random variables. As a consequence, conventional critical values can yield severely distorted inference: a nominal 5$\%$ test may reject a true null as often as 70$\%$ of the time. To establish these results, we develop a new stable limit theorem for strongly mixing, infinite-variance time series processes. Building on this theory, we consider sub-sampling-based inference that remains valid irrespective of tail-heaviness and requires no estimation of long-run variances or tail indices. An application to risk forecasts for emerging-market exchange rates shows that accounting for heavy tails can substantially alter conclusions about predictive performance relative to standard procedures.

2026-05-16T07:58:02Z 72 pages, 3 figures. Application in Econometrics Jonas F. Frederiksen Muneya Matsui Rasmus S. Pedersen http://arxiv.org/abs/2602.05553v2 Sensitivity analysis for contamination in egocentric-network randomized trials with interference 2026-06-07T10:51:15Z

Egocentric-Network Randomized Trials (ENRTs) are increasingly used to estimate causal effects under interference when measuring complete sociocentric network data is infeasible. ENRTs rely on egocentric network sampling, where a set of egos is first sampled, and each ego recruits a subset of its neighbors as alters. Treatments are then randomized across egos. While the observed ego-networks are disjoint by design, the underlying population network may contain edges connecting them, leading to contamination. Under a design-based framework, we show that the Horvitz-Thompson estimators of direct and indirect effects are biased whenever contamination is present. To address this, we derive bias-corrected estimators and propose a novel sensitivity analysis framework based on sensitivity parameters representing the probability or expected number of missing edges. This framework is implemented via both grid sensitivity analysis and probabilistic bias analysis, providing researchers with a flexible tool to assess the robustness of the causal estimators to contamination. We apply our methodology to the HIV Prevention Trials Network 037 study, finding that ignoring contamination may lead to underestimation of indirect effects and overestimation of direct effects.

2026-02-05T11:23:23Z Bar Weinstein Daniel Nevo http://arxiv.org/abs/2606.08560v1 CP-factorization for high dimensional tensor time series and double projection iterations 2026-06-07T10:34:03Z

We adopt the canonical polyadic (CP) decomposition to model high-dimensional tensor time series. Our primary goal is to identify and estimate the factor loadings in the CP decomposition. We propose a one-pass estimation procedure through standard eigen-analysis for a matrix constructed based on the serial dependence structure of the data. The asymptotic properties of the proposed estimator are established under a general setting as long as the factor loading vectors are linearly independent, allowing the factors to be correlated and the factor loading vectors to be not nearly orthogonal. The procedure adapts to the sparsity of the factor loading vectors, accommodates weak factors, and demonstrates strong performance across a wide range of scenarios. To further reduce estimation errors, we also introduce an iterative algorithm based on a novel double projection approach. We theoretically justify the improved convergence rate of the iterative estimator, and derive the associated limiting distribution. A consistent estimator of the asymptotic variance is also provided, which plays a key role in the related inference problems. All results are validated through extensive simulations and two real data applications.

2026-06-07T10:34:03Z Jinyuan Chang Guanglin Huang Qiwei Yao Long Yu http://arxiv.org/abs/2606.08551v1 Enhanced localized conformal prediction with imperfect auxiliary information 2026-06-07T10:12:30Z

There is growing interest in constructing conformal prediction sets that provide approximate or asymptotic conditional coverage guarantees, capturing local data heterogeneity. However, methods like localized conformal prediction (LCP) may face challenges in ensuring reliable prediction sets in regions with sparse calibration data. This paper introduces Enhanced Localized Conformal Prediction (ELCP), a novel approach that incorporates auxiliary data to refine localized prediction sets while preserving finite-sample marginal coverage guarantees. By utilizing a density-ratio-weighted kernel estimator, ELCP seamlessly integrates auxiliary and calibration data, accommodating potential distributional shifts and improving the local reliability of prediction sets. Theoretical analysis confirms that ELCP maintains marginal coverage and enhances asymptotic test-conditional coverage. Simulation results demonstrate its superior local coverage and smaller prediction sets compared to standard LCP, highlighting its effectiveness in settings with limited calibration data but available auxiliary information from related tasks.

2026-06-07T10:12:30Z Yinjie Min Liuhua Peng Changliang Zou http://arxiv.org/abs/2606.08499v1 A Transferability Criterion for Null-Optimized Variance Reduction in Cumulant-Based Error-Independence Testing 2026-06-07T07:56:14Z

Control-variate and polynomial-maximization (PMM) estimators are optimized at a single fixed distribution, yet they are increasingly proposed to strengthen hypothesis tests, which decide between two regions of a parameter family. We give a closed-form criterion for when this transfer succeeds. For an H0-centered augmentation of a target moment statistic with null-optimized weight vector K0, the alternative-side expectation equals the target plus K0^T mu_a,H1, where mu_a,H1 is the alternative-side mean of the augmenting basis. Null-variance reduction therefore transfers without bias only under the orthogonality condition K0^T mu_a,H1 = 0; requiring each augmenting function to remain mean-zero is sufficient but not necessary. We instantiate the criterion on the recently proposed Wiedermann-Shi third-order cumulant test for measurement-error independence. A second-order PMM correction is unbiased and lower-variance under the null (relative efficiency >= 1 in all 36 conditions; aggregated mean ARE values 1.23-5.16; Type-I 0.04-0.09), yet provably inconsistent under the alternative: the antisymmetric polynomial auxiliaries acquire nonzero means, attenuating the target by a closed-form factor and costing 7-52 percentage points of power, worst where the test is strongest and worsening under heavy tails. A fourth-order variant reduces variance (ratio 1.127) but fails a nuisance guard (rejection 0.295 versus 0.10). We derive a reusable alternative-consistency acceptance gate for variance-reduced test statistics.

2026-06-07T07:56:14Z 16 pages; no figures; submitted manuscript version Serhii Zabolotnii http://arxiv.org/abs/2606.08498v1 Tests for Independence of High-Dimensional Nonstationary Time Series 2026-06-07T07:54:53Z

This manuscript studies the problem of independence testing between two high-dimensional time series without assuming weak stationarity, that is, allowing their autocovariances to vary over time. To this end, we propose a bimodal weighted-average test statistic that removes the bias induced by temporal dependence under the null hypothesis, thereby avoiding the need to whiten the time series prior to hypothesis testing -- a procedure that is challenging in high-dimensional and nonstationary settings. To facilitate statistical inference, we develop a dependent wild bootstrap procedure. On the theoretical side, we derive a concentration inequality for quadratic forms of time series data stemming from a class of high-dimensional, nonlinear, and nonstationary processes. This result enables us to derive the asymptotic null distribution of the proposed test statistic and to establish the validity of the bootstrap algorithm. Numerical results show that the proposed test attains desired size and good power performance even when the dimension exceeds the sample size or when the data-generating process exhibits time-varying autocovariances. In contrast, tests based on whitening time series fail to maintain correct size in the presence of unstable autocovariance structures. Since nonstationary autocovariances commonly arise in real-life time series data, our work offers a robust procedure for independence testing.

2026-06-07T07:54:53Z Yunyi Zhang http://arxiv.org/abs/2606.08475v1 Parameter uncertainty in dynamical models: a practical identifiability index 2026-06-07T06:44:39Z

Ordinary differential equation models are widely used to understand and forecast complex dynamical systems, but their predictive value depends on reliable parameter estimation. Structural identifiability assesses whether parameters can be uniquely recovered from ideal observations, whereas practical identifiability depends on finite, noisy and partially observed data. We introduce the Practical Identifiability Index (PII), a marginal uncertainty-width metric based on the logarithmic span of confidence intervals. Expressed on an order-of-magnitude scale, the PII summarises how tightly individual positive-valued parameters are constrained by available observations, enabling comparison across parameters, models, error structures and observation designs. The PII is intended as a complementary diagnostic, not a standalone identifiability test, and should be interpreted alongside coverage, profile likelihoods, posterior summaries, sensitivity analysis or structural identifiability results. Using parametric bootstrap experiments across growth and compartmental epidemic models, we identify consistent principles: uncertainty decreases as calibration windows become more informative, increases with observation noise and parameter coupling, and remains high for latent or indirectly observed processes. Parameters governing early observable dynamics become constrained sooner, while additional observables can improve constraint for latent progression and recovery parameters. The PII provides a simple, reportable summary of marginal parameter uncertainty for dynamical modelling.

2026-06-07T06:44:39Z Hamed Karami Alexandra Smirnova Sunmi Lee Gerardo Chowell