https://arxiv.org/api/sEVnJsHmwNFF7ovXP+QGa4Y+bCE 2026-06-10T18:25:18Z 36124 300 15 http://arxiv.org/abs/2602.16733v3 Scaling Reproducibility: An AI-Assisted Workflow for Large-Scale Replication and Reanalysis 2026-06-01T04:54:37Z

Computational reproducibility is central to scientific credibility, yet verifying published results at scale remains costly. We develop an AI-assisted workflow for automated full-paper replication -- retrieving materials, reconstructing environments, executing code, and matching outputs to point estimates reported in regression tables. We define a universe of all empirical and quantitative papers from the three top political science journals (2010--2025) and measure stated data availability using automated extraction. For a stratified sample of 384 studies, we apply the workflow to conduct full-paper replication, totaling 3,523 empirical models. We find that journal verification requirements, combined with data archiving mandates, drive reproducibility: the share of fully or largely reproducible papers rises from 20.8% before DA-RT adoption to 82.5% after, and conditional on accessible replication packages, 92.1% of papers are fully or largely reproducible (234/254). As a secondary application, we apply standardized IV diagnostics to 84 studies (597 IV specifications among 1,910 replicated models), illustrating how automated execution enables systematic reanalysis across heterogeneous empirical settings.

2026-02-17T20:32:04Z Yiqing Xu Leo Yang Yang http://arxiv.org/abs/2507.14464v2 Exact conditional goodness-of-fit tests for the mixed membership stochastic block model 2026-06-01T04:35:00Z

We propose exact conditional goodness-of-fit tests for directed mixed membership stochastic block models. Given dyad-level sender and receiver roles, the block-pair edge totals are sufficient for the block probability matrix; conditioning on these totals gives a nuisance-free uniform law on a finite fiber. This yields finite-sample randomization tests for residual sender and receiver heterogeneity, reciprocity, and directed transitive closure. The procedure uses an independent fiber sampler, Monte Carlo rank $p$-values, and can be applied after drawing latent block-pair assignments from the posterior distribution. Simulations and the Sampson monastery network show that the tests are calibrated under the null and diagnostically useful for directed model misspecification.

2025-07-19T03:45:42Z Sourav Majumdar http://arxiv.org/abs/2606.01674v1 Higher-Order Efficient Estimators: A Review and Simulation-Based Benchmark Study 2026-06-01T04:30:19Z

Higher-order efficient estimators extend standard first-order semiparametric estimators by replacing second-order residuals with third- or higher-order terms, potentially enabling asymptotic efficiency under slower nuisance function convergence rates and improving finite-sample performance. Existing methods achieve higher-order expansions through structurally different approximation strategies, including basis truncation, kernel smoothing, and highly adaptive lasso (HAL) representations, making direct theoretical and practical comparison difficult. In this manuscript, we provide a focused review and a simulation-based empirical benchmark for second-order efficient estimators, using treatment-specific mean estimation as a canonical causal inference and missing data problem. We compare how higher-order influence function (HOIF) estimators, kernel-based higher-order targeted minimum loss-based estimator (HOTMLE), and HAL-based HOTMLE construct higher-order expansions and the approximation or regularization burdens they introduce. The asymptotic and numerical study evaluates first-order and empirical second-order estimators under controlled nuisance errors with constant or increasing sectional variation complexity. Results show that higher-order debiasing can substantially reduce first-order estimation bias; however, gains depend strongly on stability of the approximation or regularization required for higher-order correction. Empirical HAL-based HOTMLE shows relatively stable performance, while empirical HOIF remains sensitive to basis truncation and tuning choices. Overall, this manuscript clarifies when higher-order asymptotic improvements are attained in theory, when they may be practically visible, and when implementation instability may offset theoretical advantages.

2026-06-01T04:30:19Z Zeyi Wang Mark J. van der Laan http://arxiv.org/abs/2606.01669v1 Beyond principal ignorability: Nonparametric sensitivity bounds for principal stratification 2026-06-01T04:24:42Z

Principal stratification is an effective framework addressing intermediate variables in causal inference. However, point identification of the principal causal effects (PCEs) often requires the untestable principal ignorability (PI) assumption. This article develops a nonparametric sensitivity analysis framework for evaluating PI violations. We introduce a margin-free bounding factor parameterized by the selection and outcome relative risks of an unmeasured confounder. Using this bounding factor, we derive sharp nonparametric bounds for each PCE. We prove that these bounds nest within the worst-case nonparametric bounds with and without the monotonicity assumption. We then discuss Cornfield-type conditions and principal E-values that quantify the minimum joint magnitude of unmeasured confounding required to nullify the target PCE. Furthermore, we generalize this methodology to principal generalized causal effects, extending the sensitivity bounds and falsification thresholds to the recent pairwise comparison estimands evaluated over a product space.

2026-06-01T04:24:42Z Xinyuan Chen Michael O. Harhay Fan Li http://arxiv.org/abs/2606.01661v1 Feature leakage and the identifiability of direct-dependency entropy models of neural activity 2026-06-01T04:15:49Z

Biological neurons receive thousands of synaptic inputs on branching, electrically excitable dendrites, yet population activity is often modeled with direct input-output rules in which each input contributes independently to a scalar drive. We study what successful prediction by such models does, and does not, reveal about neural computation. For conditional maximum-entropy models that match output rates and pairwise output-input coactivities, the entropy explained by a direct model is a prediction measure under the sampled input distribution, not a mechanism-identification test. A restricted MaxEnt fit is an information projection: omitted interaction, temporal, or hidden-state terms can be absorbed into fitted first-order parameters whenever they are correlated with the included sufficient statistics. For sparse correlated binary inputs, this absorption has an explicit coskewness form. We introduce diagnostics that separate in-distribution prediction from recovery of the response rule: state reweighting that holds P(y|x) fixed while changing P(x), conditional log-odds contrasts for local additivity, and temporal leakage controls. In ground-truth simulations, purely higher-order responses can pass first-order entropy and raw coactivity tests under leakage-prone sampling, but are correctly classified after reweighting. Applied to selected, leakage-enriched local tables from CA1 hippocampal recordings, approximately half of tables that appear first-order under empirical weights become distribution-sensitive under balanced reweighting, far above a matched additive-surrogate null. Thus direct entropy-explained fractions and raw coactivity predictions should be interpreted as predictions under the observed state distribution, not as evidence that mechanisms outside the direct model are absent or small.

2026-06-01T04:15:49Z Houman Safaai Bernardo L. Sabatini http://arxiv.org/abs/2606.01659v1 Data-Automated Policy Learning for Nonlinear Welfare 2026-06-01T04:13:49Z

This paper explores policy learning from observational data, focusing on a nonlinear welfare criterion in a binary treatment setting. The nonlinear criterion is inspired by scenarios where policymakers prioritize specific population segments. We model this criterion using a utility function that encompasses potential outcomes and intermediate parameters, with the latter capturing higher moments of the outcome distributions. When formulated in the context of observational data, both the intermediate parameters and the welfare criterion depend on the propensity score, which we estimate using machine-learning techniques. To address bias in machine learning estimates, we introduce a novel reweighting-based debiasing approach that offers a promising alternative to traditional orthogonality-based methods. To tackle the complexities of infinite-dimensional policy spaces, we employ sieve approximations and $K$-fold cross-validation for model selection, thereby fully automating the policy-learning process. Despite these complexities, we demonstrate that both the welfare regret and the average welfare regret of our proposed policy learning method satisfy an oracle inequality, thereby providing theoretical guarantees on the performance of the estimated policy relative to the best possible policy. This finding extends the existing results from linear to nonlinear welfare criteria, from finite-dimensional to infinite-dimensional policy spaces, and from a known propensity score to a machine-learned one.

2026-06-01T04:13:49Z Chunrong Ai Zeqi Wu Zheng Zhang http://arxiv.org/abs/2606.01650v1 Post Selection Estimation of Sharpe Ratios 2026-06-01T03:58:35Z

We consider the problem of estimating the true Sharpe ratio of an asset selected for having the highest observed in-sample Sharpe ratio among many assets. We discuss estimators based on the polyhedral lemma, James Stein shrinkage, debiasing the expected maximum Sharpe ratio, thresholding and empirical Bayes. We test these estimators in simulations, computing bias and root mean square error across different values of sample size, number of assets, and spread and shape of population Sharpe ratios. We also compute rank correlation of the estimators against the underlying quantity, simulating how these estimators might be used to compare or rank the output of different teams which perform this selection process. We find that the James Stein estimator provides the best performance across many different realistic values of the relevant parameters, followed by the GMLEB estimator of Jiang and Zhang. These results are fairly robust to correlation of asset returns, with some caveats.

2026-06-01T03:58:35Z Steven E. Pav http://arxiv.org/abs/2605.13397v2 Stabilised weighted data subsampling for accelerated inference in models with recursive likelihoods 2026-06-01T03:19:33Z

Inference for models with recursively defined likelihoods is computationally demanding, limiting scalability to large datasets. We propose a stabilised weighted subsampling methodology for accelerated inference based on an unbiased estimator of the log-likelihood. By assigning higher sampling probabilities to early observations, the method reduces the effective depth of recursive likelihood evaluations and hence computational cost. However, sampling probabilities that decay too slowly yield limited savings, while overly aggressive decay can substantially inflate estimator variance. We develop a stabilisation framework, supported by theory, that restricts the decay to avoid both computational and variance pathologies through principled hyperparameter tuning. We also derive an unbiased subsampling estimator of the log-likelihood gradient, enabling gradient-based inference. The methodology can be embedded within a range of inferential frameworks. We illustrate its use in variational Bayes and subsampling Markov chain Monte Carlo for conditional volatility models, including leverage effects. Empirical results show substantial computational speed-ups relative to full-data methods while maintaining inferential accuracy. We also compare with recent stochastic gradient MCMC and divide-and-conquer MCMC methods for temporally dependent data, observing favourable empirical performance.

2026-05-13T11:53:57Z Version 2: Revised and shortened for journal submission. Some technical material has been moved from the main paper to the appendix and supplementary material. Minor improvements to exposition and presentation. No substantive changes to the methodology, theoretical results, or conclusions. This version includes the main manuscript, appendix, and supplementary material in a single file Matias Quiroz Aishwarya Bhaskaran Zixuan Wang Thomas Goodwin http://arxiv.org/abs/2511.07438v3 Two Datasets Are Better Than One: Method of Double Moments for 3-D Reconstruction in Cryo-EM 2026-06-01T03:12:13Z

Cryo-electron microscopy (cryo-EM) is a powerful imaging technique for reconstructing three-dimensional molecular structures from noisy tomographic projection images of randomly oriented particles. We introduce a new data fusion framework, termed the method of double moments (MoDM), which reconstructs molecular structures from two instances of the second-order moment of projection images obtained under distinct orientation distributions: one uniform, the other non-uniform and unknown. We prove that these moments generically uniquely determine the underlying structure, up to a global rotation and reflection, and we develop a convex-relaxation-based algorithm that achieves accurate recovery using only second-order statistics. Our results demonstrate the advantage of collecting and modeling multiple datasets under different experimental conditions, illustrating that leveraging dataset diversity can substantially enhance reconstruction quality in computational imaging tasks.

2025-11-02T20:10:34Z Joe Kileel Oscar Mickelin Amit Singer Sheng Xu http://arxiv.org/abs/2509.05563v2 Geometry-preserving and interpretable dimension reduction for compositional data 2026-06-01T01:59:02Z

High-dimensional compositional data pose unique statistical challenges due to the simplex constraint and excess zeros. While dimension reduction is indispensable for analyzing such data, conventional approaches often rely on log-ratio transformations that compromise interpretability and distort the data through ad hoc zero replacements. To address these issues, we introduce a geometry-preserving framework for dimension reduction of compositional data, mapping high-dimensional compositions directly to a lower-dimensional simplex. This framework is interpretable as a softened amalgamation of compositions and enables dual visualization -- showing both projected data and how variables contribute to reduced components -- for at-a-glance interpretation. Within this geometry, we define a new sufficient dimension reduction (SDR) approach for compositional predictors, whose identifiable object, termed the central compositional subspace, differs from the classical central subspace in Euclidean SDR. For estimation, we propose a kernel-based method that yields sparse solutions and comes with an intrinsic predictive model for direct downstream analyses. We prove consistency through a new subspace-comparison argument that allows the estimated and target subspaces to have different dimensions. Applications to real microbiome datasets demonstrate that our approach provides a powerful graphical exploration tool for uncovering meaningful biological patterns in high-dimensional compositional data.

2025-09-06T02:16:21Z 61 pages, 4 figures Junyoung Park Cheolwoo Park Jeongyoun Ahn http://arxiv.org/abs/2606.01553v1 Structural Change Detection in High-Dimensional Transformed Factor Models via Canonical Correlation Analysis 2026-06-01T01:57:58Z

This paper develops a canonical-correlation-based method for detecting structural changes in high-dimensional transformed factor models. The proposed approach exploits the low-rank canonical-correlation structure induced by dynamically dependent common factors, while serially uncorrelated idiosyncratic components correspond to a noise subspace with zero canonical correlations. We construct an eigenvalue-ratio criterion that measures residual dynamic dependence in the estimated noise subspace and identifies the true change point under sufficient separation of the regime-specific loading spaces or dynamic canonical correlation structures. Since the change-point location and the regime-specific factor numbers are both unknown, we further propose an alternating iterative estimation procedure that updates them sequentially until convergence. Under suitable mixing and moment conditions, we establish asymptotic properties of the proposed estimators, with convergence rates depending explicitly on factor strength, cross-sectional dimension, and sample size. Monte Carlo experiments and empirical applications to intraday stock returns and U.S. temperature series demonstrate the finite-sample

2026-06-01T01:57:58Z 35 pages Lei Jia Shouri Hu Zhaoxing Gao http://arxiv.org/abs/2606.01539v1 Scalable Counterfactual Risk Estimation for Rare Events in Longitudinal Data 2026-06-01T01:41:16Z

Estimating the causal effect of time-varying treatments on survival outcomes in large observational studies is computationally demanding, particularly when outcomes are rare. While g-formula-based methods such as the iterative conditional expectation (ICE) estimator provide a principled framework for longitudinal causal inference, they become computationally expensive, especially when bootstrap-based variance estimation is required. In addition, outcome rarity at each time point induces severe class imbalance, leading to instability and convergence issues in logistic regression and related models. To address these challenges, we propose a principled subsampling and reweighting strategy for longitudinal survival data that can be applied to a range of existing causal effect estimators in this setting, including the ICE estimator. The proposed method substantially reduces computational burden while preserving consistency and improving estimation stability in rare-outcome settings. We evaluate the method through simulations and validate it using a large-scale EHR cohort study on social and behavioral determinants of health (SBDH) and suicide risk, demonstrating its effectiveness for modeling rare outcomes in longitudinal data.

2026-06-01T01:41:16Z Accepted at KDD-2026, 12 pages Xiaohui Yin Avijit Mitra Ying Zhou Kun Chen Hong Yu http://arxiv.org/abs/2403.07008v3 AutoEval Done Right: Using Synthetic Data for Model Evaluation 2026-06-01T01:28:08Z

The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased. These algorithms increase the effective human-labeled sample size by up to 50% on experiments with GPT-4.

2024-03-09T02:47:11Z camera-ready paper version Pierre Boyeau Anastasios N. Angelopoulos Nir Yosef Jitendra Malik Michael I. Jordan http://arxiv.org/abs/2603.22215v2 Multiview Graph Fusion with Covariates 2026-06-01T01:25:56Z

Joint modeling of multiview graphs with a common set of nodes between views and auxiliary predictors is an essential, yet less explored, area in statistical methodology. Traditional approaches often treat graphs in different views as independent or fail to adequately incorporate predictors, potentially missing complex dependencies within and across graph views and leading to reduced inferential accuracy. Motivated by such methodological shortcomings, we introduce an integrative Bayesian approach for joint learning of a multiview graph with vector-valued predictors. Our modeling framework assumes a common set of nodes for each graph view while allowing for diverse interconnections or edge weights between nodes across graph views, accommodating both binary and continuous valued edge weights. By adopting a hierarchical Bayesian modeling approach, our framework seamlessly integrates information from diverse graphs through carefully designed prior distributions on model parameters. This approach enables the estimation of crucial model parameters defining the relationship between these graph views and predictors, as well as offers predictive inference of the graph views. Crucially, the approach provides uncertainty quantification in all such inferences. Theoretical analysis establishes that the posterior predictive density for our model asymptotically converges to the true data-generating density, under mild assumptions on the true data-generating density and the growth of the number of graph nodes relative to the sample size. Simulation studies validate the inferential advantages of our approach over predictor-dependent tensor learning and independent learning of different graph views with predictors. We further illustrate model utility by analyzing functional connectivity graphs in neuroscience under cognitive control tasks, relating task-related brain connectivity with phenotypic measures.

2026-03-23T17:12:48Z 46 pages Sharmistha Guha Jose Rodriguez-Acosta Ivo Dinov http://arxiv.org/abs/2606.01465v1 Comb Test: Histogram Uniformity Testing Based on Discrete Total Variation 2026-05-31T21:52:59Z

Histogram uniformity testing is a common statistical task usually performed using Pearson's chi-square test. This paper proposes a new test based on the discrete total variation that is easy to compute and, for comb-like (alternating) deviations, achieves up to 67% higher statistical power than Pearson's chi-square test, making it a complement to standard tests. The exact null distribution is computed via dynamic programming, and a gamma approximation with Monte Carlo estimation extends the test to arbitrarily large sample sizes. Experiments on simulated ADC alternating differential nonlinearity and on rounding bias detection in scientific data confirm the claims. The Python source code and precomputed data are available at https://github.com/DiscreteTotalVariation/CombTest.

2026-05-31T21:52:59Z 5 pages, 5 figures Nikola Banić Neven Elezović