https://arxiv.org/api/abpy5vk2IC39AVav+e5xtCwizz4 2026-06-10T22:05:32Z 36124 345 15 http://arxiv.org/abs/2509.23544v2 End-to-End Deep Learning for Predicting Metric Space-Valued Outputs 2026-05-30T18:23:59Z

Many modern applications involve predicting structured, non-Euclidean outputs such as probability distributions, networks, and symmetric positive-definite matrices. These outputs are naturally modeled as elements of general metric spaces, where classical regression techniques that rely on vector space structure no longer apply. We introduce E2M (End-to-End Metric regression), a deep learning framework for predicting metric space-valued outputs. E2M performs prediction via weighted Fréchet means over training outputs, where the weights are learned by a neural network conditioned on the input. This construction provides a principled mechanism for geometry-aware prediction that avoids surrogate embeddings and restrictive parametric assumptions, while fully preserving the intrinsic geometry of the output space. We establish theoretical guarantees, including a universal approximation theorem that characterizes the expressive capacity of the model and a convergence analysis of the entropy-regularized training objective. Through extensive simulations involving probability distributions, networks, and symmetric positive-definite matrices, we show that E2M consistently achieves state-of-the-art performance, with its advantages becoming more pronounced at larger sample sizes. Applications to human mortality distributions and New York City taxi networks further demonstrate the flexibility and practical utility of this framework.

2025-09-28T00:46:12Z 38 pages, 4 figures, 9 tables Journal of Machine Learning Research, 27:1--38, 2026 Yidong Zhou Su I Iao Hans-Georg Müller http://arxiv.org/abs/2606.00797v1 Robust inference for risk heterogeneity under group imbalance 2026-05-30T16:35:45Z

Population-level heterogeneity is ubiquitous in biomedical data, where differences across demographic or clinical subgroups can substantially alter risk patterns. For example, in intensive care unit (ICU) studies, the mortality risk associated with specific admission diagnoses can vary across ethnic groups. Existing approaches for detecting risk heterogeneity are often sensitive to baseline model misspecification and regularization bias, both of which commonly arise in practice. In this paper, we propose a robust framework for inferring risk heterogeneity between two populations using Neyman orthogonality, which yields estimators that are locally insensitive to nuisance parameter estimation error. The proposed estimator is consistent and asymptotically normal, and simulation studies demonstrate that in finite samples our method substantially reduces bias and improves inferential stability compared with standard likelihood-based approaches. In an application to the eICU Collaborative Research Database, our method reveals clinically meaningful ethnicity-specific heterogeneity in admission diagnoses for in-hospital mortality that standard likelihood-based methods fail to detect.

2026-05-30T16:35:45Z Mengqi Xu Subha Maity Joel Dubin http://arxiv.org/abs/2606.00767v1 The Effect of Choice of Metric and Scan Length on Reliability in Resting-State fMRI 2026-05-30T15:18:27Z

Resting-state fMRI (rs-fMRI) is widely used to investigate brain functional connectivity, but the reliability of these measurements remains a key concern for ensuring reproducibility. The distance-based intraclass correlation coefficient (dbICC) generalizes classical ICC to more general data types, making it well-suited for assessing the reliability of measures of functional connectivity. In this study, we applied dbICC to assess the reliability of rs-fMRI data from the Midnight Scanning Club (MSC) dataset, which consists of 10 subjects, each undergoing 10 sessions of 30-minute rs-fMRI scans. The functional connectivity was estimated using Pearson's correlation coefficients between all pairs of brain regions, resulting in a correlation matrix for each session. We compared two distance metrics-the widely used Frobenius metric and the Affine Invariant Riemannian Metric (AIRM) selected to respect the geometry of the space of covariance matrices-to evaluate how the choice of metric affects the reliability of estimating correlation. In addition, we investigated the impact of scan length and time interval between sessions on reliability. Results based on each metric agreed in some respects but disagreed in others, illustrating the impact of choice of metric. We also found that longer scan lengths significantly improve reliability, while the time interval between sessions has less impact.

2026-05-30T15:18:27Z Yu Huang Philip T. Reiss Seonjoo Lee R. Todd Ogden http://arxiv.org/abs/2606.00758v1 Statistical Testing on Directed Graphs by Surrogate Data Generation 2026-05-30T14:50:15Z

In recent years, graph signal processing has emerged as a powerful framework at the intersection of signal processing and graph theory, providing tools for the analysis of signals defined on nodes while accounting for their relationships represented by edges. These tools have been successfully applied to various settings, including statistical hypothesis testing. In particular, non-parametric approaches based on surrogate generation have been proposed for signals on undirected graphs. However, they are yet to be extended to directed graphs. In this work, we first revisit the notion of stationary graph signals on directed graphs. Specifically, and through the eigendecomposition of the graph shift operator, we define directed graph wide-sense stationary signals. Then, we propose a new framework to generate surrogate graph signals that preserve covariance structure under stationarity assumptions. Null distributions of the test metric can then be constructed from these surrogates and serve as a reference for the empirical data. Finally, we provide guiding examples and an application on real data, in which we compare the performance of our framework with existing techniques for undirected graphs or based on naive permutation, demonstrating feasibility and superiority of the proposed approach.

2026-05-30T14:50:15Z Submitted to IEEE Transactions on Signal and Information Processing over Networks Chun Hei Michael Chan Alexandre Cionca Dimitri Van De Ville http://arxiv.org/abs/2606.00754v1 Causal Density Functions 2026-05-30T14:41:25Z

We introduce causal density functions: Radon-Nikodym derivatives that compare interventional laws to observational laws and therefore act as local density ratios for causal effects. Whereas many causal-strength measures compare whole distributions after graph surgery, causal density functions provide a pointwise change-of-measure object that can be estimated, calibrated, and used to score directed influence. The basic identity \[ \mathbb{E}_{\mathrm{do}}[f(Y)] = \mathbb{E}_{\mathrm{obs}}\!\left[f(Y)ρ(X,Y)\right] \] makes causal density directly testable: if the estimated density ratio is correct, observational expectations reweighted by $ρ$ reproduce interventional expectations. We derive practical estimators for do-curves and directed edge scores, relate the construction to Radon-Nikodym/Kan semantics for conditioning and intervention, and evaluate the resulting estimators on synthetic and real perturbation benchmarks.

2026-05-30T14:41:25Z 25 pages Sridhar Mahadevan http://arxiv.org/abs/2505.18102v7 CapBencher: Give Your LLM Benchmark a Built-in Alarm for Test-Set Overfitting 2026-05-30T13:29:55Z

Publishing a large language model (LLM) benchmark (especially its ground-truth answers) on the Internet risks contaminating future LLMs and enabling evaluation gaming: it may be unintentionally (or intentionally) used to train or select a model, or exploited to overfit and hack leaderboards when labels are accessible. A common mitigation is to keep the benchmark private and let participants submit their models or predictions to the organizers, but this still permits test-set overfitting through feedback loops. To overcome this issue, we propose CapBencher, a way to publish benchmarks without fully disclosing the ground-truth answers, while preserving open evaluation of LLMs. The main idea is to reduce the best possible accuracy, i.e., Bayes accuracy, by injecting randomness to the answers by preparing several logically correct answers, and only include one of them as the solution in the benchmark. Not only does this obscure the ground-truth answers, but it also offers a test for leakage or gaming: since even fully capable models should not surpass the Bayes accuracy, any model that does is a strong signal. We show theoretically and empirically that CapBencher accurately detects test-set overfitting across diverse benchmarks, models, training methodologies, and scenarios.

2025-05-23T16:57:34Z ICML 2026 camera ready version Takashi Ishida Thanawat Lodkaew Ikko Yamane http://arxiv.org/abs/2606.00715v1 Rate-optimal neural boundary detection from unlabeled noisy images 2026-05-30T12:58:09Z

We study boundary detection for unlabeled noisy images from a statistical perspective. The aim is to recover an unknown object region from raw intensity observations without pixel-wise annotating labels or a parametric model for the intensity distributions. Motivated by robust Gibbs posterior approaches based on thresholded misclassification losses, we propose a continuous hinge-type surrogate loss for boundary detection. The proposed loss is amenable to gradient-based optimization and can be combined with deep neural networks to represent complex object boundaries. We prove that the proposed loss function is Fisher consistent under a mild separation assumption and obtain a calibration inequality linking excess surrogate risk to the symmetric difference error of the estimated region. Under a piecewise smooth boundary model, we prove that the resulting deep neural network estimator achieves the minimax-optimal boundary recovery rate, up to logarithmic factors. The piecewise smooth formulation accommodates boundaries with corners and kinks, thereby extending beyond globally smooth boundary models. Numerical experiments demonstrate that the proposed method accurately and stably recovers object boundaries across a range of noise levels and shape configurations, and compares favorably with existing unsupervised boundary detection methods.

2026-05-30T12:58:09Z Kyeongho Kim Ilsang Ohn http://arxiv.org/abs/2601.11229v4 ThSQCA: Threshold-Sweep Qualitative Comparative Analysis in R 2026-05-30T12:16:13Z

Qualitative Comparative Analysis (QCA) requires researchers to choose calibration and dichotomization thresholds, and these choices can substantially affect truth tables, minimization, and resulting solution formulas. Despite this dependency, threshold sensitivity is often examined only in an ad hoc manner because repeated analyses are time-intensive and error-prone. We present ThSQCA, an R package that automates threshold-sweep analyses by treating thresholds as explicit analytical variables. It provides four sweep functions (otSweep, ctSweepS, ctSweepM, dtSweep) to explore outcome thresholds, single-condition thresholds, multi-condition threshold grids, and joint outcome-condition threshold spaces, respectively. ThSQCA integrates with the established CRAN package QCA for truth table construction and Boolean minimization, while returning structured S3 objects with consistent print/summary methods and optional detailed results. The package also supports automated Markdown report generation and configuration-chart output to facilitate reproducible documentation of cross-threshold results.

2026-01-16T12:19:01Z 27 pages, 2 figures, 7 tables. R package available on CRAN (https://cran.r-project.org/package=ThSQCA). v5: package renamed from TSQCA to ThSQCA (v2.0.0, now available on CRAN); updated all URLs and version numbers Yuki Toyoda http://arxiv.org/abs/2602.22768v2 Asymptotic Theory and Sequential Testing for Adaptive Bandits 2026-05-30T08:32:28Z

Multi-armed bandit (MAB) processes constitute a foundational subclass of reinforcement learning problems and represent a central topic in statistical decision theory. Yet, conducting valid sequential testing under adaptive allocation remains challenging due to the lack of asymptotic theory under non-i.i.d. reward sequences and sublinear sample sizes for some arms. To address this open challenge, we propose an Urn Bandit (UNB) process to integrate the reinforcement mechanism of urn probabilistic models with MAB principles, ensuring almost sure concentration of allocation proportions on optimal arms. We establish a joint functional central limit theorem (FCLT) for consistent estimators of expected rewards under non-i.i.d. reward sequences with non-sub-Gaussian tails and pairwise cross-arm dependence. To overcome the limitations of existing methods that focus mainly on cumulative regret and therefore provide only algorithmic performance guarantees without supporting valid sequential testing, we develop an asymptotic theory for sequential test statistics under the proposed UNB process. The resulting framework enables a broad class of sequential inference procedures, such as A/B testing and policy evaluation. Simulation studies and real data analysis demonstrate that UNB maintains testing performance comparable to that of the equal randomization (ER) design while achieving improved reward accumulation relative to ER.

2026-02-26T08:59:20Z Li Yang Xiaodong Yan Dandan Jiang http://arxiv.org/abs/2606.00578v1 When Do Generalized Permutation Tests Achieve Optimal Power? A Dispersion Characterization 2026-05-30T07:03:58Z

We study generalized Monte Carlo permutation tests under a non-uniform distribution on permutations. Focusing on the difference-in-means statistic, we introduce two scalar dispersion measures that quantify departures from complete randomization at the individual and pairwise levels. We show that if both dispersions vanish asymptotically, then the conditional permutation distribution converges to its Gaussian benchmark, the critical value stabilizes, and the test attains optimal Pitman local power. Conversely, if these dispersions fail to vanish, the permutation distribution does not self-average, the critical value need not stabilize, and optimal local power cannot in general be guaranteed. We further show that beyond the standard Pitman local model, suitably chosen non-uniform permutation distributions can strictly dominate the uniform distribution by exploiting nuisance structure in the data.

2026-05-30T07:03:58Z 34 pages, 3 figures, 1 table Yongmin Kim Ilmun Kim http://arxiv.org/abs/2501.18798v3 Targeted Data Fusion for Region-Specific Survival Effects in the AMP HIV Prevention Trials 2026-05-30T00:48:46Z

The Antibody Mediated Prevention (AMP) trials opened a new scientific frontier by showing that passively administered monoclonal broadly neutralizing antibodies (bnAbs) could prevent HIV-1 acquisition. Conducted across multiple geographic regions, including the United States, Brazil, Peru, Switzerland, and sub-Saharan Africa, the AMP trials revealed substantial regional heterogeneity in treatment efficacy. These differences, together with privacy and regulatory limits on central data pooling, call for methods that borrow strength across regions without sharing individual-level data. To estimate region- and treatment-specific survival curves under distributional heterogeneity, we develop a federated learning approach that combines site-specific estimators via an L1-regularized criterion that downweights data sources not aligned with the target. We further extend the framework to a general class of causal contrasts, including the risk difference (RD), survival ratio (SR), and restricted mean survival time (RMST) difference. Through extensive simulations and an analysis of the AMP trials under different target populations, we show that the proposed approach provides privacy-preserving, region-adaptive inference with improved precision.

2025-01-30T23:21:25Z Yi Liu Alexander W. Levis Ke Zhu Shu Yang Peter B. Gilbert Larry Han http://arxiv.org/abs/2605.20615v2 Evaluating causal indirect effects when mediators are left-censored by assay limit of quantification 2026-05-30T00:36:49Z

Causal mediation analysis is essential for disentangling the mechanisms by which investigational therapeutic and preventive agents impact clinical outcomes. However, the measurement of biological mediators is often subject to left-censoring by technical measurement limitations, most commonly an assay's limit of quantification. This form of censoring can pose severe challenges for both identification and estimation of causal mediation estimands, particularly when the censoring mechanism is deterministic and the resulting missingness is missing not at random (MNAR) or nonignorable. Motivated by the question of assessing the role of viral RNA in the action mechanism of monoclonal antibody therapies for COVID-19 in the Accelerating COVID-19 Therapeutics and Vaccine (ACTIV)-2 platform trial, we develop a semi-parametric framework for estimation of the natural direct and indirect effects when the mediator of interest is partially subject to this form of left-censoring. Our proposed strategy combines fractional imputation with a semi-parametric EM algorithm to flexibly estimate key components of the factorized data likelihood. Applying the proposed strategy to circumvent the left-censoring, we discuss both traditional plug-in and asymptotically efficient estimators of the direct and indirect effect estimands, introducing a data-adaptive $m$-out-of-$n$ bootstrap for robust inference under the imputation procedure. We demonstrate in numerical experiments that our approach significantly reduces bias and allows for reliable inference. An application to data from the ACTIV-2 platform trial confirms that monoclonal antibody therapies reduce the risk of hospitalization and death due to COVID-19, while suggesting that changes in viral RNA mediate only a modest proportion of the overall treatment effect.

2026-05-20T02:02:48Z Cong Jiang Michael D. Hughes Nima S. Hejazi http://arxiv.org/abs/2606.00436v1 Weighted Conformal Clustering 2026-05-29T23:58:56Z

Clustering is a central tool for discovering latent structure in unlabeled data; yet modern clustering pipelines often end with a hard assignment of each observation to a cluster without rigorous measures of assignment uncertainty. We propose a novel weighted conformal approach for constructing valid confidence sets for cluster labels. The key difficulty is that the labels available for calibration are not observed ground-truth labels, but synthetic labels produced by a data-dependent clustering algorithm. Our method develops a conformal inference algorithm that corrects the resulting mismatch with the latent target labels through weights by formulating conformal clustering as a conditional label-distribution shift problem. We first derive an oracle procedure that attains finite-sample marginal coverage and then develop a computationally tractable and implementable version using estimated conditional label probabilities and novel augmented calibration. We show that the coverage of the estimated-weight procedure depends on the estimator, giving an explicit bound on the loss relative to the nominal level. Empirical studies demonstrate that the proposed weighted approach offers improvements over the recently proposed split conformal clustering procedure in terms of informative confidence set size, especially in nonlinear and high-dimensional clustering applications.

2026-05-29T23:58:56Z Anirban Nath YoonHaeng Hur Genevera I. Allen http://arxiv.org/abs/2606.00425v1 Empirical Likelihood with Generative AI 2026-05-29T23:21:50Z

Moment conditions are widely used to identify parameters in models where the full likelihood is either unknown or intentionally left unspecified. Empirical likelihood methods address this problem by assigning probability weights to the observed data so that the sample moment conditions hold exactly. Building on this idea, we propose a nonparametric Bayesian framework based on exponentially tilted empirical likelihood. This Bayesian formulation is particularly appealing in settings where prior information is more naturally specified on the observables rather than on the underlying parameters. Such settings arise in the presence of auxiliary data sources or synthetic data generated by modern generative AI models.Inference proceeds by projecting posterior draws from a Dirichlet process onto the moment-restricted model, yielding a computationally efficient procedure that is naturally amenable to parallelization. We establish new Bernstein--von Mises and consistency theorems for the resulting projection posterior under both vanishing-prior and persistent-prior regimes. In an application to return prediction using overnight news headlines, we show that AI-generated auxiliary data can provide a useful source of indirect regularization when informative priors on the parameter itself are unavailable.

2026-05-29T23:21:50Z Jiguang Li Sid Kankanala Veronika Rockova http://arxiv.org/abs/2110.11074v3 A Unified Framework for Regularized Estimating Equations via Fixed-Point and Variational Inequality Problems 2026-05-29T22:53:28Z

Many statistics problems are formulated within an estimating equation framework instead of a minimization framework. However, the regularized estimating equations (REE) have been much less extensively studies than regularized minimization problems. In this paper, we study an improved regularized estimating equation formulation and explore its subsequent equivalences in terms of (1) fixed-point problem specified via the proximal operator of the corresponding regularizer, and (2) generalized variational inequality problems. Such equivalences hold under general conditions and accommodate nonconvex regularizers. Moreover, these equivalences open up new possibilities in theoretical analysis and computational algorithms when studying the REE.

2021-10-21T12:28:23Z Archer Y. Yang Yue Zhao Yi Lian Yuwen Gu Jun Fan