https://arxiv.org/api/HUfCgUaUcim0yEtua0Heoyx9f5o2026-06-09T21:31:35Z361011515http://arxiv.org/abs/2606.09021v1Sparse Convexification for High-Dimensional Constrained Regression2026-06-08T04:39:29ZWe study high-dimensional linear regression under a general symmetric convex constraint. Rather than imposing a specific sparsity-inducing penalty, we start from an arbitrary sign-symmetric and permutation-invariant convex body $K\subseteq \mathbb R^p$ and construct the sparse convexification hierarchy \[ K^{(s)} = \operatorname{conv}\{v\in K:\|v\|_0\le s\}. \] We propose a penalized least-squares estimator that searches over this hierarchy and adapts to the best sparse convex approximation of the target. Under standard sub-Gaussian assumptions on the random design and noise, we prove an oracle inequality showing that the estimator adapts to the best sparse convex approximation of the target. For an $s$-sparse target, the result yields a squared-error rate governed by the effective sparse dimension $s\log(ep/s)$, the noise level $σ$, and the Euclidean diameter $d_s$ of the sparse convexification $K^{(s)}$. The method applies broadly to symmetric norm balls and can be implemented using oracle access to the Minkowski functional of $K$. As a special case, the framework yields a consistency result for the constrained Lasso.2026-06-08T04:39:29ZMatey Neykovhttp://arxiv.org/abs/2506.04480v2On the Wasserstein Geodesic Principal Component Analysis of probability measures2026-06-08T03:34:40ZThis paper focuses on Geodesic Principal Component Analysis (GPCA) on a collection of probability distributions using the Otto-Wasserstein geometry. The goal is to identify geodesic curves in the space of probability measures that best capture the modes of variation of the underlying dataset. We first address the case of a collection of Gaussian distributions, and show how to lift the computations in the space of invertible linear maps. For the more general setting of absolutely continuous probability measures, we leverage a novel approach to parameterizing geodesics in Wasserstein space with neural networks. Finally, we compare to classical tangent PCA through various examples and provide illustrations on real-world datasets.2025-06-04T22:00:43ZNina VesseronElsa CazellesAlice Le BrigantThierry Kleinhttp://arxiv.org/abs/2606.08981v1Divide-and-shrink: An efficient and heterogeneity-agnostic approach for transfer estimation using summary statistics2026-06-08T03:26:58ZKnowledge transfer across data sources holds great promise for improving the estimation of target population parameters by leveraging the growing availability of data from different sources. However, the effectiveness of knowledge transfer is often challenged by the complex and pervasive heterogeneity between data sources and the lack of access to individual-level data. This paper proposes the divide-and-shrink (dShrink) method, a transfer estimation method that estimates target population parameters in a closed form using summary statistics from a target population and some external source populations while accounting for population heterogeneity. The dShrink estimator is guaranteed to outperform the estimator based solely on the target population in terms of expected quadratic error under arbitrary population heterogeneity. The gain can be substantial when the target and source populations are similar, or the underlying true parameter values are near zero. Notably, dShrink is model-free, requires no user-specified tuning parameters, robust to various types of heterogeneity between data sources, and applies to a broad range of parameter estimation problems. dShrink remains effective even when the covariance matrix is not accessible for the external summary statistics and offers flexibility in incorporating side information and summary statistics from multiple source populations. Simulations and real data analyses demonstrate the superior performance of the dShrink estimator and its potential as a robust tool for transfer estimation.2026-06-08T03:26:58ZRuoyu WangXihong Linhttp://arxiv.org/abs/2512.10250v2Time-Averaged Drift Approximations are Inconsistent for Inference in Drift Diffusion Models2026-06-08T03:22:53ZDrift diffusion models (DDMs) have found widespread use in computational neuroscience, cognitive science, mathematical psychology as well as other fields. They model evidence accumulation in simple decision tasks as a stochastic process drifting towards decision barriers. In models where the drift is both time-varying within a trial and variable across trials, the high computational cost for accurate likelihood evaluation has often led to the use of a computationally convenient surrogate for parameter inference, the time-averaged drift approximation (TADA). In each trial, TADA assumes that the time-varying drift rate can be replaced by its temporal average throughout the trial. This approach enables fast parameter inference using analytical likelihood formulas for DDMs with constant drift. In this work, we show that such an estimator is inconsistent: it does not converge to the true drift, posing a risk of biasing scientific conclusions when parameter estimates are obtained by TADA and similar approximations. We provide an elementary proof of this inconsistency in what is perhaps the simplest possible setting: a Brownian motion with piecewise constant drift hitting a one-sided upper boundary. Furthermore, numerical examples based on an attentional DDM (aDDM) show that using TADA leads to systematic misestimation of attentional effects in decision making and can lead to false conclusions in scientific hypothesis testing.2025-12-11T03:18:55Z37 pages. Includes updates for the first revisionSicheng LiuAlexander FenglerMichael J. FrankMatthew T. Harrisonhttp://arxiv.org/abs/2606.08966v1Class Imbalance Corrections Failed to Enhance Discrimination, Model Calibration, and Prediction Stability: An Empirical Simulation Study Based on Clinical Dataset2026-06-08T03:05:24ZClass imbalance is common when developing clinical prediction models (CPMs) and is often assumed to lead to poor predictive performance. Several methods have been proposed to correct data imbalance during CPM development. However, it remains unclear whether correcting class imbalance improves or harms CPM performance. This study investigated how imbalance correction affects classification performance and prediction stability. We simulated the development and internal validation of CPMs using penalised logistic regression under different imbalance-correction strategies, including algorithm-level rebalancing, data-level rebalancing by oversampling, and combined over- and under-sampling. The simulation dataset was derived from the GUSTO-I trial, which included 40,830 patients and 2,851 events. All imbalance-correction strategies were evaluated across sample-size scenarios ranging from 500 to 40,830. Model performance and prediction stability were assessed using 200 bootstrap resamples, including discrimination, calibration, calibration stability, mean absolute prediction error (MAPE), and classification instability index (CII). Class imbalance correction did not meaningfully improve model discrimination. Both data-level and algorithm-level correction led to miscalibration, risk overestimation, and increased prediction instability, as shown by prediction stability, MAPE, and CII plots, compared with models developed without correction. These findings suggest that class imbalance correction does not necessarily improve CPM performance and may compromise calibration and prediction stability. Class imbalance should not be treated as a pathology that automatically requires correction. In clinical prediction modelling, routine imbalance correction by default is generally not advisable.2026-06-08T03:05:24Z47 pagesWachiranun SirikulNatthanaphop IsaradechWuttipat KiratipaisarlPakpoom WongyikulNoraworn JirattikanwongPhichayut Phinyohttp://arxiv.org/abs/2507.00312v4Optimal Targeting in Dynamic Systems2026-06-08T02:54:02ZModern treatment targeting methods often rely on estimating a conditional average treatment effect (CATE) using machine learning tools. While effective in identifying who benefits from treatment on the individual level, these approaches typically overlook system-level dynamics that may arise when treatments induce strain on shared capacity. We study the problem of targeting in Markovian systems, where treatment decisions must be made one at a time as units arrive, and early decisions can impact later outcomes through delayed or limited access to resources. We show that optimal policies in such settings compare CATE-like quantities to state-specific thresholds, where each threshold reflects the expected cumulative impact on the system of treating an additional individual in the given state. We propose an algorithm that augments standard CATE estimation with state-level value iteration to estimate these thresholds from observational data. Theoretical results establish consistency and convergence guarantees, and empirical studies demonstrate that our method improves long-run outcomes considerably relative to individual-level CATE targeting rules and generic offline reinforcement learning algorithms.2025-06-30T23:02:08ZYuchen HuShuangning LiStefan Wagerhttp://arxiv.org/abs/2606.08934v1Backward Coherence and Hidden-State Stability in Recurrent Neural Networks: A Quasi-Reverse-Martingale Theory2026-06-08T02:20:29ZRecurrent neural networks maintain a hidden state $h_t$, but its probabilistic meaning is often unclear. We study hidden-state stability through \emph{backward coherence}: the extent to which $h_t$ can be reconstructed from $h_{t+1}$ by a learned backward projector $g_φ$. Under contraction and summable backward drift, the hidden-state sequence forms a quasi-reverse-martingale. This yields almost-sure convergence, rates under mixing, an interpretable limiting representation, finite pathwise stopping times, and a theoretical framework for time-uniform confidence sequences.
Simulations support the theory. Backward-coherence regularisation reduces the empirical quasi-martingale total $\hat Q$ by $43$--$58%$, reaches stability $28$--$44%$ earlier than an unregularised RNN, and gives tracking-error recovery consistent with geometric bounds. Additional tests confirm echo-state forgetting rates bounded by $ρ$ and verify the increment-sum tube $R_t$ with $100%$ simultaneous coverage, although $R_t$ is conservative; in practice, the defect-tail proxy $\hat Q_t$ is the more useful monitor. The backward-coherence loss is also equivalent to minimising a Kullback--Leibler divergence in a Gaussian backward model, linking the method to variational inference. Extensions cover $φ$-mixing inputs, change-point tracking, and finite-sample concentration.
Three real-data studies further validate the approach. On PhysioNet 2012 ICU data, the Reverse Martingale RNN (RMRNN) matches RNN mortality-prediction AUC while reaching stable representations 13 hours earlier. On FRED-MD, it reduces one-month-ahead forecast error by about fourfold under concept drift. On UCI Human Activity Recognition, it maintains lower post-transition tracking error with geometric decay. The guarantees apply under the stated assumptions; universality is not claimed.2026-06-08T02:20:29ZYuan-chin Ivan Changhttp://arxiv.org/abs/2408.02122v2Graph-Enabled Efficient Federated Bayesian Modeling2026-06-08T02:19:29ZFederated Bayesian modeling requires combining evidence from distributed users into a coherent global posterior while keeping users' raw data on-device. We propose Federated Latent Graph MCMC (FLaG-MCMC), a computationally efficient framework for federated learning in which historical posterior samples of a shared global parameter are encoded into a learned low-dimensional latent space, connected via a $k$-nearest-neighbor graph, and transferred sequentially to new users as a nonparametric prior. Each user runs graph-based MCMC in the latent space guided by their own likelihood, returns updated global samples to the server, and retains local latent variables on-device. We demonstrate FLaG-MCMC on Bayesian meta-analysis for opioid use disorder prevalence estimation and on federated topic modeling, where the federated posterior closely approximates the pooled full-data posterior for both global parameters and local user-level inference.2024-08-04T19:37:09Z20 pages, 7 figuresChenyang ZhongShouxuan JiTian Zhenghttp://arxiv.org/abs/2606.07379v2Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests2026-06-08T01:53:05ZA growing failure mode in agent evaluation and training is that models can achieve high evaluation scores by exploiting shortcuts instead of solving the intended task, producing deceptive performance. This makes evaluation scores unreliable as measures of true task-solving ability. We propose CapCode, a framework for constructing coding datasets with randomized tests whose best achievable non-cheating performance is deliberately capped below one. This capped-performance design gives evaluation scores a clearer interpretation: scores substantially above the cap are implausible and therefore provide evidence of cheating. To prevent cheating, we propose CapReward, a reward design based on the CapCode principle to discourage optimization beyond the cap. Experiments across multiple datasets show that CapCode detects cheating while preserving performance ranking of models, and CapReward reduces cheating behavior, yielding models that better follow the intended task specification.2026-06-05T15:20:37ZThanawat LodkaewJohannes AckermannSoichiro NishimoriNontawat CharoenphakdeeMasashi SugiyamaTakashi Ishidahttp://arxiv.org/abs/2208.05543v6A novel decomposition to explain heterogeneity in observational and randomized studies of causality2026-06-08T01:41:51ZThis paper introduces a novel decomposition framework to explain heterogeneity in causal effects observed across different studies, considering both observational and randomized settings. We present a formal decomposition of between-study heterogeneity, identifying sources of variability in treatment effects across studies. The proposed methodology allows for robust estimation of causal parameters under various assumptions, addressing differences in pre-treatment covariate distributions, mediating variables, and the outcome mechanism. Our approach is validated through a simulation study and applied to data from the Moving to Opportunity (MTO) study, demonstrating its practical relevance. This work contributes to the broader understanding of causal inference in multi-study environments, with potential applications in evidence synthesis and policy-making.2022-08-10T20:05:34Zedits to data analysis section, minor textual changes throughoutBrian GilbertIvan DıazKara E. RudolphNicholas WilliamsTat-Thang Vohttp://arxiv.org/abs/2510.12744v2Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency Without Model Sweeps2026-06-08T01:14:11ZWe develop a unified statistical framework for softmax-gated Gaussian mixture of experts (SGMoE) that addresses three long-standing obstacles in parameter estimation and model selection: (i) non-identifiability of gating parameters up to common translations, (ii) intrinsic gate-expert interactions that induce coupled differential relations in the likelihood, and (iii) the tight numerator-denominator coupling in the softmax-induced conditional density. Our approach introduces Voronoi-type loss functions aligned with the gate-partition geometry and establishes finite-sample convergence rates for the maximum likelihood estimator (MLE). In over-specified models, we reveal a link between the MLE's convergence rate and the solvability of an associated system of polynomial equations characterizing near-nonidentifiable directions. For model selection, we adapt dendrograms of mixing measures to SGMoE, yielding a consistent, sweep-free selector of the number of experts that attains pointwise-optimal parameter rates under overfitting while avoiding multi-size training. Simulations on synthetic data corroborate the theory, accurately recovering the expert count and achieving the predicted rates for parameter estimation while closely approximating the regression function. Under model misspecification (e.g., $ε$-contamination), the dendrogram selection criterion is robust, recovering the true number of mixture components, while the Akaike information criterion, the Bayesian information criterion, and the integrated completed likelihood tend to overselect as sample size grows. On a maize proteomics dataset of drought-responsive traits, our dendrogram-guided SGMoE selects two experts, exposes a clear mixing-measure hierarchy, stabilizes the likelihood early, and yields interpretable genotype-phenotype maps, outperforming standard criteria without multi-size training.2025-10-14T17:23:44ZDo Tien Hai, Trung Nguyen Mai, and TrungTin Nguyen are co-first authors. In Proceedings of The 29th International Conference on Artificial Intelligence and Statistics, AISTATS 2026 Spotlight, Acceptance rate 2.5% over 2102 submissionsDo Tien HaiTrung Nguyen MaiTrungTin NguyenNhat HoBinh T. NguyenChristopher Drovandihttp://arxiv.org/abs/2505.10849v2A Tractable Unified Skew-t Distribution and Its Copula for Heterogeneous Asymmetries2026-06-08T00:43:52ZMultivariate distributions that allow for asymmetry and heavy tails are important building blocks in many econometric and statistical models. The Unified Skew-t (UST) is a promising choice because it is both scalable and allows for a high level of flexibility in the asymmetry in the distribution. However, it suffers from parameter identification and computational hurdles that have to date inhibited its use for modeling data. In this paper we propose a new tractable variant of the unified skew-t (TrUST) distribution that addresses both challenges. Moreover, the copula of this distribution is shown to also be tractable, while allowing for greater heterogeneity in asymmetric dependence over variable pairs than the popular skew-t copula. We show how Bayesian posterior inference for both the distribution and its copula can be computed using an extended likelihood derived from a generative representation of the distribution. The efficacy of this Bayesian method, and the enhanced flexibility of both the TrUST distribution and its implicit copula, is first demonstrated using simulated data. Applications of the TrUST distribution to highly skewed regional Australian electricity prices, and the TrUST copula to intraday U.S. equity returns, demonstrate how our proposed distribution and its copula can provide substantial increases in accuracy over the popular skew-t and its copula in practice.2025-05-16T04:42:03ZLin DengMichael Stanley SmithWorapree Maneesoonthornhttp://arxiv.org/abs/1910.07712v3Estimating Spatially-Smoothed Fiber Orientation Distribution from Diffusion-MRI Experiments2026-06-08T00:06:11ZDiffusion-weighted magnetic resonance imaging (D-MRI) is a noninvasive in vivo technique for probing the microstructural architecture of biological tissues. At each voxel, the fiber orientation distribution (FOD) characterizes local fiber configurations and orientations and is therefore a central object of estimation in D-MRI analysis. We propose the Nearest-Neighbor Adaptive Regression Model (NARM), a spatially adaptive framework for FOD estimation that performs weighted local likelihood estimation over nested spatial neighborhoods, where the weights jointly encode spatial proximity and similarity among neighboring FODs, measured by either the optimal transport or Hellinger distance. To prevent over-smoothing while preserving structural heterogeneity, we introduce a voxel-wise rescaling scheme and a data-driven stopping rule based on minimum nearest-neighbor dissimilarity. We further develop a configuration-aware strategy for selecting the similarity-smoothing parameter, allowing the smoothing strength to adapt to local fiber complexity. Simulation studies demonstrate that NARM improves FOD estimation accuracy relative to voxel-wise methods and the existing spatial smoothing approach PMARM. Application to test-retest data from the Human Connectome Project additionally shows that NARM yields more reproducible FOD estimates. Implementation details and scripts for the simulation and real data analyses are available at https://github.com/jie108/NARM2019-10-17T05:13:49ZJilei YanSeungyong HwanMengjie ShiJie Penghttp://arxiv.org/abs/2606.08853v1AI-Assisted Variance Reduction in Randomized Experiments2026-06-07T21:45:17ZGenerative AI and large language models can produce realistic predictions of human behavior from rich, unstructured inputs with little to no task-specific training data. Recent work uses these ``digital twin'' predictions to supplement human responses in surveys and experiments. We study the special case of using AI-generated predictions to reduce variance in randomized experiments. We argue that doing so requires no new estimators and that researchers can simply include AI predictions as covariates in standard regression adjustment, analogous to adjusting for a prognostic score. A benefit of this approach is a ``do no harm'' property whereby the adjusted estimator reverts to the unadjusted difference in means when predictions are uninformative. Other methods, such as variants of prediction-powered inference, do not have this guarantee. We provide implementation guidance, including how to obtain continuous scores from discrete LLM outputs and how to use LLMs to featurize unstructured inputs as auxiliary covariates. We demonstrate these ideas in simulations and three empirical applications: a survey mega-study, an email marketing A/B test, and a large-scale technology platform experiment. Overall, efficiency gains are real if modest, with greater benefits in studies that contain substantial text and other unstructured data. We also confirm the do no harm property empirically. Given these gains and limited costs, we recommend adjusting for AI-generated predictions as a regular empirical practice.2026-06-07T21:45:17Zcamera ready for KDD 2026David ArbourEli Ben-MichaelAvi FellerApoorva LalLo-Hua Yuanhttp://arxiv.org/abs/2602.02753v2Effect-Wise Inference for Smoothing Spline ANOVA on Tensor-Product Sobolev Space2026-06-07T21:25:21ZFunctional ANOVA provides a nonparametric modeling framework for multivariate covariates, enabling flexible estimation and interpretation of effect functions such as main effects and interaction effects. However, effect-wise inference in such models remains challenging. Existing methods focus primarily on inference for entire functions rather than individual effects. Methods addressing effect-wise inference face substantial limitations: the inability to accommodate interactions, a lack of rigorous theoretical foundations, or restriction to pointwise inference. To address these limitations, we develop a unified framework for effect-wise inference in smoothing spline ANOVA on a subspace of tensor product Sobolev space. For each effect function, we establish rates of convergence, pointwise confidence intervals, and a Wald-type test for whether the effect is zero, with power achieving the minimax distinguishable rate up to a logarithmic factor. Main effects achieve the optimal univariate rates, and interactions achieve optimal rates up to logarithmic factors. The theoretical foundation relies on an orthogonality decomposition of effect subspaces, which enables the extension of the functional Bahadur representation framework to effect-wise inference in smoothing spline ANOVA with interactions. Simulation studies and real-data application to the Colorado temperature dataset demonstrate superior performance compared to existing methods.2026-02-02T20:07:18ZYoungjin ChoMeimei Liu