https://arxiv.org/api/Q83q9HC8rAGFAAFremOkXGCmdKU 2026-06-21T20:37:33Z 36316 1140 15 http://arxiv.org/abs/2305.00578v3 High-Dimensional Clustering via Nearest-Neighbor Asymmetry 2026-05-14T08:17:32Z High-dimensional clustering often relies on geometric or local-similarity structure, but the dominant separation between groups may not always be location-based. Differences in dispersion can create asymmetric local-neighborhood patterns: points from a more dispersed component may be closer to points in a more concentrated component than to points from their own component. We turn this high-dimensional phenomenon into a clustering principle. The proposed method, NAC (Nearest-neighbor Asymmetry Clustering), constructs a directed $k$-nearest-neighbor graph and evaluates candidate partitions using two permutation-standardized statistics: a weighted within-edge statistic that captures overall within-cluster enrichment and a contrast statistic that captures asymmetric separation. The resulting objective combines these two standardized signals, allowing the method to adapt to different separation regimes without specifying a mixture model or a low-dimensional representation. We provide a population-level analysis showing how the two statistics target complementary nearest-neighbor patterns. Simulation studies across mean, scale, and combined location-scale differences show that NAC is competitive under location separation and especially effective when nearest-neighbor asymmetry is present; gene-expression applications further illustrate its usefulness in small-sample, high-dimensional clustering. 2023-04-30T21:18:20Z Hao Chen Xiancheng Lin http://arxiv.org/abs/2605.14491v1 Adaptive Long-Run Variance Thresholding for Sparse Covariance Estimation in High-Dimensional Time Series 2026-05-14T07:30:38Z Estimating a sparse covariance matrix is a fundamental problem in high-dimensional statistics. However, thresholding methods developed for independent data are generally not directly applicable to high-dimensional time series, where temporal dependence alters the stochastic behavior of sample covariance estimators. This paper studies sparse covariance matrix estimation for high-dimensional time series under weak dependence. We propose a thresholding procedure that incorporates long-run variance into the construction of entry-specific thresholds, thereby adapting to temporal dependence. Under suitable regularity conditions, we show that the proposed estimator is consistent under the spectral norm and attains the optimal convergence rate over a class of sparse covariance matrices. We further establish support recovery consistency for identifying the nonzero entries of the covariance matrix. In addition, we show that universal and adaptive thresholding methods developed for independent data may fail to recover the support consistently in the presence of autocorrelation. Simulation studies demonstrate that the proposed method compares favorably with existing thresholding estimators in terms of both estimation accuracy and support recovery. Applications to gene expression data and stock return data further illustrate its practical usefulness. 2026-05-14T07:30:38Z Wenhao Zhang Zhaoxing Gao http://arxiv.org/abs/2506.20425v3 Scalable Subset Selection in Linear Mixed Models 2026-05-14T07:22:23Z Linear mixed models (LMMs), which incorporate fixed and random effects, are key tools for analyzing heterogeneous data, such as in personalized medicine. Nowadays, this type of data is increasingly wide, sometimes containing thousands of candidate predictors, necessitating sparsity for prediction and interpretation. However, existing sparse learning methods for LMMs do not scale well beyond tens or hundreds of predictors, leaving a large gap compared with sparse methods for linear models, which ignore random effects. This paper closes the gap with a new $\ell_0$ regularized method for LMM subset selection that can run on datasets containing thousands of predictors in seconds to minutes. On the computational front, we develop a coordinate descent algorithm as our main workhorse and provide a guarantee of its convergence. We also develop a local search algorithm to help traverse the nonconvex optimization surface. Both algorithms readily extend to subset selection in generalized LMMs via a penalized quasi-likelihood approximation. On the statistical front, we provide a finite-sample bound on the Kullback-Leibler divergence of the new method. We then demonstrate its excellent performance in experiments involving synthetic and real datasets. 2025-06-25T13:39:30Z Ryan Thompson Matt P. Wand Joanna J. J. Wang http://arxiv.org/abs/2306.15199v3 Rank-Transformed Dissimilarity Profiles for High-Dimensional Classification 2026-05-14T07:09:27Z Despite advances in representation learning, high-dimensional classification remains challenging in low-sample-size regimes, where the dominant signal may vary across applications and labeled data are often limited. We propose a dissimilarity-profiling classification framework that represents each observation by its class-wise dissimilarity profile, transforming the original feature space into a low-dimensional representation that summarizes how the observation relates to each class. The key idea is to turn a consequence of the curse of dimensionality into signal: high-dimensional geometry can induce systematic within-class and between-class dissimilarity patterns under location, scale, or other distributional changes, and these patterns are captured by the class-wise profiles. Building on this representation, we introduce a rank-transformed algorithm that converts dissimilarities into class-wise rank profiles, yielding a compact representation for classification. The proposed method delivers competitive or improved performance relative to commonly used classifiers on two-class, multi-class, network, and real high-dimensional low-sample-size datasets. To provide insight into the mechanism underlying the method, we analyze a distance-based surrogate and show that the resulting profiles encode differences in first, second, and higher-order moments, while the rank transformation improves robustness to outliers. Together, these results show that rank-transformed dissimilarity profiles provide an adaptive representation for high-dimensional classification when the signal structure is unknown. 2023-06-27T04:46:54Z Xiangbo Mo Hao Chen http://arxiv.org/abs/2605.14463v1 KAP-CPD: Kernel Aggregation for Change-Point Detection in Dynamic Networks 2026-05-14T06:59:29Z Change-point detection in dynamic networks has received much attention due to its broad applications in social networks and biological systems. Kernel-based methods have shown strong potential for this problem. However, their performance can depend sensitively on the choice of kernel, and selecting an appropriate kernel is challenging when the underlying change pattern is unknown. Motivated by this challenge, we propose KAP-CPD, a new kernel-based testing framework for change-point detection in dynamic networks. KAP-CPD aggregates information from multiple kernels, allowing it to adapt to diverse change patterns. The proposed method does not assume specific underlying network distribution, and achieves strong empirical power across a wide range of network change scenarios. To improve scalability, we further develop a fast analytic testing procedure, KAPf-CPD, that substantially reduces computation time for long network sequences compared with permutation-based alternatives and current state-of-the-art methods. We evaluate our proposed framework through extensive simulations and real-world data on email communication networks and brain functional connectivity networks. 2026-05-14T06:59:29Z Mingxuan Sun Hao Chen http://arxiv.org/abs/2505.09552v3 Scalable Krylov Subspace Methods for Generalized Mixed-Effects Models with Crossed Random Effects 2026-05-14T06:48:00Z Mixed-effects models are widely used to model data with hierarchical grouping structures and high-cardinality categorical predictor variables. However, for high-dimensional crossed random effects, current standard computations relying on Cholesky decompositions can become prohibitively slow. In this work, we present Krylov subspace-based methods that address existing computational bottlenecks, and we analyze them both theoretically and empirically. In particular, we derive new results on the convergence and accuracy of the preconditioned stochastic Lanczos quadrature and conjugate gradient methods for mixed-effects models, and we develop scalable methods for calculating predictive variances. In experiments with simulated and real-world data, the proposed methods yield speedups by factors of up to about 10,000 and are numerically more stable than Cholesky-based computations. 2025-05-14T16:50:19Z Pascal Kündig Fabio Sigrist http://arxiv.org/abs/2605.14453v1 Estimating Precision Matrices for High-Dimensional Interval-Valued Data 2026-05-14T06:47:31Z In the field of statistical learning and data analysis, estimating precision matrices (i.e., the inverse of covariance matrices) is a critical task, particularly for understanding dependency structures among variables. However, traditional methods often fall short when dealing with high-dimensional interval-valued data, where each observation is represented as an interval rather than a single point. This paper proposes a novel framework for estimating precision matrices in such contexts, addressing the unique challenges posed by the interval nature of the data. Specifically, we assume that the upper and lower bounds of the intervals share the same conditional dependency structure, and then formulate the interval graphical lasso optimization objective to estimate the precision matrix. At the optimization level, we provide an efficient computational approach, while at the theoretical level, we prove the sparsity and consistency of the estimator. Experimental results on simulated studies and real data applications demonstrate the superiority of the proposed method in terms of estimation precision and interpretability. 2026-05-14T06:47:31Z Zhongfeng Qin Hao Xu Wenhao Cui Wan Tian http://arxiv.org/abs/2605.14444v1 Inlier Recovery for Robust Registration via Gram-Matrix Overlap 2026-05-14T06:39:36Z Robust point-set registration in the presence of noise and outliers is challenging because the matched points (inliers) must be identified before reliable alignment can be performed. Existing robust registration methods typically optimize over the transformation space and are often designed for regimes with a nonvanishing fraction of inliers. In this paper, we study the inlier recovery problem arising in robust registration by comparing two datasets through the Hadamard product of their Gram matrices. This formulation converts the inlier identification into a structured recovery problem and avoids direct optimization over the rotation group. Based on this idea, we develop two methods: an eigenvector matching method based on the leading eigenvector of the Gram-matrix overlap, and a row-sum matching method based on aggregated entrywise comparison. We show that the eigenvector method achieves weak recovery when the dimension and sample size are of the same order, while the row-sum method achieves exact recovery under a broader range of dimensional scalings. In particular, when the dimension is comparable to the sample size, exact recovery is possible even when the inlier fraction vanishes, with the number of inliers as small as order $\sqrt{n}$, up to logarithmic factors. We also discuss a parallel implementation for large-scale settings. Numerical experiments on brain imaging data and image examples demonstrate that the proposed methods effectively identify matched structure under substantial corruption. 2026-05-14T06:39:36Z Ruizi Wu Yuehaw Khoo Wanjie Wang http://arxiv.org/abs/2404.13649v3 Distributional Principal Autoencoders 2026-05-14T05:09:20Z Dimension reduction techniques usually lose information in the sense that reconstructed data are not identical to the original data. However, we argue that it is possible to have reconstructed data identically distributed as the original data, irrespective of the retained dimension or the specific mapping. This can be achieved by learning a distributional model that matches the conditional distribution of data given its low-dimensional latent variables. Motivated by this, we propose Distributional Principal Autoencoder (DPA) that consists of an encoder that maps high-dimensional data to low-dimensional latent variables and a decoder that maps the latent variables back to the data space. For reducing the dimension, the DPA encoder aims to minimise the unexplained variability of the data with an adaptive choice of the latent dimension. For reconstructing data, the DPA decoder aims to match the conditional distribution of all data that are mapped to a certain latent value, thus ensuring that the reconstructed data retains the original data distribution. Our numerical results on climate data, single-cell data, and image benchmarks demonstrate the practical feasibility and success of the approach in reconstructing the original distribution of the data. DPA embeddings are shown to preserve meaningful structures of data such as the seasonal cycle for precipitations and cell types for gene expression. 2024-04-21T12:52:04Z Xinwei Shen Nicolai Meinshausen http://arxiv.org/abs/2512.24588v2 Multiple Testing of One-Sided Hypotheses with Conservative $p$-values 2026-05-14T02:30:37Z We study a large-scale one-sided multiple testing problem in which test statistics follow normal distributions with unit variance, and the goal is to identify signals with positive mean effects. A conventional approach is to compute $p$-values under the assumption that all null means are exactly zero and then apply standard multiple testing procedures such as the Benjamini-Hochberg (BH) or Storey-BH method. However, because the null hypothesis is composite, some null means may be strictly negative. In this case, the resulting $p$-values are conservative, leading to a substantial loss of power. Existing methods address this issue by modifying the multiple testing procedure itself, for example through conditioning strategies or discarding rules. In contrast, we focus on correcting the $p$-values so that they are exact under the null. Specifically, we estimate the marginal null distribution of the test statistics within an empirical Bayes framework and construct refined $p$-values based on this estimated distribution. These refined $p$-values can then be directly used in standard multiple testing procedures without modification. Extensive simulation studies show that the proposed method substantially improves power when conventional $p$-values are conservative, while achieving comparable performance to existing methods when conventional $p$-values are exact. An application to phosphorylation data further demonstrates the practical effectiveness of our approach. 2025-12-31T03:26:43Z Kwangok Seo Johan Lim Hyungwon Choi Jaesik Jeong http://arxiv.org/abs/2605.14222v1 Robust and Data-Adaptive Integration of Nonconcurrent Data in Platform Trials via Gaussian Processes 2026-05-14T00:31:24Z A platform trial is an innovative clinical trial design that enables simultaneous and continuous evaluation of multiple treatments within a single master protocol. Existing robust methods restrict analyses to concurrently randomized participants due to concerns that including nonconcurrent data may introduce bias from temporal trends. However, this exclusion represents a missed opportunity to improve efficiency. We propose a Gaussian process framework for incorporating nonconcurrent data that exploits temporal smoothness, a key feature of platform trials. The framework includes single-task and multi-task formulations and provides data-adaptive integration of nonconcurrent data with uncertainty quantification. The connection to kernel ridge regression yields a transparent frequentist interpretation of how nonconcurrent data are integrated. We establish two theoretical guarantees: incorporating nonconcurrent controls reduces the posterior variance of the treatment effect, and the resulting bias is controlled by a non-increasing bound. We extend the framework to discrete outcomes and to covariate adjustment, illustrate it on a hypothetical platform trial constructed from SURMOUNT-1, and provide an implementation in the R package RobinCID. 2026-05-14T00:31:24Z Yuhan Qian Yu Du Jingning Zhang Yanyao Yi Patrick J. Heagerty Ting Ye http://arxiv.org/abs/2511.08559v2 Reluctant Transfer Learning in Penalized Regressions for Individualized Treatment Rules under Effect Heterogeneity 2026-05-13T21:23:24Z Estimating individualized treatment rules (ITRs) is fundamental to precision medicine, where the goal is to tailor treatment decisions to individual patient characteristics. While numerous methods have been developed for ITR estimation, there is limited research on model updating that accounts for shifted treatment-covariate relationships in the ITR setting. In practice, models trained on source data must be updated for new (target) datasets that exhibit shifts in treatment effects. To address this challenge, we propose a Reluctant Transfer Learning (RTL) framework that enables efficient model adaptation by selectively transferring essential model components (e.g., regression coefficients) from source to target data, without requiring access to individual-level source data. Leveraging the principle of reluctant modeling, the RTL approach incorporates model adjustments only when they improve performance on the target dataset, thereby controlling complexity and enhancing generalizability. Our method supports multi-armed treatment settings, performs variable selection for interpretability, and provides a regret bound for the difference in value of the optimal ITR and that of the estimated ITR. Through simulation studies and an application to a real data example from the Best Apnea Interventions for Research (BestAIR) trial, we demonstrate that RTL outperforms existing alternatives. The proposed framework offers an efficient, practically feasible approach to adaptive treatment decision-making under evolving treatment effect conditions. 2025-11-11T18:41:50Z Eun Jeong Oh Min Qian http://arxiv.org/abs/2602.21376v2 Fenchel-Young Estimators of Perturbed Utility Models 2026-05-13T20:36:00Z The Perturbed Utility Model (PUM) framework provides a generalization of discrete choice analysis, unifying models like Multinomial Logit (MNL) and Sparsemax through convex optimization. However, standard Maximum Likelihood Estimation (MLE) encounters theoretical and computational limitations when applied to this broader class, particularly regarding non-convexity and instability in sparse regimes. To address these issues, this paper introduces a unified estimation framework for PUMs based on the Fenchel-Young loss. By leveraging the intrinsic convex conjugate structure of the choice probabilities, we demonstrate that the Fenchel-Young estimator guarantees global convexity, providing a stable alternative to MLE that accommodates both dense and sparse choice kernels. Furthermore, we establish the framework's asymptotic consistency and normality under standard regularity conditions. Leveraging the tractability of the Fenchel-Young estimator, we further develop a Parametric Basis Estimation (PBE) procedure that estimate utility parameters jointly with a tree-structured perturbation function within a pre-specified basis family. PBE employs a bi-level optimization architecture that parameterizes the unknown perturbation as a learnable convex combination of basis functions. For any fixed perturbation structure, the inner Fenchel--Young estimation problem is globally convex in the utility parameters, yielding a well-defined solution mapping that can be differentiated under regularity conditions. Empirical validation on the Swissmetro dataset demonstrates that the proposed framework improves predictive performance, as measured by the Brier score and Brier Skill Score, compared to the standard MNL baseline. 2026-02-24T21:14:46Z 46 pages, 5 figures. Distributionally robust extensions previously included in earlier versions are no longer part of this manuscript and will be presented separately Xi Lin Yafeng Yin Tianming Liu http://arxiv.org/abs/2605.14056v1 An MCMC-Based Method for Dynamic Causal Modeling of Effective Connectivity in Functional MRI 2026-05-13T19:28:46Z Effective connectivity analysis in functional magnetic resonance imaging (fMRI) studies directional interactions among brain regions and experimental stimuli. Dynamic causal modeling (DCM) is a widely used method to estimate effective connectivity, based on a state-space representation consisting of a latent neural signal model and an observation model transforming the neural signal into the observed blood-oxygen-level-dependent (BOLD) response. A standard DCM combines ordinary differential equation (ODE) dynamics for the latent signal with a complex neural-hemodynamic system for the observation model, and typically uses variational Bayes for parameter estimation. While physically well-motivated, this approach can lead to practical challenges such as inexact solutions and underestimated uncertainty. We introduce Canonical DCM (CDCM), a Markov chain Monte Carlo (MCMC)-based method that adopts a simpler observation model and the No-U-Turn Sampler for posterior sampling. The simpler observation model admits a piecewise analytic solution to the neural ODE, increasing computational efficiency and enabling explicit derivation of sufficient conditions for parameter identifiability. The results indicate that CDCM provides reliable uncertainty quantification and consistent estimation of parameters related to experimental inputs for simulated and real data. We use publicly available data from the Wellcome Centre for Human Neuroimaging and the Human Connectome Project (HCP) to benchmark CDCM against standard DCM methods and examine replicability of estimated connectivity patterns in small- and large-scale neuroimaging settings. 2026-05-13T19:28:46Z Kaitlyn R. Fales Hyebin Song Nicole A. Lazar http://arxiv.org/abs/2605.14041v1 Wahkon: A Statistically Principled Deep RKHS Superposition Network 2026-05-13T19:01:59Z Deep learning excels at prediction but often lacks finite-sample guarantees and calibrated uncertainty; RKHS (Reproducing Kernel Hilbert Space)-based methods provide those guarantees but struggle to adapt in high dimensions. We propose Wahkon, a deep RKHS superposition network that unifies Kolmogorov's superposition principle with RKHS regularization in the smoothing-spline tradition of Wahba. This yields a finite-dimensional deep representer theorem that makes training tractable and provides explicit layerwise complexity control. We show the penalized estimator is exactly the MAP (maximum a posteriori) estimate under a hierarchical Gaussian-process prior, extending the spline/GP duality to deep compositions. Using metric-entropy arguments, we establish minimax-optimal convergence rates under mild smoothness and clarify how depth and width trade off with regularity. Empirically, Wahkon outperforms multilayer perceptrons, Neural Tangent Kernels, and Kolmogorov--Arnold Networks across simulation benchmarks and a single-cell CITE-seq study. By unifying Kolmogorov's superposition principle with RKHS regularization, Wahkon delivers accuracy, interpretability, and statistical rigor in a single framework. 2026-05-13T19:01:59Z Yongkai Chen Wenxuan Zhong Ping Ma