https://arxiv.org/api/Q7Ua4lM55UnRMrZTTKi05SdWUZo 2026-03-20T09:02:47Z 27242 0 15 http://arxiv.org/abs/2603.19073v1 Finite-sample bounds for multi-output system identification 2026-03-19T15:59:21Z This paper presents uniform-in-time finite-sample bounds for regularized linear regression with vector-valued outputs and conditionally zero-mean subgaussian noise. By revisiting classical self-normalized martingale arguments, we obtain bounds that apply directly to multi-output regression, unlike most of the prior work. Compared to the state of the art, the new results are more general and yield tighter bounds, even for scalar-valued outputs. The mild assumptions we use allow for unknown dependencies between regressors and past noise terms, typically induced by system dynamics or feedback mechanisms. Therefore, these novel finite-sample bounds can be applied to many affine-in-parameter system identification problems, including the identification of a linear time-invariant system from full-state measurements. These new results may lead to significant improvements in stochastic learning-based controllers for safety-critical applications. 2026-03-19T15:59:21Z Submitted for review to IEEE Transactions on Automatic Control Léo Simpson Katrin Baumgärtner Johannes Köhler Moritz Diehl http://arxiv.org/abs/2403.07189v3 A multiscale cavity method for sublinear-rank symmetric matrix factorization 2026-03-19T15:57:49Z We consider a statistical model for symmetric matrix factorization with additive Gaussian noise in the high-dimensional regime, where the rank of the signal matrix to infer $M$ scales with its size $N$ as $M=\mathrm{o}(\sqrt{\ln N})$. Allowing for an $N$-dependent rank offers new challenges and requires new methods. Working in the Bayes-optimal setting, we show that whenever the signal has i.i.d. entries, the limiting mutual information between signal and data is given by a variational formula involving a rank-one replica symmetric potential. In other words, from the information-theoretic perspective, the case of a (slowly) growing rank is the same as when $M=1$ (namely, the standard spiked Wigner model). The proof is primarily based on a novel multiscale cavity method allowing for growing rank along with some information-theoretic identities on worst noise for the vector Gaussian channel. We believe that the cavity method developed here will play a role in the analysis of a broader class of inference and spin models where the degrees of freedom are large arrays instead of vectors. 2024-03-11T22:11:04Z 65 pages. Filled out proof details, improved multiscale cavity method and its proof. Equation and theorem numbering made consistent with published version Jean Barbier Justin Ko Anas A. Rahman 10.4171/MSL/57 http://arxiv.org/abs/2603.04172v2 The Pivotal Information Criterion 2026-03-19T15:35:12Z The Bayesian and Akaike information criteria aim at finding a good balance between under- and over-fitting. They are extensively used every day by practitioners. Yet we contend they suffer from at least two afflictions: their penalty parameter $λ=\log n$ and $λ=2$ are too small, leading to many false discoveries, and their inherent (best subset) discrete optimization is infeasible in high dimension. We alleviate these issues with the pivotal information criterion: PIC is defined as a continuous optimization problem, and the PIC penalty parameter $λ$ is selected at the detection boundary (under pure noise). PIC's choice of $λ$ is the quantile of a statistic that we prove to be (asymptotically) pivotal, provided the loss function is appropriately transformed. As a result, simulations show a phase transition in the probability of exact support recovery with PIC, a phenomenon studied with no noise in compressed sensing. Applied on real data, for similar predictive performances, PIC selects the least complex model among state-of-the-art learners. 2026-03-04T15:26:32Z Sylvain Sardy Maxime van Cutsem Sara van de Geer http://arxiv.org/abs/2509.21181v4 Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias 2026-03-19T15:26:42Z For overparameterized linear regression with isotropic Gaussian design and minimum-$\ell_p$ interpolator $p\in(1,2]$, we give a unified, high-probability characterization for the scaling of the family of parameter norms $ \\{ \lVert \widehat{w_p} \rVert_r \\}_{r \in [1,p]} $ with sample size. We solve this basic, but unresolved question through a simple dual-ray analysis, which reveals a competition between a signal *spike* and a *bulk* of null coordinates in $X^\top Y$, yielding closed-form predictions for (i) a data-dependent transition $n_\star$ (the "elbow"), and (ii) a universal threshold $r_\star=2(p-1)$ that separates $\lVert \widehat{w_p} \rVert_r$'s which plateau from those that continue to grow with an explicit exponent. This unified solution resolves the scaling of *all* $\ell_r$ norms within the family $r\in [1,p]$ under $\ell_p$-biased interpolation, and explains in one picture which norms saturate and which increase as $n$ grows. We then study diagonal linear networks (DLNs) trained by gradient descent. By calibrating the initialization scale $α$ to an effective $p_{\mathrm{eff}}(α)$ via the DLN separable potential, we show empirically that DLNs inherit the same elbow/threshold laws, providing a predictive bridge between explicit and implicit bias. Given that many generalization proxies depend on $\lVert \widehat {w_p} \rVert_r$, our results suggest that their predictive power will depend sensitively on which $l_r$ norm is used. 2025-09-25T13:59:22Z Shuofeng Zhang Ard Louis http://arxiv.org/abs/2511.17928v2 Limit Theorems for Network Data without Metric Structure 2026-03-19T14:53:36Z This paper develops limit theorems for random variables with network dependence, without requiring the individuals in the network to be located in a Euclidean or metric space. This distinguishes our approach from most existing limit theorems in network statistics and econometrics, which are based on weak dependence concepts such as strong mixing, near-epoch dependence, or $ψ$-dependence. All these weak dependence concepts presuppose an underlying metric. By relaxing the assumption of an underlying metric space, our theorems can be applied to a broader range of network data, including financial and social networks. To derive the limit theorems, we generalize the concept of functional dependence (also known as physical dependence) from time series to random variables with network dependence. Using this framework, we establish several inequalities, a law of large numbers, and central limit theorems. Furthermore, we demonstrate the verifiability of our high-level conditions by deriving primitive sufficient conditions for spatial autoregressive models, which are widely used in network data analysis. 2025-11-22T05:56:33Z Wen Jiang Yachen Wang Zeqi Wu Xingbai Xu http://arxiv.org/abs/2603.18938v1 Kernel Single-Index Bandits: Estimation, Inference, and Learning 2026-03-19T14:15:16Z We study contextual bandits with finitely many actions in which the reward of each arm follows a single-index model with an arm-specific index parameter and an unknown nonparametric link function. We consider a regime in which arms correspond to stable decision options and covariates evolve adaptively under the bandit policy. This setting creates significant statistical challenges: the sampling distribution depends on the allocation rule, observations are dependent over time, and inverse-propensity weighting induces variance inflation. We propose a kernelized $\varepsilon$-greedy algorithm that combines Stein-based estimation of the index parameters with inverse-propensity-weighted kernel ridge regression for the reward functions. This approach enables flexible semiparametric learning while retaining interpretability. Our analysis develops new tools for inference with adaptively collected data. We establish asymptotic normality for the single-index estimator under adaptive sampling, yielding valid confidence regions, and derive a directional functional central limit theorem for the RKHS estimator, which provides asymptotically valid pointwise confidence intervals. The analysis relies on concentration bounds for inverse-weighted Gram matrices together with martingale central limit theorems. We further obtain finite-time regret guarantees, including $\tilde{O}(\sqrt{T})$ rates under common-link Lipschitz conditions, showing that semiparametric structure can be exploited without sacrificing statistical efficiency. These results provide a unified framework for simultaneous learning and inference in single-index contextual bandits. 2026-03-19T14:15:16Z Sakshi Arya Satarupa Bhattacharjee Bharath K. Sriperumbudur http://arxiv.org/abs/2508.19949v3 Estimating non-linear functionals of trawl processes 2026-03-19T12:22:37Z Trawl processes are a family of continuous-time, infinitely divisible, stationary processes whose correlation structure is entirely characterized by their so-called trawl function. This paper investigates the problem of estimating non-linear functionals of a trawl function under in-fill and long-span sampling schemes. Specifically, building on the work of \cite{SauriVeraart23}, we introduce non-parametric estimators for functionals of the type $Ψ_{t}(g)=\int_{0}^{t}g(a(s))\mathrm{d}s$ and $ Λ_t(g)=\int_{t}^{\infty}g(a(s))\mathrm{d}s$, where $a$ represents the trawl function of interest and $g$ a non-linear test function. We show that our estimator for $Ψ_{t}(g)$ is consistent and asymptotically Gaussian regardless of the memory of the process. We further demonstrate that the same phenomenon occurs for the estimation of $Λ_t(g)$ as long as $g(x)= \mathrm{O} (\lvert x\rvert^p)$, as $x\to0$, for some $p>3$. Additionally, we illustrate how our results can be used to construct a test statistic robust to memory effects for the presence of $T$-dependent. 2025-08-27T15:03:55Z Orimar Sauri http://arxiv.org/abs/2502.05322v3 Tropical Fréchet Means: a polyhedral approach to exact optimization 2026-03-19T10:49:43Z The Fréchet mean is a fundamental notion of central tendency defined as a minimizer of a sum of squared distances in a general metric space. In this paper, we study Fréchet means in tropical geometry -- a piecewise linear, combinatorial, and polyhedral variant of algebraic geometry -- by formulating and solving the associated tropical quadratic optimization problem. We give a geometric characterization of the collection of all tropical Fréchet means as a bounded set that is simultaneously tropically and classically convex, hence a polytrope. We establish the existence of positivity certificates for maxima of finitely many quadratic polynomials in $\mathbb{R}[x_1,\ldots,x_n]$ whose homogeneous quadratic components are sums of squares, which provides a symbolic framework for exact optimization. Using this structure, we develop algorithms for computing tropical Fréchet means and the associated Fréchet mean polytrope. We further describe a combinatorial type decomposition of the objective function induced by braid arrangements, yielding a piecewise quadratic representation and a fully symbolic method for exact computation. 2025-02-07T20:48:24Z 26 pages. 8 figures. v3: Added Section 5. Extended version as to appear in the special issue for the International Symposium on Symbolic and Algebraic Computation ISSAC 2025 Journal of Symbolic Computation (2026) 102572 Kamillo Ferry Bo Lin Carlos Améndola Anthea Monod Ruriko Yoshida 10.1016/j.jsc.2026.102572 http://arxiv.org/abs/2603.18590v1 Sometimes nonparametrics beat parametrics, even when the model is right 2026-03-19T07:56:49Z A basic issue in both teaching of and practice of statistics is the interplay between modelling assumptions and inference performance. The general message conveyed is that stronger assumptions lead to better statistical performance of the relevant estimators, tests and confidence intervals, provided that these assumptions hold. On the other hand, fewer assumptions often lead to safer and more robust methods that are good also outside narrow conditions, but not quite as good as specialist methods that exploit such narrower conditions, if these are fulfilled. This interplay is nicely illustrated in the context of density estimation, where parametric and nonparametric methods can be contrasted. The parametric ones have mean squared errors of size $O(n^{-1})$ in terms of sample size $n$ if the parametric model is right, but are not even consistent outside the model. The nonparametric methods are everywhere consistent and have mean squared errors of size $O(n^{-4/5})$ for broad classes of estimands. The point we are making here is that this picture is not universally true! We show that a simple kernel density estimator can perform better than a directly estimated parametric density on the latter's home turf, for small sample sizes, in the sense of mean integrated squared error. Our main example is that of estimating an unknown normal density. In the process of developing and discussing this somewhat counter-intuitive and half-paradoxical example we touch on several tangential issues of interest, pertaining to exact small-sample analysis of density estimators. 2026-03-19T07:56:49Z 18 pages, 2 figures; Statistical Research Report, Department of Mathematics, University of Oslo, October 1996, but now arXiv'd March 2026 Morten Byholt Nils Lid Hjort http://arxiv.org/abs/2603.18506v1 Approximation by mixtures of multivariate Erlang distributions 2026-03-19T05:27:13Z We prove that finite multivariate Erlang mixture densities with a common rate parameter are dense in the class of probability densities on $\mathbb{R}_{+}^{d}$ that belong to $L^{p}$, for every dimension $d\in\mathbb{N}$ and every $1\le p<\infty$. The argument is constructive: the one-dimensional Szász--Mirakjan--Kantorovich operator yields Erlang mixture approximations, and its tensor product yields multivariate approximants with a common scale. We then obtain several quantitative consequences. These include compact-set uniform approximation bounds and, under local Hölder conditions of order $α\in(0,1]$, rates of order $n^{-α/2}$ as the common scale $1/n$ tends to zero, whole-domain convergence in weighted sup norms, weighted and unweighted $L^{p}$ rates, and explicit rates for finite mixtures indexed by the number of mixture components. In particular, if the approximating density is required to have at most $K$ mixture components, then on fixed compact cubes we obtain an algebraic rate of order $K^{-α/(2d)}$; in global weighted sup norms we obtain the explicit algebraic component-count rate $K^{-α/[2d(2d+α)]}$; and for $1<p<\infty$ we obtain corresponding weighted $L^{p}$ component-count rates. The results strengthen the weak-approximation theory for multivariate Erlang mixture distributions and yield immediate corollaries for broader classes such as product-gamma mixtures. \noindent\textbf{Keywords:} multivariate Erlang mixtures; Erlang distributions; Szász--Mirakjan--Kantorovich operator; density approximation; weighted $L^{p}$ approximation; approximation rates. 2026-03-19T05:27:13Z Hien Duy Nguyen http://arxiv.org/abs/2603.18490v1 The minimax optimal convergence rate of posterior density in the weighted orthogonal polynomials 2026-03-19T04:51:13Z We investigate Bayesian nonparametric density estimation via orthogonal polynomial expansions in weighted Sobolev spaces. A core challenge is establishing minimax optimal posterior convergence rates, especially for densities on unbounded domains without a strictly positive lower bound. For densities bounded away from zero, we give sufficient conditions under which the framework of \cite{shen2001} applies directly. For densities lacking a positive lower bound, the equivalence between Hellinger and weighted $L_2$-norm distance fails, invalidating the original theory. We propose a novel shifting method that lifts the true density $g_0$ to a sequence of proxy densities $g_{0,n}$. We prove a modified convergence theorem applicable to these shifted densities, preserving the optimal rate. We also construct a Gaussian sieve prior that achieves the minimax rate $\varepsilon_n=n^{-p/(2p+1)}$ for any integer $p\geq1$. Numerical results confirm that our estimator approximates the true density well and validates the theoretical convergence rate. 2026-03-19T04:51:13Z 27 pages, 2 figures, 1 supplementary material (11 pages) Yiqi Luo Xue Luo http://arxiv.org/abs/2603.14601v2 $K-$means with learned metrics 2026-03-19T00:57:30Z We study the Fréchet $k-$means of a metric measure space when both the measure and the distance are unknown and have to be estimated. We prove a general result that states that the $k-$means are continuous with respect to the measured Gromov-Hausdorff topology. In this situation, we also prove a stability result for the Voronoi clusters they determine. We do not assume uniqueness of the set of $k-$means, but when it is unique, the results are stronger. This framework provides a unified approach to proving consistency for a wide range of metric learning procedures. As concrete applications, we obtain new consistency results for several important estimators that were previously unestablished, even when $k=1$. These include $k-$means based on: (i) Isomap and Fermat geodesic distances on manifolds, (ii) difussion distances, (iii) Wasserstein distances computed with respect to learned ground metrics. Finally, we consider applications beyond the statistical inference paradigm like (iv) first passage percolation and (v) discrete approximations of length spaces. 2026-03-15T20:50:59Z Pablo Groisman Matthieu Jonckheere Jordan Serres Mariela Sued http://arxiv.org/abs/2603.14561v2 Refined Inference for Asymptotically Linear Estimators with Non-Negligible Second-Order Remainders 2026-03-18T23:42:18Z Asymptotically linear estimators in semiparametric models achieve their point-estimation guarantees via a von Mises expansion in which a second-order remainder is declared negligible. Confidence intervals then treat the first-order influence-function term as the sole source of sampling variability. This reasoning is asymptotically exact but can fail materially in finite samples whenever the second-order remainder contributes variation of the same order as the influence-function variance -- a regime we call the \emph{near-boundary regime}, characterized by nuisance estimation operating at or near the product-rate threshold. We develop a general theory of inference for this regime. Our contributions are: (i) a \emph{finite-sample variance decomposition} that separates influence-function variance from remainder-induced variance and the covariance between them; (ii) a \emph{sandwich consistency theorem} that gives a precise necessary and sufficient condition -- strong remainder negligibility -- for the standard sandwich to be consistent for the total sampling variance, and shows this is strictly stronger than the product-rate condition that guarantees asymptotic linearity; (iii) two \emph{refined variance estimators} -- leave-one-unit-out jackknife and pairs cluster bootstrap -- each with full asymptotic validity guarantees in the near-boundary regime, together with a heteroskedasticity-corrected sandwich interpretation that is numerically equivalent to the jackknife Wald interval; and (iv) a \emph{clustered-data extension} in which the remainder interacts with intra-cluster correlation to produce an analytic formula for sandwich gap amplification. 2026-03-15T19:23:26Z 32 paged 3 tables, 1 supplement Lin Li Pengcheng Wu http://arxiv.org/abs/2510.04265v3 Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation 2026-03-18T23:24:02Z Pass$@k$ is widely used to report the reasoning performance of LLMs, but it often produces unstable and potentially misleading rankings, especially when the number of trials (samples) is limited and computational resources are constrained. We present a principled Bayesian evaluation framework that replaces Pass$@k$ and average accuracy over $N$ trials (avg$@N$) with posterior estimates of a model's underlying success probability and credible intervals, yielding stable rankings and a transparent decision rule for differences. Evaluation outcomes are modeled as categorical (not just 0/1) with a Dirichlet prior, giving closed-form expressions for the posterior mean and uncertainty of any weighted rubric and enabling the use of prior evidence when appropriate. Theoretically, under a uniform prior, the Bayesian posterior mean is order-equivalent to average accuracy (Pass$@1$), explaining its empirical robustness while adding principled uncertainty. Empirically, in simulations with known ground-truth success rates and on AIME'24/'25, HMMT'25, and BrUMO'25, the posterior-based procedure achieves faster convergence and greater rank stability than Pass$@k$ and recent variants, enabling reliable comparisons at far smaller sample counts. The framework clarifies when observed gaps are statistically meaningful (non-overlapping credible intervals) versus noise, and it naturally extends to graded, rubric-based evaluations. Together, these results recommend replacing Pass$@k$ for LLM evaluation and ranking with a posterior-based, compute-efficient protocol that unifies binary and non-binary evaluation while making uncertainty explicit. Source code is available at https://github.com/mohsenhariri/scorio 2025-10-05T16:14:03Z OpenReview (ICLR 2026): https://openreview.net/forum?id=PTXi3Ef4sT Mohsen Hariri Amirhossein Samandar Michael Hinczewski Vipin Chaudhary http://arxiv.org/abs/2603.18311v1 Minimax Optimal Estimation of Mean and Covariance Functions with Spectral Regularization 2026-03-18T21:50:46Z Estimation of the mean and covariance functions is a fundamental problem in functional data analysis, particularly for discretely observed functional data. In this work, we study a regularization-based framework for estimating the mean and the covariance functions within a reproducing kernel Hilbert space (RKHS) setting. Our approach utilizes a spectral regularization technique under Hölder-type source conditions, allowing for a broad class of regularization schemes and accommodating a wide range of smoothness assumptions on the target functions. Unlike previous works in the literature, the proposed work does not require the target functions to belong to the underlying RKHS. Convergence rates for the proposed estimators are derived, and optimality is established by obtaining matching minimax lower bounds. 2026-03-18T21:50:46Z Naveen Gupta Bharath K Sriperumbudur