http://arxiv.org/api/QC0YygZVZfYkBW/amNe8svNEqUA 2025-04-22T00:00:00-04:00 24847 30 15 http://arxiv.org/abs/2504.13977v1 2025-04-17T23:22:00Z 2025-04-17T23:22:00Z Testing Random Effects for Binomial Data In modern scientific research, small-scale studies with limited participants are increasingly common. However, interpreting individual outcomes can be challenging, making it standard practice to combine data across studies using random effects to draw broader scientific conclusions. In this work, we introduce an optimal methodology for assessing the goodness of fit between a given reference distribution and the distribution of random effects arising from binomial counts. Using the minimax framework, we characterize the smallest separation between the null and alternative hypotheses, called the critical separation, under the 1-Wasserstein distance that ensures the existence of a valid and powerful test. The optimal test combines a plug-in estimator of the Wasserstein distance with a debiased version of Pearson's chi-squared test. We focus on meta-analyses, where a key question is whether multiple studies agree on a treatment's effectiveness before pooling data. That is, researchers must determine whether treatment effects are homogeneous across studies. We begin by analyzing scenarios with a specified reference effect, such as testing whether all studies show the treatment is effective 80% of the time, and describe how the critical separation depends on the reference effect. We then extend the analysis to homogeneity testing without a reference effect and construct an optimal test by debiasing Cochran's chi-squared test. Finally, we illustrate how our proposed methodologies improve the construction of p-values and confidence intervals, with applications to assessing drug safety in the context of rare adverse outcomes and modeling political outcomes at the county level. Lucas Kania Larry Wasserman Sivaraman Balakrishnan http://arxiv.org/abs/2504.13322v1 2025-04-17T20:20:06Z 2025-04-17T20:20:06Z Foundations of locally-balanced Markov processes We formally introduce and study locally-balanced Markov jump processes (LBMJPs) defined on a general state space. These continuous-time stochastic processes with a user-specified limiting distribution are designed for sampling in settings involving discrete parameters and/or non-smooth distributions, addressing limitations of other processes such as the overdamped Langevin diffusion. The paper establishes the well-posedness, non-explosivity, and ergodicity of LBMJPs under mild conditions. We further explore regularity properties such as the Feller property and characterise the weak generator of the process. We then derive conditions for exponential ergodicity via spectral gaps and establish comparison theorems for different balancing functions. In particular we show an equivalence between the spectral gaps of Metropolis--Hastings algorithms and LBMJPs with bounded balancing function, but show that LBMJPs can exhibit uniform ergodicity on unbounded state spaces when the balancing function is unbounded, even when the limiting distribution is not sub-Gaussian. We also establish a diffusion limit for an LBMJP in the small jump limit, and discuss applications to Monte Carlo sampling and non-reversible extensions of the processes. Samuel Livingstone Giorgos Vasdekis Giacomo Zanella Keywords: Markov Processes, Sampling Algorithms, Mixing Times, Ergodicity, Markov Chain Monte Carlo, Locally-balanced processes. 31 pages. 31 pages http://arxiv.org/abs/2403.03868v3 2025-04-17T19:05:51Z 2024-03-06T17:18:24Z Confidence on the Focal: Conformal Prediction with Selection-Conditional Coverage Conformal prediction builds marginally valid prediction intervals that cover the unknown outcome of a randomly drawn test point with a prescribed probability. However, in practice, data-driven methods are often used to identify specific test unit(s) of interest, requiring uncertainty quantification tailored to these focal units. In such cases, marginally valid conformal prediction intervals may fail to provide valid coverage for the focal unit(s) due to selection bias. This paper presents a general framework for constructing a prediction set with finite-sample exact coverage, conditional on the unit being selected by a given procedure. The general form of our method accommodates arbitrary selection rules that are invariant to the permutation of the calibration units, and generalizes Mondrian Conformal Prediction to multiple test units and non-equivariant classifiers. We also work out computationally efficient implementation of our framework for a number of realistic selection rules, including top-K selection, optimization-based selection, selection based on conformal p-values, and selection based on properties of preliminary conformal prediction sets. The performance of our methods is demonstrated via applications in drug discovery and health risk prediction. Ying Jin Zhimei Ren Forthcoming at Journal of the Royal Statistical Society Series B http://arxiv.org/abs/2504.13273v1 2025-04-17T18:32:49Z 2025-04-17T18:32:49Z How Much Weak Overlap Can Doubly Robust T-Statistics Handle? In the presence of sufficiently weak overlap, it is known that no regular root-n-consistent estimators exist and standard estimators may fail to be asymptotically normal. This paper shows that a thresholded version of the standard doubly robust estimator is asymptotically normal with well-calibrated Wald confidence intervals even when constructed using nonparametric estimates of the propensity score and conditional mean outcome. The analysis implies a cost of weak overlap in terms of black-box nuisance rates, borne when the semiparametric bound is infinite, and the contribution of outcome smoothness to the outcome regression rate, which is incurred even when the semiparametric bound is finite. As a byproduct of this analysis, I show that under weak overlap, the optimal global regression rate is the same as the optimal pointwise regression rate, without the usual polylogarithmic penalty. The high-level conditions yield new rules of thumb for thresholding in practice. In simulations, thresholded AIPW can exhibit moderate overrejection in small samples, but I am unable to reject a null hypothesis of exact coverage in large samples. In an empirical application, the clipped AIPW estimator that targets the standard average treatment effect yields similar precision to a heuristic 10% fixed-trimming approach that changes the target sample. Jacob Dorn http://arxiv.org/abs/2504.13124v1 2025-04-17T17:41:05Z 2025-04-17T17:41:05Z Spatial Confidence Regions for Excursion Sets with False Discovery Rate Control Identifying areas where the signal is prominent is an important task in image analysis, with particular applications in brain mapping. In this work, we develop confidence regions for spatial excursion sets above and below a given level. We achieve this by treating the confidence procedure as a testing problem at the given level, allowing control of the False Discovery Rate (FDR). Methods are developed to control the FDR, separately for positive and negative excursions, as well as jointly over both. Furthermore, power is increased by incorporating a two-stage adaptive procedure. Simulation results with various signals show that our confidence regions successfully control the FDR under the nominal level. We showcase our methods with an application to functional magnetic resonance imaging (fMRI) data from the Human Connectome Project illustrating the improvement in statistical power over existing approaches. Howon Ryu Thomas Maullin-Sapey Armin Schwartzman Samuel Davenport http://arxiv.org/abs/2504.12989v1 2025-04-17T14:54:00Z 2025-04-17T14:54:00Z Query Complexity of Classical and Quantum Channel Discrimination Quantum channel discrimination has been studied from an information-theoretic perspective, wherein one is interested in the optimal decay rate of error probabilities as a function of the number of unknown channel accesses. In this paper, we study the query complexity of quantum channel discrimination, wherein the goal is to determine the minimum number of channel uses needed to reach a desired error probability. To this end, we show that the query complexity of binary channel discrimination depends logarithmically on the inverse error probability and inversely on the negative logarithm of the (geometric and Holevo) channel fidelity. As a special case of these findings, we precisely characterize the query complexity of discriminating between two classical channels. We also provide lower and upper bounds on the query complexity of binary asymmetric channel discrimination and multiple quantum channel discrimination. For the former, the query complexity depends on the geometric R\'enyi and Petz R\'enyi channel divergences, while for the latter, it depends on the negative logarithm of (geometric and Uhlmann) channel fidelity. For multiple channel discrimination, the upper bound scales as the logarithm of the number of channels. Theshani Nuradha Mark M. Wilde 22 pages; see also the independent work "Sampling complexity of quantum channel discrimination" DOI 10.1088/1572-9494/adcb9e http://arxiv.org/abs/2501.05803v3 2025-04-17T12:23:46Z 2025-01-10T09:10:30Z Test-time Alignment of Diffusion Models without Reward Over-optimization Diffusion models excel in generative tasks, but aligning them with specific objectives while maintaining their versatility remains challenging. Existing fine-tuning methods often suffer from reward over-optimization, while approximate guidance approaches fail to optimize target rewards effectively. Addressing these limitations, we propose a training-free, test-time method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution. Our approach, tailored for diffusion sampling and incorporating tempering techniques, achieves comparable or superior target rewards to fine-tuning methods while preserving diversity and cross-reward generalization. We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization. This work offers a robust solution for aligning diffusion models with diverse downstream objectives without compromising their general capabilities. Code is available at https://github.com/krafton-ai/DAS. Sunwoo Kim Minkyu Kim Dongmin Park ICLR 2025 (Spotlight). The Thirteenth International Conference on Learning Representations. 2025 http://arxiv.org/abs/2406.05714v6 2025-04-17T12:18:41Z 2024-06-09T10:12:08Z A conversion theorem and minimax optimality for continuum contextual bandits We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated with the context. The goal is to minimize all the underlying functions for the received contexts, leading to the contextual notion of regret, which is stronger than the standard static regret. Assuming that the objective functions are $\gamma$-H\"older with respect to the contexts, $0<\gamma\le 1,$ we demonstrate that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear contextual regret. We prove a static-to-contextual regret conversion theorem that provides an upper bound for the contextual regret of the output algorithm as a function of the static regret of the input algorithm. We further study the implications of this general result for three fundamental cases of dependency of the objective function on the action variable: (a) Lipschitz bandits, (b) convex bandits, (c) strongly convex and smooth bandits. For Lipschitz bandits and $\gamma=1,$ combining our results with the lower bound of Slivkins (2014), we prove that the minimax optimal contextual regret for the noise-free adversarial setting is achieved. Then, we prove that in the presence of noise, the contextual regret rate as a function of the number of queries is the same for convex bandits as it is for strongly convex and smooth bandits. Lastly, we present a minimax lower bound, implying two key facts. First, obtaining a sub-linear contextual regret may be impossible over functions that are not continuous with respect to the context. Second, for convex bandits and strongly convex and smooth bandits, the algorithms that we propose achieve, up to a logarithmic factor, the minimax optimal rate of contextual regret as a function of the number of queries. Arya Akhavan Karim Lounici Massimiliano Pontil Alexandre B. Tsybakov http://arxiv.org/abs/2504.12872v1 2025-04-17T12:00:03Z 2025-04-17T12:00:03Z On perfect sampling: ROCFTP with Metropolis-multishift coupler ROCFTP is a perfect sampling algorithm that employs various random operations, and requiring a specific Markov chain construction for each target. To overcome this requirement, the Metropolis algorithm is incorporated as a random operation within ROCFTP. While the Metropolis sampler functions as a random operation, it isn't a coupler. However, by employing normal multishift coupler as a symmetric proposal for Metropolis, we obtain ROCFTP with Metropolis-multishift. Initially designed for bounded state spaces, ROCFTP's applicability to targets with unbounded state spaces is extended through the introduction of the Most Interest Range (MIR) for practical use. It was demonstrated that selecting MIR decreases the likelihood of ROCFTP hitting $MIR^C$ by a factor of (1 - {\epsilon}), which is beneficial for practical implementation. The algorithm exhibits a convergence rate characterized by exponential decay. Its performance is rigorously evaluated across various targets, and tests ensure its goodness of fit. Lastly, an R package is provided for generating exact samples using ROCFTP Metropolis-multishift. Majid Nabipoor http://arxiv.org/abs/2407.05997v2 2025-04-17T11:47:45Z 2024-07-08T14:47:03Z On the differentiability of $φ$-projections in the discrete finite case In the case of finite measures on finite spaces, we state conditions under which {\phi}- projections are continuously differentiable. When the set on which one wishes to {\phi}- project is convex, we show that the required assumptions are implied by easily verifiable conditions. In particular, for input probability vectors and a rather large class of {\phi}-divergences, we obtain that {\phi}-projections are continuously differentiable when projecting on a set defined by linear equalities. The obtained results are applied to {\phi}- projection estimators (that is, minimum {\phi}-divergence estimators). A first application, rooted in robust statistics, concerns the computation of the influence functions of such estimators. In a second set of applications, we derive their asymptotics when projecting on parametric sets of probability vectors, on sets of probability vectors generated from distributions with certain moments fixed and on Fr\'echet classes of bivariate probability arrays. The resulting asymptotics hold whether the element to be {\phi}-projected belongs to the set on which one wishes to {\phi}-project or not. Gery Geenens Ivan Kojadinovic Tommaso Martini 33 pages, 3 figures, 1 table http://arxiv.org/abs/2406.07066v2 2025-04-17T11:18:59Z 2024-06-11T08:50:55Z Inferring the dependence graph density of binary graphical models in high dimension We consider a system of binary interacting chains describing the dynamics of a group of $N$ components that, at each time unit, either send some signal to the others or remain silent otherwise. The interactions among the chains are encoded by a directed Erd\"os-R\'enyi random graph with unknown parameter $ p \in (0, 1) .$ Moreover, the system is structured within two populations (excitatory chains versus inhibitory ones) which are coupled via a mean field interaction on the underlying Erd\"os-R\'enyi graph. In this paper, we address the question of inferring the connectivity parameter $p$ based only on the observation of the interacting chains over $T$ time units. In our main result, we show that the connectivity parameter $p$ can be estimated with rate $N^{-1/2}+N^{1/2}/T+(\log(T)/T)^{1/2}$ through an easy-to-compute estimator. Our analysis relies on a precise study of the spatio-temporal decay of correlations of the interacting chains. This is done through the study of coalescing random walks defining a backward regeneration representation of the system. Interestingly, we also show that this backward regeneration representation allows us to perfectly sample the system of interacting chains (conditionally on each realization of the underlying Erd\"os-R\'enyi graph) from its stationary distribution. These probabilistic results have an interest in its own. Julien Chevallier Eva Löcherbach Guilherme Ost 85 pages, 2 figures http://arxiv.org/abs/2406.04071v2 2025-04-17T10:38:48Z 2024-06-06T13:36:41Z Dynamic angular synchronization under smoothness constraints Given an undirected measurement graph $\mathcal{H} = ([n], \mathcal{E})$, the classical angular synchronization problem consists of recovering unknown angles $\theta_1^*,\dots,\theta_n^*$ from a collection of noisy pairwise measurements of the form $(\theta_i^* - \theta_j^*) \mod 2\pi$, for all $\{i,j\} \in \mathcal{E}$. This problem arises in a variety of applications, including computer vision, time synchronization of distributed networks, and ranking from pairwise comparisons. In this paper, we consider a dynamic version of this problem where the angles, and also the measurement graphs evolve over $T$ time points. Assuming a smoothness condition on the evolution of the latent angles, we derive three algorithms for joint estimation of the angles over all time points. Moreover, for one of the algorithms, we establish non-asymptotic recovery guarantees for the mean-squared error (MSE) under different statistical models. In particular, we show that the MSE converges to zero as $T$ increases under milder conditions than in the static setting. This includes the setting where the measurement graphs are highly sparse and disconnected, and also when the measurement noise is large and can potentially increase with $T$. We complement our theoretical results with experiments on synthetic data. Ernesto Araya Mihai Cucuringu Hemant Tyagi 42 pages, 9 figures. Corrected typos and added clarifications, as per the suggestions of reviewers. Added Remarks 4,5 and Algorithm 4 (which is same as Algorithm 3 but with TRS relaced by a spectral method). Accepted in JMLR http://arxiv.org/abs/1805.10721v4 2025-04-17T04:57:37Z 2018-05-28T01:00:07Z Bernstein's inequalities for general Markov chains We establish Bernstein's inequalities for functions of general (general-state-space and possibly non-reversible) Markov chains. These inequalities achieve sharp variance proxies and encompass the classical Bernstein inequality for independent random variables as special cases. The key analysis lies in bounding the operator norm of a perturbed Markov transition kernel by the exponential of sum of two convex functions. One coincides with what delivers the classical Bernstein inequality, and the other reflects the influence of the Markov dependence. A convex analysis on these two functions then derives our Bernstein inequalities. As applications, we apply our Bernstein inequalities to the Markov chain Monte Carlo integral estimation problem and the robust mean estimation problem with Markov-dependent samples, and achieve tight deviation bounds that previous inequalities can not. Bai Jiang Qiang Sun Jianqing Fan 32 pages including references http://arxiv.org/abs/2504.12615v1 2025-04-17T03:39:52Z 2025-04-17T03:39:52Z Shrinkage priors for circulant correlation structure models We consider a new statistical model called the circulant correlation structure model, which is a multivariate Gaussian model with unknown covariance matrix and has a scale-invariance property. We construct shrinkage priors for the circulant correlation structure models and show that Bayesian predictive densities based on those priors asymptotically dominate Bayesian predictive densities based on Jeffreys priors under the Kullback-Leibler (KL) risk function. While shrinkage of eigenvalues of covariance matrices of Gaussian models has been successful, the proposed priors shrink a non-eigenvalue part of covariance matrices. Michiko Okudo Tomonari Sei http://arxiv.org/abs/2406.19619v3 2025-04-17T00:28:27Z 2024-06-28T03:02:25Z ScoreFusion: Fusing Score-based Generative Models via Kullback-Leibler Barycenters We introduce ScoreFusion, a theoretically grounded method for fusing multiple pre-trained diffusion models that are assumed to generate from auxiliary populations. ScoreFusion is particularly useful for enhancing the generative modeling of a target population with limited observed data. Our starting point considers the family of KL barycenters of the auxiliary populations, which is proven to be an optimal parametric class in the KL sense, but difficult to learn. Nevertheless, by recasting the learning problem as score matching in denoising diffusion, we obtain a tractable way of computing the optimal KL barycenter weights. We prove a dimension-free sample complexity bound in total variation distance, provided that the auxiliary models are well-fitted for their own task and the auxiliary tasks combined capture the target well. The sample efficiency of ScoreFusion is demonstrated by learning handwritten digits. We also provide a simple adaptation of a Stable Diffusion denoising pipeline that enables sampling from the KL barycenter of two auxiliary checkpoints; on a portrait generation task, our method produces faces that enhance population heterogeneity relative to the auxiliary distributions. Hao Liu Junze Tony Ye Jose Blanchet Nian Si 41 pages, 21 figures. Accepted as an Oral (top 2%) paper by AISTATS 2025