http://arxiv.org/api/QC0YygZVZfYkBW/amNe8svNEqUA2025-04-22T00:00:00-04:00248473015http://arxiv.org/abs/2504.13977v12025-04-17T23:22:00Z2025-04-17T23:22:00ZTesting Random Effects for Binomial Data In modern scientific research, small-scale studies with limited participants
are increasingly common. However, interpreting individual outcomes can be
challenging, making it standard practice to combine data across studies using
random effects to draw broader scientific conclusions. In this work, we
introduce an optimal methodology for assessing the goodness of fit between a
given reference distribution and the distribution of random effects arising
from binomial counts.
Using the minimax framework, we characterize the smallest separation between
the null and alternative hypotheses, called the critical separation, under the
1-Wasserstein distance that ensures the existence of a valid and powerful test.
The optimal test combines a plug-in estimator of the Wasserstein distance with
a debiased version of Pearson's chi-squared test.
We focus on meta-analyses, where a key question is whether multiple studies
agree on a treatment's effectiveness before pooling data. That is, researchers
must determine whether treatment effects are homogeneous across studies. We
begin by analyzing scenarios with a specified reference effect, such as testing
whether all studies show the treatment is effective 80% of the time, and
describe how the critical separation depends on the reference effect. We then
extend the analysis to homogeneity testing without a reference effect and
construct an optimal test by debiasing Cochran's chi-squared test.
Finally, we illustrate how our proposed methodologies improve the
construction of p-values and confidence intervals, with applications to
assessing drug safety in the context of rare adverse outcomes and modeling
political outcomes at the county level.
Lucas KaniaLarry WassermanSivaraman Balakrishnanhttp://arxiv.org/abs/2504.13322v12025-04-17T20:20:06Z2025-04-17T20:20:06ZFoundations of locally-balanced Markov processes We formally introduce and study locally-balanced Markov jump processes
(LBMJPs) defined on a general state space. These continuous-time stochastic
processes with a user-specified limiting distribution are designed for sampling
in settings involving discrete parameters and/or non-smooth distributions,
addressing limitations of other processes such as the overdamped Langevin
diffusion. The paper establishes the well-posedness, non-explosivity, and
ergodicity of LBMJPs under mild conditions. We further explore regularity
properties such as the Feller property and characterise the weak generator of
the process. We then derive conditions for exponential ergodicity via spectral
gaps and establish comparison theorems for different balancing functions. In
particular we show an equivalence between the spectral gaps of
Metropolis--Hastings algorithms and LBMJPs with bounded balancing function, but
show that LBMJPs can exhibit uniform ergodicity on unbounded state spaces when
the balancing function is unbounded, even when the limiting distribution is not
sub-Gaussian. We also establish a diffusion limit for an LBMJP in the small
jump limit, and discuss applications to Monte Carlo sampling and non-reversible
extensions of the processes.
Samuel LivingstoneGiorgos VasdekisGiacomo ZanellaKeywords: Markov Processes, Sampling Algorithms, Mixing Times,
Ergodicity, Markov Chain Monte Carlo, Locally-balanced processes. 31 pages.
31 pageshttp://arxiv.org/abs/2403.03868v32025-04-17T19:05:51Z2024-03-06T17:18:24ZConfidence on the Focal: Conformal Prediction with Selection-Conditional
Coverage Conformal prediction builds marginally valid prediction intervals that cover
the unknown outcome of a randomly drawn test point with a prescribed
probability. However, in practice, data-driven methods are often used to
identify specific test unit(s) of interest, requiring uncertainty
quantification tailored to these focal units. In such cases, marginally valid
conformal prediction intervals may fail to provide valid coverage for the focal
unit(s) due to selection bias. This paper presents a general framework for
constructing a prediction set with finite-sample exact coverage, conditional on
the unit being selected by a given procedure. The general form of our method
accommodates arbitrary selection rules that are invariant to the permutation of
the calibration units, and generalizes Mondrian Conformal Prediction to
multiple test units and non-equivariant classifiers. We also work out
computationally efficient implementation of our framework for a number of
realistic selection rules, including top-K selection, optimization-based
selection, selection based on conformal p-values, and selection based on
properties of preliminary conformal prediction sets. The performance of our
methods is demonstrated via applications in drug discovery and health risk
prediction.
Ying JinZhimei RenForthcoming at Journal of the Royal Statistical Society Series Bhttp://arxiv.org/abs/2504.13273v12025-04-17T18:32:49Z2025-04-17T18:32:49ZHow Much Weak Overlap Can Doubly Robust T-Statistics Handle? In the presence of sufficiently weak overlap, it is known that no regular
root-n-consistent estimators exist and standard estimators may fail to be
asymptotically normal. This paper shows that a thresholded version of the
standard doubly robust estimator is asymptotically normal with well-calibrated
Wald confidence intervals even when constructed using nonparametric estimates
of the propensity score and conditional mean outcome. The analysis implies a
cost of weak overlap in terms of black-box nuisance rates, borne when the
semiparametric bound is infinite, and the contribution of outcome smoothness to
the outcome regression rate, which is incurred even when the semiparametric
bound is finite. As a byproduct of this analysis, I show that under weak
overlap, the optimal global regression rate is the same as the optimal
pointwise regression rate, without the usual polylogarithmic penalty. The
high-level conditions yield new rules of thumb for thresholding in practice. In
simulations, thresholded AIPW can exhibit moderate overrejection in small
samples, but I am unable to reject a null hypothesis of exact coverage in large
samples. In an empirical application, the clipped AIPW estimator that targets
the standard average treatment effect yields similar precision to a heuristic
10% fixed-trimming approach that changes the target sample.
Jacob Dornhttp://arxiv.org/abs/2504.13124v12025-04-17T17:41:05Z2025-04-17T17:41:05ZSpatial Confidence Regions for Excursion Sets with False Discovery Rate
Control Identifying areas where the signal is prominent is an important task in image
analysis, with particular applications in brain mapping. In this work, we
develop confidence regions for spatial excursion sets above and below a given
level. We achieve this by treating the confidence procedure as a testing
problem at the given level, allowing control of the False Discovery Rate (FDR).
Methods are developed to control the FDR, separately for positive and negative
excursions, as well as jointly over both. Furthermore, power is increased by
incorporating a two-stage adaptive procedure. Simulation results with various
signals show that our confidence regions successfully control the FDR under the
nominal level. We showcase our methods with an application to functional
magnetic resonance imaging (fMRI) data from the Human Connectome Project
illustrating the improvement in statistical power over existing approaches.
Howon RyuThomas Maullin-SapeyArmin SchwartzmanSamuel Davenporthttp://arxiv.org/abs/2504.12989v12025-04-17T14:54:00Z2025-04-17T14:54:00ZQuery Complexity of Classical and Quantum Channel Discrimination Quantum channel discrimination has been studied from an information-theoretic
perspective, wherein one is interested in the optimal decay rate of error
probabilities as a function of the number of unknown channel accesses. In this
paper, we study the query complexity of quantum channel discrimination, wherein
the goal is to determine the minimum number of channel uses needed to reach a
desired error probability. To this end, we show that the query complexity of
binary channel discrimination depends logarithmically on the inverse error
probability and inversely on the negative logarithm of the (geometric and
Holevo) channel fidelity. As a special case of these findings, we precisely
characterize the query complexity of discriminating between two classical
channels. We also provide lower and upper bounds on the query complexity of
binary asymmetric channel discrimination and multiple quantum channel
discrimination. For the former, the query complexity depends on the geometric
R\'enyi and Petz R\'enyi channel divergences, while for the latter, it depends
on the negative logarithm of (geometric and Uhlmann) channel fidelity. For
multiple channel discrimination, the upper bound scales as the logarithm of the
number of channels.
Theshani NuradhaMark M. Wilde22 pages; see also the independent work "Sampling complexity of
quantum channel discrimination" DOI 10.1088/1572-9494/adcb9ehttp://arxiv.org/abs/2501.05803v32025-04-17T12:23:46Z2025-01-10T09:10:30ZTest-time Alignment of Diffusion Models without Reward Over-optimization Diffusion models excel in generative tasks, but aligning them with specific
objectives while maintaining their versatility remains challenging. Existing
fine-tuning methods often suffer from reward over-optimization, while
approximate guidance approaches fail to optimize target rewards effectively.
Addressing these limitations, we propose a training-free, test-time method
based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target
distribution. Our approach, tailored for diffusion sampling and incorporating
tempering techniques, achieves comparable or superior target rewards to
fine-tuning methods while preserving diversity and cross-reward generalization.
We demonstrate its effectiveness in single-reward optimization, multi-objective
scenarios, and online black-box optimization. This work offers a robust
solution for aligning diffusion models with diverse downstream objectives
without compromising their general capabilities. Code is available at
https://github.com/krafton-ai/DAS.
Sunwoo KimMinkyu KimDongmin ParkICLR 2025 (Spotlight). The Thirteenth International Conference on
Learning Representations. 2025http://arxiv.org/abs/2406.05714v62025-04-17T12:18:41Z2024-06-09T10:12:08ZA conversion theorem and minimax optimality for continuum contextual
bandits We study the contextual continuum bandits problem, where the learner
sequentially receives a side information vector and has to choose an action in
a convex set, minimizing a function associated with the context. The goal is to
minimize all the underlying functions for the received contexts, leading to the
contextual notion of regret, which is stronger than the standard static regret.
Assuming that the objective functions are $\gamma$-H\"older with respect to the
contexts, $0<\gamma\le 1,$ we demonstrate that any algorithm achieving a
sub-linear static regret can be extended to achieve a sub-linear contextual
regret. We prove a static-to-contextual regret conversion theorem that provides
an upper bound for the contextual regret of the output algorithm as a function
of the static regret of the input algorithm. We further study the implications
of this general result for three fundamental cases of dependency of the
objective function on the action variable: (a) Lipschitz bandits, (b) convex
bandits, (c) strongly convex and smooth bandits. For Lipschitz bandits and
$\gamma=1,$ combining our results with the lower bound of Slivkins (2014), we
prove that the minimax optimal contextual regret for the noise-free adversarial
setting is achieved. Then, we prove that in the presence of noise, the
contextual regret rate as a function of the number of queries is the same for
convex bandits as it is for strongly convex and smooth bandits. Lastly, we
present a minimax lower bound, implying two key facts. First, obtaining a
sub-linear contextual regret may be impossible over functions that are not
continuous with respect to the context. Second, for convex bandits and strongly
convex and smooth bandits, the algorithms that we propose achieve, up to a
logarithmic factor, the minimax optimal rate of contextual regret as a function
of the number of queries.
Arya AkhavanKarim LouniciMassimiliano PontilAlexandre B. Tsybakovhttp://arxiv.org/abs/2504.12872v12025-04-17T12:00:03Z2025-04-17T12:00:03ZOn perfect sampling: ROCFTP with Metropolis-multishift coupler ROCFTP is a perfect sampling algorithm that employs various random
operations, and requiring a specific Markov chain construction for each target.
To overcome this requirement, the Metropolis algorithm is incorporated as a
random operation within ROCFTP. While the Metropolis sampler functions as a
random operation, it isn't a coupler. However, by employing normal multishift
coupler as a symmetric proposal for Metropolis, we obtain ROCFTP with
Metropolis-multishift. Initially designed for bounded state spaces, ROCFTP's
applicability to targets with unbounded state spaces is extended through the
introduction of the Most Interest Range (MIR) for practical use. It was
demonstrated that selecting MIR decreases the likelihood of ROCFTP hitting
$MIR^C$ by a factor of (1 - {\epsilon}), which is beneficial for practical
implementation. The algorithm exhibits a convergence rate characterized by
exponential decay. Its performance is rigorously evaluated across various
targets, and tests ensure its goodness of fit. Lastly, an R package is provided
for generating exact samples using ROCFTP Metropolis-multishift.
Majid Nabipoorhttp://arxiv.org/abs/2407.05997v22025-04-17T11:47:45Z2024-07-08T14:47:03ZOn the differentiability of $φ$-projections in the discrete finite
case In the case of finite measures on finite spaces, we state conditions under
which {\phi}- projections are continuously differentiable. When the set on
which one wishes to {\phi}- project is convex, we show that the required
assumptions are implied by easily verifiable conditions. In particular, for
input probability vectors and a rather large class of {\phi}-divergences, we
obtain that {\phi}-projections are continuously differentiable when projecting
on a set defined by linear equalities. The obtained results are applied to
{\phi}- projection estimators (that is, minimum {\phi}-divergence estimators).
A first application, rooted in robust statistics, concerns the computation of
the influence functions of such estimators. In a second set of applications, we
derive their asymptotics when projecting on parametric sets of probability
vectors, on sets of probability vectors generated from distributions with
certain moments fixed and on Fr\'echet classes of bivariate probability arrays.
The resulting asymptotics hold whether the element to be {\phi}-projected
belongs to the set on which one wishes to {\phi}-project or not.
Gery GeenensIvan KojadinovicTommaso Martini33 pages, 3 figures, 1 tablehttp://arxiv.org/abs/2406.07066v22025-04-17T11:18:59Z2024-06-11T08:50:55ZInferring the dependence graph density of binary graphical models in
high dimension We consider a system of binary interacting chains describing the dynamics of
a group of $N$ components that, at each time unit, either send some signal to
the others or remain silent otherwise. The interactions among the chains are
encoded by a directed Erd\"os-R\'enyi random graph with unknown parameter $ p
\in (0, 1) .$ Moreover, the system is structured within two populations
(excitatory chains versus inhibitory ones) which are coupled via a mean field
interaction on the underlying Erd\"os-R\'enyi graph. In this paper, we address
the question of inferring the connectivity parameter $p$ based only on the
observation of the interacting chains over $T$ time units. In our main result,
we show that the connectivity parameter $p$ can be estimated with rate
$N^{-1/2}+N^{1/2}/T+(\log(T)/T)^{1/2}$ through an easy-to-compute estimator.
Our analysis relies on a precise study of the spatio-temporal decay of
correlations of the interacting chains. This is done through the study of
coalescing random walks defining a backward regeneration representation of the
system. Interestingly, we also show that this backward regeneration
representation allows us to perfectly sample the system of interacting chains
(conditionally on each realization of the underlying Erd\"os-R\'enyi graph)
from its stationary distribution. These probabilistic results have an interest
in its own.
Julien ChevallierEva LöcherbachGuilherme Ost85 pages, 2 figureshttp://arxiv.org/abs/2406.04071v22025-04-17T10:38:48Z2024-06-06T13:36:41ZDynamic angular synchronization under smoothness constraints Given an undirected measurement graph $\mathcal{H} = ([n], \mathcal{E})$, the
classical angular synchronization problem consists of recovering unknown angles
$\theta_1^*,\dots,\theta_n^*$ from a collection of noisy pairwise measurements
of the form $(\theta_i^* - \theta_j^*) \mod 2\pi$, for all $\{i,j\} \in
\mathcal{E}$. This problem arises in a variety of applications, including
computer vision, time synchronization of distributed networks, and ranking from
pairwise comparisons. In this paper, we consider a dynamic version of this
problem where the angles, and also the measurement graphs evolve over $T$ time
points. Assuming a smoothness condition on the evolution of the latent angles,
we derive three algorithms for joint estimation of the angles over all time
points. Moreover, for one of the algorithms, we establish non-asymptotic
recovery guarantees for the mean-squared error (MSE) under different
statistical models. In particular, we show that the MSE converges to zero as
$T$ increases under milder conditions than in the static setting. This includes
the setting where the measurement graphs are highly sparse and disconnected,
and also when the measurement noise is large and can potentially increase with
$T$. We complement our theoretical results with experiments on synthetic data.
Ernesto ArayaMihai CucuringuHemant Tyagi42 pages, 9 figures. Corrected typos and added clarifications, as per
the suggestions of reviewers. Added Remarks 4,5 and Algorithm 4 (which is
same as Algorithm 3 but with TRS relaced by a spectral method). Accepted in
JMLRhttp://arxiv.org/abs/1805.10721v42025-04-17T04:57:37Z2018-05-28T01:00:07ZBernstein's inequalities for general Markov chains We establish Bernstein's inequalities for functions of general
(general-state-space and possibly non-reversible) Markov chains. These
inequalities achieve sharp variance proxies and encompass the classical
Bernstein inequality for independent random variables as special cases. The key
analysis lies in bounding the operator norm of a perturbed Markov transition
kernel by the exponential of sum of two convex functions. One coincides with
what delivers the classical Bernstein inequality, and the other reflects the
influence of the Markov dependence. A convex analysis on these two functions
then derives our Bernstein inequalities. As applications, we apply our
Bernstein inequalities to the Markov chain Monte Carlo integral estimation
problem and the robust mean estimation problem with Markov-dependent samples,
and achieve tight deviation bounds that previous inequalities can not.
Bai JiangQiang SunJianqing Fan32 pages including referenceshttp://arxiv.org/abs/2504.12615v12025-04-17T03:39:52Z2025-04-17T03:39:52ZShrinkage priors for circulant correlation structure models We consider a new statistical model called the circulant correlation
structure model, which is a multivariate Gaussian model with unknown covariance
matrix and has a scale-invariance property. We construct shrinkage priors for
the circulant correlation structure models and show that Bayesian predictive
densities based on those priors asymptotically dominate Bayesian predictive
densities based on Jeffreys priors under the Kullback-Leibler (KL) risk
function. While shrinkage of eigenvalues of covariance matrices of Gaussian
models has been successful, the proposed priors shrink a non-eigenvalue part of
covariance matrices.
Michiko OkudoTomonari Seihttp://arxiv.org/abs/2406.19619v32025-04-17T00:28:27Z2024-06-28T03:02:25ZScoreFusion: Fusing Score-based Generative Models via Kullback-Leibler
Barycenters We introduce ScoreFusion, a theoretically grounded method for fusing multiple
pre-trained diffusion models that are assumed to generate from auxiliary
populations. ScoreFusion is particularly useful for enhancing the generative
modeling of a target population with limited observed data. Our starting point
considers the family of KL barycenters of the auxiliary populations, which is
proven to be an optimal parametric class in the KL sense, but difficult to
learn. Nevertheless, by recasting the learning problem as score matching in
denoising diffusion, we obtain a tractable way of computing the optimal KL
barycenter weights. We prove a dimension-free sample complexity bound in total
variation distance, provided that the auxiliary models are well-fitted for
their own task and the auxiliary tasks combined capture the target well. The
sample efficiency of ScoreFusion is demonstrated by learning handwritten
digits. We also provide a simple adaptation of a Stable Diffusion denoising
pipeline that enables sampling from the KL barycenter of two auxiliary
checkpoints; on a portrait generation task, our method produces faces that
enhance population heterogeneity relative to the auxiliary distributions.
Hao LiuJunze Tony YeJose BlanchetNian Si41 pages, 21 figures. Accepted as an Oral (top 2%) paper by AISTATS
2025