https://arxiv.org/api/v8r3nDY+jwaNNr6s9LulV6w5ZLc 2026-03-20T21:41:16Z 34634 105 15 http://arxiv.org/abs/2603.15182v1 Sequential Transport for Causal Mediation Analysis 2026-03-16T12:18:35Z We propose sequential transport (ST), a distributional framework for mediation analysis that combines optimal transport (OT) with a mediator directed acyclic graph (DAG). Instead of relying on cross-world counterfactual assumptions, ST constructs unit-level mediator counterfactuals by minimally transporting each mediator, either marginally or conditionally, toward its distribution under an alternative treatment while preserving the causal dependencies encoded by the DAG. For numerical mediators, ST uses monotone (conditional) OT maps based on conditional CDF/quantile estimators; for categorical mediators, it extends naturally via simplex-based transport. We establish consistency of the estimated transport maps and of the induced unit-level decompositions into mutatis mutandis direct and indirect effects under standard regularity and support conditions. When the treatment is randomized or ignorable (possibly conditional on covariates), these decompositions admit a causal interpretation; otherwise, they provide a principled distributional attribution of differences between groups aligned with the mediator structure. Gaussian examples show that ST recovers classical mediation formulas, while additional simulations confirm good performance in nonlinear and mixed-type settings. An application to the COMPAS dataset illustrates how ST yields deterministic, DAG-consistent counterfactual mediators and a fine-grained mediator-level attribution of disparities. 2026-03-16T12:18:35Z Agathe Fernandes-Machado Iryna Voitsitska Arthur Charpentier Ewen Gallic http://arxiv.org/abs/2603.15149v1 Measuring the depth of multidimensional poverty with ordinal data 2026-03-16T11:43:14Z This paper proposes a positional poverty gap measure of multidimensional poverty within the Alkire-Foster counting framework. The measure captures the depth of deprivations even when indicators are ordinal, unlike the standard poverty gap, which requires cardinal variables. The proposed method draws on the fuzzy set literature and introduces a distribution-based measure of deprivation depth using the empirical cumulative distribution of each indicator, with the most deprived group as the benchmark. For each deprived individual, the method assigns a score based on the individual's relative position in the distribution. Depth is thus expressed as a difference in distributional positions, motivating the label positional poverty gap. The paper demonstrates that this measure preserves the identification and aggregation structure of the counting approach and satisfies its axiomatic properties when the reference distribution remains fixed over time. The framework remains flexible because it accommodates different identification rules, deprivation cutoffs, and variable types. Overall, it offers a simple, meaningful, and theoretically grounded way to incorporate depth into multidimensional poverty measurement with ordinal data. 2026-03-16T11:43:14Z Fernando Flores Tavares http://arxiv.org/abs/2603.15082v1 Identifying Topological Differences in Two Populations of Random Geometric Objects 2026-03-16T10:37:04Z We propose a statistical framework to identify topological differences in two populations of random geometric objects. The proposed framework involves first associating a topological signature with random geometric objects and then performing a two-sample test using the observed topological signatures. We associate persistence barcodes, a topological signature from topological data analysis, with each observed random geometric object. This, in turn, yields a two-sample problem on the space of persistence barcodes. As the space of persistence barcodes is not suitable for standard statistical analysis, we translate the two-sample problem on a suitable subset of a Euclidean space. In the course of this study, we embed the topological signatures in an ordered convex cone in a Euclidean space using functions from tropical geometry. We show that the embedding is a sufficient statistic for the persistence barcodes. This fact leads to the proposal of a two-sample test based on this sufficient statistic, and its equivalence to the two-sample problem on the barcode space is established. Finally, the consistency of the proposed test is studied. 2026-03-16T10:37:04Z Satish Kumar Subhra Sankar Dhar http://arxiv.org/abs/2512.19398v2 A Reduced Basis Decomposition Approach to Efficient Data Collection in Pairwise Comparison Studies 2026-03-16T10:18:28Z Comparative judgement studies elicit quality assessments through pairwise comparisons, typically analysed using the Bradley-Terry model. A challenge in these studies is experimental design, specifically, determining the optimal pairs to compare to maximize statistical efficiency. Constructing static experimental designs for these studies requires spectral decomposition of a covariance matrix over pairs of pairs, which becomes computationally infeasible for studies with more than approximately 150 objects. We propose a scalable method based on reduced basis decomposition that bypasses explicit construction of this matrix, achieving computational savings of two to three orders of magnitude. We establish eigenvalue bounds guaranteeing approximation quality and characterise the rank structure of the design matrix. Simulations demonstrate speedup factors exceeding 100 for studies with 64 or more objects, with negligible approximation error. We apply the method to construct designs for a 452-region spatial study in under 7 minutes and enable real-time design updates for classroom peer assessment, reducing computation time from 15 minutes to 15 seconds. 2025-12-22T13:48:53Z Author Accepted Manuscript Jiahua Jiang Joseph Marsh Rowland G Seymour http://arxiv.org/abs/2603.14984v1 Spatiotemporally Consistent Multivariate Bias Correction for Climate Projections via Nested Vine Copulas 2026-03-16T08:47:56Z Climate models are essential for understanding large-scale climate dynamics and long-term climate change, yet they exhibit systematic biases when compared with historical observations. Existing multivariate bias correction (MBC) approaches do not explicitly handel spatiotemporal dependence. However, preserving both spatiotemporal and inter-variable consistency is essential for realistic climate dynamics and reliable regional impact assessments. To address this gap, we propose a novel MBC method called GN-VBC that uses generalized additive models (GAMs) to disentangle spatiotemporal deterministic effects from stochastic residuals. To model joint distributions and dependencies across variables and locations, we introduce nsted vine copulas (NVCs), a hierarchical vine merging strategy. NVC in the context of MBC combines two dependence levels: (i) spatial dependence across locations, modeled separately for each variable, and (ii) inter-variable dependence modeled at a selected reference location, which links the spatial models into a coherent multivariate and spatial structure. An application to Switzerland shows improvements in preserving inter-variable, spatial and temporal dependence across a wide range of evaluation metrics. 2026-03-16T08:47:56Z 58 pages, 15 figures, 7 tables Theresa Meier Erwan Koch Valérie Chavez-Demoulin Thibault Vatter http://arxiv.org/abs/2603.06134v2 Clustering-Based Outcome Models for Clinical Studies: A Scoping Review 2026-03-16T08:33:03Z This review provides a systematic overview of methods that combine covariate-based clustering of observational units (patients) with outcome models for clinical studies. We distinguish between informed-cluster models, where the outcome contributes to cluster formation, and agnostic-cluster models, where clustering is performed solely on covariates in a separate first step. Informed-cluster models include product partition models with covariates (PPMx), finite mixtures of regression models (FMR), and cluster-aware supervised learning (CluSL). Agnostic-cluster models encompass two-step procedures using either model-based or algorithmic clustering followed by cluster-specific regression models. Following a systematic search of Web of Science and PubMed, 55 records were identified that propose or evaluate such models. We describe the key models, summarise study characteristics, and present applications from biomedical and public health research. Clustering-based outcome models are particularly relevant for settings with high-dimensional covariates (e.g., biomarker panels and "omics") and heterogeneous patient populations. These models can support risk stratification and we discuss extensions to estimate subgroup-specific treatment effects. They are most valuable when the population is clustered in distinct regions of the covariate space that correspond to different outcome distributions. We discuss applications to rare disease research, covariate adjustment and borrowing from historical data, and subgroup-specific treatment effect estimation in clinical trials. 2026-03-06T10:41:52Z Johannes Vilsmeier Fabian Eibensteiner Franz König Francois Mercier Robin Ristl Nigel Stallard Marc Vandemeulebroecke Sarah Zohar Martin Posch http://arxiv.org/abs/2603.11867v2 Data Fusion with Distributional Equivalence Test-then-pool 2026-03-16T08:30:10Z Randomized controlled trials (RCTs) are the gold standard for causal inference, yet practical constraints often limit the size of the concurrent control arm. Borrowing control data from previous trials offers a potential efficiency gain, but naive borrowing can induce bias when historical and current populations differ. Existing test-then-pool (TTP) procedures address this concern by testing for equality of control outcomes between historical and concurrent trials before borrowing; however, standard implementations may suffer from reduced power or inadequate control of the Type-I error rate. We develop a new TTP framework that fuses control arms while rigorously controlling the Type-I error rate of the final treatment effect test. Our method employs kernel two-sample testing via maximum mean discrepancy (MMD) to capture distributional differences, and equivalence testing to avoid introducing uncontrolled bias, providing a more flexible and informative criterion for pooling. To ensure valid inference, we introduce partial bootstrap and partial permutation procedures for approximating null distributions in the presence of heterogeneous controls. We further establish the overall validity and consistency. We provide empirical studies demonstrating that the proposed approach achieves higher power than standard TTP methods while maintaining nominal error control, highlighting its value as a principled tool for leveraging historical controls in modern clinical trials. 2026-03-12T12:38:35Z Linying Yang Xing Liu Robin J. Evans http://arxiv.org/abs/2511.02660v2 Spectral analysis of high-dimensional spot volatility matrix with applications 2026-03-16T08:00:00Z In random matrix theory, the spectral distribution of the covariance matrix has been well studied under the large dimensional asymptotic regime when the dimensionality and the sample size tend to infinity at the same rate. However, most existing theories are built upon the assumption of independent and identically distributed samples, which may be violated in practice. For example, the observational data of continuous-time processes at discrete time points, namely, the high-frequency data. In this paper, we extend the classical spectral analysis for the covariance matrix in large dimensional random matrix to the spot volatility matrix by using the high-frequency data. We establish the first-order limiting spectral distribution and obtain a second-order result, that is, the central limit theorem for linear spectral statistics. Moreover, we apply the results to design some feasible tests for the spot volatility matrix, including the identity and sphericity tests. Simulation studies justify the finite sample performance of the test statistics and verify our established theory. 2025-11-04T15:37:39Z Qiang Liu Yiming Liu Zhi Liu Wang Zhou http://arxiv.org/abs/2603.14815v1 On Heterogeneity in Wasserstein Space 2026-03-16T04:33:07Z Data represented by probability measures arise as empirical distributions, posterior distributions, and feature-based representations of complex objects. We study heterogeneity in a population of probability measures through the expected value of a chosen transform of the pairwise Wasserstein distance. The resulting estimator is unbiased and, under simple moment conditions on the population law, is strongly consistent, asymptotically normal, and equipped with a consistent standard error. This also yields a simple comparison of two populations and remains stable under plug-in approximation when the measures are estimated. The associated empirical eccentricities identify the observations that contribute most strongly to heterogeneity within a sample. 2026-03-16T04:33:07Z Kisung You http://arxiv.org/abs/2603.14752v1 Prior- and likelihood-free probabilistic inference with finite-sample calibration guarantees 2026-03-16T02:36:30Z Motivated by parametric models for which the likelihood is analytically unavailable, numerically unstable, or prohibitively expensive to compute or optimize, we develop a prior- and likelihood-free framework for fully probabilistic (Bayesian-like) uncertainty quantification with finite-sample calibration guarantees. Our method, a type of inferential model, produces data-dependent degrees of belief about claims concerning the unknown parameter while controlling the frequency with which high belief is assigned to false claims, even in finite-sample settings. Our procedure is general in that it requires only the ability to simulate from the model. We first rank candidate parameter values according to how well data simulated from the model agree with the observed data, and then rescale these rankings in a way that yields the desired finite-sample calibration guarantees. The key idea is to employ a permutation-invariant function, such as a depth function, to rank parameter values. We show that such a choice yields closed-form calibration rescaling calculations, making the procedure computationally simple. We illustrate our method's broad appeal with four examples, including differential privacy and Ising models. An analysis of the spatial configuration of 2025 measles outbreaks in the U.S. showcases our method's practical advantages. 2026-03-16T02:36:30Z 26 pages, 6 Figures Leonardo Cella Emily C. Hector http://arxiv.org/abs/2603.14676v1 Scalable Text-Embedding-informed Cognitive Diagnosis of Large Language Models 2026-03-16T00:14:47Z Large language models (LLMs) have achieved remarkable performance on diverse benchmarks, yet existing evaluation practices largely rely on coarse summary metrics that obscure underlying reasoning abilities. In this work, we propose novel methodologies to adapt cognitive diagnosis models (CDMs) in psychometrics to LLM evaluation, enabling fine-grained diagnosis via multidimensional discrete capability profiles and interpretable characterizations of LLM strengths and weaknesses. First, to enable CDM-based evaluation at benchmark scale (more than 1000 items), we propose a scalable method that jointly estimates LLM mastery profiles and the item-attribute Q-matrix, addressing key challenges posed by high-dimensional latent attributes (K > 20), large item pools, and the prohibitive computational cost of existing marginal maximum likelihood-based estimation. Second, we incorporate item-level textual information to construct AI-embedding-informed priors for the Q-matrix, stabilizing high-dimensional estimation while reducing reliance on costly human specification. We develop an efficient stochastic-approximation algorithm to jointly estimate LLM mastery profiles and the Q-matrix that balances data fit with text-embedding-informed priors. Simulation studies demonstrate accurate parameter recovery. An application to the MATH Level 5 benchmark illustrates the practical utility of our method for LLM evaluation and uncovers useful insights into LLMs' fine-grained capabilities. 2026-03-16T00:14:47Z 34 pages of main text, 12 pages of appendix, 7 figures Jia Liu Zhiyu Xu Yuqi Gu http://arxiv.org/abs/2502.09806v2 Two-Sided Prioritized Ranking: A Coherency-Preserving Design for Marketplace Experiments 2026-03-15T23:10:11Z Online marketplaces frequently run pricing experiments in environments where users choose from a list of items. In these settings, items compete for users' limited attention and demand, creating interference among items within a list: Changing prices for any item can affect the demand for others, biasing estimates from item-level A/B tests. Besides, a key consideration in pricing experiments is preserving platform coherency across prices and item availability. This requirement rules out experimental designs such as user-level A/B tests as they violate platform coherency. We propose Two-Sided Prioritized Ranking (TSPR) to estimate the total average treatment effect of price changes in such settings. TSPR exploits position bias in ranked search results to create variation in treatment exposure without compromising coherency. TSPR randomizes both users and items and reorders ranked lists, prioritizing treated items for one group of users and untreated items for the other. All users see the same items at consistent prices, but differ in exposure to treatment as they pay disproportionate attention across ranks. In semi-synthetic simulations based on Expedia hotel search data, TSPR outperforms baseline coherency-preserving experiment designs by reducing estimation bias and providing sufficient statistical power. 2025-02-13T22:48:09Z New version with revisions and updated title Mahyar Habibi Zahra Khanalizadeh Negar Ziaeian http://arxiv.org/abs/2603.10886v2 Kernel Tests of Equivalence 2026-03-15T21:47:48Z We propose novel kernel-based tests for assessing the equivalence between distributions. Traditional goodness-of-fit testing is inappropriate for concluding the absence of distributional differences, because failure to reject the null hypothesis may simply be a result of lack of test power, also known as the Type-II error. This motivates \emph{equivalence testing}, which aims to assess the \emph{absence} of a statistically meaningful effect under controlled error rates. However, existing equivalence tests are either limited to parametric distributions or focus only on specific moments rather than the full distribution. We address these limitations using two kernel-based statistical discrepancies: the \emph{kernel Stein discrepancy} and the \emph{Maximum Mean Discrepancy}. The null hypothesis of our proposed tests assumes the candidate distribution differs from the nominal distribution by at least a pre-defined margin, which is measured by these discrepancies. We propose two approaches for computing the critical values of the tests, one using an asymptotic normality approximation, and another based on bootstrapping. Numerical experiments are conducted to assess the performance of these tests. 2026-03-11T15:30:57Z 29 pages; 6 figures Xing Liu Axel Gandy http://arxiv.org/abs/2401.13208v3 Assessing Influential Observations in Pain Prediction using fMRI Data 2026-03-15T20:43:36Z Neuroimaging data allows researchers to model the relationship between multivariate patterns of brain activity and outcomes related to mental states and behaviors. However, the existence of outlying participants can potentially undermine the generalizability of these models and jeopardize the validity of downstream statistical analysis. To date, the ability to detect and account for participants unduly influencing various model selection approaches have been sorely lacking. Motivated by a task-based functional magnetic resonance imaging (fMRI) study of thermal pain, we propose and establish the asymptotic distribution for a diagnostic measure applicable to a number of different model selectors. A high-dimensional clustering procedure is further combined with this measure to detect multiple influential observations. In a series of simulations, our proposed method demonstrates clear advantages over existing methods in terms of improved detection performance, leading to enhanced predictive and variable selection outcomes. Application of our method to data from the thermal pain study illustrates the influence of outlying participants, in particular with regards to differences in activation between low and intense pain conditions. This allows for the selection of an interpretable model with high prediction power after removal of the detected observations. Though inspired by the fMRI-based thermal pain study, our methods are broadly applicable to other high-dimensional data types. 2024-01-24T03:38:55Z 6 figures Dongliang Zhang Masoud Asgharian Martin A. Lindquist http://arxiv.org/abs/2412.02945v2 Detection of Multiple Influential Observations on Model Selection 2026-03-15T19:50:15Z Outlying observations are frequently encountered across a wide spectrum of scientific domains, posing notable challenges to the generalizability of statistical models and the reproducibility of downstream analysis. They are identified through influential diagnostics, which aim to capture observations that unduly bias model estimation. To date, methods for identifying observations that influence the selection of a stochastically chosen submodel have been underdeveloped, especially in the high-dimensional setting where the number of predictors $p$ exceeds the sample size $n$. Recently we proposed an improved diagnostic measure to handle this setting. However, its distributional properties and approximations have not yet been explored. To address this shortcoming, we revisit the notion of exchangeability to determine the exact asymptotic distribution of our assessment measure. This foundation enables the introduction of theoretically supported parametric and nonparametric approaches for distributional approximation and derivation of thresholds for outlier identification. The resulting framework is further extended to logistic regression models and evaluated by comprehensive simulation studies comparing the performance of various detection methods. Finally, the framework is applied to data from a task-based fMRI study of thermal pain, with the goal of identifying outliers that distort the formulation of the statistical model using functional brain activity to predict physical pain ratings. Both linear and logistic models are used to demonstrate the benefits of detection and compare the performance of different detection procedures. In particular, we identify two influential observations that were not detected in prior studies 2024-12-04T01:22:18Z 3 figures Dongliang Zhang Masoud Asgharian Martin A. Lindquist