https://arxiv.org/api/yU4cr/zYWWDPShUBCacrIYF6DMk 2026-06-13T15:00:40Z 23522 60 15 http://arxiv.org/abs/2408.02122v2 Graph-Enabled Efficient Federated Bayesian Modeling 2026-06-08T02:19:29Z

Federated Bayesian modeling requires combining evidence from distributed users into a coherent global posterior while keeping users' raw data on-device. We propose Federated Latent Graph MCMC (FLaG-MCMC), a computationally efficient framework for federated learning in which historical posterior samples of a shared global parameter are encoded into a learned low-dimensional latent space, connected via a $k$-nearest-neighbor graph, and transferred sequentially to new users as a nonparametric prior. Each user runs graph-based MCMC in the latent space guided by their own likelihood, returns updated global samples to the server, and retains local latent variables on-device. We demonstrate FLaG-MCMC on Bayesian meta-analysis for opioid use disorder prevalence estimation and on federated topic modeling, where the federated posterior closely approximates the pooled full-data posterior for both global parameters and local user-level inference.

2024-08-04T19:37:09Z 20 pages, 7 figures Chenyang Zhong Shouxuan Ji Tian Zheng http://arxiv.org/abs/2606.08923v1 Scalable Network-Aware Experiment Design for Two-Sided Marketplaces 2026-06-08T02:01:30Z

Measuring causal effects in networked two-sided marketplaces is challenging due to treatment interference between market participants on different sides. When treatment is applied to one side (e.g., job seekers), their interactions with the other side (e.g., job posters) introduce spillover effects that violate the Stable Unit Treatment Value Assumption (SUTVA) and bias causal estimates. While cluster-based randomization mitigates this problem, prior approaches struggle with a fundamental trade-off: reducing spillover requires isolated clusters that will reduce the number of qualifying clusters, which decreases statistical power. This paper introduces EgoCluster V3, an iterative clustering algorithm that reduces spillover by 3x compared to prior versions while preserving node coverage and doubling test power. We further introduce MultiEgoCluster, which extends V3 through a two-stage procedure that first groups highly connected egos into multi-ego clusters before applying the iterative clustering algorithm. This achieves an additional ~56% spillover reduction and ~38% increase in sample size. Both methods are deployed in production at LinkedIn and have systematically enabled high-impact two-sided marketplace experiments. Since residual bias cannot be fully eliminated through clustering alone, we derive a theoretical bias correction method for average treatment effect (ATE) estimation based on graph structure and propose an approach to generalize results to the general population.

2026-06-08T02:01:30Z Yi Su Zhen Yan 10.1145/3770855.3818478 http://arxiv.org/abs/2606.09941v1 Stochastic weather generators for high-frequency wind vector time series 2026-06-08T00:42:51Z

Surface winds can vary substantially from one minute to the next, so there is scope for studying its variation on this fine time scale. Restricting to the month of June to minimize seasonality, this work develops a range of machine learning models for generating realistic time series of surface wind vectors at a site in Lamont, Oklahoma based on more than 30 years of high quality measurements at the minute time scale. Such a generator could be used as an input into models from a range of disciplines, notably for wind energy, but also wildfire spread and aviation, among others. The data show complex diurnal structures in both wind speed and direction that would be challenging to capture with standard time series models, so we consider a number of machine learning approaches to producing a stochastic wind generator based on time vector-quantized variational autoencoders. We consider generating a day's worth of data at a time and generating a day of wind vectors conditional on the previous day's winds. We also study methods for incorporating a discrete weather state variable in the generator. We evaluate the generators using a wide range of formal and informal methods. The best of these generators can capture many but not all of the complex features present in the observational data. In particular, the best of our approaches accurately mimic diurnal changes in wind volatility but struggle to match the observed distribution of extreme wind speeds.

2026-06-08T00:42:51Z Mingshi Cui Kevin Eng Justin T. Greene Zern Ke Abolfazl Sodagartojgi Zhiqiu Xia Gemma E. Moran Michael L. Stein http://arxiv.org/abs/2606.08693v1 An exploration into how susceptibility distribution misspecifications impact epidemic forecasting 2026-06-07T15:46:44Z

Heterogeneous susceptibility models for epidemic dynamics preferentially assume that individual susceptibility follows a gamma distribution, which permits analytical reduction to a low-dimensional system. However, the true empirical distributional form in any given population is unknown. Here we investigate the consequences of misspecifying the susceptibility distribution by comparing gamma and lognormal specifications in a Susceptible-Exposed-Infectious-Removed (SEIR) framework. When both distributions are matched on mean and coefficient of variation ($ν$), we find that their epidemic trajectories diverge once heterogeneity is moderate or high ($ν\gtrsim 1$), with the lognormal producing a later, larger peak and a greater final size. We then assess the impact of distributional misspecification on statistical inference. Using synthetic datasets, we fit correctly specified and misspecified models by maximum likelihood. In a default scenario, where inference is based on simulated data for a single epidemic, both models can reproduce the data by compensating through correlated shifts in heterogeneity and intervention parameters. When inference is based on two simulated epidemics, however, this compensation may be reduced by known constraints of how parameters are related across epidemics. In these cases, the correctly specified model recovers all parameters accurately, while the misspecified model tends to give biased estimates. These inference biases propagate into forecasts, but predictions remain relatively accurate when compared to homogeneous models which more than double peak incidences in scenarios where $ν\approx 1$, for instance. We conclude that deviations resulting from the susceptibility distribution misspecifications assessed here are minor and encourage the adoption of heterogeneous models in future epidemic forecasting.

2026-06-07T15:46:44Z 18 pages, 8 figures, 4 tables Ibrahim Mohammed Chris Robertson M. Gabriela M. Gomes http://arxiv.org/abs/2606.08692v1 Logistic Credibility with Temporal Decay: Extending Bühlmann--Straub for Commercial Lines 2026-06-07T15:45:09Z

Bühlmann--Straub (B-S) credibility assigns each account a weight $Z_i = E_i/(E_i+K)$, where $K$ is a single portfolio-wide ratio. The formula assumes $K$ is the same for every account regardless of size, history length, or volatility, and that recent and older years carry equal weight. On a held-out US commercial auto dataset these assumptions fail: standard B-S applied to 96 companies produces a calibration slope of 29 for small accounts, a signature of severe under-crediting. We propose a joint framework that retains B-S interpretability while addressing these limitations. The credibility weight $Z_i$ is modelled as a logistic function of account characteristics; historical experience is discounted by an EWMA decay parameter $λ$ estimated from the data; and $Z$, $λ$, and the complement are optimised in a single likelihood pass. The framework formally nests Bühlmann--Straub as a special case, admitting a likelihood-ratio test for any proposed extension. On a two-year held-out test set the proposed model restores calibration (slope = 1.00) and reduces exposure-weighted prediction error by 38% (90% bootstrap interval: 26%--50%). A size gradient in the decay rate emerges ($\hatλ\approx 0.6$, $0.84$, $0.13$ for Small, Mid, Large) and replicates qualitatively on Other Liability. A simulation study confirms the mechanisms. The model requires only account-year summaries and delivers three transparent outputs: credibility weight, complement, and recommended renewal rate.

2026-06-07T15:45:09Z 68 pages, 18 figures Jake Morris http://arxiv.org/abs/2606.08660v1 Active Learning with Bayesian Reasoning: A POGIL-Based Pedagogy in Introductory Statistics 2026-06-07T15:02:43Z

We introduce a Process Oriented Guided Inquiry Learning (POGIL)-style activity for teaching Bayesian reasoning in introductory statistics through conditional probability, Bayes' theorem, and belief updating. The activity is self-contained, uses hand-computable probabilities organized in two-way tables, and engages students in structured team roles. We evaluated the activity in four sections of an undergraduate introductory statistics course using a quasi-experimental comparison of POGIL-style and lecture-based instruction for a Bayes' theorem unit. Outcomes included student performance on Bayes' theorem final exam questions and satisfaction with instruction. We used a Bayesian bivariate generalized linear model to compare the two approaches while accounting for major type, gender, and race. The results indicated similar exam performance and similar probabilities of high satisfaction across instructional styles and demographic groups, with considerable uncertainty and no clear evidence of meaningful differences. These findings suggest that the POGIL-style activity performed comparably to lecture-based instruction for this unit while offering an active and classroom-ready way to introduce Bayesian reasoning without requiring difficult computation or simulation. We provide adaptable instructional materials and a reproducible Bayesian analytic framework for evaluating active learning innovations in introductory statistics. Our study supports the feasible inclusion of Bayesian reasoning in introductory courses and may help instructors considering active learning.

2026-06-07T15:02:43Z Cheng-Han Yu Angela Ebeling http://arxiv.org/abs/2606.08654v1 Operator learning for the 2D incompressible Navier-Stokes equations: a conformal prediction approach in the data-scarce regime 2026-06-07T14:49:37Z

In this paper, we propose a perturbation-based conformal prediction framework for uncertainty quantification in operator learning, with a focus on the 2D Navier--Stokes equations. While neural operators provide fast surrogates for expensive PDE solvers, they do not by themselves provide calibrated uncertainty for spatiotemporal field predictions. Our approach wraps a trained Fourier Neural Operator (FNO) with split conformal prediction and constructs the local uncertainty scale by comparing the predictions of two operators trained on nearly identical datasets: one on the original labels and one on labels perturbed by small Gaussian noise. We consider this procedure in the data-scarce regime, where the total label budget is fixed and methods that require a separate uncertainty network must divide training data between multiple models. On the 2D Navier--Stokes benchmark, the perturbation-based method produces substantially narrower conformal bands than existing methods under matched total data budgets while maintaining the target simultaneous coverage. These results suggest that perturbation sensitivity is a practical and sample-efficient uncertainty proxy for conformalized neural operators.

2026-06-07T14:49:37Z Weinan Wang Bowen Gang Hao Deng http://arxiv.org/abs/2606.08642v1 A Practical Framework for Sensitivity Analysis in Externally Controlled Trials: An Illustration with a Bayesian Hybrid Evidence Synthesis Case Study 2026-06-07T14:14:50Z

Externally controlled trials (ECTs), including single-arm studies augmented with historical data and hybrid randomized designs with partial external augmentation, are increasingly used when concurrent randomized controls are infeasible or unethical. Regulatory guidance from the FDA, EMA, and NMPA calls for sensitivity analysis of borrowing assumptions, yet provides no structured template for which analyses to run or how to interpret them together. We propose a three-pillar framework organized around three questions: was the borrowing appropriate, did it contribute meaningful value, and are the conclusions robust to perturbation? The framework comprises eight modular analyses covering heterogeneity diagnostics, source influence, no-borrowing references, effective sample size, prior sensitivity, tipping points, alternative borrowing methods, and structural model sensitivity. It is method-agnostic and applies to both Bayesian and frequentist borrowing in patient-level or hybrid settings. We illustrate the framework using simulated data that mimic a hybrid evidence synthesis from a historical approval of ethnic-bridging submission under a real-world-evidence regulatory pathway. That original analysis combined individual patient data from a global pivotal study and a regional real-world study with aggregate data from two published cohorts, fitted via a Bayesian longitudinal model with ethnic-difference parameters. The worked example provides a reproducible template for sensitivity analysis in ECT submissions.

2026-06-07T14:14:50Z Xuemin Gu Kitty Guo Jane Zhang http://arxiv.org/abs/2602.05553v2 Sensitivity analysis for contamination in egocentric-network randomized trials with interference 2026-06-07T10:51:15Z

Egocentric-Network Randomized Trials (ENRTs) are increasingly used to estimate causal effects under interference when measuring complete sociocentric network data is infeasible. ENRTs rely on egocentric network sampling, where a set of egos is first sampled, and each ego recruits a subset of its neighbors as alters. Treatments are then randomized across egos. While the observed ego-networks are disjoint by design, the underlying population network may contain edges connecting them, leading to contamination. Under a design-based framework, we show that the Horvitz-Thompson estimators of direct and indirect effects are biased whenever contamination is present. To address this, we derive bias-corrected estimators and propose a novel sensitivity analysis framework based on sensitivity parameters representing the probability or expected number of missing edges. This framework is implemented via both grid sensitivity analysis and probabilistic bias analysis, providing researchers with a flexible tool to assess the robustness of the causal estimators to contamination. We apply our methodology to the HIV Prevention Trials Network 037 study, finding that ignoring contamination may lead to underestimation of indirect effects and overestimation of direct effects.

2026-02-05T11:23:23Z Bar Weinstein Daniel Nevo http://arxiv.org/abs/2602.17640v4 huff: A Python package for Market Area Analysis 2026-06-07T08:50:02Z

Market area models, such as the Huff model and its extensions, are widely used to estimate regional market shares and customer flows of retail and service locations. Another, now very common, area of application is the analysis of catchment areas, supply structures and the accessibility of healthcare locations. The huff Python package provides a complete workflow for market area analysis, including data import, construction of origin-destination interaction matrices, basic model analysis, parameter estimation from empirical data, calculation of distance or travel time indicators, and map visualization. Additionally, the package provides several methods of spatial accessibility analysis. The package is modular and object-oriented. It is intended for researchers in economic geography, regional economics, spatial planning, marketing, geoinformation science, and health geography. The software is openly available via the Python Package Index (PyPI) (https://pypi.org/project/huff/); its development and version history are managed in a public GitHub Repository (https://github.com/geowieland/huff_official) and archived at Zenodo (https://doi.org/10.5281/zenodo.18639559).

2026-02-19T18:52:46Z v1.2.1; added references, update of scientific usage and PyPI usage statistics Thomas Wieland http://arxiv.org/abs/2606.08407v1 Topological Effective Connectivity Modeling in Brain Networks 2026-06-07T02:11:09Z

Characterizing directed information flow in brain networks is difficult because neural circuits are full of recurrent feedback loops. Many existing tools for directed dependence assume a directed acyclic graph (DAG) structure to resolve directional ambiguity, and therefore cannot represent these loops. We present a nonparametric, information-theoretic framework that addresses this by coupling the discrete Hodge decomposition with lead-lag mutual information, splitting the resulting edge flow into three orthogonal components: a gradient term capturing hierarchical, feed-forward relationships; a curl term isolating triangle-level feedback loops; and a harmonic term capturing cyclic flow around topological holes. This separation makes it possible to disentangle feed-forward drive from recurrent circulation, which conventional measures conflate. We further develop a permutation-based hypothesis-testing layer that identifies nodes and triangular motifs whose information-flow signatures change significantly between conditions. We validate the framework on simulations with known ground-truth structure and apply it to local field potential recordings from a rodent model of focal ischemic stroke. In three of four animals, we find a post-stroke shift toward hierarchical, source-driven propagation at the expense of recurrent feedback, while the fourth shows no significant change.

2026-06-07T02:11:09Z 45 pages, 15 figures Anass El-Yaagoubi Moo K. Chung Hernando Ombao http://arxiv.org/abs/2411.03026v3 Robust Market Interventions 2026-06-06T21:54:21Z

When can interventions in markets be designed to increase surplus robustly -- i.e., with high probability -- accounting for uncertainty due to imprecise information about economic primitives? In a setting with many strategic firms, each possessing some market power, we present conditions for such interventions to exist. The key condition, recoverable structure, requires large-scale complementarities among families of products. The analysis works by decomposing the incidence of interventions in terms of principal components of a Slutsky matrix. Under recoverable structure, a noisy signal of this matrix reveals enough about these principal components to design robust interventions. Our results demonstrate the usefulness of spectral methods for analyzing imperfectly observed strategic interactions with many agents.

2024-11-05T11:49:11Z Andrea Galeotti Benjamin Golub Sanjeev Goyal Eduard Talamàs Omer Tamuz http://arxiv.org/abs/2606.05450v2 Eigenvector Spatial Filters Nuclear Norm Matrix Completion with Application to Air Quality Data 2026-06-06T19:32:05Z

Reliable reconstruction of missing observations in environmental panel datasets is essential for accurate exposure assessment and policy analysis. Traditional nuclear norm matrix completion methods effectively impute missing entries in low-rank matrices, yet often overlook the spatial dependence inherent to air quality processes. This paper introduces the Eigenvector Spatial Filters Nuclear Norm Matrix Completion (ESFNNMC) method, an extension of nuclear norm fixed-effects matrix completion that replaces unit-specific intercepts with a set of Moran-type eigenvectors capturing spatial autocorrelation in the data. To estimate the model, we propose a Block-Coordinate Descent (BCD) approach for multiconvex optimization problems, with soft-thresholded singular value decomposition and cross-validated regularization. Through comprehensive simulations varying missingness patterns, the level of spatial and temporal autocorrelation, and dimension, shape, and rank structure of the matrices, ESFNNMC demonstrates substantial improvements in imputation accuracy over the standard fixed-effects approach, while keeping the computational cost approximately unchanged. The method is applied to impute missing entries in daily PM10 measurements in 64 monitoring stations in Lombardy, Italy, during the year 2021.

2026-06-03T21:11:18Z 29 pages, 5 figures, 14 tables, draft version (to do not cite yet) Rodolfo Metulini http://arxiv.org/abs/2606.08261v1 Sparse Longitudinal Functional Principal Component Analysis for Episodic Ambulatory Behavioral Assessments 2026-06-06T17:16:37Z

Accurately monitoring mental fatigue is critical for improving workplace safety and productivity. A recent study examined unobtrusively collected smartphone typing speed as a potential ambulatory proxy assessment of mental fatigue using data from the Intern Health Study (IHS). While population-level average typing speed patterns were found to be consistent with validated measures of mental fatigue, how these trajectories vary across participants and days may inform opportune moments for just-in-time interventions and remains an open question. Treating typing speed trajectories as sparsely observed functional data, we propose a novel sparse longitudinal functional principal component analysis (sparse LFPCA) method for decomposing variability and predicting individual curves. Specifically, sparse data are accommodated by casting covariance estimation as a structured penalized spline regression problem, enabling simultaneous estimation and smoothing of multiple covariance components while borrowing information across locations in the functional domain. Simulations show that sparse LFPCA (1) accurately estimates eigenfunctions and generates reasonable predictions for underlying curves, and (2) achieves similar or superior performance compared to existing alternatives. Our analysis of typing speed data collected from IHS reveals new and interpretable participant- and day-level patterns not captured by previous analyses and can be used to tailor behavioral interventions.

2026-06-06T17:16:37Z Nidhi Pai Yu Fang Srijan Sen Zhenke Wu Erjia Cui http://arxiv.org/abs/2606.08114v1 Robust applicability of continuous dynamical decoupling to decoherence reduction in longitudinal and transverse-noise settings: The role of anisotropy 2026-06-06T11:35:04Z

We analytically evaluate the efficiency of continuous dynamical decoupling (CDD) to curb decoherence in generic qubit setups where diverse sources of noise can be present. Previous theoretical approaches to CDD have mainly focused on its potential to cope with longitudinal fluctuations. Here, the basic scenario tackled with CDD is generalized. Apart from dealing with pure dephasing induced by diagonal noise, we consider the impact of transverse fluctuations, usually present in the practical arrangements. In particular, the implications of anisotropic noisy inputs are studied. Additionally, we analyze the role of the fluctuations in the dressing of the qubit by the CDD field of control: since the driving field is usually switched on through linear ramps of its characteristic parameters, the associated dressing of the original states can be described in terms of noisy Landau-Zener transitions. In our approach, based on a sequence of unitary transformations, the noise entering the system is cast into effective stochastic terms whose spectral characteristics are dependent on the driving parameters. This description allows the design of strategies to mitigate the impact of the fluctuations using controlled changes in the effective-noise properties. Significant robustness of CDD against the generalization of the basic scenario can be achieved through an appropriate choice of the parameters of control.

2026-06-06T11:35:04Z Phys. Rev. A 113, 062412 (2026) S. Afonso J. M. Gomez Llorente J. Plata 10.1103/gb82-y4z3