https://arxiv.org/api/rlrU68uiZe4pnpHDdHbTbJdXev4 2026-06-13T23:24:56Z 36171 690 15 http://arxiv.org/abs/2402.13472v3 Generalized linear models with spatial dependence and a functional covariate 2026-05-21T14:17:35Z

We extend generalized functional linear models under independence to a situation in which a functional covariate is related to a scalar response variable that exhibits spatial dependence-a complex yet prevalent phenomenon. For estimation, we apply basis expansion and truncation for dimension reduction of the covariate process followed by a composite likelihood estimating equation to handle the spatial dependency. We establish asymptotic results for the proposed model under a repeating lattice asymptotic context, allowing us to construct a confidence interval for the spatial dependence parameter and a confidence band for the regression parameter function. A binary conditionals model with functional covariates is presented as a concrete illustration and is used in simulation studies to verify the applicability of the asymptotic inferential results. We apply the proposed model to a problem in which the objective is to relate annual corn yield in counties of states in the Midwestern United States to daily maximum temperatures from April to September in those same geographic regions. The extension to an expanding lattice context is further discussed in the supplement.

2024-02-21T02:02:11Z Sooran Kim Mark S. Kaiser Xiongtao Dai http://arxiv.org/abs/2508.12085v2 Unified Conformalized Multiple Testing with Full Data Efficiency 2026-05-21T12:23:48Z

Conformalized multiple testing offers a model-free way to control predictive uncertainty in decision-making. Existing methods typically use only part of the available data to build score functions tailored to specific settings. We propose a unified framework that puts data utilisation at the centre: it uses all available data-null, alternative, and unlabelled-to construct scores and calibrate p-values through a full permutation strategy. This unified use of all available data significantly improves power by enhancing non-conformity score quality and maximising calibration set size while rigorously controlling the false discovery rate. Crucially, our framework provides a systematic design principle for conformal testing and enables automatic selection of the best conformal procedure among candidates without extra data splitting. Extensive numerical experiments demonstrate that our enhanced methods deliver superior efficiency and adaptability across diverse scenarios.

2025-08-16T15:45:29Z Yuyang Huo Xiaoyang Wu Changliang Zou Haojie Ren http://arxiv.org/abs/2605.22354v1 From Volterra Series to Kunchenko Stochastic Polynomials: Half a Century of Non-Gaussian Estimation Methodology 2026-05-21T11:42:32Z

This paper reconstructs the half-century evolution of the scientific school founded by Yuriy P. Kunchenko (1939--2006) as the development of a semiparametric methodology for non-Gaussian estimation. Starting with Kunchenko's 1972/1973 dissertation applying Volterra series to estimate parameters of random processes, the trajectory is followed through 2006--2026. Kunchenko stochastic polynomials are presented as a coherent family of moment-cumulant procedures: the polynomial maximization method (PMM) for parameter estimation, polynomial criteria for hypothesis testing, and decomposition in spaces with a generating element. The paper details the school's structure: a verified genealogy of 15 defended dissertations, collaborations in Poland, Slovakia, and Germany, and the R package EstemPMM. A recent 2026 paper on Volterra-based signal processing is analyzed, showing how Kunchenko's nonlinear formulation reappears in applied radio engineering. We build a formal bridge between finite Volterra models and generalized Kunchenko polynomials, while separating the MMSE/L2 criterion from PMM: the former is a covariance projection for kernel adaptation, whereas PMM is a parameter-dependent moment procedure. PMM efficiency claims are stated conditionally: gains require that moments exist, the centered correlant matrix is nondegenerate, and the variance reduction coefficient is below one. The concluding research program operationalizes the historical reconstruction into testable statistical and signal-processing tasks.

2026-05-21T11:42:32Z Bilingual submission: English followed by Ukrainian translation Serhii Zabolotnii http://arxiv.org/abs/2605.22352v1 Spatiotemporal dynamics and ecological risk factors of highly pathogenic avian influenza A(H5N1) in Canadian wildlife: A One Health surveillance analysis 2026-05-21T11:41:20Z

Highly pathogenic avian influenza A(H5N1) has expanded geographically and ecologically, affecting wild birds, mammalian wildlife, domestic animals, and humans. Wildlife surveillance provides critical early warning for One Health preparedness, yet national-scale analyses integrating host ecology, spatial patterns, seasonality, viral lineage, and risk factors remain limited. This study analysed Canadian wildlife HPAI A(H5N1) surveillance records from 2022 to 2026 to characterise spatiotemporal dynamics and identify factors associated with detection counts. A retrospective analysis of 2,657 detections across 13 provinces and territories was conducted using descriptive epidemiology, spatial clustering methods, and Negative Binomial mixed models. Detections were predominantly avian, with waterfowl and raptors as the major host groups, while mammals accounted for a smaller but epidemiologically important proportion. Detection burden was highest in 2022, with increased activity in autumn and spring. Ontario, Alberta, and British Columbia were identified as major hotspots, with evidence of local clustering in parts of the Prairie region. Reassortant Eurasian-North American lineages dominated detections and were strongly associated with higher detection counts. Modelling results identified year, season, and lineage as key predictors. These findings support risk-based One Health surveillance prioritising high-burden regions, migration-associated periods, key avian host groups, reassortant viral lineages, and continued monitoring of mammalian wildlife.

2026-05-21T11:41:20Z Hammed Olawale Fatoyinbo Hoyeon Jeong http://arxiv.org/abs/2605.22301v1 Chained Markov melding using divide and conquer sequential Monte Carlo 2026-05-21T10:47:41Z

Specifying a full Bayesian model that integrates multiple data sources can be challenging. One natural approach is to specify each individual model separately and join them afterwards. This is the approach adopted in Markov melding. However, when adjacent submodels share common quantities, as in chained Markov melding, posterior inference can be challenging for existing MCMC-based approaches. In this paper, we propose a new multi-stage sampler for chained Markov models involving an arbitrary number of submodels. The proposed sampler adopts a divide-and-conquer sequential Monte Carlo approach for the tree-structured model that fits naturally with the structure of chained Markov melding. The resulting multi-stage sampler provides a flexible alternative for sampling from complex joint models, as its separate sampling scheme for different submodels avoids the need for directly sampling from the full model. We demonstrate applications of the sampler through two examples. The first is a toy example involving 11 submodels of various types. The second example considers an ecologically integrated population model that combines multiple datasets to estimate immigration and reproduction rates.

2026-05-21T10:47:41Z Yixuan Liu Robert J. B. Goudie http://arxiv.org/abs/2605.22253v1 Bayesian Nonparametrics: Principles and Practice 2026-05-21T09:59:21Z

This extended preface [to the Book `Bayesian Nonparametrics', Cambridge University Press, 2010, by NL Hjort, CC Holmes, P Mueller, SG Walker] is meant to explain why you are right to be curious about Bayesian nonparametrics -- why you may actually need it and how you can manage to understand it and use it. The preface also serves as an introductory chapter, giving an overview of the aims and contents of the book. We also explain the background for how the book came into existence, delve briefly on the history of the still relatively young field of Bayesian nonparametrics, and offer some concluding remarks, pertaining to various challenges and likely future developments of the area.

2026-05-21T09:59:21Z 16 pages, no figures. This is the authors' extended preface to and published in modified form in the book Bayesian Nonparametrics, Cambridge University Press, 2010, sketching the history of Bayesian Nonparametrics, pointing to developments and application domains, etc Nils Lid Hjort Chris Holmes Peter Mueller Stephen G. Walker http://arxiv.org/abs/2605.22110v1 Two-stage Ensemble Clustering of Functional Data Using Random Projections 2026-05-21T07:43:03Z

We propose a computationally simple framework for clustering functional data based on Gaussian-process-generated random projections. In this approach, each curve is first projected onto a large collection of independent Gaussian process realizations. The resulting high-dimensional representations are clustered using the Mean Absolute Difference of Distances (MADD), a dissimilarity measure well suited for high-dimensional settings. A population-level analysis of this dissimilarity provides insight into how random projections help capture distributional differences between functional populations. We introduce a second stage of clustering to additionally leverage on data-driven projection directions. Thus, in Stage I, an initial clustering is obtained using a set of prespecified projection families. In Stage II, this partition is refined by constructing Gaussian random projections based on an estimated covariance operator that uses the first stage of cluster labels. Finally, a normalized cost function is used to select the optimal clustering among candidate solutions. The proposed clustering algorithm is broadly applicable to diverse functional data regimes including irregular and partially observed data. Through extensive simulations and real-data applications, we show that the proposed method achieves a high degree of accuracy and outperforms many of the state-of-the-art methods across a wide range of functional data settings.

2026-05-21T07:43:03Z 32 pages, 6 figures, 7 tables Sourav Chakrabarty Anirvan Chakraborty Shyamal K. De http://arxiv.org/abs/2605.22038v1 A Mixed Self-Exciting Process to Model Epileptic Seizures 2026-05-21T06:21:35Z

Epilepsy is a neurological disorder characterized by recurrent seizures affecting more than 70 million people worldwide. Often, an individual with epilepsy is more likely to experience subsequent seizures following an initial seizure, a process we call seizure clustering. Motivated by seizure diary data collected over three years from 407 individuals newly diagnosed with focal epilepsy in the Human Epilepsy Project (HEP), we propose a Bayesian mixed Hawkes process model that addresses seizure clustering and heterogeneity between individuals. In the Hawkes process, the intensity is accelerated each time an event occurs, through the composition of background and excitation intensity functions. The proposed model incorporates a Weibull baseline intensity to model a trend in background seizure rates over time, while the excitation process accounts for seizure clustering within individuals. We model heterogeneity among individuals by including covariates and random effects in both the background and excitation intensities. In the HEP study, the average time between primary and secondary seizures within an individual is 1.57 (95\% CrI: 1.43, 1.70) days, with an average of 2.20 (1.96, 2.47) seizures per cluster. We demonstrate that omitting random effects in the presence of heterogeneity leads to underestimation of the background intensity and overestimation of excitation rates.

2026-05-21T06:21:35Z 35 pages, 5 figures, 33 pages supplementary material Karen Kanaster Giovani L. Silva Peter Mueller Jacob Pellinen Elizabeth Juarez-Colunga http://arxiv.org/abs/2605.22025v1 Testing for Serial Independence via Auto Hilbert-Schmidt Independence Criterion 2026-05-21T05:50:26Z

We develop a Hilbert--Schmidt independence criterion (HSIC)-based framework for testing serial independence in strictly stationary time series. The proposed auto Hilbert--Schmidt independence criterion (AutoHSIC) measures dependence between an observation and its lagged counterpart, providing a kernel-based approach to detecting nonlinear serial dependence. The empirical AutoHSIC statistic is a lagged U-statistic constructed from overlapping observations, and hence inherits temporal dependence even under the i.i.d. null. Its asymptotic analysis therefore differs from standard i.i.d. HSIC theory and must account for degeneracy under the null. We establish the limiting behaviour of the resulting single-lag and portmanteau tests under the null and under fixed alternatives. Since the limiting null distribution is non-pivotal, we develop a wild bootstrap procedure for critical value approximation and prove its asymptotic validity. The framework is further extended to residual-based model diagnostics, where parameter estimation affects the null distribution. Simulations and empirical applications illustrate its ability to detect nonlinear serial dependence in multivariate, functional and matrix time series.

2026-05-21T05:50:26Z Muyi Li Yuqing Xu Zhou Zhou http://arxiv.org/abs/2605.22004v1 Selecting Informative Conformal Prediction Sets with an Optimized FCR-Controlled Approach 2026-05-21T05:02:35Z

Conformal methods provide prediction sets for outcomes with confidence guarantees. We study their use in a selective inference setting, where inference is performed only when the prediction set is informative. The analyst may consider as informative, for example, cases with prediction sets that are sufficiently small, exclude null values, or satisfy other appropriate monotone constraints. Because inference is typically restricted to informative cases in practical applications, accounting for the resulting selection bias is crucial to maintaining false coverage rate (FCR) control. A general framework for constructing such informative conformal prediction sets while controlling the FCR on the selected sample was suggested in Gazin et al. (2025). In this work we focus on oracle-guided procedures. We derive the optimal decision policy under a suitable power objective in the oracle setting where the probability of belonging to each prediction set can be computed. In practice, of course, only estimated probabilities are available. We therefore introduce a calibration procedure that adjusts the oracle policy to maintain finite sample FCR control. We show that this approach can achieve substantially higher power than available alternatives. We demonstrate the effectiveness of our new methods for classification outcomes on both real and simulated data.

2026-05-21T05:02:35Z Israela Solomon Etienne Roquain Saharon Rosset Ruth Heller http://arxiv.org/abs/2606.02589v1 Rashomon-Seeded Annealing for Robust Bayesian Inference in Factorial Designs 2026-05-21T05:01:39Z

Integrating over model uncertainty in factorial designs via Bayesian model averaging is hindered by the combinatorial explosion of interpretable interaction effects, often yielding a multimodal posterior, where standard Markov chain Monte Carlo algorithms encounter significant convergence issues. We propose a general computational framework that repurposes Rashomon sets, collections of high-performing models traditionally valued for prediction and interpretability, as a strategic "warm start" for estimating the full posterior. Our method, Rashomon-seeded annealing, initializes annealed importance sampling (AIS) by anchoring the starting density within these pre-identified, high-evidence regions while preserving global support over the entire model space. Rather than restricting inference to the Rashomon set and understating uncertainty, the AIS correction restores full posterior inference, turning the Rashomon certificate from an inferential truncation into a proposal mechanism. We demonstrate this approach using Rashomon Partition Sets (RPS) as a rigorous, certified seed constructor for factorial designs. The resulting algorithm yields consistent self-normalized posterior summaries, such as model-averaged cell means, credible intervals, and uncertainty summaries without exhaustive enumeration of the complete model space. This bridges the gap between high-evidence model discovery and rigorous Bayesian inference, and outlines a general strategy in which any high-posterior seed set can provide computational leverage for AIS-based model averaging.

2026-05-21T05:01:39Z 28 pages, 8 figures Yiyang Fan Soumyakanti Pan Tyler H. McCormick http://arxiv.org/abs/2605.20828v2 Adaptive Test for Jump 2026-05-21T04:05:33Z

We develop an adaptive jump test for discretely observed high-frequency semimartingales by combining the A"it-Sahalia--Jacod ratio statistic (A"it-Sahalia and Jacod, 2009) and the Lee--Mykland extreme-return statistic (Lee and Mykland, 2008) with the Cauchy combination rule. Allowing stochastic It^o drift, volatility, and leverage, we show asymptotic independence under the continuous-path null and dense local alternatives, yielding an analytically calibrated test with closed-form power; under finite-activity jumps, the test is consistent. We also extend the method to additive microstructure noise. Simulations show that the combined procedure performs well under both dense and sparse alternatives and is typically best overall.

2026-05-20T07:21:06Z Huifang Ma Long Feng http://arxiv.org/abs/2605.21928v1 CausalGuard: Conformal Inference under Graph Uncertainty 2026-05-21T02:56:46Z

Estimating treatment effects from observational data requires choosing an adjustment set, but valid adjustment depends on an unknown causal graph. Graph misspecification can cause under-coverage, while graph-agnostic conformal wrappers may regain nominal coverage only through large padding. We introduce CausalGuard, a structure-weighted conformal framework that calibrates after aggregating graph-conditional doubly robust pseudo-outcomes. Candidate DAGs are proposed from an LLM-derived edge prior, pruned by conditional-independence tests, and reweighted by Bayesian Information Criterion. A composite nonconformity score then calibrates the posterior-weighted pseudo-outcome. CausalGuard provides distribution-free finite-sample marginal coverage for this aggregated pseudo-outcome; under causal identification, overlap, conditional-mean nuisance stability, and concentration on target-aligned valid adjustment strategies, its conditional mean converges to the true Conditional Average Treatment Effect. Across five benchmarks, CausalGuard attains mean coverage above the nominal 90% level for the directly evaluable target and reduces width when graph-agnostic conformal baselines require large padding. Stress tests show that CausalGuard suppresses invalid collider adjustment and remains stable under misspecified priors when the retained candidate set is data-supported.

2026-05-21T02:56:46Z Vikash Singh Weicong Chen Debargha Ganguly Yanyan Zhang Nengbo Wang Sreehari Sankar Mohsen Hariri Alexander Nemecek Chaoda Song Shouren Wang Biyao Zhang Van Yang Erman Ayday Jing Ma Vipin Chaudhary http://arxiv.org/abs/2605.21884v1 Trend and seasonality estimation for point-process time series 2026-05-21T01:44:48Z

This article introduces estimators of trend and seasonality for time series of point processes. We assume the point processes follow a temporal or spatial doubly-stochastic Poisson model with log-Gaussian intensity functions. The proposed estimators are computationally simple M-estimators. Their asymptotic distribution is derived, and their finite-sample performance is studied by simulation. As an example of real-data application, we study the patterns of bike demand in the Divvy bike-sharing system of the city of Chicago.

2026-05-21T01:44:48Z Daniel Gervini Simon A. Kopischke http://arxiv.org/abs/2605.21848v1 Block-Independent Likelihood Ratio Testing for High-Dimensional Mean Vectors with Applications to Matrix-Variate Data 2026-05-21T00:44:47Z

Testing the equality of two high-dimensional mean vectors is a fundamental problem in multivariate analysis. While the classical Hotelling's $T^2$ test is optimal in low-dimensional settings, it fails when the dimension $p$ is comparable to or exceeds the sample size $n$. Several extensions, including the Diagonal Likelihood Ratio Test (DLRT), have been proposed under the working independence assumption among variables. However, such an assumption can lead to a substantial loss of power when correlations are present. In this paper, we propose a new test, the Block Independent Likelihood Ratio Test (BILT), which generalizes DLRT by relaxing the working independence assumption to a block independence assumption. We establish its asymptotic normality of the null distribution of the BILT statistic for 'increasing $p$ with small $n$' under mild regularity conditions. We further analyze the asymptotic power of BILT under a local alternatives. Extensive simulation studies show that BILT maintains Type I error control and achieves substantially higher power than DLRT across a wide range of covariance structures. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset further demonstrates the application of BILT to testing mean differences between two matrix-variate populations.

2026-05-21T00:44:47Z Minsub Shin Kwangok Seo Sang Han Lee Johan Lim