https://arxiv.org/api/XctwkOBTw0juJEhH8IPFj4r3G3w2026-06-19T01:39:51Z2357949515http://arxiv.org/abs/2605.13187v1Testing the Structural Properties of Marked Point Processes Using Local Inhomogeneous Mark-Weighted K-Functions2026-05-13T08:43:38ZThis work proposes $χ^2$-type test statistics to assess different hypotheses on the local structure of an observed marked point pattern. The test statistics is based on the local inhomogeneous extension of the mark-weighted $K$-function to investigate local behaviour of the marked point pattern. The summary statistic captures interactions between marks and locations by assessing local contributions to global deviations from independence or homogeneity. The methodology proves to be effective in identifying both global and localised departures from the null hypotheses, even in scenarios with subtle mark structures or small sample sizes. Real-world environmental applications to forestry and earthquake data demonstrate the utility of the proposed framework for detecting spatially dependent marked structures in the patterns.2026-05-13T08:43:38Zsubmitted for publicationNicoletta D'AngeloGiada AdelfioMatthias Eckardthttp://arxiv.org/abs/2602.01099v2Simultaneous Estimation of Seabed and Its Roughness With Longitudinal Waves2026-05-13T07:41:15ZThis paper introduces an infinite-dimensional Bayesian framework for acoustic seabed tomography, leveraging wave scattering to simultaneously estimate the seabed and its roughness. Tomography is considered an ill-posed problem where multiple seabed configurations can result in similar measurement patterns. We propose a novel approach focusing on the statistical isotropy of the seabed. Utilizing fractional differentiability to identify seabed roughness, the paper presents a robust numerical algorithm to estimate the seabed and quantify uncertainties. Extensive numerical experiments validate the effectiveness of this method, offering a promising avenue for large-scale seabed exploration.2026-02-01T08:46:45ZBabak Maboudi AfkhamAna Carpiohttp://arxiv.org/abs/2507.17172v2Local graph estimation with pathwise false discovery control2026-05-13T04:47:59ZMany datasets include a small set of variables, such as biomarkers or clinical outcomes, whose relationships to the broader system are of primary scientific interest. Estimating the full network of inter-variable relationships in such settings often obscures local structures around these targets, limiting interpretability. To address this fundamental problem, we introduce local graph estimation, a statistical framework for inferring substructures around target variables. We show that traditional graph estimation methods often fail to recover local structure, and present pathwise feature selection (PFS) as an effective alternative. PFS estimates local subgraphs by iteratively applying feature selection and propagating uncertainty along network paths, providing rigorous finite-sample false discovery control even in settings with mixed variable types and nonlinear dependencies. In four distinct applications spanning environmental and public health, multiomics, brain connectomics, and single-nucleus RNA sequencing, PFS recovers interpretable networks consistent with domain knowledge, highlighting its ability to uncover established mechanisms and generate novel hypotheses.2025-07-23T03:36:00ZOmar MelikechiDavid B. DunsonNoureddine MelikechiJeffrey W. Miller10.1038/s41467-026-72796-9http://arxiv.org/abs/2605.12977v1Enhancing a Risk Model by Adding Transient Statistical Factors2026-05-13T04:15:24ZEstimating the covariance of asset returns, i.e., the risk model, is a key component of financial portfolio construction and evaluation. Most risk modeling approaches produce a factor model that decomposes the asset variability into two components: the first attributed to a small number of factors that are common among the assets and the second attributed to the idiosyncratic behavior of each asset. Third-party providers typically provide risk models to investors, and while these models are typically of high quality, they may fail to capture important information, e.g., changing market regimes and transient factors. To overcome these limitations, we propose a systematic method based on maximum likelihood estimation to enhance an existing factor model by both refining the given model and adding new statistical factors. Our approach relies only on the observed sequence of realized returns and on the choice of two hyperparameters: the number of additional factors and the half-life parameter that determines the weights assigned to returns in the log-likelihood objective. Importantly, our methodology applies to the situation where asset returns may be missing, making it suitable for typical equity datasets. We demonstrate our approach on the Barra short-term US risk model, a high-quality risk model used in practice, for a universe of US high-capitalization equities. We show that the proposed extension captures structure in the returns that is missed by the original model.2026-05-13T04:15:24ZAlexandros E. TzikasEmmanuel J. CandèsTrevor HastieStephen P. BoydMykel J. KochenderferRonald N. Kahnhttp://arxiv.org/abs/1906.00573v9Conditional inference on the asset with maximum Sharpe ratio2026-05-13T03:45:45ZWe apply the procedure of Lee et al. to the problem of performing inference on the signal-noise ratio of the asset which displays maximum sample Sharpe ratio over a set of possibly correlated assets. We find a multivariate analogue of the commonly used approximate standard error of the Sharpe ratio to use in this conditional estimation procedure. We also consider several alternative procedures, including the simple Bonferroni correction for multiple hypothesis testing, which we fix for the case of positive common correlation among assets, the chi-bar square test against one-sided alternatives, Follman's test, and Hansen's asymptotic adjustments.
Testing indicates the conditional inference procedure achieves nominal type I rate, and does not appear to suffer from non-normality of returns. The conditional estimation test has low power under the alternative where there is little spread in the signal-noise ratios of the assets, and high power under the alternative where a single asset has high signal-noise ratio. Unlike the alternative procedures, it appears to enjoy rejection probabilities monotonic in the signal-noise ratio of the selected asset, and actually maintains near-nominal rejection rates under the conditional null.2019-06-03T04:50:52Zcode and latex source available from github repo, github.com/shabbychef/maxsharpeSteven E. Pavhttp://arxiv.org/abs/2605.12901v1A Bayesian Adaptive Latent Mixture Model for Zero-Inflated Weighted Brain Connectome Analysis2026-05-13T02:25:09ZReplicated weighted networks often exhibit many structural zeros alongside heterogeneous non-zero edge strengths. In structural connectomics, this zero-inflation coincides with subjects expressing overlapping, rather than discrete, connectivity patterns. To address these features, we propose a Bayesian adaptive latent mixture model for zero-inflated weighted networks. Our approach represents each subject network as a simplex mixture of shared low-rank latent score matrices, integrated with a hurdle likelihood that separates edge existence from conditional edge strength. A sparsity-coupling parameter enables absent edges to be either independent of, or informative about, the latent connectivity. For computation, we employ transformed Hamiltonian Monte Carlo on unconstrained coordinates, selecting the number of templates via predictive fit, held-out link prediction, and template stability. Theoretically, we establish posterior consistency, local asymptotic normality, a Bernstein--von Mises approximation, and predictive consistency for an identifiable quotient-space estimand under a fixed-template scenario. Simulations demonstrate performance gains over topology-only baselines in settings with mixed memberships or structure-informed sparsity. Applied to Human Connectome Project data, the model recovers stable latent score patterns and heterogeneous subject-level mixtures, with behavioural analyses serving strictly as exploratory annotations rather than confirmatory biomarker claims.2026-05-13T02:25:09ZHsin-Hsiung HuangYuh-Haur ChenTeng Zhanghttp://arxiv.org/abs/2605.12890v1Steer-to-Detect: Probing Hidden Representations for Detection of LLM-Generated Texts2026-05-13T02:14:21ZThe rapid advancement of large language models (LLMs) has made machine-generated text increasingly difficult to distinguish from human-written text. While recent studies explore leveraging internal representations of language models to uncover deeper detection signals, these raw features often exhibit substantial overlap between classes, limiting their discriminative power. To address this challenge, we propose Steer-to-Detect (\texttt{S2D}), a two-stage framework for detecting LLM-generated text. In the first stage, \texttt{S2D} learns a steering vector that is injected into the hidden states of a frozen observer LLM, producing representations with improved class separability. In the second stage, detection is performed via a hypothesis testing procedure based on the steered representations. We establish finite-sample, high-probability guarantees for Type I and Type II errors, providing a theoretical characterization of the procedure. Empirically, \texttt{S2D} achieves strong and consistent performance across a range of settings, including out-of-distribution scenarios and adversarial perturbations.2026-05-13T02:14:21ZLuxu LiangXiang Lihttp://arxiv.org/abs/2605.12840v1Decision Support for Marketplace Policies under Incomplete Evidence: From Replay to Launch Readiness2026-05-13T00:26:50ZMarketplace platforms routinely evaluate pricing and allocation policies using logged observational data, yet strong offline performance does not imply that a policy is safe to deploy. In real-time bidding (RTB) marketplaces, reserve-price and floor-policy changes affect not only revenue but also fill, advertiser value, budget pacing, and competition across auctions, creating feedback and interference. The central problem is therefore not to estimate whether a policy improves an offline metric, but to determine whether the available evidence justifies direct launch or only further validation. In this regard, we propose a support-aware decision-support system (DSS) that distinguishes promising from actionable evidence. The framework integrates replay, support-aware off-policy evaluation (OPE), conservative lower-bound ranking, multi-sided guardrails, out-of-time validation, sensitivity analysis, and interference-aware validation design into a claim-preserving pipeline that outputs a launch-readiness classification rather than a single performance estimate. Applying the framework to iPinYou-style RTB logs, we identify a margin-gated floor policy as the leading candidate, with a 47.7% replay yield lift, a 45.8% conservative lower-tail lift, and stable out-of-time performance. However, the framework does not recommend direct launch. A decision-rule ablation shows that simplified pipelines select the same policy but incorrectly recommend deployment, leaving key causal assumptions unresolved. In contrast, the proposed DSS selects the same policy but changes the action to online validation, reflecting missing evidence on propensities, bidder response, and interference. Overall, the contribution is a reproducible DSS protocol that prevents decision overclaim under partial identification and converts offline evaluation into an auditable, action-oriented recommendation.2026-05-13T00:26:50ZPrashant ShekharCaroline Howardhttp://arxiv.org/abs/2605.12832v1Digital Twins as Synthetic Controls in Single-Arm Trials2026-05-12T23:58:48ZSingle-arm trials are an important study design for evaluating drug efficacy and safety without enrolling patients into a control arm. Although they do not provide the gold-standard evidence of randomized controlled trials, they are increasingly used in clinical development as they offer an efficient, ethical, and practical alternative. A wide variety of approaches can be used to construct control comparators and estimate treatment effects, from fixed comparators informed by clinical knowledge to data-based and model-based patient-level comparators, also known as synthetic controls. Powerful and flexible machine learning models can allow outcome-model-based synthetic controls to overcome key limitations of direct data-based approaches, yield more robust estimates of treatment effects, and provide a principled way to incorporate corrections or encode additional assumptions when external data are not directly comparable. In this work, we argue that outcome-model-based synthetic control arms are an important tool for single-arm trials. We focus on digital twins, personalized predictions of disease progression generated from machine learning models trained on historical datasets, which naturally leverage these flexible approaches. We review doubly robust estimators, present power and sample size formulas, and discuss trade-offs in selecting historical data for training and analysis. We also outline practical considerations for deploying digital twins within the framework of recent FDA draft guidance on the use of artificial intelligence in drug development. Finally, we reanalyze data from trials in amyotrophic lateral sclerosis and Huntington's disease to demonstrate the proposed methods.2026-05-12T23:58:48ZDaniele BertoliniFranklin FullerAaron M. SmithJonathan R. WalshRun Zhuanghttp://arxiv.org/abs/2605.12797v1Evaluating the impact of outcome delay on the efficiency of sample size re-estimation2026-05-12T22:26:52ZSample size reestimation can be a powerful tool to ensure that a clinical trial meets its prespecified power requirements when uncertainty regarding a design parameter exists at the planning stage. However, long term primary endpoints can be harmful to the efficiency of this trial design. If recruitment is continued while treatment outcomes are awaited, long delay can potentially lead to a large number of pipeline participants being recruited in the trial that do not contribute to the interim analysis. This may lead to a larger number of recruited participants than are actually deemed required, resulting in an overpowered trial with high cost. This paper studies the exact impact of such outcome delay on the efficiency of internal pilot type SSR designs. The distribution of the final sample size post SSR is obtained under various delay lengths for both continuous and binary outcome data, how delay impacts the precision of the final sample size estimate is then discussed. Precisely, the impact of delay on this precision is assessed through RMSE, as well as two more novel metrics, termed the delay impact and cost. The results indicate that with increase in delay length, the delay impact increases, inflating average sample size and power. However, the severity of the effect of delayed outcomes depends highly on the exact trial setting. Trials where the reestimated sample size is smaller than originally planned suffer the most from delayed outcomes, often leading to an overpowered trial. However, the impact of delay is substantially less if the original planned sample size remains smaller than the reestimated sample size.2026-05-12T22:26:52ZAritra MukherjeeMichael J GraylingJames J M S Wasonhttp://arxiv.org/abs/2603.14479v2Risk-Calibrated Process Capability Approval with Finite Samples2026-05-12T22:18:13ZProcess capability indices such as $C_{pk}$ are widely used in manufacturing to support supplier qualification, pilot-build release, and production approval. In practice, approval decisions are often based on deterministic threshold rules of the form $\widehat{C}_{pk} \ge C_0$. Because $\widehat{C}_{pk}$ is estimated from finite samples, however, such decisions are inherently stochastic, especially when the true capability lies near the approval threshold. This paper develops a risk-calibrated decision framework for process capability approval that explicitly accounts for estimation uncertainty and asymmetric operational loss. Capability approval is formulated as a binary statistical decision problem, leading to a rule of the form $\widehat{C}_{pk} \ge C_0 + k\,SE(\widehat{C}_{pk})$, where the calibration constant $k$ is determined either by a tolerable failure probability or by a false-accept/false-reject cost ratio. The resulting formulation unifies several commonly used procedures, including deterministic thresholding, lower confidence bound rules, and probability-based approval rules, and naturally extends them to cost-sensitive decision rules derived from asymmetric operational loss. Simulation experiments and an industrial case study show that risk calibration primarily affects near-threshold decisions, improves approval stability, and can substantially reduce expected operational loss when false acceptance is more costly than false rejection.2026-03-15T16:47:59Z17 pages, 4 figures and 6 tablesFei JiangLei Yanghttp://arxiv.org/abs/2605.12760v1How long should a block be?2026-05-12T21:14:09ZThe block maximum method, which is widely used in extreme value analysis, uses a generalized extreme value distribution to approximate that of the maximum of m observations. The quality of this approximation depends on the value of m and may be poor if m is too small. Surprisingly little attention has been paid to the choice of the block length, although a good choice is crucial to the success of the method. In this paper we assess the effect of taking excessively long blocks in terms of asymptotic relative efficiency, and propose likelihood-based approaches and graphical diagnostics to determine whether a proposed block length is suitable, allowing for potential rounding and left-censoring of observations. We investigate our ideas using simulation and illustrate them using wind speed, river flow and rainfall data.2026-05-12T21:14:09Z18 pages, plus supplementary materialLéo R. BelzileAnthony C. Davisonhttp://arxiv.org/abs/2503.17606v2Combining longitudinal cohort studies to examine cardiovascular risk factor trajectories across the adult lifespan2026-05-12T17:07:47ZWe introduce a statistical framework for combining data from multiple large longitudinal cardiovascular cohorts to enable the study of long-term cardiovascular health starting in early adulthood. Using data from seven cohorts belonging to the Lifetime Risk Pooling Project (LRPP), we present a Bayesian hierarchical multivariate approach that jointly models multiple longitudinal risk factors over time and across cohorts. Because few cohorts in our project cover the entire adult lifespan, our strategy uses information from all risk factors to increase precision for each risk factor trajectory and borrows information across cohorts to fill in unobserved risk factors. We develop novel diagnostic testing and model validation methods to ensure that our model robustly captures and maintains critical relationships over time and across risk factors. Our modeling reveals substantial age-related variation in risk factor trajectories, with patterns that differ across life stages, subgroups, and cohorts, thereby highlighting key periods for cardiovascular prevention and monitoring. Keywords: Bayesian hierarchical models; Missing data; Model validation; Multiple imputation; Random effects.2025-03-22T01:21:13ZZeynab AghabazazMichael J DanielsHongyan NingDonald M. Lloyd-JonesJuned Siddiquehttp://arxiv.org/abs/2604.16642v2Geometric coherence of single-cell CRISPR perturbations reveals regulatory architecture and predicts cellular stress2026-05-12T17:01:20ZGenome engineering has achieved remarkable sequence-level precision, yet predicting the transcriptomic state that a cell will occupy after perturbation remains an open problem. Single-cell CRISPR screens measure how far cells move from their unperturbed state, but this effect magnitude ignores a fundamental question: do the cells move together? Two perturbations with identical magnitude can produce qualitatively different outcomes if one drives cells coherently along a shared trajectory while the other scatters them across expression space. We introduce a geometric stability metric, Shesha, that quantifies the directional coherence of single-cell perturbation responses as the mean cosine similarity between individual cell shift vectors and the mean perturbation direction. Across five CRISPR datasets (2,200+ perturbations spanning CRISPRa, CRISPRi, and pooled screens), stability correlates strongly with effect magnitude (Spearman $ρ=0.75-0.97$), with a calibrated cross-dataset correlation of 0.97. Crucially, discordant cases where the two metrics decouple expose regulatory architecture: pleiotropic master regulators such as CEBPA and GATA1 pay a "geometric tax," producing large but incoherent shifts, while lineage-specific factors such as KLF1 produce tightly coordinated responses. After controlling for magnitude, geometric instability is independently associated with elevated chaperone activation (HSPA5/BiP; $ρ_{partial}=-0.34$ and $-0.21$ across datasets), and the high-stability/high-stress quadrant is systematically depleted. The magnitude-stability relationship persists in scGPT foundation model embeddings, confirming it is a property of biological state space rather than linear projection. Perturbation stability provides a complementary axis for hit prioritization in screens, phenotypic quality control in cell manufacturing, and evaluation of in silico perturbation predictions.2026-04-17T19:01:05ZPrashant C. Rajuhttp://arxiv.org/abs/2605.12248v1Time-variant reliability using time-dependent surrogate models2026-05-12T15:15:14ZTime-variant reliability analysis is a critical task for ensuring the safety of engineering dynamical systems subjected to stochastic excitations. However, assessing failure probability for realistic systems with Monte-Carlo simulation-based methods is often computationally intractable due to the high cost of the underlying models and the large number of simulations required. While surrogate models such as polynomial chaos expansions or Kriging are well-established for time-invariant reliability problems, their direct application to time-dependent systems remains challenging. This chapter introduces two advanced surrogate modeling frameworks designed specifically for dynamical systems: manifold-NARX (mNARX) and functional NARX (F-NARX). The mNARX approach constructs the surrogate on a reduced-order manifold of auxiliary state variables, enabling the efficient handling of high-dimensional inputs by embedding physical insight into a regression formulation. Conversely, the F-NARX framework exploits the functional nature of system trajectories, extracting principal component features from continuous time windows to mitigate issues associated with discrete lag selection and long-memory effects. We demonstrate the efficacy of these methods on two benchmark reliability problems: a stochastic quarter-car model and a hysteretic Bouc-Wen oscillator. The results highlight that, when combined with suitably biased experimental designs, both frameworks accurately capture the tail behavior of the system response, enabling precise and efficient estimation of first-passage probabilities.2026-05-12T15:15:14ZStefano MarelliStyfen SchärBruno Sudret