Fast confidence bounds for the false discovery proportion over a path of hypotheses

2026-03-09T15:38:58Z

This paper presents a new algorithm (and an additional trick) that allows to compute fastly an entire curve of post hoc bounds for the False Discovery Proportion when the underlying bound $V^*\_{\mathfrak{R}}$ construction is based on a reference family $\mathfrak{R}$ with a forest structure {à} la Durand et al. (2020). By an entire curve, we mean the values $V^*\_{\mathfrak{R}}(S\_1),\dotsc,V^*\_{\mathfrak{R}}(S\_m)$ computed on a path of increasing selection sets $S\_1\subsetneq\dotsb\subsetneq S\_m$, $|S\_t|=t$. The new algorithm leverages the fact that going from $S\_t$ to $S\_{t+1}$ is done by adding only one hypothesis. Compared to a more naive approach, the new algorithm has a complexity in $O(|\mathcal K|m)$ instead of $O(|\mathcal K|m^2)$, where $|\mathcal K|$ is the cardinality of the family.

Fractional Topological Phases, Flat Bands, and Robust Edge States on Finite Cyclic Graphs via Single-Coin Split-Step Quantum Walks

2026-03-08T15:59:21Z

We report the first realization of a fractional topological phase in a fully unitary, noninteracting discrete-time quantum walk implemented on finite cyclic graphs. Using a single-coin split-step cyclic quantum walk (SCSS-CQW), we uncover topological phenomena that are inaccessible within conventional cyclic quantum-walk dynamics. The protocol enables controlled engineering of quasienergy spectra, flat bands, and topological phase transitions through the step-dependency parameter and coin-rotation angle. We show that cyclic graphs with even and odd numbers of sites exhibit qualitatively different band structures, while rotational flat bands arise exclusively in $4n$-site cycles; a general analytic condition for their emergence is derived. The SCSS-CQW produces fractional winding numbers $\pm \frac{1}{2}$ (Zak phases $\pm \fracπ{2}$), in sharp contrast with the integer invariants of standard quantum walks. These fractional invariants lead to an unconventional bulk--boundary correspondence and support edge states beyond the usual integer topological classification. In the step-dependent protocol, transitions between distinct fractional winding sectors generate robust edge modes. Numerical simulations show that these states remain stable in the presence of both dynamic and static coin disorder as well as phase-preserving perturbations, while survival-probability analysis demonstrates their long-time persistence. Requiring only a constant number of detectors independent of the evolution time, the proposed scheme offers a minimal-resource and experimentally accessible platform for realizing fractional topology, flat bands, and protected edge states in small-scale synthetic quantum systems.

DisSim-FinBERT: Text Simplification for Core Message Extraction in Complex Financial Texts

2026-03-08T15:48:51Z

This study proposes DisSim-FinBERT, a novel framework that integrates Discourse Simplification (DisSim) with Aspect-Based Sentiment Analysis (ABSA) to enhance sentiment prediction in complex financial texts. By simplifying intricate documents such as Federal Open Market Committee (FOMC) minutes, DisSim improves the precision of aspect identification, resulting in sentiment predictions that align more closely with economic events. The model preserves the original informational content and captures the inherent volatility of financial language, offering a more nuanced and accurate interpretation of long-form financial communications. This approach provides a practical tool for policymakers and analysts aiming to extract actionable insights from central bank narratives and other detailed economic documents.

Group-Sparse Smoothing for Longitudinal Models with Time-Varying Coefficients

2026-03-08T14:39:35Z

Longitudinal data analysis is fundamental for understanding dynamic processes in biomedical and social sciences. Although varying coefficient models (VCMs) provide a flexible framework by allowing covariate effects to evolve over time, fitting all effects as time-varying may lead to overfitting, efficiency loss, and reduced interpretability when some effects are actually constant. In contrast, standard linear mixed models (LMMs) may suffer substantial bias when temporal heterogeneity is ignored. To address this issue, we propose time-varying effect selection, TV-Select, a unified framework for structural identification that simultaneously selects relevant variables and determines whether their effects are constant or time-varying. The proposed method decomposes each coefficient function into a time-invariant mean component and a centered time-varying deviation, where the latter is approximated by B-splines. We then construct a doubly penalized objective function that combines a group Lasso penalty for structural sparsity with a roughness penalty for smoothness control. An efficient block coordinate descent algorithm is developed for computation. Under regular semiparametric conditions, we establish selection consistency and oracle-type asymptotic properties, including asymptotic normality for the constant-effect component after correct structure recovery. Simulation studies and a real-data application show that TV-Select achieves more accurate structural recovery, smoother functional estimation, and better predictive performance than competing methods.

A note on diffusive/random-walk behaviour in Metropolis--Hastings algorithms

2026-03-07T18:59:24Z

We prove a general result that if a Metropolis--Hastings algorithm has a proposal that is not geometrically ergodic and the acceptance rate approaches unity at a suitable rate as the state variable becomes large, then the Metropolised chain will also not be geometrically ergodic. Our conditions seem stronger than might be expected, but are shown to be necessary through a counterexample. We then turn our attention to the random walk and guided walk Metropolis algorithms. We show that if the target distribution has polynomial tails the latter converges at twice the polynomial rate of the former, but that if instead the target distribution has strictly convex potential then the random walk Metropolis behaves as a $1/2$-lazy version of the guided walk Metropolis when the state variable is large, and therefore moves at a similar (ballistic) speed.

StablePCA: Distributionally Robust Learning of Shared Representations from Multi-Source Data

2026-03-07T18:38:13Z

When synthesizing multi-source high-dimensional data, a key objective is to extract low-dimensional representations that effectively approximate the original features across different sources. Such representations facilitate the discovery of transferable structures and help mitigate systematic biases such as batch effects. We introduce Stable Principal Component Analysis (StablePCA), a distributionally robust framework for constructing stable latent representations by maximizing the worst-case explained variance over multiple sources. A primary challenge in extending classical PCA to the multi-source setting lies in the nonconvex rank constraint, which renders the StablePCA formulation a nonconvex optimization problem. To overcome this challenge, we conduct a convex relaxation of StablePCA and develop an efficient Mirror-Prox algorithm to solve the relaxed problem, with global convergence guarantees. Since the relaxed problem generally differs from the original formulation, we further introduce a data-dependent certificate to assess how well the algorithm solves the original nonconvex problem and establish the condition under which the relaxation is tight. Finally, we explore alternative distributionally robust formulations of multi-source PCA based on different loss functions.

Bias- and Variance-Aware Probabilistic Rounding Error Analysis for Floating-Point Arithmetic

2026-03-07T15:14:33Z

Probabilistic rounding error analysis can yield much sharper bounds than classical worst-case theory, but existing results typically rely on zero-mean rounding errors and often leave the confidence parameter implicit. This work revisits probabilistic rounding error analysis in a moment-aware setting. We first derive a confidence-calibrated reformulation of the Higham and Mary [16] bound that makes its confidence parameter explicit. We then introduce a variance-informed probabilistic backward error bound based on the first two moments of $\log(1+δ)$, where $δ$ is the relative rounding error. This allows the analysis to accommodate biased rounding error models rather than relying on a zero-mean assumption. To illustrate this framework, we study both a uniform model and a log-space $\operatorname{Beta}$ model for rounding errors, the latter of which provides a simple way to represent bias. This perspective shows that the growth of probabilistic rounding error bounds is not universal: near-zero-mean regimes recover $\sqrt{n}$-like behavior, while biased models can exhibit faster accumulation. $\texttt{CUDA}$ experiments in single and half precision on dot products, sparse matrix-vector products, and a stochastic boundary-value problem show that the proposed framework is especially useful in low-precision regimes where deterministic bounds are overly conservative and where bias-aware modeling better matches observed error growth.

Multi-parameter determination in the semilinear Helmholtz equation

2026-03-07T14:57:46Z

This paper studies an inverse boundary value problem for a semilinear Helmholtz equation with Neumann boundary conditions in a bounded domain $Ω\subset \mathbb{R}^n$ ($n\ge2$). The objective is to recover the unknown linear and nonlinear coefficients from the associated Neumann-to-Dirichlet (NtD) map. Using a higher-order linearization approach, we establish the unique determination of both coefficients from boundary measurements. For spatial dimensions $n\ge3$, uniqueness holds under $C^γ(\overlineΩ)$ regularity assumptions with $0<γ<1$, while in the two-dimensional case uniqueness is obtained under Sobolev regularity $W^{1,p}(Ω)$ with $p>2$. The analysis relies on the well-posedness of the forward problem together with techniques from linear inverse problems, including Runge-type approximation arguments and Fourier analysis. In addition, we develop a numerical reconstruction framework for recovering the coefficients from boundary data. The forward problem is discretized using a finite difference scheme combined with a quasi-Newton iteration, and the inverse problem is formulated within a Bayesian inference framework. Posterior distributions of the coefficients are explored using the preconditioned Crank-Nicolson (pCN) Markov chain Monte Carlo algorithm, which provides both point estimates and uncertainty quantification. Numerical experiments demonstrate the effectiveness of the proposed reconstruction method and illustrate the theoretical uniqueness results.

Robustness and size-dependence of circadian rhythms in multiscale suprachiasmatic-nucleus networks

2026-03-07T08:54:32Z

Understanding how multi-scale network structure influences circadian rhythms in the suprachiasmatic nucleus (SCN) is essential for uncovering the principles of rhythmic robustness and synchronization. Previous studies using synthetic SCN networks suggested a size-dependent phenomenon, in which rhythmic activity initially strengthens with network size and then saturates, but it remains unclear whether this occurs in real SCN networks. Here, we apply geometric branch growth (GBG) and geometric renormalization (GR) to generate self-similar scaled-up and scaled-down replicas from a single-scale functional mouse SCN network. Unlike synthetic models, these SCN replicas do not exhibit size-dependent rhythms: average period, amplitude, and synchronization remain stable across scales. By increasing the average degree with network size, we reproduce size-dependent rhythms and show that they arise from network connectivity, whereas low-degree networks fragment and fail to sustain oscillations. Disrupting clustering self-similarity slightly reduces synchronization, but circadian rhythms remain robust, indicating that average degree, rather than clustering, is the dominant structural driver. These results highlight the resilience of SCN rhythms to network scaling and provide a framework for linking multi-scale network structure to biological timekeeping.

Parametric modal regression for right-censored positive responses

2026-03-07T08:19:35Z

We present a unified parametric framework for modal regression applicable to continuous positive distributions, with explicit support for right-censored observations. The key contribution is a systematic analytical reparameterization of density parameters as direct functions of the conditional mode. This closed-form mapping is derived for the Gamma, Beta, Weibull, Lognormal, and Inverse Gaussian distributions, directly linking the mode to a linear predictor. Maximum likelihood estimation is performed using the censored log-likelihood, with asymptotic inference based on the observed Fisher information matrix. A Monte Carlo simulation study across multiple distributions, sample sizes, and censoring levels confirms consistent parameter recovery. Empirical bias and RMSE decrease as expected, and Wald confidence intervals achieve nominal coverage. Finally, the proposed methodology is illustrated through an application to real-world reliability data. All methodology is implemented in the open-source R package ModalCens.

From Structural Equation Modeling to Targeted Learning: A Tutorial Introduction to Targeted Maximum Likelihood Estimation for SEM Researchers

2026-03-07T02:48:07Z

Structural equation modeling (SEM) and path analysis have long been central tools for studying complex causal relationships in the social and behavioral sciences, yet their reliance on parametric assumptions can lead to biased inference under model misspecification. To bridge traditional SEM with modern causal machine learning, this paper introduces targeted maximum likelihood estimation (TMLE), a doubly robust framework built on nonparametric structural equation modeling. We formally connect TMLE to classical path analysis, showing that standard SEM estimators arise as special cases of TMLE under restrictive parametric specifications and that both approaches can estimate common causal quantities such as direct, indirect, and total effects. Through simulation studies under both correctly specified and misspecified models, we demonstrate that while the two methods perform similarly when models are correctly specified, TMLE consistently achieves lower bias, reduced mean squared error, and improved confidence interval coverage when parametric assumptions are violated. We further illustrate these differences using an applied mediation analysis examining the role of poverty in access to high school education, where path analysis suggests a significant direct effect, whereas TMLE does not, highlighting the practical consequences of robustness in causal inference. Overall, this tutorial offers SEM researchers a conceptual and practical introduction to targeted learning, providing guidance on leveraging TMLE to enhance causal analysis beyond traditional parametric frameworks.

Adaptive Bi-Level Variable Selection of Conditional Main Effects for Generalized Linear Models

2026-03-06T20:44:43Z

Understanding interaction effects among variables is important for regression modeling in various applications. The conventional approach of quantifying interactions as the product of variables often lacks clear interpretability, especially in complex systems. The concept of conditional main effects (CME) provides a more intuitive and interpretable framework for capturing interaction effects by quantifying the effect of one variable conditional on the level of another. A recent method called cmenet further considered the bi-level selection of CMEs by leveraging their natural grouping structure (e.g., sibling and cousin groups) through penalization. However, there are several limitations in the cmenet method, including the coupling ability of penalties for within-group CMEs, lack of adaptiveness for between-group penalties, and restriction to linear models with continuous responses. To overcome these limitations, we propose an adaptive cmenet method for CME selection under the generalized linear model (GLM) framework. The proposed method considers a penalized likelihood approach with adaptive weights to enable effective bi-level variable selection, improving both between-group and within-group selection. An efficient algorithm for parameter estimation is also developed by employing an iteratively reweighted least squares procedure. The performance of the proposed method is evaluated by both simulation studies and real-data studies in gene association analysis.

Bayesian Transfer Learning for High-Dimensional Linear Regression via Adaptive Shrinkage

2026-03-06T20:23:43Z

We introduce BLAST, Bayesian Linear regression with Adaptive Shrinkage for Transfer, a Bayesian multi-source transfer learning framework for high-dimensional linear regression. The proposed analytical framework leverages global-local shrinkage priors together with Bayesian source selection to balance information sharing and regularization. We show how Bayesian source selection allows for the extraction of the most useful data sources, while discounting biasing information that may lead to negative transfer. In this framework, both source selection and sparse regression are jointly accounted for in prediction and inference via Bayesian model averaging. The structure of our model admits efficient posterior simulation via a Metropolis-within-Gibbs sampling algorithm allowing full posterior inference for the target regression coefficients, making BLAST both computationally practical and inferentially straightforward. Our method achieves more accurate posterior inference for the target than regularization approaches based on target data alone, while offering competitive predictive performance and superior uncertainty quantification compared to current state-of-the-art transfer learning methods. We validate its effectiveness through extensive simulation studies and illustrate its analytical properties when applied to a case study on the estimation of tumor mutational burden from gene expression, using data from The Cancer Genome Atlas (TCGA).

Diagnostics for Semiparametric Accelerated Failure Time Models with R Package afttest

2026-03-06T04:14:52Z

The semiparametric accelerated failure time (AFT) model offers a direct and interpretable alternative to the Cox proportional hazards model, yet practical diagnostic tools for this framework remain limited. We introduce afttest, an R package that implements martingale-residual-based goodness-of-fit procedures for semiparametric AFT models. In addition to the recently developed multiplier bootstrap diagnostics, the package introduces a new computationally efficient resampling strategy based on an influence-function linear approximation. Unlike the original approach, which requires repeatedly solving estimating equations for each bootstrap replicate, the proposed method avoids iterative optimization and substantially reduces computation time while preserving asymptotic validity. Both the standard multiplier bootstrap and the accelerated linear approximation are implemented, allowing users to balance finite-sample performance and computational scalability. The package supports rank-based and least-squares estimators, provides omnibus, link function, and functional form tests, and includes graphical tools for visualizing residual processes. An application to the Mayo Clinic primary biliary cirrhosis study illustrates the workflow.

Two Localization Strategies for Sequential MCMC Data Assimilation with Applications to Nonlinear Non-Gaussian Geophysical Models

2026-03-06T02:03:04Z

We present a localized data assimilation (DA) scheme based on the sequential Markov Chain Monte Carlo (SMCMC) technique [Ruzayqat et al., 2024], a provably convergent method for filtering high-dimensional, nonlinear, and potentially non-Gaussian state-space models. Unlike particle filters, which are exact methods for nonlinear non-Gaussian models, SMCMC does not assign weights to samples and therefore avoids weight degeneracy in small-ensemble regimes. We design two localization approaches within the SMCMC framework that exploit spatial sparsity of observations to reduce the effective degrees of freedom and improve efficiency. The first variant collects observed blocks into a single reduced domain and runs parallel MCMC chains over this combined region. The second variant further reduces the per-chain state dimension by decomposing the observed region into independent blocks, each augmented with a compact halo, and applying Gaspari--Cohn observation-noise tapering to smoothly down-weight distant observations. When the observation model is linear and Gaussian, we show that our approximate filtering density reduces to a Gaussian mixture from which independent samples can be drawn exactly. For nonlinear or non-Gaussian observation models, we employ an MCMC kernel. We test on high-dimensional ($d \sim 10^4 - 10^5$) state-space models, including a linear Gaussian model and a nonlinear multilayer shallow water equation with both linear and nonlinear observation operators. We consider Gaussian and non-Gaussian (Student-$t$) observation noise, showing that LSMCMC naturally handles heavy-tailed errors that cause ensemble Kalman methods to diverge. Observations include synthetic and real data from the Surface Water and Ocean Topography (SWOT) mission (NASA) and ocean drifter data (NOAA). We compare the two variants against each other and the local ensemble transform Kalman filter (LETKF).