https://arxiv.org/api/JRQbZhEw/T3I5ByyDaMhqik4D70 2026-06-14T09:46:18Z 78354 315 15 http://arxiv.org/abs/2506.00188v2 Cluster-Aware Causal Mixer for Online Anomaly Detection in Multivariate Time Series 2026-06-04T04:48:12Z

Early and accurate detection of anomalies in time-series data is critical due to the substantial risks associated with false or missed detections. While MLP-based mixer models have shown promise in time-series analysis, they do not maintain temporal causality during data processing. Moreover, real-world multivariate time series often contain numerous channels with diverse inter-channel correlations. Spurious correlations in the reconstructed time series lead to noisy representations, resulting in inaccurate anomaly detection. In addition, anomaly scoring methods that ignore temporal continuity can mislead sequential detection. To address these challenges, we propose a cluster-aware causal mixer for multivariate time-series anomaly detection. Channels are grouped into clusters based on their correlations, and each cluster is embedded through a dedicated embedding layer. A causal mixer is introduced to integrate information while maintaining temporal causality. We further develop a sequential anomaly-scoring method that accumulates evidence over time and refines anomaly boundaries. Our proposed model operates in an online fashion, making it suitable for real-time time-series anomaly detection. Experimental evaluations across six public benchmark datasets demonstrate that the proposed approach consistently achieves superior performance.

2025-05-30T19:56:54Z Md Mahmuddun Nabi Murad Yasin Yilmaz http://arxiv.org/abs/2601.06655v2 Physics-constrained Gaussian Processes for Predicting Shockwave Hugoniot Curves 2026-06-04T04:10:19Z

A physics-constrained Gaussian Process regression framework is developed for predicting shocked material states and their associated uncertainties along the Hugoniot curve using data from a small number of shockwave simulations. The proposed Gaussian process is constrained by the Rankine-Hugoniot jump conditions between the various shocked material states to construct a thermodynamically consistent covariance function. This leads to the formulation of an optimization problem over a small number of interpretable hyperparameters and enables the identification of regime transitions, from a leading elastic wave to trailing plastic and phase transformation waves. Shock Hugoniots are an important measure for understanding material behavior under extreme conditions, including for the development of equations of state and determining material properties such as the Hugoniot Elastic Limit, but they are costly to generate through large-scale molecular dynamics simulations or shock experiments. Under these constraints, the proposed methodology establishes Hugoniot curves from a limited number of molecular dynamics simulations. We consider silicon carbide as a representative material and Molecular Dynamics simulations are performed using a reverse ballistic approach. The framework reproduces the Hugoniot curve with satisfactory accuracy while also quantifying the uncertainty in the predictions using the Gaussian Process posterior. These uncertain Hugoniot predictions can then be used to calibrate equation of state models, estimate material properties, or inform future experimental and/or simulation campaigns.

2026-01-10T18:54:24Z George D. Pasparakis Himanshu Sharma Rushik Desai Chunyu Li Alejandro Strachan Lori Graham-Brady Michael D. Shields http://arxiv.org/abs/2501.14291v3 Advances in Temporal Point Processes: Bayesian, Neural, and LLM Approaches 2026-06-04T03:54:20Z

Temporal point processes (TPPs) are stochastic process models used to characterize event sequences occurring in continuous time. Traditional statistical TPPs have a long-standing history, with numerous models proposed and successfully applied across diverse domains. In recent years, advances in deep learning have spurred the development of neural TPPs, enabling greater flexibility and expressiveness in capturing complex temporal dynamics. The emergence of large language models (LLMs) has further sparked excitement, offering new possibilities for modeling and analyzing event sequences by leveraging their rich contextual understanding. This survey presents a comprehensive review of recent research on TPPs from three perspectives: Bayesian, deep learning, and LLM approaches. We begin with a review of the fundamental concepts of TPPs, followed by an in-depth discussion of model design and parameter estimation techniques in these three frameworks. We also revisit classic application areas of TPPs to highlight their practical relevance. Finally, we outline challenges and promising directions for future research.

2025-01-24T07:13:26Z Feng Zhou Quyu Kong Jie Qiao Cheng Wan Yixuan Zhang Ruichu Cai http://arxiv.org/abs/2605.12951v2 Coreset-Induced Conditional Velocity Flow Matching 2026-06-04T03:21:02Z

We propose Coreset-Induced Conditional Velocity Flow Matching (CCVFM), a generative model that augments hierarchical rectified flow with a data-informed source distribution. Hierarchical flow matching models the full conditional velocity law in velocity space, but its inner flow is asked to transport isotropic Gaussian noise to a multimodal target velocity distribution from scratch. Our key observation is that this inner source can be replaced by a closed-form surrogate built from a coreset of the target. CCVFM first compresses the target into weighted atoms using an entropic Sinkhorn coreset and lifts them to a Gaussian mixture. The induced conditional velocity law is then a closed-form Gaussian mixture that can be sampled without a learned neural sampler. A lightweight correction flow, trained from this exact surrogate source, then refines the remaining surrogate-to-target residual rather than learning an entire noise-to-data map. We prove that the surrogate transport cost equals the target--surrogate Wasserstein gap under an explicit compression assumption, whereas the noise-source analogue has a dimension-scale lower bound. We further characterize the conditional second moment of the direct surrogate-source training target and show that its source-dependent excess is small when the surrogate conditional law is close to the true conditional velocity law in mean and covariance. Empirically, on MNIST, CIFAR-10, ImageNet-32, and CelebA-HQ, the proposed method reaches competitive few-step generation under matched architectures.

2026-05-13T03:34:40Z Xiao Wang Zihua She Jianxi Su http://arxiv.org/abs/2605.15454v2 Reasoning Models Don't Just Think Longer, They Move Differently 2026-06-04T02:57:35Z

Reasoning-trained language models often spend more tokens on harder problems, but longer chains of thought do not show whether a model is merely computing for more steps or following a different internal trajectory. We study this distinction through hidden-state trajectories during chain-of-thought generation across competitive programming, mathematics, and Boolean satisfiability. Raw trajectory geometry is strongly shaped by generation length: longer generations mechanically alter path statistics, so difficulty-dependent comparisons are misleading without adjustment. After residualizing trajectory statistics on length, difficulty remains systematically coupled to corrected trajectory geometry across all domains studied. The clearest reasoning-specific separation appears in the code domain, where harder problems show more direct corrected trajectories and less heterogeneous local curvature in reasoning-trained models than in matched instruction-tuned baselines. Corrected difficulty-geometry coupling is weaker, but still present, in mathematics and Boolean satisfiability. Prompt-stage linear probes do not mirror the code-domain separation, and behavioral annotations show that stronger corrected coupling co-occurs with strategy shifts and uncertainty monitoring. Together, these findings establish length correction as a prerequisite for generation-time trajectory analysis and show that reasoning training can be associated with distinct corrected trajectory geometry, with the strength of the effect depending on the domain.

2026-05-14T22:37:33Z Preprint Anders Gjølbye Lars Kai Hansen Sanmi Koyejo http://arxiv.org/abs/2606.05599v1 Mitigating the Curse of Dimensionality in Uniform Convergence of Deep Neural Networks via Smooth Activations 2026-06-04T02:24:50Z

This paper establishes a theoretical framework for the uniform convergence of smoothly activated deep neural network (DNN) estimators. While standard ReLU networks achieve minimax-optimal rates in the $L^2(P)$ norm for various nonparametric regression tasks, we establish a theoretical lower bound demonstrating that least-squares ReLU estimators can suffer from the curse of dimensionality in their uniform convergence behavior. Motivated by the need for reliable uniform guarantees in downstream tasks requiring worst-case reliability, we address this limitation by analyzing smoothly activated DNNs (smooth DNNs), encompassing both feedforward and residual structures. We establish novel pseudo-dimension bounds, non-asymptotic approximation guarantees, and Hölder-norm bounds for the approximators of these models. Leveraging these results, we derive non-asymptotic uniform convergence rates for smooth DNN estimators across multiple statistical contexts, including Huber, least-squares, quantile, and logistic regression. We prove that smooth DNNs can mitigate the {curse of dimensionality} in uniform convergence by adaptively exploiting the low-dimensional hierarchical composition structure of the target function. Supported by both simulation studies and a real-world application, our results position smooth DNNs as a theoretically grounded and practically viable alternative to ReLU networks for statistical learning tasks requiring uniform guarantees.

2026-06-04T02:24:50Z 30 pages, 5 figures Yizhe Ding Runze Li Jia Liu Lingzhou Xue http://arxiv.org/abs/2606.05560v1 Wasserstein Exponential Smoothing 2026-06-04T01:14:22Z

Exponential smoothing (ES) often outperforms other techniques in time series forecasting across a wide range of data-generating processes. While ES has traditionally been applied to time series in $\mathbb{R}$, this paper extends the methodology to distributional time series, where each observation is a probability distribution on $\mathbb{R}$. The primary contribution of this work is twofold. First, we propose a principled and intuitive generalization of ES within the Wasserstein space, which retains the exceptional parsimony of classical ES. Second, we theoretically and empirically demonstrate that the smoothing parameter can be consistently estimated by minimizing a Wasserstein distance. Applications to distributional time series of high-frequency financial returns and household electricity demands confirm the practical effectiveness of our Wasserstein ES model.

2026-06-04T01:14:22Z Takuo Matsubara Peiwen Jiang Minh-Ngoc Tran Wilson Ye Chen http://arxiv.org/abs/2603.11319v2 On the Robustness of Langevin Dynamics to Score Function Error 2026-06-04T01:00:47Z

We consider the robustness of score-based generative modeling to errors in the estimate of the score function. In particular, we show that Langevin dynamics is not robust to the $L^2$ errors (more generally $L^p$ errors) in the estimate of the score function. It is well-established that with small $L^2$ errors in the estimate of the score function, diffusion models can sample faithfully from the target distribution under fairly mild regularity assumptions in a polynomial time horizon. In contrast, our work shows that even for simple distributions in high dimensions, Langevin dynamics run for any polynomial time horizon will produce a distribution far from the target distribution in Total Variation (TV) distance, even when the $L^2$ error (more generally $L^p$) of the estimate of the score function is arbitrarily small. Considering such an error in the estimate of the score function is unavoidable in practice when learning the score function from data, our results provide further justification for diffusion models over Langevin dynamics and serve to caution against the use of Langevin dynamics with estimated scores.

2026-03-11T21:25:01Z ICML 2026 Daniel Yiming Cao August Y. Chen Karthik Sridharan Yuchen Wu http://arxiv.org/abs/2410.06326v3 Convex Estimation of Gaussian Graphical Regression Models with Covariates 2026-06-04T00:45:28Z

Gaussian graphical models (GGMs) are widely used to recover the conditional independence structure among random variables. Recent work has sought to incorporate auxiliary covariates to improve estimation, particularly in applications such as co-expression quantitative trait locus (eQTL) studies, where both gene expression levels and their conditional dependence structure may be influenced by genetic variants. Existing approaches to covariate-adjusted GGMs either restrict covariate effects to the mean structure or lead to nonconvex formulations when jointly estimating the mean and precision matrix. In this paper, we propose a convex framework that simultaneously estimates the covariate-adjusted mean and precision matrix via a natural parametrization of the multivariate Gaussian likelihood. The resulting formulation enables joint convex optimization and yields improved theoretical guarantees under high-dimensional scaling, where the sparsity and dimension of covariates grow with the sample size. We support our theoretical findings with numerical simulations and demonstrate the practical utility of the proposed method through a reanalysis of an eQTL study of glioblastoma multiforme and an analysis of diet on the human gut microbiome.

2024-10-08T20:02:10Z Ruobin Liu Guo Yu http://arxiv.org/abs/2606.03067v2 Trajectory-Aware Node Contributions and the Limits of Static Controllability 2026-06-04T00:13:05Z

A recurring data mining task in complex networks is to determine how individual nodes contribute to system behavior. Existing approaches rely on either static-graph centralities or control-theoretic quantities such as controllability Gramians, which assume linear, time-invariant dynamics. Estimated systems, however, are typically nonlinear and time-varying. We define "emergent contribution (EC)," a finite-horizon measure of a node's dynamical leverage: the metric-weighted energy of its impulse response accumulated along the system trajectory. Computed from the Jacobians of any differentiable model, EC is estimator-agnostic and reduces exactly to average controllability in the linear, time-invariant limit. Our contribution is a characterization of when the two measures agree and diverge. Using a controlled synthetic family with known ground-truth contribution, we construct a phase diagram spanning nonlinearity, regime structure, persistence, and perturbation amplitude. EC and average controllability agree under static or smoothly drifting dynamics and both track ground truth. Divergence emerges under persistent regime switching, is strongest under persistent sign reversal, and disappears when the sign reversal is removed. At extreme perturbation amplitudes, both measures degrade, identifying the limits of local linearization. We place five estimated real systems from several domains within this phase space. Their placement serves as a diagnostic of when EC provides information beyond static controllability and therefore justifies its additional computational cost. On one panel examined in depth, a twenty-seed retraining ensemble reveals a robust variance--leverage dissociation: nodes whose perturbations propagate widely despite low within-system variance, which is not recovered by static centralities nor variance-based summaries.

2026-06-02T02:56:36Z 11 pages, 1 figure Valentina Kuskova Dmitry Zaytsev Michael Coppedge http://arxiv.org/abs/2603.20980v3 From Causal Discovery to Dynamic Causal Inference in Neural Time Series 2026-06-04T00:09:54Z

Time-varying causal models provide a powerful framework for studying dynamic scientific systems, yet most existing approaches assume that the underlying causal network is known a priori - an assumption rarely satisfied in real-world domains where causal structure is uncertain, evolving, or only indirectly observable. This limits the applicability of dynamic causal inference in many scientific settings. We propose Dynamic Causal Network Autoregression (DCNAR), a two-stage neural causal modeling framework that integrates data-driven causal discovery with time-varying causal inference. In the first stage, a neural autoregressive causal discovery model learns a sparse directed causal network from multivariate time series. In the second stage, this learned structure is used as a structural prior for a time-varying neural network autoregression, enabling dynamic estimation of causal influence without requiring pre-specified network structure. We evaluate the scientific validity of DCNAR using behavioral diagnostics that assess causal necessity, temporal stability, and sensitivity to structural change, rather than predictive accuracy alone. Experiments on multi-country panel time-series data demonstrate that learned causal networks yield more stable and behaviorally meaningful dynamic causal inferences than coefficient-based or structure-free alternatives, even when forecasting performance is comparable. These results position DCNAR as a general framework for using AI as a scientific instrument for dynamic causal reasoning under structural uncertainty.

2026-03-21T23:53:53Z 11 pages, 2 figures Dmitry Zaytsev Valentina Kuskova Michael Coppedge 10.1145/3770855.3818956 http://arxiv.org/abs/2602.01607v3 Minimax optimal differentially private synthetic data for smooth queries 2026-06-04T00:07:53Z

Differentially private synthetic data enables the sharing and analysis of sensitive datasets while providing rigorous privacy guarantees for individual contributors. A central challenge is to achieve strong utility guarantees for meaningful downstream analysis. Many existing methods ensure uniform accuracy over broad query classes, such as all Lipschitz functions, but this level of generality often leads to suboptimal rates for statistics of practical interest. Since many common data analysis queries exhibit smoothness beyond what worst-case Lipschitz bounds capture, we ask whether exploiting this additional structure can yield improved utility. We study the problem of generating $(\varepsilon,δ)$-differentially private synthetic data from a dataset of size $n$ supported on the hypercube $[-1,1]^d$, with utility guarantees uniformly for all smooth queries having bounded derivatives up to order $k$. We propose a polynomial-time algorithm that achieves a minimax error rate of $O_{k,d}(n^{-\min \{1, \frac{k}{d}\}})$, up to a $\log(n)$ factor. This characterization uncovers a phase transition at $k=d$. Our results generalize the Chebyshev moment matching framework of (Musco et al., 2025; Wang et al., 2016) and strictly improve the error rates for $k$-smooth queries established in \citep{wang2016differentially}. Moreover, we establish the first minimax lower bound for the utility of $(\varepsilon,δ)$-differentially private synthetic data with respect to $k$-smooth queries, extending the Wasserstein lower bound for $\varepsilon$-differential privacy in (Boedihardjo et al., 2024).

2026-02-02T03:54:11Z COLT 2026 arXiv version. 34 pages Rundong Ding Yiyun He Yizhe Zhu http://arxiv.org/abs/2510.10968v3 Blade: A Derivative-free Bayesian Inversion Method using Diffusion Priors 2026-06-03T23:23:40Z

Derivative-free Bayesian inversion arises in science and engineering applications, particularly when forward model is costly or infeasible to differentiate through. Existing derivative-free methods collapse the posterior to a point estimate or return severely over-confident uncertainty on high-dimensional, nonlinear problems. We introduce Blade, which produces accurate and well-calibrated posteriors using an ensemble of interacting particles. Blade leverages diffusion models as data-driven priors, and only queries the forward model through forward evaluations (i.e., derivative-free). Theoretically, we show the convergence and stability of Blade under forward model approximation and prior score estimation error. Empirically, on nonlinear fluid dynamics, Blade produces well-calibrated posterior samples that existing derivative-free methods cannot, as measured by CRPS, the spread-skill ratio, and the rank histogram. Its accuracy and calibration improve consistently with more iterations and particles, backed by our convergence and stability analysis and empirical experiments.

2025-10-13T03:19:44Z Hongkai Zheng Austin Wang Zihui Wu Zhengyu Huang Ricardo Baptista Yisong Yue http://arxiv.org/abs/2606.05488v1 Sparse Functional Singular Value Decomposition for Biclustering and Triclustering Longitudinal Data 2026-06-03T22:26:01Z

Identifying subtypes of complex conditions, such as Inflammatory Bowel Disease (IBD), often requires capturing latent patterns in longitudinal omics data. However, these data are typically high-dimensional, sparsely sampled, and irregularly observed over time, posing substantial challenges for conventional (bi)clustering and functional data analysis methods. We propose Tri-SfSVD, a unified sparse functional Singular Value Decomposition framework for discovering biclusters and triclusters in longitudinal data. Unlike existing functional biclustering methods that rely on ad hoc imputation or enforce restrictive shape-homogeneity assumptions, Tri-SfSVD integrates continuous trajectory estimation with simultaneous subject, feature, and temporal selection within a single optimization framework. By imposing sparse penalties across subjects, variables, and temporal subregions, the proposed method works directly on observed data to uncover localized structures at the subject, subject-feature, and subject-feature-time levels. Extensive simulations demonstrate that Tri-SfSVD outperforms existing approaches in high-dimensional settings. Applied to IBD multi-omics data, the method identified three biclusters linking sample clusters with distinct IBD-related clinical characteristics to microbial pathway groups associated with specific bacterial taxa, providing interpretable subject-pathway associations for characterizing disease heterogeneity. Applied to multi-channel EEG data, the method identified three triclusters linking sample clusters with distinct alcohol-related phenotypes to localized brain activity patterns, including subgroup differences separated by temporal subregions within the same spatial region.

2026-06-03T22:26:01Z Yue Zhao Thierry Chekouo Sandra Safo http://arxiv.org/abs/2507.12257v4 Robust Causal Discovery in Real-World Time Series with Power-Laws 2026-06-03T21:53:44Z

Exploring causal relationships in stochastic time series is a challenging yet crucial task with a vast range of applications, including finance, economics, neuroscience, and climate science. Many algorithms for Causal Discovery (CD) have been proposed; however, they often exhibit a high sensitivity to noise, resulting in spurious causal inferences in real data. In this paper, we observe that the frequency spectra of many real-world time series follow a power-law distribution, notably due to an inherent self-organizing behavior. Leveraging this insight, we build a robust CD method based on the extraction of power-law spectral features that amplify genuine causal signals. Our method consistently outperforms state-of-the-art alternatives on both synthetic benchmarks and real-world datasets with known causal structures, demonstrating its robustness and practical relevance.

2025-07-16T14:02:21Z Matteo Tusoni Giuseppe Masi Andrea Coletta Aldo Glielmo Viviana Arrigoni Novella Bartolini