https://arxiv.org/api/HfqWI3OyqBWEIePJTitB0SgXpSs2026-06-21T12:27:01Z7851167515http://arxiv.org/abs/2606.04182v1Exact Unlearning in Reinforcement Learning2026-06-02T19:54:44ZWe formulate the problem of \emph{exact unlearning} in reinforcement learning, where the goal is to design an efficient framework that enables the removal of any user's data upon deletion request, i.e., the online learner's output after unlearning is \emph{indistinguishable} from what would have been produced had the deleted user never interacted with the learner. For any $ρ>0$, we show that there exists a reinforcement learning (RL) algorithm that is $ρ$-TV-stable and supports an exact unlearning procedure whose expected computational cost is only a $ρ\sqrt{\ln T}$ fraction of the computational cost of retraining from scratch. We construct such a $ρ$-TV-stable RL algorithm for tabular Markov decision processes (MDPs), which achieves a regret bound of $\mathcal{O}(H^2 \sqrt{SAT} + H^3 S^2 A + {H^{2.5} S^2 A}/ρ)$, where $S, A, H$, and $T$ denote the number of states, the number of actions, the episode horizon, and the number of episodes, respectively. We also establish a lower bound of $Ω(H\sqrt{\!SAT}\! +\! {SAH}/ρ)$ for $ρ$-TV-stable RL algorithms, showing that our algorithm is nearly minimax optimal.2026-06-02T19:54:44ZICML SpotlightThanh Nguyen-TangRaman Arorahttp://arxiv.org/abs/2606.09874v1Disjoint or Overlapping? Inference Windowing for Reconstruction-Based Time Series Anomaly Detection2026-06-02T19:42:30ZReconstruction-based methods are widely used for time series anomaly detection, where models are trained to reconstruct subsequences, and anomalies are identified through reconstruction errors. However, reported results are often hard to compare due to heterogeneous evaluation practices and underspecified inference procedures. In this paper, we revisit reconstruction-based anomaly detection in the univariate offline setting and study the role of the inference stride, which controls whether subsequences are processed as disjoint windows or with overlap. We propose a unified training, tuning, and multi-seed evaluation protocol on the curated TSB-AD benchmark, and study how overlapping inference affects anomaly detection performance for a range of reconstruction models, including PCA-based baselines, DLinear, an AutoEncoder, TimesNet, and Transformer variants. The results show that across all models, overlapping windows yield consistent improvements, with average relative gain up to +28%, and can alter method rankings. We further analyze variability across datasets, random seeds, and hyperparameter configurations. Finally, we complement the benchmark study with an evaluation on the full UCR archive using localization criteria aligned with sliding-window reconstruction. Overall, our results highlight that reconstruction-based anomaly detection performance depends not only on model architecture and training, but also on inference choices, motivating a clear and reproducible protocol. Our results show that reconstructionbased baselines achieve strong performance on both TSB-AD and UCR benchmarks, supporting them as competitive and practical approaches for univariate time series anomaly detection.2026-06-02T19:42:30ZGuillaume CoulaudUM, IROKOReza AkbariniaIROKOFlorent MassegliaIROKOhttp://arxiv.org/abs/2606.04176v1Low-rank Distributional Matrix Completion2026-06-02T19:41:07ZWe study a distributional generalization of the matrix completion problem in which each entry of the target matrix is a probability distribution rather than a scalar. In this setting, only a subset of matrix entries is observed, and even for observed entries, the underlying distributions are not directly accessible; instead, we observe finitely many samples drawn from them. To represent distributional entries, we employ kernel mean embeddings and introduce a notion of Tucker rank for distribution-valued matrices to capture their low-rank structure. The infinite-dimensional nature of kernel embeddings poses significant methodological challenges. To address this, we introduce functional unfolding operators that link the proposed distributional low-rank structure to the classical Tucker rank for finite-dimensional tensors. Based on this framework, we propose a novel estimator for distributional matrix completion. We establish non-asymptotic error bounds that characterize the statistical performance of the estimator. Extensive experiments on synthetic data and a real-world application demonstrate the effectiveness of the proposed method.2026-06-02T19:41:07ZJiayi WangRaymond K. W. Wonghttp://arxiv.org/abs/2107.01629v3From Live to Recording: Consumer Demand and Response to Price Across the Livestreaming Lifecycle2026-06-02T19:38:43ZLivestreaming has evolved into a thriving industry where creators can directly monetize and engage with their audiences and followers. In practice, creators and platforms typically concentrate their marketing efforts on the period leading up to the livestream. However, livestreaming events naturally transition into recorded formats once the event concludes, creating potential "residual" opportunities for monetization. This study systematically examines consumer demand for live events throughout the entire livestream life-cycle, using data from a large livestreaming platform that allows consumers to purchase the recorded version of a paid live event after the livestream ends. We find that the demand is surprisingly more price-sensitive during the pre-livestream period compared to the post-period. This is partly driven by two mechanisms: consumer self-selection (infrequent consumers who may have missed the live events exhibit a higher willingness to pay for recorded versions) and quality uncertainty (consumers face higher uncertainty in event quality during the pre-period than in the post-period). Our findings generate implications for the pricing and targeting strategies in livestreaming markets.2021-07-04T13:50:54ZAn earlier version of this paper was distributed under the title "The Role of 'Live' in Livestreaming Markets: Evidence Using Orthogonal Random Forest."Ziwei CongJia LiuPuneet Manchandahttp://arxiv.org/abs/2602.08142v2Variance-Gated Ensembles: An Epistemic-Aware Framework for Uncertainty Estimation2026-06-02T19:33:58ZMachine learning applications require fast and reliable per-sample uncertainty estimation. A common approach is to use predictive distributions from Bayesian or approximation methods and additively decompose uncertainty into aleatoric (i.e., data-related) and epistemic (i.e., model-related) components. However, additive decomposition has recently been questioned, with evidence that it breaks down when using finite-ensemble sampling and/or mismatched predictive distributions. This paper introduces Variance-Gated Ensembles (VGE), an intuitive, differentiable framework that injects epistemic sensitivity via a signal-to-noise gate computed from ensemble statistics. VGE provides: (i) a Variance-Gated Margin Uncertainty (VGMU) score that couples decision margins with ensemble predictive variance; and (ii) a Variance-Gated Normalization (VGN) layer that generalizes the variance-gated uncertainty mechanism to training via per-class, learnable normalization of ensemble member probabilities. We derive closed-form vector-Jacobian products enabling end-to-end training through ensemble sample mean and variance. VGE matches or exceeds state-of-the-art information-theoretic baselines while remaining computationally efficient. As a result, VGE provides a practical and scalable approach to epistemic-aware uncertainty estimation in ensemble models.2026-02-08T22:05:23ZPublished in Transactions on Machine Learning Research (06/2026)H. Martin GillisIsaac XuThomas Trappenberghttp://arxiv.org/abs/2601.18175v2Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success2026-06-02T19:09:34ZA widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a desired outcome, and updates the policy to imitate the actions taken along successful trajectories. This principle appears under many names -- rejection sampling with SFT, goal-conditioned RL, Decision Transformers -- yet what optimization problem it solves, if any, has remained unclear. We prove that success conditioning exactly solves a trust-region optimization problem, maximizing policy improvement subject to a $χ^2$ divergence constraint whose radius is determined automatically by the data. This yields an identity: relative policy improvement, the magnitude of policy change, and a quantity we call action-influence -- measuring how random variation in action choices affects success rates -- are exactly equal at every state. Success conditioning thus emerges as a conservative improvement operator. Exact success conditioning cannot degrade performance or induce dangerous distribution shift, but when it fails, it does so observably, by hardly changing the policy at all. We apply our theory to the common practice of return thresholding, showing this can amplify improvement, but at the cost of potential misalignment with the true objective.2026-01-26T05:54:39ZDaniel Russohttp://arxiv.org/abs/2604.14575v2Generative Augmented Inference2026-06-02T18:59:00ZLarge language models enable inexpensive AI-generated annotations, but using them reliably for causal inference remains challenging. Naively pooling AI and human data induces bias, while existing methods such as Prediction-Powered Inference (PPI; Angelopoulos et al., 2023a) treat AI outputs as proxies of true labels -- an assumption often violated for generative model outputs in practice. We propose Generative Augmented Inference (GAI), a framework that treats AI outputs as general, potentially high-dimensional informative features for learning human labels rather than as surrogates. GAI flexibly models this relationship using nonparametric methods, enabling consistent estimation and valid inference from combined human and AI data. We establish asymptotic normality and show that, under random labeling, GAI strictly improves asymptotic efficiency over human-data-only estimation whenever AI outputs are informative for true labels. Empirical studies on real-world datasets demonstrate that GAI significantly reduces estimation error and improves confidence interval quality across diverse generative data sources relative to human-only and PPI-based estimation.2026-04-16T03:10:37ZCheng LuMengxin WangDennis J. ZhangHeng Zhanghttp://arxiv.org/abs/2606.04110v1Variance Reduction for Heavy-Tailed Monetization Metrics in Ranking Experiments via Post-Stratification2026-06-02T18:14:14ZOnline evaluation of ranking and retrieval systems often relies on downstream monetization metrics such as app revenue or creator earnings. These metrics are typically heavy-tailed, with a small fraction of users dominating both mean and variance, leading to low statistical power and unreliable conclusions in A/B experiments -- especially under limited traffic.
We present a practical framework for variance reduction in online experiments by combining post-stratification with CUPED. Our approach leverages pre-experiment covariates to improve the sensitivity of monetization experiments without requiring additional traffic. Deployed at ShareChat across ranking-driven monetization experiments, the method substantially reduces variance and improves decision stability, achieving equivalent statistical confidence with ~45\% less traffic than standard metrics. We further discuss practical design choices, guardrails, and limitations, providing guidance on when post-stratification is appropriate for real-world information retrieval and Recommendation systems.2026-06-02T18:14:14ZAccepted as Industry Track paper in the 2026 ACM SIGIR Conference on Research and Development in Information RetrievalNeeti PokharnaOlivier JeunenYatharth SarafAleksei Ustimenko10.1145/3805712.3808428http://arxiv.org/abs/2605.18106v3Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers2026-06-02T17:59:42ZA striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its variants operate inherently coordinate-wise, rendering them unable to respect the equivariance structures of the parameter space. We address this disparity by introducing a symmetry-compatible principle for optimizer design: the gradient update rule should be equivariant under the symmetry group acting on the corresponding weight block. Following this principle, we first provide a unified perspective on bi-orthogonally equivariant updates for general matrix layers, as employed by stochastic spectral descent, Muon, Scion, and polar gradient methods. More importantly, by moving from orthogonal groups to permutation and shared-shift symmetries, we derive symmetry-compatible optimizers for parameter blocks whose symmetries differ from those of general matrix layers: embedding and LM head matrices, SwiGLU MLP projections, and MoE router matrices. These constructions include one-sided spectral, row-norm, hybrid row-norm/spectral, row-aware, column-aware, centered row-norm, and left-spectral updates. They yield an end-to-end layerwise optimizer stack in which each major matrix-valued parameter class is assigned an update whose equivariance matches its symmetry group. We corroborate this principle through pre-training experiments on dense and sparse MoE language models, including Qwen3-0.6B-style, Gemma 3 1B-style, OLMoE-1B-7B-style, and downsized gpt-oss architectures. Across these experiments, symmetry-compatible update rules consistently improve final validation loss, reduce load imbalance in sparse MoE models, and in several cases improve training stability over the corresponding AdamW updates.2026-05-18T09:17:26ZTim Tsz-Kit LauWeijie Suhttp://arxiv.org/abs/2112.11671v4Partial recovery and weak consistency in the non-uniform hypergraph Stochastic Block Model2026-06-02T17:49:49ZWe consider the community detection problem in sparse random hypergraphs under the non-uniform hypergraph stochastic block model (HSBM), a general model of random networks with community structure and higher-order interactions. When the random hypergraph has bounded expected degrees, we provide a spectral algorithm that outputs a partition with at least a $γ$ fraction of the vertices classified correctly, where $γ\in (0.5,1)$ depends on the signal-to-noise ratio (SNR) of the model. When the SNR grows slowly as the number of vertices goes to infinity, our algorithm achieves weak consistency, which improves the previous results in Ghoshdastidar and Dukkipati (2017) for non-uniform HSBMs.
Our spectral algorithm consists of three major steps: (1) Hyperedge selection: select hyperedges of certain sizes to provide the maximal signal-to-noise ratio for the induced sub-hypergraph; (2) Spectral partition: construct a regularized adjacency matrix and obtain an approximate partition based on singular vectors; (3) Correction and merging: incorporate the hyperedge information from adjacency tensors to upgrade the error rate guarantee. The theoretical analysis of our algorithm relies on the concentration and regularization of the adjacency matrix for sparse non-uniform random hypergraphs, which can be of independent interest.2021-12-22T05:38:33Z54 pagesCombinator. Probab. Comp. 34 (2025) 1-51Ioana DumitriuHai-Xiao WangYizhe Zhu10.1017/S0963548324000166http://arxiv.org/abs/2304.13139v4Optimal and exact recovery on the general nonuniform Hypergraph Stochastic Block Model2026-06-02T17:38:39ZConsider the community detection problem in random hypergraphs under the non-uniform hypergraph stochastic block model (HSBM), where each hyperedge appears independently with some given probability depending only on the labels of its vertices. We establish, for the first time in the literature, a sharp threshold for exact recovery under this non-uniform case, subject to minor constraints; in particular, we consider the model with multiple communities. One crucial point here is that by aggregating information from all the uniform layers, we may obtain exact recovery even in cases when this may appear impossible if each layer were considered alone. Besides that, we prove a wide-ranging, information-theoretic lower bound on the number of misclassified vertices \emph{for any algorithm}, depending on a \emph{generalized Chernoff-Hellinger} divergence involving model parameters. We provide two efficient algorithms which successfully achieve exact recovery when above the threshold, and attain the lowest possible mismatch ratio when the exact recovery is impossible, proved to be optimal. The theoretical analysis of our algorithms relies on the concentration and regularization of the adjacency matrix for non-uniform random hypergraphs, which could be of independent interest. We also address some open problems regarding parameter knowledge and estimation.2023-04-25T20:30:33Z74 pages. The arXiv version contains a more comprehensive literature review, whereas the journal version has been shortened due to the page limitAnn. Statist. 54(1): 48-73 (February 2026)Ioana DumitriuHai-Xiao Wang10.1214/25-AOS2524http://arxiv.org/abs/2501.01324v4Fast data inversion for high-dimensional Ornstein-Uhlenbeck processes from noisy measurements2026-06-02T16:49:03ZIn this work, we develop a scalable approach for a flexible latent factor model for high-dimensional dynamical systems. Each latent factor process has its own correlation and variance parameters, and the orthogonal factor loading matrix can be either fixed or estimated. We utilize an orthogonal factor loading matrix that avoids computing the inversion of the posterior covariance matrix at each time of the Kalman filter, and derive closed-form expressions in an expectation-maximization algorithm for parameter estimation, which substantially reduces the computational complexity without approximation. Our approach has several applications, including noise filtering for high-dimensional time series, estimating nonseparable covariance structure between different time series, and estimating latent physical processes from real-world measurements. Extensive simulated studies illustrate higher accuracy and scalability of our approach compared to alternatives. Furthermore, by applying our method to geodetic measurements to estimate slow slip events from geodetic data in the Cascadia region, our estimated slip better agrees with independently measured seismic data of tremor events. The substantial acceleration from our method enables the use of massive noisy data for geological hazard quantification and other applications.2025-01-02T16:25:57ZYizi LinXubo LiuPaul SegallMengyang Guhttp://arxiv.org/abs/2606.03878v1Privacy-Robust Incrementality Measurement for Advertising Systems under Signal Loss2026-06-02T16:46:38ZAdvertising platforms use randomized lift tests to measure incrementality, but privacy-preserving reporting systems degrade the observed signal through match-rate loss, linkability loss, attribution-window loss, aggregation-threshold suppression, randomized reporting noise, and segment-heterogeneous signal loss. This paper formulates privacy-constrained advertising measurement as a robust causal decision problem under the mentioned signal losses. Given a randomized experiment and an ambiguity set for privacy-induced degradation, the framework projects the observation-compatible fiber of clean/unfiltered experimental worlds onto the incrementality functional and returns certified, rejected, and unresolved decisions. The main result gives a sharp decision frontier. Reports outside the frontier support uniformly valid certification or rejection, whereas reports inside it contain too little information for any method to uniformly distinguish above-threshold incrementality from non-incrementality. Supporting results give finite-sample certification, sample-complexity guarantees, a minimax lower bound showing that signal loss reduces effective information, and a reporting-granularity tradeoff. On 2.0M Criteo Uplift rows and the 64K-row Hillstrom email experiment, clean conversion lift is positive in both datasets, with lifts 0.00112 and 0.00495, respectively. Population certification survives mild degradation in Criteo and severe degradation in Hillstrom, while all considered finite-sample stress settings in both datasets remain unresolved after simultaneous uncertainty and reporting noise are included. Overall, the research contributes a decision-theoretic layer for privacy-aware incrementality measurement whose output is the strongest causal-claim justified by degraded ads signals.2026-06-02T16:46:38ZPrashant ShekharCaroline Howardhttp://arxiv.org/abs/2406.05242v4Markov chain Monte Carlo without evaluating the target: an auxiliary variable approach2026-06-02T16:27:25ZIn sampling tasks, it is common for target distributions to be known up to a normalizing constant. However, in many situations, even evaluating the unnormalized distribution can be costly or infeasible. This issue arises in scenarios such as sampling from the Bayesian posterior for tall datasets and the `doubly-intractable' distributions. In this paper, we begin by observing that seemingly different Markov chain Monte Carlo (MCMC) algorithms, such as the exchange algorithm, PoissonMH, and TunaMH, can be unified under a simple common procedure. We then extend this procedure into a novel framework that allows the use of auxiliary variables in both the proposal and the acceptance--rejection step. Several new MCMC algorithms emerge from this framework that uses estimated gradients to guide the proposal moves. They have demonstrated significantly better performance than existing methods on both synthetic and real datasets. We also develop theory for the new framework and use it to simplify and extend results for existing algorithms. The code to reproduce the experimental results can be found at https://github.com/ywwes26/Auxiliary-MCMC.2024-06-07T20:06:23ZICML 2026 oral. Code: https://github.com/ywwes26/Auxiliary-MCMCWei YuanGuanyang Wanghttp://arxiv.org/abs/2601.22443v2Weak Diffusion Priors Can Still Achieve Strong Inverse-Problem Performance2026-06-02T16:16:48ZCan a diffusion model trained on bedrooms recover human faces? Diffusion models are widely used as priors for inverse problems, but standard approaches usually assume a high-fidelity model trained on data that closely match the unknown signal. In practice, one often must use a mismatched or low-fidelity diffusion prior. Surprisingly, these weak priors often perform nearly as well as full-strength, in-domain baselines. We study when and why inverse solvers are robust to weak diffusion priors. Through extensive experiments, we find that weak priors succeed when measurements are highly informative (e.g., many observed pixels), and we identify regimes where they fail. To explain this behavior, we combine Bayesian-consistency theory with local-correlation analysis: the theory gives conditions under which high-dimensional measurements make the posterior concentrate near the true signal, while the correlation analysis shows that weak and stronger natural-image priors can share similar local spatial structure. These results provide a principled justification on when weak diffusion priors can be used reliably. Code is available at https://github.com/jjia131/weak-diffusion-priors-inverse-problem.2026-01-30T01:25:54Z37 pages, ICML 2026 spotlight. Code: https://github.com/jjia131/weak-diffusion-priors-inverse-problem, Project Page: https://jjia131.github.io/weak-diffusion-priors-inverse-problem/Jing JiaWei YuanSifan LiuLiyue ShenGuanyang Wang