https://arxiv.org/api/GkSqvTWi5jgnY8AfTz1KWShIyeg2026-06-13T23:05:55Z7835416515http://arxiv.org/abs/2502.01226v4Adaptive Prior Selection in Gaussian Process Bandits with Thompson Sampling2026-06-08T08:31:27ZGaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the literature assume that this prior is known but in practice this seldom holds. Instead, practitioners often rely on maximum likelihood estimation to select the hyperparameters of the prior - which lacks theoretical guarantees. In this work, we study two algorithms for joint prior selection and regret minimization in GP bandits based on GP Thompson sampling (GP-TS): Prior-Elimination GP-TS (PE-GP-TS) that disqualifies priors with poor predictive performance, and HyperPrior GP-TS (HP-GP-TS) that utilizes a bi-level Thompson sampling scheme. We theoretically analyze the algorithms and establish a sublinear regret bound for HP-GP-TS. In addition, we demonstrate the effectiveness of these algorithms compared to the alternatives through extensive experiments with synthetic and real-world data.2025-02-03T10:29:35Z30 pages, 12 figuresJack SandbergMorteza Haghir Chehreghanihttp://arxiv.org/abs/2606.09191v1Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards2026-06-08T08:26:37ZWe prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any continuous risk functional $ρ$ (CVaR, mean-variance, Sharpe ratio, distortion risk measures, and more) on the class of distributions with bounded density and sub-Gaussian tails, including Gaussian arms. Both this result and its bounded-support counterpart require only continuity of $ρ$: strictly weaker than the dominance condition of prior parametric Thompson Sampling results, and strictly weaker than the Lipschitz condition of UCB-type algorithms, yielding the first instance-optimal guarantees for non-Lipschitz functionals such as the Sharpe ratio without parametric reward assumptions. The bounded-support case is developed first as a stepping stone sharing the same proof structure. The key technical contributions are a discretisation lemma (bounded support) and a truncated discretisation lemma (sub-Gaussian tails), each projecting the growing-alphabet Dirichlet posterior onto a fixed grid via the Dirichlet aggregation property, holding all polynomial prefactors at fixed degree independent of sample size and breaking the super-exponential barrier that blocked prior proofs.2026-06-08T08:26:37Z10 pages, 4 figuresJoel Q. L. Changhttp://arxiv.org/abs/2401.14591v11Ricci flow regularization in latent spaces for the forward learning of partial differential equations2026-06-08T07:14:30ZWe present a manifold-based machine learning encoder-decoder method for learning dynamics in time, notably partial differential equations (PDEs), in which the manifold latent space evolves according to Ricci flow. This can be accomplished by parameterizing the latent manifold stage and subsequently simulating Ricci flow in a physics-informed setting, matching manifold quantities so that Ricci flow is empirically achieved. We emphasize dynamics that admit low-dimensional representations. With our method, the manifold, induced by the metric, is discerned through the training procedure, while the latent evolution due to Ricci flow provides an accommodating representation. By use of this flow, we sustain a canonical manifold latent representation for all values in the ambient PDE time interval continuum. We showcase that the Ricci flow facilitates qualities such as learning for out-of-distribution data and adversarial robustness on select PDE data. Moreover, we provide a thorough expansion of our methods in regard to special cases which allow higher-dimensional representations, such as Ricci flow on the hypersphere and neural discovery of non-parametric geometric flows with entropic strategies.2024-01-26T01:36:48ZFixed a small error in appendix; some improvements to experimentsAndrew Gracykhttp://arxiv.org/abs/2606.01619v2ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL2026-06-08T06:27:26ZAgentic reinforcement learning (RL) enables LLM agents to improve continuously from environment rewards, yet the resulting policies do not systematically accumulate reusable strategies that generalize across tasks. Modular skills can provide such reusable strategies, yet existing skill-augmented RL methods decouple skill creation from policy optimization, risking adopting skills that conflict with the evolving policy. Inspired by Anthropic's Skill Creator, we introduce ReSkill, an RL-in-the-loop skill creation framework that reconciles skill evolution with policy learning. ReSkill exploits the group-wise structure of GRPO to naturally embed three mechanisms with only marginal additional overhead: (1) an assertion-driven skill creator that diagnoses failures from past experience and proposes conditional, trigger-based skill revisions; (2) within-group rollout sampling that enables controlled comparison of skill versions, capturing which version best supports the policy's ongoing learning; and (3) Thompson Sampling with adaptive discounting to balance exploration and exploitation in skill version selection as the policy evolves. Across several domains, ReSkill consistently outperforms existing memory and skill-based RL methods, with the largest gains on unseen tasks. Analysis of the skill lifecycle shows skills being automatically created, tested, refined, and pruned as the policy improves, demonstrating reconciled skill-policy co-evolution.2026-06-01T03:12:05ZZelin HeHaotian LinBoran HanWei ZhuHaoyang FangBernie WangXuan ZhuRunze LiMatthew Reimherrhttp://arxiv.org/abs/2603.10823v2ReTabSyn: Realistic Tabular Data Synthesis via Reinforcement Learning2026-06-08T06:00:31ZDeep generative models can help with data scarcity and privacy by producing synthetic training data, but they struggle in low-data, imbalanced tabular settings to fully learn the complex data distribution. We argue that striving for the full joint distribution could be overkill; for greater data efficiency, models should prioritize learning the conditional distribution $P(y\mid \bm{X})$, as suggested by recent theoretical analysis. Therefore, we overcome this limitation with \textbf{ReTabSyn}, a \textbf{Re}inforced \textbf{Tab}ular \textbf{Syn}thesis pipeline that provides direct feedback on feature correlation preservation during synthesizer training. This objective encourages the generator to prioritize the most useful predictive signals when training data is limited, thereby strengthening downstream model utility. We empirically fine-tune a language model-based generator using this approach, and across benchmarks with small sample sizes, class imbalance, and distribution shift, ReTabSyn consistently outperforms state-of-the-art baselines. Moreover, our approach can be readily extended to control various aspects of synthetic tabular data, such as applying expert-specified constraints on generated observations.2026-03-11T14:31:07ZXiaofeng LinSeungbae KimZhuoya LiZachary DeSotoCharles FlemingGuang Chenghttp://arxiv.org/abs/2606.09052v1INFUSER: Influence-Guided Self-Evolution Improves Reasoning2026-06-08T05:40:36ZSelf-evolution offers a scalable path to stronger reasoning: a pretrained language model improves itself with only minimal external supervision. Yet existing methods either depend on extensively curated or teacher-generated training data, or, when the generator runs unsupervised, reward it by a difficulty heuristic that need not improve the solver. We introduce INFUSER, an iterative co-training framework with two co-evolving roles: a Generator that drafts questions and reference golden answers from a pool of unstructured, automatically collected documents, and a Solver that improves by training on them. The solver is trained with standard correctness rewards against the generator-provided answers, while the generator is rewarded by an optimizer-aware influence score that measures whether each proposed question would actually improve the solver on the target distribution. Because this continuous, noisy influence score is poorly served by standard GRPO, we propose DuGRPO, a dual-normalized variant of GRPO, for generator training. Together, these turn the document pool into an adaptive curriculum that favors questions useful to the current solver, not just hard ones. On Qwen3-8B-Base, INFUSER outperforms strong self-evolution baselines with over 20% relative improvement on Olympiad and SuperGPQA benchmarks, and an 8B INFUSER co-evolving generator outperforms a frozen 32B thinking generator on math and coding. Ablations confirm each design choice is necessary, and two extensions, applying INFUSER to an instruction-finetuned anchor and augmenting it with rule-verifiable RLVR data, further demonstrate the flexibility and generalizability of the framework. Code is available at https://github.com/FFishy-git/INFUSER.2026-06-08T05:40:36Z66 pages, 17 figuresSiyu ChenMiao LuBeining WuHeejune SheenFengzhuo ZhangShuangning LiZhiyuan LiJose BlanchetTianhao WangZhuoran Yanghttp://arxiv.org/abs/2606.09049v1Data augmented bootstrap: Unifying confidence interval construction by approximate invariance2026-06-08T05:39:02ZWe propose the data augmented bootstrap (DAB), a framework for constructing confidence intervals from approximately invariant transformations of the data. As special cases, DAB recovers popular methods that rely on exact group symmetries, such as conformal prediction, wild bootstrap for Maximum Mean Discrepancy U-statistics and the recently proposed SymmPI. Meanwhile, DAB also recovers the classical bootstrap method, which exploits the dataset's approximate invariance under uniform sampling of data indices as the dataset size grows. For all DAB methods, we establish theoretical coverage results that interpolate between finite-sample and asymptotic guarantees according to the strength of the invariance, and without assuming a group structure. The approximate invariance is measured in the Kolmogorov distance and, for statistics that satisfy Gaussian universality, reduces to conditional mean and variance matching. This allows us to incorporate data augmentation (DA), a widely used machine learning heuristic based on approximate invariances, into known statistical methods. We empirically test the performance of incorporating DA into bootstrap, wild bootstrap and conformal prediction for simulated settings as well as for image, language and scientific data.2026-06-08T05:39:02ZKevin Han Huanghttp://arxiv.org/abs/2402.08922v3The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes2026-06-08T04:51:16ZLarge-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models.
In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches.
We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.2024-02-14T03:43:05ZThe IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024Myeongseob KoFeiyang KangWeiyan ShiMing JinZhou YuRuoxi Jiahttp://arxiv.org/abs/2510.09783v2Large Language Models for Imbalanced Classification: Diversity makes the difference2026-06-08T04:36:18ZOversampling is one of the most widely used approaches for addressing imbalanced classification. The core idea is to generate additional minority samples to rebalance the dataset. Most existing methods, such as SMOTE, require converting categorical variables into numerical vectors, which often leads to information loss. Recently, large language model (LLM)-based methods have been introduced to overcome this limitation. However, current LLM-based approaches typically generate minority samples with limited diversity, reducing robustness and generalizability in downstream classification tasks. To address this gap, we propose a novel LLM-based oversampling method designed to enhance diversity. First, we introduce a sampling strategy that conditions synthetic sample generation on both minority labels and features. Second, we develop a new permutation strategy for fine-tuning pre-trained LLMs. Third, we fine-tune the LLM not only on minority samples but also on interpolated samples to further enrich variability. Extensive experiments on 10 tabular datasets demonstrate that our method significantly outperforms eight SOTA baselines. The generated synthetic samples are both realistic and diverse. Moreover, we provide theoretical analysis through an entropy-based perspective, proving that our method encourages diversity in the generated samples.2025-10-10T18:45:29ZDang NguyenSunil GuptaKien DoThin NguyenTaylor BraundAlexis WhittonSvetha Venkateshhttp://arxiv.org/abs/2606.09012v1Understanding Quantization-Aware Training: Gradients at Quantized Weights Bias to the Low-Loss Basin2026-06-08T04:21:39ZPost-training quantization (PTQ) converts a trained full-precision model into low-bit weights without task-level retraining, while quantization-aware training (QAT) incorporates quantization into the training loop. Although PTQ is efficient and often accurate at moderate bitwidths, it can fail sharply at aggressive bitwidths; QAT is more expensive but can often recover the lost accuracy. We propose a unified geometric framework that explains both PTQ failure and QAT recovery. We model full-precision training as following a low-loss \emph{river} inside a wider \emph{valley}: a normal neighborhood of the river forms a nearly flat \emph{basin}, while leaving this basin incurs a sharp loss increase. When the quantization grid is comparable to the basin width, local PTQ objectives, including rounding and Hessian-based second-order reconstruction, can select a high-loss deployed quantized point outside the basin even when nearby low-loss quantized points exist. In this regime, straight-through-estimator-based QAT has a useful bias: it evaluates gradients at the deployed quantized weights while updating latent full-precision weights, causing the gradient to sense the valley wall and acquire an inward component that steers subsequent quantized iterates back into the basin. We formalize this mechanism through a local landscape model, construct a geometric PTQ failure mode, and prove finite-time QAT recovery under local quantizer-compatibility assumptions. Experiments across vision and language models under multiple neural-network quantization schemes corroborate the predicted basin-crossing failure of PTQ and the corresponding recovery mechanism of QAT.2026-06-08T04:21:39Z31 pages, 10 figuresHanyang LiJianhao MaYing Cuihttp://arxiv.org/abs/2606.09002v1Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees2026-06-08T03:58:38ZWe study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making regret against a single best arm in hindsight inappropriate. We instead evaluate performance relative to the best arm currently available, leading to a dynamic-regret criterion for arriving-arm environments. To address the resulting challenges of arrival information discrepancy (AID) and a drifting benchmark (DB), we propose UCB for Arriving Arms (UCB-AA), an elimination-based procedure with an aiding preliminary screening step for newly arrived arms before full competition with incumbent arms. We show that UCB-AA attains regret bounds that depend explicitly on the arrival process, achieves sublinear dynamic regret under regularity conditions on gap evolution, and admits an online extension for unknown horizons. Simulation results show that UCB-AA reduces wasted pulls and maintains a smaller active arm set while preserving competitive regret performance.2026-06-08T03:58:38Z24 pages, 4 figuresDeqi ZhengXiaoyang XuYuhong Yanghttp://arxiv.org/abs/2506.04480v2On the Wasserstein Geodesic Principal Component Analysis of probability measures2026-06-08T03:34:40ZThis paper focuses on Geodesic Principal Component Analysis (GPCA) on a collection of probability distributions using the Otto-Wasserstein geometry. The goal is to identify geodesic curves in the space of probability measures that best capture the modes of variation of the underlying dataset. We first address the case of a collection of Gaussian distributions, and show how to lift the computations in the space of invertible linear maps. For the more general setting of absolutely continuous probability measures, we leverage a novel approach to parameterizing geodesics in Wasserstein space with neural networks. Finally, we compare to classical tangent PCA through various examples and provide illustrations on real-world datasets.2025-06-04T22:00:43ZNina VesseronElsa CazellesAlice Le BrigantThierry Kleinhttp://arxiv.org/abs/2606.00419v3Parameter-Free and Group Conditional Online Conformal Prediction2026-06-08T03:33:21ZUncertainty quantification (UQ) is critical for the deployment of machine learning predictors in real-world scenarios where the data distribution may shift over time (i.e., data may not be exchangeable). Online conformal prediction (OCP) methods address this issue at the expense of either (i) group-wise error control or (ii) learning-rate independent implementation. Group-conditional coverage is essential for fairness across different collections of data points and for providing finer UQ guarantees. Parameter-free optimization is crucial for robustness to adversarial and unknown data shifts. We propose a parameter-free algorithm for group-conditional OCP and demonstrate that it achieves the best group-conditional coverage guarantees. We evaluate our algorithm on synthetic and real-world data, demonstrating that our method not only improves the reliability of existing parameter-free OCP methods but also provides prediction intervals that are comparable in size to well-tuned group-conditional approaches. By unifying group-conditional coverage with parameter-free online algorithms, our work lays a foundation for fair and robust uncertainty quantification in shifting environments.2026-05-29T23:15:04ZBeepul BhartiAmbar PalJacopo TeneggiJeremias Sulamhttp://arxiv.org/abs/2606.08941v1Estimate Collapsibility of Causal Effects in Completed Partial DAGs via Strong d-Convex Hulls2026-06-08T02:36:08ZThis paper proposes a collapsible method for estimating causal effects that maintains the estimator's consistency before and after marginalization over some variables in completed partially directed acyclic graphs (CPDAGs). We first introduce the estimate collapsibility for CPDAGs and characterize the minimal collapsible sets as strong d-convex hulls. An efficient algorithm is devised to obtain such sets in DAGs and is generalized to CPDAGs. Then, we combine the graph reduction procedure with the IDA framework. Finally, experiments and empirical analysis show the effectiveness of the collapsibility for causal estimations in CPDAGs. Code is available at https://github.com/Jamyang-D/strongly-convex.2026-06-08T02:36:08ZYuxin DengYi SunZhiming LiHuaxiong Liuhttp://arxiv.org/abs/2606.08934v1Backward Coherence and Hidden-State Stability in Recurrent Neural Networks: A Quasi-Reverse-Martingale Theory2026-06-08T02:20:29ZRecurrent neural networks maintain a hidden state $h_t$, but its probabilistic meaning is often unclear. We study hidden-state stability through \emph{backward coherence}: the extent to which $h_t$ can be reconstructed from $h_{t+1}$ by a learned backward projector $g_φ$. Under contraction and summable backward drift, the hidden-state sequence forms a quasi-reverse-martingale. This yields almost-sure convergence, rates under mixing, an interpretable limiting representation, finite pathwise stopping times, and a theoretical framework for time-uniform confidence sequences.
Simulations support the theory. Backward-coherence regularisation reduces the empirical quasi-martingale total $\hat Q$ by $43$--$58%$, reaches stability $28$--$44%$ earlier than an unregularised RNN, and gives tracking-error recovery consistent with geometric bounds. Additional tests confirm echo-state forgetting rates bounded by $ρ$ and verify the increment-sum tube $R_t$ with $100%$ simultaneous coverage, although $R_t$ is conservative; in practice, the defect-tail proxy $\hat Q_t$ is the more useful monitor. The backward-coherence loss is also equivalent to minimising a Kullback--Leibler divergence in a Gaussian backward model, linking the method to variational inference. Extensions cover $φ$-mixing inputs, change-point tracking, and finite-sample concentration.
Three real-data studies further validate the approach. On PhysioNet 2012 ICU data, the Reverse Martingale RNN (RMRNN) matches RNN mortality-prediction AUC while reaching stable representations 13 hours earlier. On FRED-MD, it reduces one-month-ahead forecast error by about fourfold under concept drift. On UCI Human Activity Recognition, it maintains lower post-transition tracking error with geometric decay. The guarantees apply under the stated assumptions; universality is not claimed.2026-06-08T02:20:29ZYuan-chin Ivan Chang