https://arxiv.org/api/J79c3ZP3nNOFLp60Xgz6K6hZnF02026-03-28T08:56:07Z2591246015http://arxiv.org/abs/2503.08371v2Density Ratio-based Proxy Causal Learning Without Density Ratios2026-03-26T14:06:29ZWe address the setting of Proxy Causal Learning (PCL), which has the goal of estimating causal effects from observed data in the presence of hidden confounding. Proxy methods accomplish this task using two proxy variables related to the latent confounder: a treatment proxy (related to the treatment) and an outcome proxy (related to the outcome). Two approaches have been proposed to perform causal effect estimation given proxy variables; however only one of these has found mainstream acceptance, since the other was understood to require density ratio estimation - a challenging task in high dimensions. In the present work, we propose a practical and effective implementation of the second approach, which bypasses explicit density ratio estimation and is suitable for continuous and high-dimensional treatments. We employ kernel ridge regression to derive estimators, resulting in simple closed-form solutions for dose-response and conditional dose-response curves, along with consistency guarantees. Our methods empirically demonstrate superior or comparable performance to existing frameworks on synthetic and real-world datasets.2025-03-11T12:27:54ZAISTATS 2025 accepted, 81 pagesBariscan BozkurtBen DeanerDimitri MeunierLiyuan XuArthur Grettonhttp://arxiv.org/abs/2401.12546v3On Building Myopic MPC Policies using Supervised Learning2026-03-26T14:03:11ZThe application of supervised learning techniques in combination with model predictive control (MPC) has recently generated significant interest, particularly in the area of approximate explicit MPC, where function approximators like deep neural networks are used to learn the MPC policy via optimal state-action pairs generated offline. While the aim of approximate explicit MPC is to closely replicate the MPC policy, substituting online optimization with a trained neural network, the performance guarantees that come with solving the online optimization problem are typically lost. This paper considers an alternative strategy, where supervised learning is used to learn the optimal value function offline instead of learning the optimal policy. This can then be used as the cost-to-go function in a myopic MPC with a very short prediction horizon, such that the online computation burden reduces significantly without affecting the controller performance. This approach differs from existing work on value function approximations in the sense that it learns the cost-to-go function by using offline-collected state-value pairs, rather than closed-loop performance data. The cost of generating the state-value pairs used for training is addressed using a sensitivity-based data augmentation scheme.2024-01-23T08:08:09ZUpdated version available as arXiv:2508.05804Christopher A. OrricoBokan YangDinesh Krishnamoorthyhttp://arxiv.org/abs/2511.01464v2Split-Flows: Measure Transport and Information Loss Across Molecular Resolutions2026-03-26T13:56:20ZBy reducing resolution, coarse-grained models greatly accelerate molecular simulations, unlocking access to long-timescale phenomena, though at the expense of microscopic information. Recovering this fine-grained detail is essential for tasks that depend on atomistic accuracy, making backmapping a central challenge in molecular modeling. We introduce split-flows, a novel flow-based approach that reinterprets backmapping as a continuous-time measure transport across resolutions. Unlike existing generative strategies, split-flows establish a direct probabilistic link between resolutions, enabling expressive conditional sampling of atomistic structures and -- for the first time -- a tractable route to computing mapping entropies, an information-theoretic measure of the irreducible detail lost in coarse-graining. We demonstrate these capabilities on diverse molecular systems, including chignolin, a lipid bilayer, and alanine dipeptide, highlighting split-flows as a principled framework for accurate backmapping and systematic evaluation of coarse-grained models.2025-11-03T11:23:13ZSander HummerichTristan BereauUllrich Köthehttp://arxiv.org/abs/2603.25440v1The Symmetric Perceptron: a Teacher-Student Scenario2026-03-26T13:37:22ZWe introduce and solve a teacher-student formulation of the symmetric binary Perceptron, turning a traditionally storage-oriented model into a planted inference problem with a guaranteed solution at any sample density. We adapt the formulation of the symmetric Perceptron which traditionally considers either the u-shaped potential or the rectangular one, by including labels in both regions. With this formulation, we analyze both the Bayes-optimal regime at for noise-less examples and the effect of thermal noise under two different potential/classification rules. Using annealed and quenched free-entropy calculations in the high-dimensional limit, we map the phase diagram in the three control parameters, namely the sample density $α$, the distance between the origin and one of the symmetric hyperplanes $κ$ and temperature $T$, and identify a robust scenario where learning is organized by a second-order instability that creates teacher-correlated suboptimal states, followed by a first-order transition to full alignment. We show how this structure depends on the choice of potential, the interplay between metastability of the suboptimal solution and its melting towards the planted configuration, which is relevant for Monte Carlo-based optimization algorithms.2026-03-26T13:37:22Z19 pages, 6 figuresGiovanni CataniaAurélien DecelleSuhanee Korpehttp://arxiv.org/abs/2601.21747v2Temporal Sepsis Modeling: a Fully Interpretable Relational Way2026-03-26T13:31:54ZSepsis remains one of the most complex and heterogeneous syndromes in intensive care, characterized by diverse physiological trajectories and variable responses to treatment. While deep learning models perform well in the early prediction of sepsis, they often lack interpretability and ignore latent patient sub-phenotypes. In this work, we propose a machine learning framework by opening up a new avenue for addressing this issue: a relational approach. Temporal data from electronic medical records (EMRs) are viewed as multivariate patient logs and represented in a relational data schema. Then, a propositionalisation technique (based on classic aggregation/selection functions from the field of relational data) is applied to construct interpretable features to "flatten" the data. Finally, the flattened data is classified using a selective naive Bayesian classifier. Experimental validation demonstrates the relevance of the suggested approach as well as its extreme interpretability. The interpretation is fourfold: univariate, global, local, and counterfactual.2026-01-29T14:02:26ZVincent LemaireNédra MeloulliPierre Jaquethttp://arxiv.org/abs/2505.21545v3Corruption-Aware Training of Latent Video Diffusion Models for Robust Text-to-Video Generation2026-03-26T13:23:26ZLatent Video Diffusion Models (LVDMs) have achieved state-of-the-art generative quality for image and video generation; however, they remain brittle under noisy conditioning, where small perturbations in text or multimodal embeddings can cascade over timesteps and cause semantic drift. Existing corruption strategies from image diffusion (Gaussian, Uniform) fail in video settings because static noise disrupts temporal fidelity. In this paper, we propose CAT-LVDM, a corruption-aware training framework with structured, data-aligned noise injection tailored for video diffusion. Our two operators, Batch-Centered Noise Injection (BCNI) and Spectrum-Aware Contextual Noise (SACN), align perturbations with batch semantics or spectral dynamics to preserve coherence. CAT-LVDM yields substantial gains: BCNI reduces FVD by 31.9 percent on WebVid-2M, MSR-VTT, and MSVD, while SACN improves UCF-101 by 12.3 percent, outperforming Gaussian, Uniform, and even large diffusion baselines like DEMO (2.3B) and Lavie (3B) despite training on 5x less data. Ablations confirm the unique value of low-rank, data-aligned noise, and theory establishes why these operators tighten robustness and generalization bounds. CAT-LVDM thus sets a new framework for robust video diffusion, and our experiments show that it can also be extended to autoregressive generation and multimodal video understanding LLMs. Code, models, and samples are available at https://github.com/chikap421/catlvdm2025-05-24T20:11:14ZICLR 2026 ReALM-GENChika MaduabuchiHao ChenYujin HanJindong Wanghttp://arxiv.org/abs/2511.22344v2Cleaning the Pool: Progressive Filtering of Unlabeled Pools in Deep Active Learning2026-03-26T13:17:52ZExisting active learning (AL) strategies capture fundamentally different notions of data value, e.g., uncertainty or representativeness. Consequently, the effectiveness of strategies can vary substantially across datasets, models, and even AL cycles. Committing to a single strategy risks suboptimal performance, as no single strategy dominates throughout the entire AL process. We introduce REFINE, an ensemble AL method that combines multiple strategies without knowing in advance which will perform best. In each AL cycle, REFINE operates in two stages: (1) Progressive filtering iteratively refines the unlabeled pool by considering an ensemble of AL strategies, retaining promising candidates capturing different notions of value. (2) Coverage-based selection then chooses a final batch from this refined pool, ensuring all previously identified notions of value are accounted for. Extensive experiments across 6 classification datasets and 3 foundation models show that REFINE consistently outperforms individual strategies and existing ensemble methods. Notably, progressive filtering serves as a powerful preprocessing step that improves the performance of any individual AL strategy applied to the refined pool, which we demonstrate on an audio spectrogram classification use case. Finally, the ensemble of REFINE can be easily extended with upcoming state-of-the-art AL strategies.2025-11-27T11:28:18ZAccepted at CVPR 2026Denis HuseljicMarek HerdeLukas RauchPaul HahnBernhard Sickhttp://arxiv.org/abs/2603.25414v1Decidable By Construction: Design-Time Verification for Trustworthy AI2026-03-26T13:09:36ZA prevailing assumption in machine learning is that model correctness must be enforced after the fact. We observe that the properties determining whether an AI model is numerically stable, computationally correct, or consistent with a physical domain do not necessarily demand post hoc enforcement. They can be verified at design time, before training begins, at marginal computational cost, with particular relevance to models deployed in high-leverage decision support and scientifically constrained settings. These properties share a specific algebraic structure: they are expressible as constraints over finitely generated abelian groups $\mathbb{Z}^n$, where inference is decidable in polynomial time and the principal type is unique. A framework built on this observation composes three prior results (arXiv:2603.16437, arXiv:2603.17627, arXiv:2603.18104): a dimensional type system carrying arbitrary annotations as persistent codata through model elaboration; a program hypergraph that infers Clifford algebra grade and derives geometric product sparsity from type signatures alone; and an adaptive domain model architecture preserving both invariants through training via forward-mode coeffect analysis and exact posit accumulation. We believe this composition yields a novel information-theoretic result: Hindley-Milner unification over abelian groups computes the maximum a posteriori hypothesis under a computable restriction of Solomonoff's universal prior, placing the framework's type inference on the same formal ground as universal induction. We compare four contemporary approaches to AI reliability and show that each imposes overhead that can compound across deployments, layers, and inference requests. This framework eliminates that overhead by construction.2026-03-26T13:09:36Z18 pages, 1 figureHouston Hayneshttp://arxiv.org/abs/2603.21877v2P^2O: Joint Policy and Prompt Optimization2026-03-26T12:57:04ZReinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs). However, vanilla RLVR suffers from inefficient exploration, particularly when confronting "hard samples" that yield nearzero success rates. In such scenarios, the reliance on sparse outcome rewards typically results in zero-advantage estimates, effectively starving the model of supervision signals despite the high informational value of these instances. To address this, we propose P^2O, a novel framework that synergizes Prompt Optimization with Policy Optimization. P^2O identifies hard samples during training iterations and leverages the GeneticPareto (GEPA) prompt optimization algorithm to evolve prompt templates that guide the model toward discovering successful trajectories. Crucially, unlike traditional prompt engineering methods that rely on input augmentation, P^2O distills the reasoning gains induced by these optimized prompts directly into the model parameters. This mechanism provides denser positive supervision signals for hard samples and accelerates convergence. Extensive experiments demonstrate that P^2O not only achieves superior performance on in-distribution datasets but also exhibits strong generalization, yielding substantial improvements on out-of-distribution benchmarks (+4.7% avg.).2026-03-23T12:08:47ZXinyu LuKaiqi ZhangJinglin YangBoxi CaoYaojie LuHongyu LinMin HeXianpei HanLe Sunhttp://arxiv.org/abs/2601.18420v2Gradient Regularized Natural Gradients2026-03-26T12:55:21ZGradient regularization (GR) has been shown to improve the generalizability of trained models. While Natural Gradient Descent has been shown to accelerate optimization in the initial phase of training, little attention has been paid to how the training dynamics of second-order optimizers can benefit from GR. In this work, we propose Gradient-Regularized Natural Gradients (GRNG), a family of scalable second-order optimizers that integrate explicit gradient regularization with natural gradient updates. Our framework introduces two frequentist algorithms: Regularized Explicit Natural Gradient (RENG), which utilizes double backpropagation to explicitly minimize the gradient norm, and Regularized Implicit Natural Gradient (RING), which incorporates regularization implicitly into the update direction. We also propose a Bayesian variant based on a Regularized-Kalman formulation that eliminates the need for FIM inversion entirely. We establish convergence guarantees for GRNG, showing that gradient regularization improves stability and enables convergence to global minima. Empirically, we demonstrate that GRNG consistently enhances both optimization speed and generalization compared to first-order methods (SGD, AdamW) and second-order baselines (K-FAC, Sophia), with strong results on vision and language benchmarks.2026-01-26T12:25:04ZSatya Prakash DashHossein AbdiWei PanSamuel KaskiMingfei Sunhttp://arxiv.org/abs/2603.25403v1Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models2026-03-26T12:53:49ZOn-device Vision-Language Models (VLMs) promise data privacy via local execution. However, we show that the architectural shift toward Dynamic High-Resolution preprocessing (e.g., AnyRes) introduces an inherent algorithmic side-channel. Unlike static models, dynamic preprocessing decomposes images into a variable number of patches based on their aspect ratio, creating workload-dependent inputs. We demonstrate a dual-layer attack framework against local VLMs. In Tier 1, an unprivileged attacker can exploit significant execution-time variations using standard unprivileged OS metrics to reliably fingerprint the input's geometry. In Tier 2, by profiling Last-Level Cache (LLC) contention, the attacker can resolve semantic ambiguity within identical geometries, distinguishing between visually dense (e.g., medical X-rays) and sparse (e.g., text documents) content. By evaluating state-of-the-art models such as LLaVA-NeXT and Qwen2-VL, we show that combining these signals enables reliable inference of privacy-sensitive contexts. Finally, we analyze the security engineering trade-offs of mitigating this vulnerability, reveal substantial performance overhead with constant-work padding, and propose practical design recommendations for secure Edge AI deployments.2026-03-26T12:53:49Z13 pages, 8 figuresEyal HadadMordechai Gurihttp://arxiv.org/abs/2504.21419v3Kernel Density Machines2026-03-26T12:52:34ZWe introduce kernel density machines (KDM), an agnostic kernel-based framework for learning the Radon-Nikodym derivative (density) between probability measures under minimal assumptions. KDM applies to general measurable spaces and avoids the structural requirements common in classical nonparametric density estimators. We construct a sample estimator and prove its consistency and a functional central limit theorem. To enable scalability, we develop Nystrom-type low-rank approximations and derive optimal error rates, filling a gap in the literature where such guarantees for density learning have been missing. We demonstrate the versatility of KDM through applications to kernel-based two-sample testing and conditional distribution estimation, the latter enjoying dimension-free guarantees beyond those of locally smoothed methods. Experiments on simulated and real data show that KDM is accurate, scalable, and competitive across a range of tasks.2025-04-30T08:25:25ZAndrea Della VecchiaDamir FilipovicPaul Schneiderhttp://arxiv.org/abs/2603.25397v1A Causal Framework for Evaluating ICU Discharge Strategies2026-03-26T12:42:36ZIn this applied paper, we address the difficult open problem of when to discharge patients from the Intensive Care Unit. This can be conceived as an optimal stopping scenario with three added challenges: 1) the evaluation of a stopping strategy from observational data is itself a complex causal inference problem, 2) the composite objective is to minimize the length of intervention and maximize the outcome, but the two cannot be collapsed to a single dimension, and 3) the recording of variables stops when the intervention is discontinued. Our contributions are two-fold. First, we generalize the implementation of the g-formula Python package, providing a framework to evaluate stopping strategies for problems with the aforementioned structure, including positivity and coverage checks. Second, with a fully open-source pipeline, we apply this approach to MIMIC-IV, a public ICU dataset, demonstrating the potential for strategies that improve upon current care.2026-03-26T12:42:36Z8 pages, 2 figures, 2 tablesSagar Nagaraj SimhaJuliette OrtholandDave DongelmansJessica D. WorkumOlivier W. M. ThijssensAmeen Abu-HannaGiovanni Cinàhttp://arxiv.org/abs/2603.25385v1GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs2026-03-26T12:36:44ZQuantization techniques such as BitsAndBytes, AWQ, and GPTQ are widely used as a standard method in deploying large language models but often degrades accuracy when using low-bit representations, e.g., 4 bits. Low-rank correction methods (e.g., LQER, QERA, ASER) has been proposed to mitigate this issue, however, they restore all layers and insert error-correction modules into every decoder block, which increases latency and memory overhead. To address this limitation, we propose GlowQ, a group-shared low-rank approximation for quantized LLMs that caches a single shared right factor per input-sharing group and restores only the groups or layers that yield the highest accuracy benefit. GlowQ computes the high-precision projection once per input-sharing group and reuses it across its modules, reducing parameter and memory overhead, and retaining the expressivity of layer-specific corrections. We also propose a selective variant, GlowQ-S, that applies the cached shared module only where it provides the largest benefit. Compared with strong baselines, our approach reduces TTFB by (5.6%) and increases throughput by (9.6%) on average, while reducing perplexity on WikiText-2 by (0.17%) and increasing downstream accuracy by 0.42 percentage points. The selective model GlowQ-S further reduces latency, cutting TTFB by (23.4%) and increasing throughput by (37.4%), while maintaining accuracy within 0.2 percentage points on average.2026-03-26T12:36:44ZSelim AnIl hong SuhYeseong Kimhttp://arxiv.org/abs/2510.04900v2Benchmarking M-LTSF: Frequency and Noise-Based Evaluation of Multivariate Long Time Series Forecasting Models2026-03-26T12:36:25ZUnderstanding the robustness of deep learning models for multivariate long-term time series forecasting (M-LTSF) remains challenging, as evaluations typically rely on real-world datasets with unknown noise properties. We propose a simulation-based evaluation framework that generates parameterizable synthetic datasets, where each dataset instance corresponds to a different configuration of signal components, noise types, signal-to-noise ratios, and frequency characteristics. These configurable components aim to model real-world multivariate time series data without the ambiguity of unknown noise. This framework enables fine-grained, systematic evaluation of M-LTSF models under controlled and diverse scenarios. We benchmark four representative architectures S-Mamba (state-space), iTransformer (transformer-based), R-Linear (linear), and Autoformer (decomposition-based). Our analysis reveals that all models degrade severely when lookback windows cannot capture complete periods of seasonal patters in the data. S-Mamba and Autoformer perform best on sawtooth patterns, while R-Linear and iTransformer favor sinusoidal signals. White and Brownian noise universally degrade performance with lower signal-to-noise ratio while S-Mamba shows specific trend-noise and iTransformer shows seasonal-noise vulnerability. Further spectral analysis shows that S-Mamba and iTransformer achieve superior frequency reconstruction. This controlled approach, based on our synthetic and principle-driven testbed, offers deeper insights into model-specific strengths and limitations through the aggregation of MSE scores and provides concrete guidance for model selection based on signal characteristics and noise conditions.2025-10-06T15:16:52ZNumber of pages: 13 Number of figures: 16 Number of Tables: 1Nick JanssenMelanie SchallerBodo Rosenhahn