https://arxiv.org/api/Sclruoz1z+Y/kgaMTZ5qE2HtteM 2026-06-14T02:48:54Z 272646 585 15 http://arxiv.org/abs/2606.11415v1 Spatially Masked Regression Reveals Local and Distributed Predictability in Electrophysiological Recordings 2026-06-09T20:05:44Z

Neural recordings are often interpreted as local measurements, yet the signal at any one sensor can also reflect structured activity distributed across the broader network. This raises a basic question: to what extent does an electrode's signal reflect local versus distributed information in the underlying system? More specifically, how much of an electrode's activity is carried by its immediate neighborhood, and how much is embedded more broadly across the array? We address this with a Spatially Masked Regression (SMR) framework that reconstructs each electrode's timeseries from the remaining electrodes while excluding a configurable neighborhood around the target. By progressively increasing this mask, spatial locality becomes an experimental control for quantifying how much predictive information survives after nearby channels are withheld. We apply SMR to intracranial EEG with heterogeneous electrode coverage and to scalp EEG with standardized montages over sensorimotor cortex. Using distance correlation between original and reconstructed signals, we find strong within-subject reconstruction in both modalities, substantial residual predictability even when local neighbors are excluded, and markedly stronger cross-subject transfer in EEG than in iEEG. Masking shows that nearby electrodes contribute strongly to reconstruction but do not account for all of it, indicating that individual channels reflect both local redundancy and broader distributed structure. Surrogates that preserve selected marginal or spectral properties while disrupting phase structure or temporal ordering substantially reduce performance, supporting the conclusion that SMR depends on structured temporal and cross-channel organization rather than on marginal statistics alone. These results position SMR as an interpretable framework for quantifying the balance between local and distributed information in recordings.

2026-06-09T20:05:44Z Maryam Ostadsharif Memar Nima Dehghani http://arxiv.org/abs/2602.19502v2 Human-Guided Agentic AI for Multimodal Clinical Prediction: Lessons from the AgentDS Healthcare Benchmark 2026-06-09T20:01:46Z

Agentic AI systems are increasingly capable of autonomous data science workflows, yet clinical prediction tasks demand domain expertise that purely automated approaches struggle to provide. We investigate how human guidance of agentic AI can improve multimodal clinical prediction, presenting our approach to all three AgentDS Healthcare benchmark challenges: 30-day hospital readmission prediction (Macro-F1 = 0.8986), emergency department cost forecasting (MAE = $465.13), and discharge readiness assessment (Macro-F1 = 0.7939). Across these tasks, human analysts directed the agentic workflow at key decision points, multimodal feature engineering from clinical notes, scanned PDF billing receipts, and time-series vital signs; task-appropriate model selection; and clinically informed validation strategies. Our approach ranked 5th overall in the healthcare domain, with a 3rd-place finish on the discharge readiness task. Ablation studies reveal that human-guided decisions compounded to a cumulative gain of +0.065 F1 over automated baselines, with multimodal feature extraction contributing the largest single improvement (+0.041 F1). We distill three generalizable lessons: (1) domain-informed feature engineering at each pipeline stage yields compounding gains that outperform extensive automated search; (2) multimodal data integration requires task-specific human judgment that no single extraction strategy generalizes across clinical text, PDFs, and time-series; and (3) deliberate ensemble diversity with clinically motivated model configurations outperforms random hyperparameter search. These findings offer practical guidance for teams deploying agentic AI in healthcare settings where interpretability, reproducibility, and clinical validity are essential.

2026-02-23T04:37:45Z Presented at the Data Challenge track at the 14th IEEE International Conference on Healthcare Informatics (ICHI) 2026 on June 3, 2026 Lalitha Pranathi Pulavarthy Raajitha Muthyala Aravind V Kuruvikkattil Zhenan Yin Rashmita Kudamala Saptarshi Purkayastha http://arxiv.org/abs/2606.11409v1 Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models 2026-06-09T19:59:12Z

Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly treating all attacks as equally costly. In practice, the computational expense of different attack strategies can vary by orders of magnitude. Consequently, ASR at a fixed budget can obscure the true effort required to jailbreak a model, thereby making it hard to determine whether an attack's cost justifies its payoff to the attacker. We propose a compute-aware evaluation framework based on computational pressure, measured in cumulative floating-point operations (FLOPs), as a proxy for adversarial effort. We introduce risk-compute curves, which map compute budgets to attack risk, and derive two metrics that summarize the average pressure required for a given attack to succeed. Across ten models spanning three families and four different stages in language model training and alignment, evaluated with three attack strategies (gradient-based, iterative refinement, and template-based) on two jailbreak robustness benchmarks, we find: (1) alignment training has non-monotonic effects on compute-space robustness; (2) scaling model size reduces gradient-based attack effectiveness but has limited impact on cheaper template-based attacks; (3) gradient-based attacks optimized on a surrogate model can transfer to a separate target model, providing a way to reduce attacker costs; (4) compute cost varies by up to ${\approx}5{\times}$ across harm categories within a single model; and (5) safety-aligned RL increases aggregate cost while leaving some categories disproportionately accessible. We release our framework to enable compute-aware risk assessment and evaluation.

2026-06-09T19:59:12Z Malikeh Ehghaghi Boglárka Ecsedi Marsha Chechik Colin Raffel http://arxiv.org/abs/2505.03649v4 Weighted Random Dot Product Graphs 2026-06-09T19:54:19Z

Modeling of intricate relational patterns has become a cornerstone of contemporary statistical research and related data science fields. Networks, represented as graphs, offer a natural framework for this analysis. This paper extends the Random Dot Product Graph (RDPG) model to accommodate weighted graphs, markedly broadening the model's scope to scenarios where edges exhibit heterogeneous weight distributions. We propose a nonparametric weighted (W)RDPG model that assigns a sequence of latent positions to each node. Inner products of these nodal vectors specify the moments of their incident edge weights' distribution via moment-generating functions. In this way, and unlike prior art, the WRDPG can discriminate between weight distributions that share the same mean but differ in other higher-order moments. We derive statistical guarantees for an estimator of the nodal's latent positions adapted from the workhorse adjacency spectral embedding, establishing its consistency and asymptotic normality. We also contribute a generative framework that enables sampling of graphs that adhere to a (prescribed or data-fitted) WRDPG, facilitating, e.g., the analysis and testing of observed graph metrics using judicious reference distributions. The paper is organized to formalize the model's definition, the estimation (or nodal embedding) process and its guarantees, as well as the methodologies for generating weighted graphs, all complemented by illustrative and reproducible examples showcasing the WRDPG's effectiveness in various network analytic applications.

2025-05-06T15:57:00Z 30 pages, 12 figures, code to generate Figures 3 to 12 available at https://github.com/bmarenco/wrdpg. Updated to match the published version Electronic Journal of Statistics, 20(1), 2456-2499, 2026 Bernardo Marenco Paola Bermolen Marcelo Fiori Federico Larroca Gonzalo Mateos 10.1214/26-EJS2538 http://arxiv.org/abs/2410.24145v3 Projected random forests and conformal prediction of circular data 2026-06-09T19:53:34Z

We apply conformal prediction techniques to regression problems with circular responses, producing prediction sets with adaptive arc length and finite-sample coverage guarantees for any circular predictive model under the assumption of data exchangeability. Leveraging the high performance of existing predictive models designed for linear responses, we analyze a general projection procedure that converts any linear-response regression model into one suitable for circular responses. When random forests are used as base models in this projection procedure, we leverage the random forest out-of-bag mechanism to eliminate the need for a separate calibration sample in the construction of prediction sets. On synthetic and real datasets, the resulting projected random forest model produces more efficient out-of-bag conformal prediction sets, with shorter median arc length, than the split conformal prediction sets generated by two existing alternative models.

2024-10-31T17:05:52Z 7 pages; 4 figures Paulo C. Marques F. Rinaldo Artes Helton Graziadei http://arxiv.org/abs/2601.14792v2 Robustness of Mixtures of Experts to Feature Noise 2026-06-09T19:43:01Z

Despite their practical success, it remains unclear why Mixture of Experts (MoE) models can outperform dense networks beyond sheer parameter scaling. We study an iso-parameter regime where inputs exhibit latent modular structure but are corrupted by feature noise, a proxy for noisy internal activations. We show that sparse expert activation acts as a noise filter: compared to a dense estimator, MoEs achieve lower generalization error under feature noise, improved robustness to perturbations, and faster convergence speed. Empirical results on synthetic data and real-world language tasks corroborate the theoretical insights, demonstrating consistent robustness and efficiency gains from sparse modular computation.

2026-01-21T09:15:48Z ICML 2026 Dong Sun Rahul Nittala Rebekka Burkholz http://arxiv.org/abs/2601.10774v2 Analytic Bijections for Smooth and Interpretable Normalizing Flows 2026-06-09T19:36:35Z

A key challenge in normalizing flows is finding expressive invertible scalar bijections. Existing approaches face trade-offs: affine transformations are smooth and analytically invertible but lack expressivity; monotonic splines offer local control but are only piecewise smooth and act on bounded domains; residual flows achieve smoothness but need numerical inversion. We introduce three families of analytic bijections that are globally smooth ($C^\infty$), defined on all of $\mathbb{R}$, and analytically invertible in closed form, combining the favorable properties of prior approaches. Beyond serving as drop-in replacements in coupling flows, where they match or exceed spline performance, we develop radial flows: a novel architecture using direct parametrization that transforms the radial coordinate while preserving angular direction. Radial flows exhibit exceptional training stability, produce geometrically interpretable transformations, and on targets with radial structure can achieve comparable quality to coupling flows with $1000\times$ fewer parameters. We provide comprehensive evaluation on 1D and 2D benchmarks, and demonstrate applicability to higher-dimensional physics problems through experiments on $φ^4$ lattice field theory, where our bijections outperform affine baselines and enable problem-specific designs that address mode collapse.

2026-01-15T16:09:02Z Final ICML 2026 version. 9 + 14 pages, 10 + 11 figures, 3 + 2 tables. New CIFAR-10 and tabular-data results; main text shortened for readability Mathis Gerdes Miranda C. N. Cheng http://arxiv.org/abs/2505.13196v3 A Physics-Inspired Optimizer: Velocity Regularized Adam 2026-06-09T19:32:07Z

We introduce Velocity-Regularized Adam (VRAdam), a physics-inspired optimizer for training deep neural networks that draws on ideas from quartic terms for kinetic energy with its stabilizing effects on various system dynamics. Previous algorithms, including the ubiquitous Adam, operate at the so-called adaptive edge of stability regime during training, leading to rapid oscillations and slowed convergence of loss. However, VRAdam adds a higher order penalty on the learning rate based on the velocity such that the algorithm automatically slows down whenever weight updates become large. In practice, we observe that the effective dynamic learning rate shrinks in high-velocity regimes, and damping oscillations. By combining this velocity-based regularizer for global damping with per-parameter scaling of Adam, we create a powerful hybrid optimizer. For this optimizer, we provide rigorous theoretical analysis of operation at the edge of stability from a physical and control perspective for the momentum. Furthermore, we derive convergence bounds with the rate $\mathcal{O}(\ln(N)/\sqrt{N})$ for a stochastic non convex objective under mild assumptions. We demonstrate that VRAdam exceeds the performance against standard optimizers including AdamW. We benchmark various tasks such as image classification, language modeling, and generative modeling using diverse architectures and training methodologies including Convolutional Neural Networks (CNNs), Transformers, and GFlowNets.

2025-05-19T14:51:40Z L. Schorling and P. Vaidhyanathan contributed equally to this work. 20 pages, 10 figures Pranav Vaidhyanathan Lucas Schorling Natalia Ares Maike Osborne http://arxiv.org/abs/2606.11391v1 Recursive Binding on a Budget: Subspace Carving in Order-p Tensor Memories 2026-06-09T19:16:12Z

Tensor Product Representations provide the structural fidelity required for symbolic reasoning in models but suffer from exponential dimensionality growth when encoding deep recursive structures. Conversely, Vector Symbolic Architectures maintain constant dimensionality but sacrifice capacity and fidelity due to noisy compression via superposition. In this work, we propose Orthogonal Subspace Carving (OSC), a memory architecture that binds fillers to roles by projecting onto the null space of the role basis before aggregating into a fixed order-p tensor. OSC uses projections to enforce geometric orthogonality between bound structures within a static memory trace. We show that this mechanism decouples the tensor order from the structural depth, enabling deep recursive binding within a constant memory footprint. By performing retrieval via recognition, this construction allows for component vectors that are orders of magnitude smaller than the memory tensor, giving superior memory efficiency in settings involving high superposition. We also show that TPR is a special case of binding in Clifford algebra, and give a Clifford formulation of OSC.

2026-06-09T19:16:12Z 24 pages, 12 figures, 7 tables 43rd International Conference on Machine Learning 2026 Travis Pence Daisuke Yamada Vikas Singh http://arxiv.org/abs/2606.11390v1 A Scalable PyTorch Abstraction for Multi-GPU Gaussian Splatting 2026-06-09T19:15:39Z

Gaussian splatting methods have become increasingly popular for neural reconstruction of the real world. However, they are often limited in scale and resolution due to compute and memory constraints. We present a multi-GPU Gaussian splatting approach that scales reconstruction to higher resolutions and larger scenes while abstracting away the code complexity typically associated with distributing a model. To accomplish this, we propose a PyTorch backend that distributes the Gaussian parameters and splatting operators across GPUs via CUDA unified memory and NVLink. Because distribution occurs at the operator level, the model code requires no explicit cross-device communication. More broadly, the backend exposes multiple GPUs as an aggregate PyTorch device and supports other PyTorch operators. We demonstrate city-scale reconstructions with street-level detail consisting of over 1 billion Gaussian splats, more than 25 times as many as the current state of the art.

2026-06-09T19:15:39Z 14 pages, 6 tables, 2 figures, and 1 listing. Includes supplementary material Matthew Cong Francis Williams Jonathan Swartz Mark Harris Sanja Fidler Ken Museth http://arxiv.org/abs/2606.11387v1 Small Experiments, Cheaper Decisions: A Case Study in Staged Promotion for Micro-Pretraining 2026-06-09T19:10:54Z

Short pretraining runs can reduce experimental cost, but they can also over-promote configurations that only look strong at tiny budgets. We study an auditable staged-promotion protocol for a fixed micro-pretraining runner on two heterogeneous host blocks: Windows A100 and Linux L40S. Starting from twelve prior-screened configurations, we use staged budgets of 2 minutes, 5 minutes, 10 minutes, 60 minutes, and 12 hours, with frozen promotion rules before expensive continuations. The early screens are intentionally treated as unstable: the 5- and 10-minute rankings are host-sensitive, and the eventual 12-hour top-ranked condition is not the mean-best condition at the replicated 10-minute gate. Because seed ranges differ across stages, these changes are operational promotion evidence, not within-seed curves. A replicated 60-minute gate keeps the Staged Factorial Screening bridge reference in the promoted set, where it ranks first in all four 60-minute host-seed cells. In the final 12-hour confirmation package, the bridge condition ranks first in all four host-seed cells across two seeds; the greedy comparator does not meet the frozen 0.010 val_bpb near-equivalence rule; and the cheaper d8/ar48 (depth-8, aspect-48) sentinel does not meet the frozen 0.020 mean-gap rule. The executed 12-hour branch spends 144 GPU-hours, and the full staged protocol records 169.2 training GPU-hours including screening stages. Continuing all four 60-minute candidates would spend 192 GPU-hours, while continuing all nine replicated 10-minute candidates would spend 432 GPU-hours. The latter numbers are accounting counterfactuals for unrun continuations, not evidence that skipped candidates could not have overtaken the reference. The result is a bounded cost-allocation finding, not a claim of global optimality, capacity-normalized superiority, or superiority over adaptive hyperparameter optimization methods.

2026-06-09T19:10:54Z 14 pages, 5 figures; 12-hour dual-host micro-pretraining promotion study; source package includes curated ancillary artifacts Felipe Chavarro Polania http://arxiv.org/abs/2606.11382v1 GLACIER: A Multimodal Student-Teacher Foundation Model for Molecular Property Prediction 2026-06-09T19:05:58Z

Deep learning models facilitate the discovery of molecules with tailored properties among billions of candidate compounds. However, the computational burden to develop and deploy state-of-the-art models continuously increases, limiting their scalability. Most large-scale models are unimodal in nature and overlook the potential to leverage complementary molecular data modalities. To address these shortcomings, this paper introduces the Graph-Language Alignment for Chemical Inference and Exploration using Representations (GLACIER) model, a student-teacher framework that integrates molecular graphs, SMILES strings, and physicochemical descriptors to learn rich molecular embeddings. Our framework consists of three stages: (1) we pretrain three student encoders on 100,000 drug-like molecules: a message-passing neural network for molecular graphs, a transformer-based encoder for SMILES strings, and a multilayer perceptron for physicochemical descriptors, (2) we fuse these student modalities using a novel Finsler geometry-aware module, and (3) distill complementary knowledge from large teacher models, including MiniMol and MolFormer, into a single lightweight model via contrastive learning. We demonstrate that GLACIER is a robust framework that delivers high predictive performance and computational efficiency in complex molecular property prediction tasks. Our code is publicly available at https://github.com/eemokey/glacier.

2026-06-09T19:05:58Z Emily Nguyen Yongchan Hong Harsh Toshniwal Yan Liu Andreas Luttens http://arxiv.org/abs/2509.20241v2 Energy Use of AI Inference, Efficiency Pathways, and Test-Time Scaling 2026-06-09T19:01:15Z

As AI inference scales to billions of queries, estimates of per-query energy use are increasingly important for capacity planning, efficiency interventions, and policy. Yet many public estimates assume non-production settings, leading to systematic overestimation. We introduce a bottom-up framework estimating inference energy from token throughput, node power, and overhead under large-scale deployment assumptions. For frontier-scale models (>200B parameters) on H100 nodes, we estimate a median energy of 0.31 Wh/query (IQR 0.16-0.60), indicating widely cited estimates are overstated by 4-20x. In test-time scaling scenarios 15x longer than typical queries, the median energy rises 13x to 3.91 Wh (IQR 2.15-7.05). Across models, serving systems, and hardware, we estimate 8-20x line-of-sight energy reductions. At datacenter scale, serving 1 billion queries/day requires 0.7 GWh; if 10% are long queries, demand rises to 1.7 GWh/day. With efficiency interventions, it falls to 0.8 GWh/day, mitigating the energy impact of test-time scaling.

2025-09-24T15:32:01Z A preprint version with DOI is available at Zenodo: https://doi.org/10.5281/zenodo.17188770 Joule (2026) 102430 Felipe Oviedo Fiodar Kazhamiaka Esha Choukse Allen Kim Amy Luers Melanie Nakagawa Ricardo Bianchini Juan M. Lavista Ferres 10.1016/j.joule.2026.102430 http://arxiv.org/abs/2606.11375v1 When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis 2026-06-09T19:00:23Z

Standard linear probing declares a property "encoded" when a classifier on hidden states achieves high accuracy. The protocol works well on a snapshot but breaks across pre-training: probe accuracy saturates within the first few thousand steps, leaving most of training invisible to the instrument. We introduce fragility, a complementary per-layer metric defined as the activation-noise level at which probe accuracy collapses. Fragility is sensitive to both the margin of separability and the redundancy of representation, both of which keep evolving long after accuracy plateaus. Applied to open-checkpoint language models, fragility recovers structure that accuracy alone cannot see. Moralized representations emerge along a lexical $\to$ compositional gradient: lexical moral detection first, compositional moral encoding later. Because probe accuracy on its own tracks how lexically separable a dataset is, we establish the compositional encoding directly, by showing it transfers across construction types that share no contrast tokens. A layer-depth robustness gradient develops monotonically across training while accuracy stays flat. And matched fine-tuning corpora that produce identical probing accuracy leave distinct fragility fingerprints, showing that data curation reshapes probe robustness without changing probe accuracy. In every comparison we test, where probing accuracy returns a flat answer, fragility returns a structured one.

2026-06-09T19:00:23Z 22 pages, 5 figures. Code and datasets at https://github.com/deepsteer/deepsteer Orion Reblitz-Richardson http://arxiv.org/abs/2511.19314v2 PRInTS: Reward Modeling for Long-Horizon Information Seeking 2026-06-09T18:51:02Z

Information-seeking is a core capability for AI agents, requiring them to gather and reason over tool-generated information across long trajectories. However, such multi-step information-seeking tasks remain challenging for agents backed by language models. While process reward models (PRMs) can guide agents by ranking candidate steps at test-time, existing PRMs - designed for short reasoning with binary judgment - cannot capture richer dimensions of information-seeking steps, such as tool interactions and reasoning over tool outputs, nor handle the rapidly growing context in long-horizon tasks. To address these limitations, we introduce PRInTS, a generative PRM trained with dual capabilities: (1) dense scoring based on the PRM's reasoning across multiple dimensions of step quality (e.g., interpretation of tool outputs, tool call informativeness) and (2) trajectory summarization that compresses the growing context while preserving essential information for step evaluation. Extensive evaluations across FRAMES, GAIA (levels 1-3), and WebWalkerQA (easy-hard) benchmarks on multiple models reveal that best-of-n sampling with PRInTS enhances information-seeking in open-source models as well as specialized agents, matching or surpassing frontier models with a much smaller backbone agent and outperforming other strong reward modeling baselines.

2025-11-24T17:09:43Z ACL 2026, 19 pages, code: https://github.com/G-JWLee/PRInTS Jaewoo Lee Archiki Prasad Justin Chih-Yao Chen Zaid Khan Elias Stengel-Eskin Mohit Bansal