https://arxiv.org/api/Ok79O9bZL+kP0U4EcvLfXofwA0c 2026-04-09T17:24:47Z 171213 210 15 http://arxiv.org/abs/2604.06518v1 Adaptive Differential Privacy for Federated Medical Image Segmentation Across Diverse Modalities 2026-04-07T23:18:20Z Large volumes of medical data remain underutilized because centralizing distributed data is often infeasible due to strict privacy regulations and institutional constraints. In addition, models trained in centralized settings frequently fail to generalize across clinical sites because of heterogeneity in imaging protocols and continuously evolving data distributions arising from differences in scanners, acquisition parameters, and patient populations. Federated learning offers a promising solution by enabling collaborative model training without sharing raw data. However, incorporating differential privacy into federated learning, while essential for privacy guarantees, often leads to degraded accuracy, unstable convergence, and reduced generalization. In this work, we propose an adaptive differentially private federated learning (ADP-FL) framework for medical image segmentation that dynamically adjusts privacy mechanisms to better balance the privacy-utility trade-off. The proposed approach stabilizes training, significantly improves Dice scores and segmentation boundary quality, and maintains rigorous privacy guarantees. We evaluated ADP-FL across diverse imaging modalities and segmentation tasks, including skin lesion segmentation in dermoscopic images, kidney tumor segmentation in 3D CT scans, and brain tumor segmentation in multi-parametric MRI. Compared with conventional federated learning and standard differentially private federated learning, ADP-FL consistently achieves higher accuracy, improved boundary delineation, faster convergence, and greater training stability, with performance approaching that of non-private federated learning under the same privacy budgets. These results demonstrate the practical viability of ADP-FL for high-performance, privacy-preserving medical image segmentation in real-world federated settings. 2026-04-07T23:18:20Z 10 pages, 8 figures. Accepted in SPIE Medical Imaging 2026. Recipient of CAD Best Paper Award: 1st Place, and Robert F. Wagner All-Conference Best Paper Award: Finalist Proceedings Volume 13926, SPIE Medical Imaging 2026: Computer-Aided Diagnosis Puja Saha Eranga Ukwatta 10.1117/12.3075111 http://arxiv.org/abs/2604.06515v1 Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees 2026-04-07T23:17:23Z Sparse Mixture-of-Experts (MoE) allows scaling of language and vision models efficiently by activating only a small subset of experts per input. While this reduces computation, the large number of parameters still incurs substantial memory overhead during inference. Post-training quantization has been explored to address this issue. Because uniform quantization suffers from significant accuracy loss at low bit-widths, mixed-precision methods have been recently explored; however, they often require substantial computation for bit-width allocation and overlook the varying sensitivity of model performance to the quantization of different experts. We propose a theoretically grounded expert-wise mixed precision strategy that assigns bit-width to each expert primarily based on their change in routers l2 norm during training. Experts with smaller changes are shown to capture less frequent but critical features, and model performance is more sensitive to the quantization of these experts, thus requiring higher precision. Furthermore, to avoid allocating experts to lower precision that inject high quantization noise, experts with large maximum intra-neuron variance are also allocated higher precision. Experiments on large-scale MoE models, including Switch Transformer and Mixtral, show that our method achieves higher accuracy than existing approaches, while also reducing inference cost and incurring only negligible overhead for bit-width assignment. 2026-04-07T23:17:23Z The Fourteenth International Conference on Learning Representations, 2026 Mohammed Nowaz Rabbani Chowdhury Kaoutar El Maghraoui Hsinyu Tsai Naigang Wang Geoffrey W. Burr Liu Liu Meng Wang http://arxiv.org/abs/2604.06505v1 MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts 2026-04-07T22:34:02Z Large language models (LLMs) are widely explored for reasoning-intensive research tasks, yet resources for testing whether they can infer scientific conclusions from structured biomedical evidence remain limited. We introduce $\textbf{MedConclusion}$, a large-scale dataset of $\textbf{5.7M}$ PubMed structured abstracts for biomedical conclusion generation. Each instance pairs the non-conclusion sections of an abstract with the original author-written conclusion, providing naturally occurring supervision for evidence-to-conclusion reasoning. MedConclusion also includes journal-level metadata such as biomedical category and SJR, enabling subgroup analysis across biomedical domains. As an initial study, we evaluate diverse LLMs under conclusion and summary prompting settings and score outputs with both reference-based metrics and LLM-as-a-judge. We find that conclusion writing is behaviorally distinct from summary writing, strong models remain closely clustered under current automatic metrics, and judge identity can substantially shift absolute scores. MedConclusion provides a reusable data resource for studying scientific evidence-to-conclusion reasoning. Our code and data are available at: https://github.com/Harvard-AI-and-Robotics-Lab/MedConclusion. 2026-04-07T22:34:02Z Weiyue Li Ruizhi Qian Yi Li Yongce Li Yunfan Long Jiahui Cai Yan Luo Mengyu Wang http://arxiv.org/abs/2602.22251v3 Zatom-1: A Multimodal Flow Foundation Model for 3D Molecules and Materials 2026-04-07T22:30:32Z General-purpose 3D chemical modeling encompasses molecules and materials, requiring both generative and predictive capabilities. However, most existing AI approaches are optimized for a single domain (molecules or materials) and a single task (generation or prediction), which limits representation sharing and transfer. We introduce Zatom-1, the first end-to-end, fully open-source foundation model that unifies generative and predictive learning of 3D molecules and materials. Zatom-1 is a Transformer trained with a multimodal flow matching objective that jointly models discrete atom types and continuous 3D geometries. This approach supports scalable pretraining with predictable gains as model capacity increases, while enabling fast and stable sampling. We use joint generative pretraining as a universal initialization for downstream multi-task prediction of properties, energies, and forces. Empirically, Zatom-1 matches or outperforms specialized baselines on both generative and predictive benchmarks, while reducing the generative inference time by more than an order of magnitude. Our experiments demonstrate positive predictive transfer between chemical domains from joint generative pretraining: modeling materials during pretraining improves molecular property prediction accuracy. Open-source code: https://github.com/Zatom-AI/zatom 2026-02-24T20:52:39Z 32 pages, 10 figures, 15 tables. ICLR 2026 FM4Science. Code, data, and model weights are available at https://github.com/Zatom-AI/zatom Alex Morehead Miruna Cretu Antonia Panescu Rishabh Anand Maurice Weiler Tynan Perez Samuel Blau Steven Farrell Wahid Bhimji Anubhav Jain Hrushikesh Sahasrabuddhe Pietro Lio Tommi Jaakkola Rafael Gomez-Bombarelli Rex Ying N. Benjamin Erichson Michael W. Mahoney http://arxiv.org/abs/2604.04384v2 Compressible Softmax-Attended Language under Incompressible Attention 2026-04-07T22:10:07Z Softmax attention defines an interaction through $d_h$ head dimensions, but not all dimensions carry equal weight once real text passes through. We decompose the attention logit field into a learned component and a generated component and measure their spectra separately. For all 5,888 KV heads in five transformer language models (124M--7B parameters, four architecture families), the logit energy field $\tilde{E}$ reaches 90\% of its variance in 2--11 singular components. The learned interaction matrix $W_Q^\mathrm{T} W_K$ needs 38--75 components for the same threshold out of $d_h \in {64, 128}$. The spectral gap is 5--25$\times$ in effective rank. The compressibility of softmax-attended language is a property of the data, not the frame that analyzes it. 2026-04-06T03:18:27Z 6 pages Wonsuk Lee http://arxiv.org/abs/2604.04563v2 Temporal Inversion for Learning Interval Change in Chest X-Rays 2026-04-07T22:05:38Z Recent advances in vision--language pretraining have enabled strong medical foundation models, yet most analyze radiographs in isolation, overlooking the key clinical task of comparing prior and current images to assess interval change. For chest radiographs (CXRs), capturing interval change is essential, as radiologists must evaluate not only the static appearance of findings but also how they evolve over time. We introduce TILA (Temporal Inversion-aware Learning and Alignment), a simple yet effective framework that uses temporal inversion, reversing image pairs, as a supervisory signal to enhance the sensitivity of existing temporal vision-language models to directional change. TILA integrates inversion-aware objectives across pretraining, fine-tuning, and inference, complementing conventional appearance modeling with explicit learning of temporal order. We also propose a unified evaluation protocol to assess order sensitivity and consistency under temporal inversion, and introduce MS-CXR-Tretrieval, a retrieval evaluation set constructed through a general protocol that can be applied to any temporal CXR dataset. Experiments on public datasets and real-world hospital cohorts demonstrate that TILA consistently improves progression classification and temporal embedding alignment when applied to multiple existing architectures. 2026-04-06T09:52:26Z 10 pages, 5 figures Hanbin Ko Kyungmin Jeon Doowoong Choi Chang Min Park http://arxiv.org/abs/2604.06495v1 Improving Robustness In Sparse Autoencoders via Masked Regularization 2026-04-07T21:56:23Z Sparse autoencoders (SAEs) are widely used in mechanistic interpretability to project LLM activations onto sparse latent spaces. However, sparsity alone is an imperfect proxy for interpretability, and current training objectives often result in brittle latent representations. SAEs are known to be prone to feature absorption, where general features are subsumed by more specific ones due to co-occurrence, degrading interpretability despite high reconstruction fidelity. Recent negative results on Out-of-Distribution (OOD) performance further underscore broader robustness related failures tied to under-specified training objectives. We address this by proposing a masking-based regularization that randomly replaces tokens during training to disrupt co-occurrence patterns. This improves robustness across SAE architectures and sparsity levels reducing absorption, enhancing probing performance, and narrowing the OOD gap. Our results point toward a practical path for more reliable interpretability tools. 2026-04-07T21:56:23Z 4 pages, 1 figure Vivek Narayanaswamy Kowshik Thopalli Bhavya Kailkhura Wesam Sakla http://arxiv.org/abs/2601.08950v3 ConvoLearn: A Dataset for Fine-Tuning Dialogic AI Tutors 2026-04-07T21:52:13Z Despite their growing adoption in education, LLMs remain misaligned with the core principle of effective tutoring: the dialogic construction of knowledge. We introduce CONVOLEARN1, a dataset of 2,134 semi-synthetic tutor-student dialogues operationalizing six dimensions of dialogic tutoring grounded in knowledge-building theory, situated in a middle school Earth Science curriculum. We show that dimension-labeled dialogic training data captures meaningful pedagogical signal that generalizes beyond its semi-synthetic domain: scores from a classifier trained on CONVOLEARN correlate significantly with expert-coded instructional quality in authentic classrooms across multiple subscales. As a proof of concept, we fine-tune MISTRAL-7B on CONVOLEARN and show that dimension-level fine-tuning can steer a 7B open-weight model toward dialogic tutoring behavior that credentialed teachers rate as competitive with a strong proprietary baseline. With this work, we support the development of AI tutors capable of more dialogic interactions. 2026-01-13T19:40:28Z Mayank Sharma Roy Pea Hari Subramonyam http://arxiv.org/abs/2604.06491v1 Discrete Flow Matching Policy Optimization 2026-04-07T21:49:29Z We introduce Discrete flow Matching policy Optimization (DoMinO), a unified framework for Reinforcement Learning (RL) fine-tuning Discrete Flow Matching (DFM) models under a broad class of policy gradient methods. Our key idea is to view the DFM sampling procedure as a multi-step Markov Decision Process. This perspective provides a simple and transparent reformulation of fine-tuning reward maximization as a robust RL objective. Consequently, it not only preserves the original DFM samplers but also avoids biased auxiliary estimators and likelihood surrogates used by many prior RL fine-tuning methods. To prevent policy collapse, we also introduce new total-variation regularizers to keep the fine-tuned distribution close to the pretrained one. Theoretically, we establish an upper bound on the discretization error of DoMinO and tractable upper bounds for the regularizers. Experimentally, we evaluate DoMinO on regulatory DNA sequence design. DoMinO achieves stronger predicted enhancer activity and better sequence naturalness than the previous best reward-driven baselines. The regularization further improves alignment with the natural sequence distribution while preserving strong functional performance. These results establish DoMinO as an useful framework for controllable discrete sequence generation. 2026-04-07T21:49:29Z Maojiang Su Po-Chung Hsieh Weimin Wu Mingcheng Lu Jiunhau Chen Jerry Yao-Chieh Hu Han Liu http://arxiv.org/abs/2604.06485v1 Inference-Time Code Selection via Symbolic Equivalence Partitioning 2026-04-07T21:37:59Z "Best-of-N" selection is a popular inference-time scaling method for code generation using Large Language Models (LLMs). However, to reliably identify correct solutions, existing methods often depend on expensive or stochastic external verifiers. In this paper, we propose Symbolic Equivalence Partitioning, a selection framework that uses symbolic execution to group candidate programs by semantic behavior and select a representative from the dominant functional partition. To improve grouping and selection, we encode domain-specific constraints as Satisfiability Modulo Theories (SMT) assumptions during symbolic execution to reduce path explosion and prevent invalid input searches outside the problem domain. At N=10, our method improves average accuracy over Pass@1 from 0.728 to 0.803 on HumanEval+ and from 0.516 to 0.604 on LiveCodeBench, without requiring any additional LLM inference beyond the initial N candidate generations. 2026-04-07T21:37:59Z David Cho Yifan Wang Fanping Sui Ananth Grama http://arxiv.org/abs/2604.06483v1 Distributed Interpretability and Control for Large Language Models 2026-04-07T21:35:08Z Large language models that require multiple GPU cards to host are usually the most capable models. It is necessary to understand and steer these models, but the current technologies do not support the interpretability and steering of these models in the multi-GPU setting as well as the single-GPU setting. We present a practical implementation of activation-level interpretability (logit lens) and steering (steering vector) that scales up to multi-GPU language models. Our system implements design choices that reduce the activation memory by up to 7x and increase the throughput by up to 41x compared to a baseline on identical hardware. We demonstrate the method across LLaMA-3.1 (8B, 70B) and Qwen-3 (4B, 14B, 32B), sustaining 20-100 tokens/s while collecting full layer-wise activation trajectories for sequences of 1,500 tokens. Using label-position steering vectors injected post-LayerNorm, we show controllable, monotonic shifts in model outputs with a mean steerability slope of 0.702 across evaluated datasets, without fine-tuning or additional forward passes. We release detailed benchmarks, ablations, and a reproducible instrumentation recipe to enable practical interpretability and real-time behavioral control for frontier LLMs at https://github.com/Devdesai1901/LogitLense. 2026-04-07T21:35:08Z Dev Arpan Desai Shaoyi Huang Zining Zhu http://arxiv.org/abs/2604.06481v1 Hybrid ResNet-1D-BiGRU with Multi-Head Attention for Cyberattack Detection in Industrial IoT Environments 2026-04-07T21:32:03Z This study introduces a hybrid deep learning model for intrusion detection in Industrial IoT (IIoT) systems, combining ResNet-1D, BiGRU, and Multi-Head Attention (MHA) for effective spatial-temporal feature extraction and attention-based feature weighting. To address class imbalance, SMOTE was applied during training on the EdgeHoTset dataset. The model achieved 98.71% accuracy, a loss of 0.0417%, and low inference latency (0.0001 sec /instance), demonstrating strong real-time capability. To assess generalizability, the model was also tested on the CICIoV2024 dataset, where it reached 99.99% accuracy and F1-score, with a loss of 0.0028, 0 % FPR, and 0.00014 sec/instance inference time. Across all metrics and datasets, the proposed model outperformed existing methods, confirming its robustness and effectiveness for real-time IoT intrusion detection. 2026-04-07T21:32:03Z 2025 International Conference on Intelligent Computer Systems, Data Science and Applications (IC2SDA) Afrah Gueriani Hamza Kheddar Ahmed Cherif Mazari 10.1109/IC2SDA68097.2025.11331431 http://arxiv.org/abs/2604.06465v1 Multi-objective Evolutionary Merging Enables Efficient Reasoning Models 2026-04-07T21:07:55Z Reasoning models have demonstrated remarkable capabilities in solving complex problems by leveraging long chains of thought. However, this more deliberate reasoning comes with substantial computational overhead at inference time. The Long-to-Short (L2S) reasoning problem seeks to maintain high accuracy using fewer tokens, but current training-free model merging approaches rely on scalarized, fixed-hyperparameter arithmetic methods that are highly brittle and force suboptimal compromises. To address this gap, we introduce Evo-L2S, a novel framework that formulates L2S reasoning as a multi-objective optimization challenge. By leveraging evolutionary model merging, Evo-L2S explicitly optimizes the trade-off between accuracy and output length to produce a robust Pareto front of merged models. To make this search computationally tractable for large language models, we propose an entropy-based subset sampling technique that drastically reduces the overhead of fitness estimation. Comprehensive experiments across 1.5B, 7B, and 14B parameter scales on six mathematical reasoning benchmarks demonstrate that Evo-L2S can reduce the length of generated reasoning traces by over 50% while preserving, or even improving, the problem-solving accuracy of the original reasoning models. 2026-04-07T21:07:55Z Mario Iacobelli Adrian Robert Minut Tommaso Mencattini Donato Crisostomi Andrea Santilli Iacopo Masi Emanuele RodolĂ  http://arxiv.org/abs/2510.23642v2 VisCoder2: Building Multi-Language Visualization Coding Agents 2026-04-07T20:59:49Z Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows due to limited language coverage, unreliable execution, and lack of iterative correction mechanisms. Progress has been constrained by narrow datasets and benchmarks that emphasize single-round generation and single-language tasks. To address these challenges, we introduce three complementary resources for advancing visualization coding agents. VisCode-Multi-679K is a large-scale, supervised dataset containing 679K validated and executable visualization samples with multi-turn correction dialogues across 12 programming languages. VisPlotBench is a benchmark for systematic evaluation, featuring executable tasks, rendered outputs, and protocols for both initial generation and multi-round self-debug. Finally, we present VisCoder2, a family of multi-language visualization models trained on VisCode-Multi-679K. Experiments show that VisCoder2 significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4.1, with further gains from iterative self-debug, reaching 82.4% overall execution pass rate at the 32B scale, particularly in symbolic or compiler-dependent languages. 2025-10-24T18:03:57Z Yuansheng Ni Songcheng Cai Xiangchao Chen Jiarong Liang Zhiheng Lyu Jiaqi Deng Kai Zou Ping Nie Fei Yuan Xiang Yue Wenhu Chen http://arxiv.org/abs/2604.06448v1 From Load Tests to Live Streams: Graph Embedding-Based Anomaly Detection in Microservice Architectures 2026-04-07T20:43:07Z Prime Video regularly conducts load tests to simulate the viewer traffic spikes seen during live events such as Thursday Night Football as well as video-on-demand (VOD) events such as Rings of Power. While these stress tests validate system capacity, they can sometimes miss service behaviors unique to real event traffic. We present a graph-based anomaly detection system that identifies under-represented services using unsupervised node-level graph embeddings. Built on a GCN-GAE, our approach learns structural representations from directed, weighted service graphs at minute-level resolution and flags anomalies based on cosine similarity between load test and event embeddings. The system identifies incident-related services that are documented and demonstrates early detection capability. We also introduce a preliminary synthetic anomaly injection framework for controlled evaluation that show promising precision (96%) and low false positive rate (0.08%), though recall (58%) remains limited under conservative propagation assumptions. This framework demonstrates practical utility within Prime Video while also surfacing methodological lessons and directions, providing a foundation for broader application across microservice ecosystems. 2026-04-07T20:43:07Z Accepted at FSE 2026 - Industrial Track Srinidhi Madabhushi Pranesh Vyas Swathi Vaidyanathan Mayur Kurup Elliott Nash Yegor Silyutin