https://arxiv.org/api/xA/dW+jJKdgy/Nc3zFJCAt6V+6o2026-03-24T08:26:40Z303743015http://arxiv.org/abs/2511.23251v4Deep Learning for Restoring MPI System Matrices Using Simulated Training Data2026-03-19T13:52:23ZMagnetic particle imaging reconstructs tracer distributions using a system matrix obtained through time-consuming, noise-prone calibration measurements. Methods for addressing imperfections in measured system matrices increasingly rely on deep neural networks, yet curated training data remain scarce. This study evaluates whether physics-based simulated system matrices can be used to train deep learning models for different system matrix restoration tasks, i.e., denoising, accelerated calibration, upsampling, and inpainting, that generalize to measured data. A large system matrices dataset was generated using an equilibrium magnetization model extended with uniaxial anisotropy. The dataset spans particle, scanner, and calibration parameters for 2D and 3D trajectories, and includes background noise injected from empty-frame measurements. For each restoration task, deep learning models were compared with classical non-learning baseline methods. The models trained solely on simulated system matrices generalized to measured data across all tasks: for denoising, DnCNN/RDN/SwinIR outperformed DCT-F baseline by >10 dB PSNR and up to 0.1 SSIM on simulations and led to perceptually better reconstuctions of real data; for 2D upsampling, SMRnet exceeded bicubic by 20 dB PSNR and 0.08 SSIM at $\times 2$-$\times 4$ which did not transfer qualitatively to real measurements. For 3D accelerated calibration, SMRnet matched tricubic in noiseless cases and was more robust under noise, and for 3D inpainting, biharmonic inpainting was superior when noise-free but degraded with noise, while a PConvUNet maintained quality and yielded less blurry reconstructions. The demonstrated transferability of deep learning models trained on simulations to real measurements mitigates the data-scarcity problem and enables the development of new methods beyond current measurement capabilities.2025-11-28T15:00:40ZArtyom TsandaSarah ReissKonrad SchefflerMarija BobergTobias Knopphttp://arxiv.org/abs/2603.18723v1A Hybrid Physical--Digital Framework for Annotated Fracture Reduction Data Evaluated using Clinically Relevant 3D metrics2026-03-19T10:17:27ZA major bottleneck in Computer-Assisted Preoperative Planning (CAPP) for fracture reduction is the limited availability of annotated data. While annotated datasets are now available for evaluating bone fracture segmentation algorithms, there is a notable lack of annotated data for the evaluation of automatic fracture reduction methods. Obtaining precise annotations, which are essential for training and evaluating automatic CAPP algorithm, of the reduced bone therefore remains a critical and underexplored challenge. Existing approaches to assess reduction methods rely either on synthetic fracture simulation which often lacks realism, or on manual virtual reductions, which are complex, time-consuming, operator-dependant and error-prone. To address these limitations, we propose a hybrid physical-digital framework for generating annotated fracture reduction data. Based on fracture CTs, fragments are first 3D printed, physically reduced, fixed and CT scanned to accurately recover transformation matrix applied to each fragment. To quantitatively assess reduction quality, we introduce a reproducible formulation of clinically relevant 3D fracture metrics, including 3D gap, 3D step-off, and total gap area. The framework was evaluated on 11 clinical acetabular fracture cases reduced by two independent operators. Compared to preoperative measurements, the proposed approach achieved mean improvements of 168.85 mm 2 in total gap area, 1.82 mm in 3D gap, and 0.81 mm in 3D step-off. This hybrid physical--digital framework enables the efficient generation of realistic, clinically relevant annotated fracture reduction data that can be used for the development and evaluation of automatic fracture reduction algorithms.2026-03-19T10:17:27ZBasile LongoLaTIMPaul-Emmanuel EdelineLaTIM, IMT AtlantiqueHoel LetissierLaTIMMarc-Olivier GauciIMT Atlantique, LaTIMAziliz Guezou-PhilippeIMT Atlantique, LaTIMValérie BurdinIMT Atlantique, LaTIMGuillaume DardenneLaTIMhttp://arxiv.org/abs/2603.18572v1UEPS: Robust and Efficient MRI Reconstruction2026-03-19T07:33:23ZDeep unrolled models (DUMs) have become the state of the art for accelerated MRI reconstruction, yet their robustness under domain shift remains a critical barrier to clinical adoption. In this work, we identify coil sensitivity map (CSM) estimation as the primary bottleneck limiting generalization. To address this, we propose UEPS, a novel DUM architecture featuring three key innovations: (i) an Unrolled Expanded (UE) design that eliminates CSM dependency by reconstructing each coil independently; (ii) progressive resolution, which leverages k-space-to-image mapping for efficient coarse-to-fine refinement; and (iii) sparse attention tailored to MRI's 1D undersampling nature. These physics-grounded designs enable simultaneous gains in robustness and computational efficiency. We construct a large-scale zero-shot transfer benchmark comprising 10 out-of-distribution test sets spanning diverse clinical shifts -- anatomy, view, contrast, vendor, field strength, and coil configurations. Extensive experiments demonstrate that UEPS consistently and substantially outperforms existing DUM, end-to-end, diffusion, and untrained methods across all OOD tests, achieving state-of-the-art robustness with low-latency inference suitable for real-time deployment.2026-03-19T07:33:23ZThe document contains the main paper and additional experimental details in the supplementary material. Open-source code can be found at: https://github.com/HongShangGroup/UEPSXiang ZhouHong ShangZijian ZhanTianyu HeJintao MengDong Lianghttp://arxiv.org/abs/2603.18544v1SCISSR: Scribble-Conditioned Interactive Surgical Segmentation and Refinement2026-03-19T07:00:18ZAccurate segmentation of tissues and instruments in surgical scenes is annotation-intensive due to irregular shapes, thin structures, specularities, and frequent occlusions. While SAM models support point, box, and mask prompts, points are often too sparse and boxes too coarse to localize such challenging targets. We present SCISSR, a scribble-promptable framework for interactive surgical scene segmentation. It introduces a lightweight Scribble Encoder that converts freehand scribbles into dense prompt embeddings compatible with the mask decoder, enabling iterative refinement for a target object by drawing corrective strokes on error regions. Because all added modules (the Scribble Encoder, Spatial Gated Fusion, and LoRA adapters) interact with the backbone only through its standard embedding interfaces, the framework is not tied to a single model: we build on SAM 2 in this work, yet the same components transfer to other prompt-driven segmentation architectures such as SAM 3 without structural modification. To preserve pre-trained capabilities, we train only these lightweight additions while keeping the remaining backbone frozen. Experiments on EndoVis 2018 demonstrate strong in-domain performance, while evaluation on the out-of-distribution CholecSeg8k further confirms robustness across surgical domains. SCISSR achieves 95.41% Dice on EndoVis 2018 with five interaction rounds and 96.30% Dice on CholecSeg8k with three interaction rounds, outperforming iterative point prompting on both benchmarks.2026-03-19T07:00:18ZHaonan PingJian JiangCheng YuanQizhen SunLv WuYutong Banhttp://arxiv.org/abs/2511.14070v2ELiC: Efficient LiDAR Geometry Compression via Cross-Bit-depth Feature Propagation and Bag-of-Encoders2026-03-19T03:59:33ZHierarchical LiDAR geometry compression encodes voxel occupancies from low to high bit-depths, yet prior methods treat each depth independently and re-estimate local context from coordinates at every level, limiting compression efficiency. We present ELiC, a real-time framework that combines cross-bit-depth feature propagation, a Bag-of-Encoders (BoE) selection scheme, and a Morton-order-preserving hierarchy. Cross-bit-depth propagation reuses features extracted at denser, lower depths to support prediction at sparser, higher depths. BoE selects, per depth, the most suitable coding network from a small pool, adapting capacity to observed occupancy statistics without training a separate model for each level. The Morton hierarchy maintains global Z-order across depth transitions, eliminating per-level sorting and reducing latency. Together these components improve entropy modeling and computation efficiency, yielding state-of-the-art compression at real-time throughput on Ford and SemanticKITTI. Code and pretrained models are available at https://github.com/moolgom/ELiCv1.2025-11-18T02:58:16ZJunsik KimGun BangSoowoong Kimhttp://arxiv.org/abs/2603.18305v1Energy-Aware Frame Rate Selection for Video Coding2026-03-18T21:42:36ZThe main contributions of this paper are twofold: First, we present an in-depth analysis of the impact of frame rate reductions on the visual quality of the video and the encoding as well as decoding energy. Second, we propose a lightweight frame rate selection method for energy- and quality-aware encoding. Concerning the first contribution, this paper performs extensive encoding and decoding measurements, followed by an investigation of the impact of temporal downsampling on the energy demand of encoding and decoding at different frame rates. Furthermore, we determine the objective visual quality of the downsampled videos. As a result of this investigation, we identify content- and quantization-setting-dependent energy-aware frame rates, i.e., the temporal downsampling factors that lead to Pareto-optimality in terms of energy and quality. We demonstrate that significant energy savings are achieved while maintaining constant visual quality. Subsequently, a subjective experiment is conducted to verify this observation regarding perceptual quality using mean opinion scores. As the second contribution, we propose an energy-aware frame rate selection method that extracts spatio-temporal features from the video sequences. Based on these features, the proposed method employs a feature-based supervised machine learning approach to predict energy-aware frame rates for a given quantization parameter and video sequence, aiming to reduce energy consumption during encoding and decoding. The experimental results demonstrate that the proposed method offers significant energy savings, with an average of 17.46% and 17.60% of encoding and decoding energy demand reduction, respectively, alongside 3.38% average bitrate savings at a constant quality.2026-03-18T21:42:36ZGeetha RamasubbuAndrè KaupChristian Herglotz10.1109/ACCESS.2026.3672053http://arxiv.org/abs/2411.15060v2Hallucination Detection in Virtually-Stained Histology: A Latent Space Baseline2026-03-18T19:09:28ZHistopathologic analysis of stained tissue remains central to biomedical research and clinical care. Virtual staining (VS) offers a promising alternative, with potential to reduce costs and streamline workflows, yet hallucinations pose serious risks to clinical reliability. Here, we formalize the problem of hallucination detection in VS and propose a scalable post-hoc method: Neural Hallucination Precursor (NHP), which leverages the generator's latent space to preemptively flag hallucinations. Extensive experiments across diverse VS tasks show NHP is both effective and robust. Critically, we also find that models with fewer hallucinations do not necessarily offer better detectability, exposing a gap in current VS evaluation and underscoring the need for hallucination detection benchmarks.2024-11-22T16:46:00ZJi-Hun OhKianoush FalahkheirkhahJohn ChevilleRohit Bhargavahttp://arxiv.org/abs/2603.18119v1Dual Agreement Consistency Learning with Foundation Models for Semi-Supervised Fetal Heart Ultrasound Segmentation and Diagnosis2026-03-18T15:50:44ZCongenital heart disease (CHD) screening from fetal echocardiography requires accurate analysis of multiple standard cardiac views, yet developing reliable artificial intelligence models remains challenging due to limited annotations and variable image quality. In this work, we propose FM-DACL, a semi-supervised Dual Agreement Consistency Learning framework for the FETUS 2026 challenge on fetal heart ultrasound segmentation and diagnosis. The method combines a pretrained ultrasound foundation model (EchoCare) with a convolutional network through heterogeneous co-training and an exponential moving average teacher to better exploit unlabeled data. Experiments on the multi-center challenge dataset show that FM-DACL achieves a Dice score of 59.66 and NSD of 42.82 using heterogeneous backbones, demonstrating the feasibility of the proposed semi-supervised framework. These results suggest that FM-DACL provides a flexible approach for leveraging heterogeneous models in low-annotation fetal cardiac ultrasound analysis. The code is available on https://github.com/13204942/FM-DACL.2026-03-18T15:50:44ZAccepted to the ISBI 2026 Fetal HearT UltraSound Segmentation and Diagnosis (FETUS) ChallengeFangyijie WangGuénolé SilvestreKathleen M. Curranhttp://arxiv.org/abs/2603.20290v1Transparent Fragments Contour Estimation via Visual-Tactile Fusion for Autonomous Reassembly2026-03-18T14:58:17ZThe contour estimation of transparent fragments is very important for autonomous reassembly, especially in the fields of precision optical instrument repair, cultural relic restoration, and identification of other precious device broken accidents. Different from general intact transparent objects, the contour estimation of transparent fragments face greater challenges due to strict optical properties, irregular shapes and edges. To address this issue, a general transparent fragments contour estimation framework based on visual-tactile fusion is proposed in this paper. First, we construct the transparent fragment dataset named TransFrag27K, which includes a multiscene synthetic data of broken fragments from multiple types of transparent objects, and a scalable synthetic data generation pipeline. Secondly, we propose a visual grasping position detection network named TransFragNet to identify, locate and segment the sampling grasping position. And, we use a two-finger gripper with Gelsight Mini sensors to obtain reconstructed tactile information of the lateral edge of the fragments. By fusing this tactile information with visual cues, a visual-tactile fusion material classifier is proposed. Inspired by the way humans estimate a fragment's contour combining vision and touch, we introduce a general transparent fragment contour estimation framework based on visual-tactile fusion, demonstrates strong performance in real-world validation. Finally, a multi-dimensional similarity metrics based contour matching and reassembly algorithm is proposed, providing a reproducible benchmark for evaluating visual-tactile contour estimation and fragment reassembly. The experimental results demonstrate the validity of the proposed framework. The dataset and codes are available at https://github.com/Keithllin/Transparent-Fragments-Contour-Estimation.2026-03-18T14:58:17Z17 pages, 22 figures, submitted to IEEE Transactions on Pattern Analysis and Machine IntelligenceQihao LinBorui ChenYuping ZhouJianing WuYulan GuoWeishi ZhengChongkun Xiahttp://arxiv.org/abs/2603.17702v1Cache-enabled Generative Joint Source-Channel Coding for Evolving Semantic Communications2026-03-18T13:22:31ZLearning-based semantic communication (SemCom) has recently emerged as a promising paradigm for improving the transmission efficiency of wireless networks. However, existing methods typically rely on extensive end-to-end training, which is both inflexible and computationally expensive in dynamic wireless environments. Moreover, they fail to exploit redundancy across multiple transmissions of semantically similar content, limiting overall efficiency. To overcome these limitations, we propose a channel-aware generative adversarial network (GAN) inversion-based joint source-channel coding (CAGI-JSCC) framework that enables training-free SemCom by leveraging a pre-trained SemanticStyleGAN model. By explicitly incorporating wireless channel characteristics into the GAN inversion process, CAGI-JSCC adapts to varying channel conditions without additional training. Furthermore, we introduce a cache-enabled dynamic codebook (CDC) that caches disentangled semantic components at both the transmitter and receiver, allowing the system to reuse previously transmitted content. This semantic-level caching can continuously reduce redundant transmissions as experience accumulates. Extensive experiments on image transmission demonstrate the effectiveness of the proposed framework. In particular, our system achieves comparable perceptual quality with an average bandwidth compression ratio (BCR) of 1/224, and as low as 1/1024 for a single image, significantly outperforming baselines with a BCR of 1/128.2026-03-18T13:22:31ZShunpu TangQianqian YangJihong ParkZhaoyang ZhangKaibin HuangDeniz Gunduzhttp://arxiv.org/abs/2504.00638v3Impact of Data Duplication on Deep Neural Network-Based Image Classifiers: Robust vs. Standard Models2026-03-18T11:51:32ZThe accuracy and robustness of machine learning models against adversarial attacks are significantly influenced by factors such as training data quality, model architecture, the training process, and the deployment environment. In recent years, duplicated data in training sets, especially in language models, has attracted considerable attention. It has been shown that deduplication enhances both training performance and model accuracy in language models. While the importance of data quality in training image classifier Deep Neural Networks (DNNs) is widely recognized, the impact of duplicated images in the training set on model generalization and performance has received little attention.
In this paper, we address this gap and provide a comprehensive study on the effect of duplicates in image classification. Our analysis indicates that the presence of duplicated images in the training set not only negatively affects the efficiency of model training but also may result in lower accuracy of the image classifier. This negative impact of duplication on accuracy is particularly evident when duplicated data is non-uniform across classes or when duplication, whether uniform or non-uniform, occurs in the training set of an adversarially trained model. Even when duplicated samples are selected in a uniform way, increasing the amount of duplication does not lead to a significant improvement in accuracy.2025-04-01T10:48:00ZAlireza AghabagherlooAydin AbadiSumanta SarkarVishnu Asutosh DasuBart Preneelhttp://arxiv.org/abs/2511.16955v2Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models2026-03-18T09:58:10ZGroup Relative Policy Optimization (GRPO) has shown promise in aligning image and video generative models with human preferences. However, applying it to modern flow matching models is challenging because of its deterministic sampling paradigm. Current methods address this issue by converting Ordinary Differential Equations (ODEs) to Stochastic Differential Equations (SDEs), which introduce stochasticity. However, this SDE-based GRPO suffers from issues of inefficient credit assignment and incompatibility with high-order solvers for fewer-step sampling. In this paper, we first reinterpret existing SDE-based GRPO methods from a distance optimization perspective, revealing their underlying mechanism as a form of contrastive learning. Based on this insight, we propose Neighbor GRPO, a novel alignment algorithm that completely bypasses the need for SDEs. Neighbor GRPO generates a diverse set of candidate trajectories by perturbing the initial noise conditions of the ODE and optimizes the model using a softmax distance-based surrogate leaping policy. We establish a theoretical connection between this distance-based objective and policy gradient optimization, rigorously integrating our approach into the GRPO framework. Our method fully preserves the advantages of deterministic ODE sampling, including efficiency and compatibility with high-order solvers. We further introduce symmetric anchor sampling for computational efficiency and group-wise quasi-norm reweighting to address reward flattening. Extensive experiments demonstrate that Neighbor GRPO significantly outperforms SDE-based counterparts in terms of training cost, convergence speed, and generation quality.2025-11-21T05:02:47ZCVPR 2026Dailan HeGuanlin FengXingtong GeYazhe NiuYi ZhangBingqi MaGuanglu SongYu LiuHongsheng Lihttp://arxiv.org/abs/2603.17547v1Deep Learning-Based Airway Segmentation in Systemic Lupus Erythematosus Patients with Interstitial Lung Disease (SLE-ILD): A Comparative High-Resolution CT Analysis2026-03-18T09:52:17ZTo characterize lobar and segmental airway volume differences between systemic lupus erythematosus (SLE) patients with interstitial lung disease (ILD) and those without ILD (non-ILD) using a deep learning-based approach on non-contrast chest high-resolution CT (HRCT). Methods: A retrospective analysis was conducted on 106 SLE patients (27 SLE-ILD, 79 SLE-non-ILD) who underwent HRCT. A customized deep learning framework based on the U-Net architecture was developed to automatically segment airway structures at the lobar and segmental levels via HRCT. Volumetric measurements of lung lobes and segments derived from the segmentations were statistically compared between the two groups using two-sample t-tests (significance threshold: p < 0.05). Results: At lobar level, significant airway volume enlargement in SLE-ILD patients was observed in the right upper lobe (p=0.009) and left upper lobe (p=0.039) compared to SLE-non-ILD. At the segmental level, significant differences were found in segments including R1 (p=0.016), R3 (p<0.001), and L3 (p=0.038), with the most marked changes in the upper lung zones, while lower zones showed non-significant trends. Conclusion: Our study demonstrates that an automated deep learning-based approach can effectively quantify airway volumes on HRCT scans and reveal significant, region-specific airway dilation in patients with SLE-ILD compared to those without ILD. The pattern of involvement, predominantly affecting the upper lobes and specific segments, highlights a distinct topographic phenotype of SLE-ILD and implicates airway structural alterations as a potential biomarker for disease presence. This AI-powered quantitative imaging biomarker holds promise for enhancing the early detection and monitoring of ILD in the SLE population, ultimately contributing to more personalized patient management.2026-03-18T09:52:17ZSirong PiaoDepartment of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, ChinaYing MingDepartment of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, ChinaRuijie ZhaoDepartment of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, ChinaJiaru WangDepartment of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, ChinaRan XiaoDepartment of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, ChinaRui ZhaoDepartment of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, ChinaZicheng LiaoDepartment of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, ChinaQiqi XuResearch and Development CenterShaoze LuoResearch and Development CenterBing LiResearch and Development CenterLin LiResearch and Development CenterZhuangfei MaCanon Medical SystemsFuling ZhengDepartment of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, ChinaWei SongDepartment of Radiology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, Chinahttp://arxiv.org/abs/2603.04438v2CogGen: Cognitive-Load-Informed Fully Unsupervised Deep Generative Modeling for Compressively Sampled MRI Reconstruction2026-03-18T08:50:42ZFully unsupervised deep generative modeling (FU-DGM) is promising for compressively sampled MRI (CS-MRI) when training data or compute are limited. Classical FU-DGMs such as DIP and INR rely on architectural priors, but the ill-conditioned inverse problem often demands many iterations and easily overfits measurement noise. We propose CogGen, a cognitive-load-informed FU-DGM that casts CS-MRI as staged inversion and regulates task-side "cognitive load" by progressively scheduling intrinsic difficulty and extraneous interference. CogGen replaces uniform data fitting with an easy-to-hard k-space weighting/selection strategy: early iterations emphasize low-frequency, high-SNR, structure-dominant samples, while higher-frequency or noise-dominated measurements are introduced later. We realize this schedule through self-paced curriculum learning (SPCL) with complementary criteria: a student mode that reflects what the model can currently learn and a teacher mode that indicates what it should follow, supporting both soft weighting and hard selection. Experiments and analyses show that CogGen-DIP and CogGen-INR improve reconstruction fidelity and convergence behavior compared with strong unsupervised baselines and competitive supervised pipelines.2026-02-20T07:20:52ZQingyong ZhuYumin TanXiang GuDong Lianghttp://arxiv.org/abs/2412.01525v5Towards Clinical Practice in CT-Based Pulmonary Disease Screening: An Efficient and Reliable Framework2026-03-18T06:50:07ZDeep learning models for pulmonary disease screening from Computed Tomography (CT) scans promise to alleviate the immense workload on radiologists. Still, their high computational cost, stemming from processing entire 3D volumes, remains a major barrier to widespread clinical adoption. Current sub-sampling techniques often compromise diagnostic integrity by introducing artifacts or discarding critical information. To overcome these limitations, we propose an Efficient and Reliable Framework (ERF) that fundamentally improves the practicality of automated CT analysis. Our framework introduces two core innovations: (1) A Cluster-based Sub-Sampling (CSS) method that efficiently selects a compact yet comprehensive subset of CT slices by optimizing for both representativeness and diversity. By integrating an efficient k-nearest neighbor search with an iterative refinement process, CSS bypasses the computational bottlenecks of previous methods while preserving vital diagnostic features. (2) An Ambiguity-aware Uncertainty Quantification (AUQ) mechanism, which enhances reliability by specifically targeting data ambiguity arising from subtle lesions and artifacts. Unlike standard uncertainty measures, AUQ leverages the predictive discrepancy between auxiliary classifiers to construct a specialized ambiguity score. By maximizing this discrepancy during training, the system effectively flags ambiguous samples where the model lacks confidence due to visual noise or intricate pathologies. Validated on two public datasets with 2,654 CT volumes across diagnostic tasks for 3 pulmonary diseases, ERF achieves diagnostic performance comparable to the full-volume analysis (over 90% accuracy and recall) while reducing processing time by more than 60%. This work represents a significant step towards deploying fast, accurate, and trustworthy AI-powered screening tools in time-sensitive clinical settings.2024-12-02T14:18:17ZQian ShaoBang DuYixuan WuZepeng LiQiyuan ChenQianqian TangJian WuJintai ChenHongxia Xu