https://arxiv.org/api/wj0D3KlFo6ytTK7Hj+xtHhXOx54 2026-03-20T09:05:59Z 30353 15 15 http://arxiv.org/abs/2412.01525v5 Towards Clinical Practice in CT-Based Pulmonary Disease Screening: An Efficient and Reliable Framework 2026-03-18T06:50:07Z Deep learning models for pulmonary disease screening from Computed Tomography (CT) scans promise to alleviate the immense workload on radiologists. Still, their high computational cost, stemming from processing entire 3D volumes, remains a major barrier to widespread clinical adoption. Current sub-sampling techniques often compromise diagnostic integrity by introducing artifacts or discarding critical information. To overcome these limitations, we propose an Efficient and Reliable Framework (ERF) that fundamentally improves the practicality of automated CT analysis. Our framework introduces two core innovations: (1) A Cluster-based Sub-Sampling (CSS) method that efficiently selects a compact yet comprehensive subset of CT slices by optimizing for both representativeness and diversity. By integrating an efficient k-nearest neighbor search with an iterative refinement process, CSS bypasses the computational bottlenecks of previous methods while preserving vital diagnostic features. (2) An Ambiguity-aware Uncertainty Quantification (AUQ) mechanism, which enhances reliability by specifically targeting data ambiguity arising from subtle lesions and artifacts. Unlike standard uncertainty measures, AUQ leverages the predictive discrepancy between auxiliary classifiers to construct a specialized ambiguity score. By maximizing this discrepancy during training, the system effectively flags ambiguous samples where the model lacks confidence due to visual noise or intricate pathologies. Validated on two public datasets with 2,654 CT volumes across diagnostic tasks for 3 pulmonary diseases, ERF achieves diagnostic performance comparable to the full-volume analysis (over 90% accuracy and recall) while reducing processing time by more than 60%. This work represents a significant step towards deploying fast, accurate, and trustworthy AI-powered screening tools in time-sensitive clinical settings. 2024-12-02T14:18:17Z Qian Shao Bang Du Yixuan Wu Zepeng Li Qiyuan Chen Qianqian Tang Jian Wu Jintai Chen Hongxia Xu http://arxiv.org/abs/2603.17415v1 Structured SIR: Efficient and Expressive Importance-Weighted Inference for High-Dimensional Image Registration 2026-03-18T06:46:55Z Image registration is an ill-posed dense vision task, where multiple solutions achieve similar loss values, motivating probabilistic inference. Variational inference has previously been employed to capture these distributions, however restrictive assumptions about the posterior form can lead to poor characterisation, overconfidence and low-quality samples. More flexible posteriors are typically bottlenecked by the complexity of high-dimensional covariance matrices required for dense 3D image registration. In this work, we present a memory and computationally efficient inference method, Structured SIR, that enables expressive, multi-modal, characterisation of uncertainty with high quality samples. We propose the use of a Sampled Importance Resampling (SIR) algorithm with a novel memory-efficient high-dimensional covariance parameterisation as the sum of a low-rank covariance and a sparse, spatially structured Cholesky precision factor. This structure enables capturing complex spatial correlations while remaining computationally tractable. We evaluate the efficacy of this approach in 3D dense image registration of brain MRI data, which is a very high-dimensional problem. We demonstrate that our proposed methods produces uncertainty estimates that are significantly better calibrated than those produced by variational methods, achieving equivalent or better accuracy. Crucially, we show that the model yields highly structured multi-modal posterior distributions, enable effective and efficient uncertainty quantification. 2026-03-18T06:46:55Z Ivor J. A. Simpson Neill D. F. Campbell http://arxiv.org/abs/2508.20476v3 Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio 2026-03-18T06:03:16Z Audio is the primary modality for human communication and has driven the success of Automatic Speech Recognition (ASR) technologies. However, such audio-centric systems inherently exclude individuals who are deaf or hard of hearing. Visual alternatives such as sign language and lip reading offer effective substitutes, and recent advances in Sign Language Translation (SLT) and Visual Speech Recognition (VSR) have improved audio-less communication. Yet, these modalities have largely been studied in isolation, and their integration within a unified framework remains underexplored. In this paper, we propose the first unified framework capable of handling diverse combinations of sign language, lip movements, and audio for spoken-language text generation. We focus on three main objectives: (i) designing a unified, modality-agnostic architecture capable of effectively processing heterogeneous inputs; (ii) exploring the underexamined synergy among modalities, particularly the role of lip movements as non-manual cues in sign language comprehension; and (iii) achieving performance on par with or superior to state-of-the-art models specialized for individual tasks. Building on this framework, we achieve performance on par with or better than task-specific state-of-the-art models across SLT, VSR, ASR, and Audio-Visual Speech Recognition. Furthermore, our analysis reveals a key linguistic insight: explicitly modeling lip movements as a distinct modality significantly improves SLT performance by capturing critical non-manual cues. 2025-08-28T06:51:42Z Jeong Hun Yeo Hyeongseop Rha Sungjune Park Junil Won Yong Man Ro http://arxiv.org/abs/2603.17358v1 A 3D Reconstruction Benchmark for Asset Inspection 2026-03-18T04:42:14Z Asset management requires accurate 3D models to inform the maintenance, repair, and assessment of buildings, maritime vessels, and other key structures as they age. These downstream applications rely on high-fidelity models produced from aerial surveys in close proximity to the asset, enabling operators to locate and characterise deterioration or damage and plan repairs. Captured images typically have high overlap between adjacent camera poses, sufficient detail at millimetre scale, and challenging visual appearances such as reflections and transparency. However, existing 3D reconstruction datasets lack examples of these conditions, making it difficult to benchmark methods for this task. We present a new dataset with ground truth depth maps, camera poses, and mesh models of three synthetic scenes with simulated inspection trajectories and varying levels of surface condition on non-Lambertian scene content. We evaluate state-of-the-art reconstruction methods on this dataset. Our results demonstrate that current approaches struggle significantly with the dense capture trajectories and complex surface conditions inherent to this domain, exposing a critical scalability gap and pointing toward new research directions for deployable 3D reconstruction in asset inspection. Project page: https://roboticimaging.org/Projects/asset-inspection-dataset/ 2026-03-18T04:42:14Z 29 pages, 15 figures, 8 tables James L. Gray Australian Centre for Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, University of Sydney, Sydney, NSW, Australia Nikolai Goncharov Australian Centre for Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, University of Sydney, Sydney, NSW, Australia Alexandre Cardaillac Australian Centre for Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, University of Sydney, Sydney, NSW, Australia Ryan Griffiths Australian Centre for Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, University of Sydney, Sydney, NSW, Australia Jack Naylor Australian Centre for Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, University of Sydney, Sydney, NSW, Australia Donald G. Dansereau Australian Centre for Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, University of Sydney, Sydney, NSW, Australia http://arxiv.org/abs/2603.14832v2 Halfway to 3D: Ensembling 2.5D and 3D Models for Robust COVID-19 CT Diagnosis 2026-03-18T04:16:07Z We propose a deep learning framework for COVID-19 detection and disease classification from chest CT scans that integrates both 2.5D and 3D representations to capture complementary slice-level and volumetric information. The 2.5D branch processes multi-view CT slices (axial, coronal, sagittal) using a DINOv3 vision transformer to extract robust visual features, while the 3D branch employs a ResNet-18 architecture to model volumetric context and is pretrained with Variance Risk Extrapolation (VREx) followed by supervised contrastive learning to improve cross-source robustness. Predictions from both branches are combined through logit-level ensemble inference. Experiments on the PHAROS-AIF-MIH benchmark demonstrate the effectiveness of the proposed approach: for binary COVID-19 detection, the ensemble achieves 94.48% accuracy and a 0.9426 Macro F1-score, outperforming both individual models, while for multi-class disease classification the 2.5D DINOv3 model achieves the best performance with 79.35% accuracy and a 0.7497 Macro F1-score. These results highlight the benefit of combining pretrained slice-based representations with volumetric modeling for robust multi-source medical imaging analysis. Code is available at https://github.com/HySonLab/PHAROS-AIF-MIH 2026-03-16T05:24:10Z Tuan-Anh Yang Bao V. Q. Bui Chanh-Quang Vo-Van Truong-Son Hy http://arxiv.org/abs/2602.13293v2 NutVLM: A Self-Adaptive Defense Framework against Full-Dimension Attacks for Vision Language Models in Autonomous Driving 2026-03-18T02:30:50Z Vision Language Models (VLMs) have advanced perception in autonomous driving (AD), but they remain vulnerable to adversarial threats. These risks range from localized physical patches to imperceptible global perturbations. Existing defense methods for VLMs remain limited and often fail to reconcile robustness with clean-sample performance. To bridge these gaps, we propose NutVLM, a comprehensive self-adaptive defense framework designed to secure the entire perception-decision lifecycle. Specifically, we first employ NutNet++ as a sentinel, which is a unified detection-purification mechanism. It identifies benign samples, local patches, and global perturbations through three-way classification. Subsequently, localized threats are purified via efficient grayscale masking, while global perturbations trigger Expert-guided Adversarial Prompt Tuning (EAPT). Instead of the costly parameter updates of full-model fine-tuning, EAPT generates "corrective driving prompts" via gradient-based latent optimization and discrete projection. These prompts refocus the VLM's attention without requiring exhaustive full-model retraining. Evaluated on the Dolphins benchmark, our NutVLM yields a 4.89% improvement in overall metrics (e.g., Accuracy, Language Score, and GPT Score). These results validate NutVLM as a scalable security solution for intelligent transportation. Our code is available at https://github.com/PXX/NutVLM. 2026-02-09T05:42:59Z 12 pages, 6 figures Xiaoxu Peng Dong Zhou Jianwen Zhang Guanghui Sun Anh Tu Ngo Anupam Chattopadhyay http://arxiv.org/abs/2603.17156v1 A Lensless Polarization Camera 2026-03-17T21:39:35Z Polarization imaging is a technique that creates a pixel map of the polarization state in a scene. Although invisible to the human eye, polarization can assist various sensing and computer vision tasks. Existing polarization cameras use spatial or temporal multiplexing, which increases the camera volume, weight, cost, or all of the above. Recent lensless imaging approaches, such as DiffuserCam, have demonstrated that compact imaging systems can be realized by replacing the lens with a coding element and performing computational reconstruction. In this work, we propose a compact lensless polarization camera composed of a diffuser and a simple striped polarization mask. By combining this optical design with a reconstruction algorithm that explicitly models the polarization-encoded lensless measurements, four linear polarization images are recovered from a single snapshot. Our results demonstrate the potential of lensless approaches for polarization imaging and reveal the physical factors that govern reconstruction quality, guiding the development of high-quality practical systems. 2026-03-17T21:39:35Z Noa Kraicer Shay Elmalem Erez Yosef Hani Barhum Raja Giryes http://arxiv.org/abs/2603.17126v1 Topology-Preserving Deep Joint Source-Channel Coding for Semantic Communication 2026-03-17T20:40:36Z Many wireless vision applications, such as autonomous driving, require preservation of global structural information rather than only per-pixel fidelity. However, existing Deep joint source-channel coding (DeepJSCC) schemes mainly optimize pixel-wise losses and provide no explicit protection of connectivity or topology. This letter proposes TopoJSCC, a topology-aware DeepJSCC framework that integrates persistent-homology regularizers to end-to-end training. Specifically, we enforce topological consistency by penalizing Wasserstein distances between cubical persistence diagrams of original and reconstructed images, and between Vietoris--Rips persistence of latent features before and after the channel to promote a robust latent manifold. TopoJSCC is based on end-to-end learning and requires no side information. Experiments show improved topology preservation and peak signal-to-noise ratio (PSNR) in low signal-to-noise ratio (SNR) and bandwidth-ratio regimes. 2026-03-17T20:40:36Z Submitted to IEEE Journals for possible publication Omar Erak Omar Alhussein Fang Fang Sami Muhaidat http://arxiv.org/abs/2603.18050v1 Quality assessment of brain structural MR images: Comparing generalization of deep learning versus hand-crafted feature-based machine learning methods to new sites 2026-03-17T20:19:09Z Quality assessment of brain structural MR images is critical for large-scale neuroimaging studies, where motion artifacts can significantly bias clinical estimates. While visual rating remains the gold standard, it is time-consuming and subjective. This study evaluates the relative performance and generalization capabilities of two prominent Automated Quality Assessment (AQA) methods: MRIQC, which uses hand-crafted image-quality metrics with traditional machine learning, and CNNQC, which utilizes a deep learning (DL) architecture. Using a heterogeneous dataset of 1,098 T1-weighted volumes from 17 different sites, we assessed performance on both seen sites and entirely new sites using a leave-one-site-out (LOSO) approach. Our results indicate that both DL and traditional ML methods struggle to generalize to new scanners or sites. While MRIQC generally achieved higher accuracy across most unseen sites, CNNQC demonstrated higher sensitivity for detecting poor-quality scans. Given that DL-based methods like CNNQC offer higher computational efficiency and do not require expensive pre-processing, they may be preferred for widespread deployment, provided that future work focuses on improving cross-site generalizability. 2026-03-17T20:19:09Z Prabhjot Kaur John S. Thornton Frederik Barkhof Tarek A. Yousry Sjoerd B. Vos Hui Zhang http://arxiv.org/abs/2603.16788v1 Preserving Vertical Structure in 3D-to-2D Projection for Permafrost Thaw Mapping 2026-03-17T17:01:13Z Forecasting permafrost thaw from aerial lidar requires projecting 3D point cloud features onto 2D prediction grids, yet naive aggregation methods destroy the vertical structure critical in forest environments where ground, understory, and canopy carry distinct information about subsurface conditions. We propose a projection decoder with learned height embeddings that enable height-dependent feature transformations, allowing the network to differentiate ground-level signals from canopy returns. Combined with stratified sampling that ensures all forest strata remain represented, our approach preserves the vertical information critical for predicting subsurface conditions. Our approach pairs this decoder with a Point Transformer V3 encoder to predict dense thaw depth maps from drone-collected lidar over boreal forest in interior Alaska. Experiments demonstrate that z-stratified projection outperforms standard averaging-based methods, particularly in areas with complex vertical vegetation structure. Our method enables scalable, high-resolution monitoring of permafrost degradation from readily deployable UAV platforms. 2026-03-17T17:01:13Z Justin McMillen Robert Van Alphen Taha Sadeghi Chorsi Jason Shabaga Mel Rodgers Rocco Malservisi Timothy Dixon Yasin Yilmaz http://arxiv.org/abs/2603.14644v2 LUMINA: A Multi-Vendor Mammography Benchmark with Energy Harmonization Protocol 2026-03-17T16:50:59Z Publicly available full-field digital mammography (FFDM) datasets remain limited in size, clinical annotations, and vendor diversity, hindering the development of robust models. We introduce LUMINA, a curated, multi-vendor FFDM dataset that explicitly encodes acquisition energy and vendor metadata to capture clinically relevant appearance variations often overlooked in existing benchmarks. This dataset contains 1824 images from 468 patients (960 benign, 864 malignant), with pathology-confirmed labels, BI-RADS assessments, and breast-density annotations. LUMINA spans six acquisition systems and includes both high- and low-energy imaging styles, enabling systematic analysis of vendor- and energy-induced domain shifts. To address these variations, we propose a foreground-only pixel-space alignment method (''energy harmonization'') that maps images to a low-energy reference while preserving lesion morphology. We benchmark CNN and transformer models on three clinically relevant tasks: diagnosis (benign vs. malignant), BI-RADS classification, and density estimation. Two-view models consistently outperform single-view models. EfficientNet-B0 achieves an AUC of 93.54% for diagnosis, while Swin-T achieves the best macro-AUC of 89.43% for density prediction. Harmonization improves performance across architectures and produces more localized Grad-CAM responses. Overall, LUMINA provides (1) a vendor-diverse benchmark and (2) a model-agnostic harmonization framework for reliable and deployable mammography AI. 2026-03-15T22:41:40Z This paper was accepted to CVPR 2026 Hongyi Pan Gorkem Durak Halil Ertugrul Aktas Andrea M. Bejar Baver Tutun Emre Uysal Ezgi Bulbul Mehmet Fatih Dogan Berrin Erok Berna Akkus Yildirim Sukru Mehmet Erturk Ulas Bagci http://arxiv.org/abs/2603.16587v1 HistoAtlas: A Pan-Cancer Morphology Atlas Linking Histomics to Molecular Programs and Clinical Outcomes 2026-03-17T14:36:07Z We present HistoAtlas, a pan-cancer computational atlas that extracts 38 interpretable histomic features from 6,745 diagnostic H&E slides across 21 TCGA cancer types and systematically links every feature to survival, gene expression, somatic mutations, and immune subtypes. All associations are covariate-adjusted, multiple-testing corrected, and classified into evidence-strength tiers. The atlas recovers known biology, from immune infiltration and prognosis to proliferation and kinase signaling, while uncovering compartment-specific immune signals and morphological subtypes with divergent outcomes. Every result is spatially traceable to tissue compartments and individual cells, statistically calibrated, and openly queryable. HistoAtlas enables systematic, large-scale biomarker discovery from routine H&E without specialized staining or sequencing. Data and an interactive web atlas are freely available at https://histoatlas.com . 2026-03-17T14:36:07Z Pierre-Antoine Bannier http://arxiv.org/abs/2603.09531v2 Association of Progressive PPFE and Mortality in Lung Cancer Screening Cohorts 2026-03-17T14:06:33Z Background: Pleuroparenchymal fibroelastosis (PPFE) is an upper lobe predominant fibrotic lung abnormality associated with increased mortality in established interstitial lung disease. However, the clinical significance of radiologic PPFE progression in lung cancer screening (LCS) populations remains unclear. Methods: We analysed longitudinal low-dose CT scans and clinical data from two LCS studies: National Lung Screening Trial (NLST; n=7,980); SUMMIT study (n=8,561). An automated algorithm quantified PPFE volume on baseline and follow-up scans. Annualised change in PPFE was derived and dichotomised using a distribution-based threshold to define progressive PPFE. Associations between progressive PPFE and mortality were evaluated using Cox proportional hazards models adjusted for demographic and clinical variables. In SUMMIT cohort, associations between progressive PPFE and clinical outcomes were assessed using incidence rate ratios (IRR) and odds ratios (OR). Findings: Progressive PPFE independently associated with mortality in both LCS cohorts (NLST: Hazard Ratio (HR)=1.25, 95% Confidence Interval (CI): 1.01--1.56, p=0.042; SUMMIT: HR=3.14, 95% CI: 1.66--5.97, p<0.001). Within SUMMIT, progressive PPFE was strongly associated with higher respiratory admissions (IRR=2.79, p<0.001), increased antibiotic and steroid use (IRR=1.55, p=0.010), and showed a trend towards higher modified medical research council scores (OR=1.40, p=0.055). Interpretation: Radiologic PPFE progression independently associates with mortality across two large LCS cohorts, and associates with adverse clinical outcomes. Quantitative assessment of PPFE progression may provide a clinically relevant imaging biomarker to identify individuals at increased risk of respiratory morbidity within LCS programmes. 2026-03-10T11:37:50Z Shahab Aslani Mehran Azimbagirad Daryl Cheng Daisuke Yamada Ryoko Egashira Adam Szmul Justine Chan-Fook Robert Chapman Alfred Chung Pui So Shanshan Wang John McCabe Tianqi Yang Jose M Brenes Eyjolfur Gudmundsson The SUMMIT Consortium Susan M. Astley Daniel C. Alexander Sam M. Janes Joseph Jacob http://arxiv.org/abs/2603.14610v2 Make it SING: Analyzing Semantic Invariants in Classifiers 2026-03-17T07:33:45Z All classifiers, including state-of-the-art vision models, possess invariants, partially rooted in the geometry of their linear mappings. These invariants, which reside in the null-space of the classifier, induce equivalent sets of inputs that map to identical outputs. The semantic content of these invariants remains vague, as existing approaches struggle to provide human-interpretable information. To address this gap, we present Semantic Interpretation of the Null-space Geometry (SING), a method that constructs equivalent images, with respect to the network, and assigns semantic interpretations to the available variations. We use a mapping from network features to multi-modal vision language models. This allows us to obtain natural language descriptions and visual examples of the induced semantic shifts. SING can be applied to a single image, uncovering local invariants, or to sets of images, allowing a breadth of statistical analysis at the class and model levels. For example, our method reveals that ResNet50 leaks relevant semantic attributes to the null space, whereas DinoViT, a ViT pretrained with self-supervised DINO, is superior in maintaining class semantics across the invariant space. 2026-03-15T21:13:14Z Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026 Harel Yadid Meir Yossef Levi Roy Betser Guy Gilboa http://arxiv.org/abs/2603.15143v2 Clinical Priors Guided Lung Disease Detection in 3D CT Scans 2026-03-17T06:19:11Z Accurate classification of lung diseases from chest CT scans plays an important role in computer-aided diagnosis systems. However, medical imaging datasets often suffer from severe class imbalance, which may significantly degrade the performance of deep learning models, especially for minority disease categories. To address this issue, we propose a gender-aware two-stage lung disease classification framework. The proposed approach explicitly incorporates gender information into the disease recognition pipeline. In the first stage, a gender classifier is trained to predict the patient's gender from CT scans. In the second stage, the input CT image is routed to a corresponding gender-specific disease classifier to perform final disease prediction. This design enables the model to better capture gender-related imaging characteristics and alleviate the influence of imbalanced data distribution. Experimental results demonstrate that the proposed method improves the recognition performance for minority disease categories, particularly squamous cell carcinoma, while maintaining competitive performance on other classes. 2026-03-16T11:38:22Z Kejin Lu Jianfa Bai Qingqiu Li Runtian Yuan Jilan Xu Junlin Hou Yuejie Zhang Rui Feng