Structured SIR: Efficient and Expressive Importance-Weighted Inference for High-Dimensional Image Registration

2026-03-18T06:46:55Z

Image registration is an ill-posed dense vision task, where multiple solutions achieve similar loss values, motivating probabilistic inference. Variational inference has previously been employed to capture these distributions, however restrictive assumptions about the posterior form can lead to poor characterisation, overconfidence and low-quality samples. More flexible posteriors are typically bottlenecked by the complexity of high-dimensional covariance matrices required for dense 3D image registration. In this work, we present a memory and computationally efficient inference method, Structured SIR, that enables expressive, multi-modal, characterisation of uncertainty with high quality samples. We propose the use of a Sampled Importance Resampling (SIR) algorithm with a novel memory-efficient high-dimensional covariance parameterisation as the sum of a low-rank covariance and a sparse, spatially structured Cholesky precision factor. This structure enables capturing complex spatial correlations while remaining computationally tractable. We evaluate the efficacy of this approach in 3D dense image registration of brain MRI data, which is a very high-dimensional problem. We demonstrate that our proposed methods produces uncertainty estimates that are significantly better calibrated than those produced by variational methods, achieving equivalent or better accuracy. Crucially, we show that the model yields highly structured multi-modal posterior distributions, enable effective and efficient uncertainty quantification.

A 3D Reconstruction Benchmark for Asset Inspection

2026-03-18T04:42:14Z

Asset management requires accurate 3D models to inform the maintenance, repair, and assessment of buildings, maritime vessels, and other key structures as they age. These downstream applications rely on high-fidelity models produced from aerial surveys in close proximity to the asset, enabling operators to locate and characterise deterioration or damage and plan repairs. Captured images typically have high overlap between adjacent camera poses, sufficient detail at millimetre scale, and challenging visual appearances such as reflections and transparency. However, existing 3D reconstruction datasets lack examples of these conditions, making it difficult to benchmark methods for this task. We present a new dataset with ground truth depth maps, camera poses, and mesh models of three synthetic scenes with simulated inspection trajectories and varying levels of surface condition on non-Lambertian scene content. We evaluate state-of-the-art reconstruction methods on this dataset. Our results demonstrate that current approaches struggle significantly with the dense capture trajectories and complex surface conditions inherent to this domain, exposing a critical scalability gap and pointing toward new research directions for deployable 3D reconstruction in asset inspection. Project page: https://roboticimaging.org/Projects/asset-inspection-dataset/

Halfway to 3D: Ensembling 2.5D and 3D Models for Robust COVID-19 CT Diagnosis

2026-03-18T04:16:07Z

We propose a deep learning framework for COVID-19 detection and disease classification from chest CT scans that integrates both 2.5D and 3D representations to capture complementary slice-level and volumetric information. The 2.5D branch processes multi-view CT slices (axial, coronal, sagittal) using a DINOv3 vision transformer to extract robust visual features, while the 3D branch employs a ResNet-18 architecture to model volumetric context and is pretrained with Variance Risk Extrapolation (VREx) followed by supervised contrastive learning to improve cross-source robustness. Predictions from both branches are combined through logit-level ensemble inference. Experiments on the PHAROS-AIF-MIH benchmark demonstrate the effectiveness of the proposed approach: for binary COVID-19 detection, the ensemble achieves 94.48% accuracy and a 0.9426 Macro F1-score, outperforming both individual models, while for multi-class disease classification the 2.5D DINOv3 model achieves the best performance with 79.35% accuracy and a 0.7497 Macro F1-score. These results highlight the benefit of combining pretrained slice-based representations with volumetric modeling for robust multi-source medical imaging analysis. Code is available at https://github.com/HySonLab/PHAROS-AIF-MIH

NutVLM: A Self-Adaptive Defense Framework against Full-Dimension Attacks for Vision Language Models in Autonomous Driving

2026-03-18T02:30:50Z

Vision Language Models (VLMs) have advanced perception in autonomous driving (AD), but they remain vulnerable to adversarial threats. These risks range from localized physical patches to imperceptible global perturbations. Existing defense methods for VLMs remain limited and often fail to reconcile robustness with clean-sample performance. To bridge these gaps, we propose NutVLM, a comprehensive self-adaptive defense framework designed to secure the entire perception-decision lifecycle. Specifically, we first employ NutNet++ as a sentinel, which is a unified detection-purification mechanism. It identifies benign samples, local patches, and global perturbations through three-way classification. Subsequently, localized threats are purified via efficient grayscale masking, while global perturbations trigger Expert-guided Adversarial Prompt Tuning (EAPT). Instead of the costly parameter updates of full-model fine-tuning, EAPT generates "corrective driving prompts" via gradient-based latent optimization and discrete projection. These prompts refocus the VLM's attention without requiring exhaustive full-model retraining. Evaluated on the Dolphins benchmark, our NutVLM yields a 4.89% improvement in overall metrics (e.g., Accuracy, Language Score, and GPT Score). These results validate NutVLM as a scalable security solution for intelligent transportation. Our code is available at https://github.com/PXX/NutVLM.

A Lensless Polarization Camera

2026-03-17T21:39:35Z

Polarization imaging is a technique that creates a pixel map of the polarization state in a scene. Although invisible to the human eye, polarization can assist various sensing and computer vision tasks. Existing polarization cameras use spatial or temporal multiplexing, which increases the camera volume, weight, cost, or all of the above. Recent lensless imaging approaches, such as DiffuserCam, have demonstrated that compact imaging systems can be realized by replacing the lens with a coding element and performing computational reconstruction. In this work, we propose a compact lensless polarization camera composed of a diffuser and a simple striped polarization mask. By combining this optical design with a reconstruction algorithm that explicitly models the polarization-encoded lensless measurements, four linear polarization images are recovered from a single snapshot. Our results demonstrate the potential of lensless approaches for polarization imaging and reveal the physical factors that govern reconstruction quality, guiding the development of high-quality practical systems.

Topology-Preserving Deep Joint Source-Channel Coding for Semantic Communication

2026-03-17T20:40:36Z

Many wireless vision applications, such as autonomous driving, require preservation of global structural information rather than only per-pixel fidelity. However, existing Deep joint source-channel coding (DeepJSCC) schemes mainly optimize pixel-wise losses and provide no explicit protection of connectivity or topology. This letter proposes TopoJSCC, a topology-aware DeepJSCC framework that integrates persistent-homology regularizers to end-to-end training. Specifically, we enforce topological consistency by penalizing Wasserstein distances between cubical persistence diagrams of original and reconstructed images, and between Vietoris--Rips persistence of latent features before and after the channel to promote a robust latent manifold. TopoJSCC is based on end-to-end learning and requires no side information. Experiments show improved topology preservation and peak signal-to-noise ratio (PSNR) in low signal-to-noise ratio (SNR) and bandwidth-ratio regimes.

Quality assessment of brain structural MR images: Comparing generalization of deep learning versus hand-crafted feature-based machine learning methods to new sites

2026-03-17T20:19:09Z

Quality assessment of brain structural MR images is critical for large-scale neuroimaging studies, where motion artifacts can significantly bias clinical estimates. While visual rating remains the gold standard, it is time-consuming and subjective. This study evaluates the relative performance and generalization capabilities of two prominent Automated Quality Assessment (AQA) methods: MRIQC, which uses hand-crafted image-quality metrics with traditional machine learning, and CNNQC, which utilizes a deep learning (DL) architecture. Using a heterogeneous dataset of 1,098 T1-weighted volumes from 17 different sites, we assessed performance on both seen sites and entirely new sites using a leave-one-site-out (LOSO) approach. Our results indicate that both DL and traditional ML methods struggle to generalize to new scanners or sites. While MRIQC generally achieved higher accuracy across most unseen sites, CNNQC demonstrated higher sensitivity for detecting poor-quality scans. Given that DL-based methods like CNNQC offer higher computational efficiency and do not require expensive pre-processing, they may be preferred for widespread deployment, provided that future work focuses on improving cross-site generalizability.

Preserving Vertical Structure in 3D-to-2D Projection for Permafrost Thaw Mapping

2026-03-17T17:01:13Z

Forecasting permafrost thaw from aerial lidar requires projecting 3D point cloud features onto 2D prediction grids, yet naive aggregation methods destroy the vertical structure critical in forest environments where ground, understory, and canopy carry distinct information about subsurface conditions. We propose a projection decoder with learned height embeddings that enable height-dependent feature transformations, allowing the network to differentiate ground-level signals from canopy returns. Combined with stratified sampling that ensures all forest strata remain represented, our approach preserves the vertical information critical for predicting subsurface conditions. Our approach pairs this decoder with a Point Transformer V3 encoder to predict dense thaw depth maps from drone-collected lidar over boreal forest in interior Alaska. Experiments demonstrate that z-stratified projection outperforms standard averaging-based methods, particularly in areas with complex vertical vegetation structure. Our method enables scalable, high-resolution monitoring of permafrost degradation from readily deployable UAV platforms.

LUMINA: A Multi-Vendor Mammography Benchmark with Energy Harmonization Protocol

2026-03-17T16:50:59Z

Publicly available full-field digital mammography (FFDM) datasets remain limited in size, clinical annotations, and vendor diversity, hindering the development of robust models. We introduce LUMINA, a curated, multi-vendor FFDM dataset that explicitly encodes acquisition energy and vendor metadata to capture clinically relevant appearance variations often overlooked in existing benchmarks. This dataset contains 1824 images from 468 patients (960 benign, 864 malignant), with pathology-confirmed labels, BI-RADS assessments, and breast-density annotations. LUMINA spans six acquisition systems and includes both high- and low-energy imaging styles, enabling systematic analysis of vendor- and energy-induced domain shifts. To address these variations, we propose a foreground-only pixel-space alignment method (''energy harmonization'') that maps images to a low-energy reference while preserving lesion morphology. We benchmark CNN and transformer models on three clinically relevant tasks: diagnosis (benign vs. malignant), BI-RADS classification, and density estimation. Two-view models consistently outperform single-view models. EfficientNet-B0 achieves an AUC of 93.54% for diagnosis, while Swin-T achieves the best macro-AUC of 89.43% for density prediction. Harmonization improves performance across architectures and produces more localized Grad-CAM responses. Overall, LUMINA provides (1) a vendor-diverse benchmark and (2) a model-agnostic harmonization framework for reliable and deployable mammography AI.

HistoAtlas: A Pan-Cancer Morphology Atlas Linking Histomics to Molecular Programs and Clinical Outcomes

2026-03-17T14:36:07Z

We present HistoAtlas, a pan-cancer computational atlas that extracts 38 interpretable histomic features from 6,745 diagnostic H&E slides across 21 TCGA cancer types and systematically links every feature to survival, gene expression, somatic mutations, and immune subtypes. All associations are covariate-adjusted, multiple-testing corrected, and classified into evidence-strength tiers. The atlas recovers known biology, from immune infiltration and prognosis to proliferation and kinase signaling, while uncovering compartment-specific immune signals and morphological subtypes with divergent outcomes. Every result is spatially traceable to tissue compartments and individual cells, statistically calibrated, and openly queryable. HistoAtlas enables systematic, large-scale biomarker discovery from routine H&E without specialized staining or sequencing. Data and an interactive web atlas are freely available at https://histoatlas.com .

Association of Progressive PPFE and Mortality in Lung Cancer Screening Cohorts

2026-03-17T14:06:33Z

Background: Pleuroparenchymal fibroelastosis (PPFE) is an upper lobe predominant fibrotic lung abnormality associated with increased mortality in established interstitial lung disease. However, the clinical significance of radiologic PPFE progression in lung cancer screening (LCS) populations remains unclear. Methods: We analysed longitudinal low-dose CT scans and clinical data from two LCS studies: National Lung Screening Trial (NLST; n=7,980); SUMMIT study (n=8,561). An automated algorithm quantified PPFE volume on baseline and follow-up scans. Annualised change in PPFE was derived and dichotomised using a distribution-based threshold to define progressive PPFE. Associations between progressive PPFE and mortality were evaluated using Cox proportional hazards models adjusted for demographic and clinical variables. In SUMMIT cohort, associations between progressive PPFE and clinical outcomes were assessed using incidence rate ratios (IRR) and odds ratios (OR). Findings: Progressive PPFE independently associated with mortality in both LCS cohorts (NLST: Hazard Ratio (HR)=1.25, 95% Confidence Interval (CI): 1.01--1.56, p=0.042; SUMMIT: HR=3.14, 95% CI: 1.66--5.97, p<0.001). Within SUMMIT, progressive PPFE was strongly associated with higher respiratory admissions (IRR=2.79, p<0.001), increased antibiotic and steroid use (IRR=1.55, p=0.010), and showed a trend towards higher modified medical research council scores (OR=1.40, p=0.055). Interpretation: Radiologic PPFE progression independently associates with mortality across two large LCS cohorts, and associates with adverse clinical outcomes. Quantitative assessment of PPFE progression may provide a clinically relevant imaging biomarker to identify individuals at increased risk of respiratory morbidity within LCS programmes.

Make it SING: Analyzing Semantic Invariants in Classifiers

2026-03-17T07:33:45Z

All classifiers, including state-of-the-art vision models, possess invariants, partially rooted in the geometry of their linear mappings. These invariants, which reside in the null-space of the classifier, induce equivalent sets of inputs that map to identical outputs. The semantic content of these invariants remains vague, as existing approaches struggle to provide human-interpretable information. To address this gap, we present Semantic Interpretation of the Null-space Geometry (SING), a method that constructs equivalent images, with respect to the network, and assigns semantic interpretations to the available variations. We use a mapping from network features to multi-modal vision language models. This allows us to obtain natural language descriptions and visual examples of the induced semantic shifts. SING can be applied to a single image, uncovering local invariants, or to sets of images, allowing a breadth of statistical analysis at the class and model levels. For example, our method reveals that ResNet50 leaks relevant semantic attributes to the null space, whereas DinoViT, a ViT pretrained with self-supervised DINO, is superior in maintaining class semantics across the invariant space.

Clinical Priors Guided Lung Disease Detection in 3D CT Scans

2026-03-17T06:19:11Z

Accurate classification of lung diseases from chest CT scans plays an important role in computer-aided diagnosis systems. However, medical imaging datasets often suffer from severe class imbalance, which may significantly degrade the performance of deep learning models, especially for minority disease categories. To address this issue, we propose a gender-aware two-stage lung disease classification framework. The proposed approach explicitly incorporates gender information into the disease recognition pipeline. In the first stage, a gender classifier is trained to predict the patient's gender from CT scans. In the second stage, the input CT image is routed to a corresponding gender-specific disease classifier to perform final disease prediction. This design enables the model to better capture gender-related imaging characteristics and alleviate the influence of imbalanced data distribution. Experimental results demonstrate that the proposed method improves the recognition performance for minority disease categories, particularly squamous cell carcinoma, while maintaining competitive performance on other classes.

Deformation-Invariant Neural Network and Its Applications in Distorted Image Restoration and Analysis

2026-03-17T04:26:25Z

Images degraded by geometric distortions pose a significant challenge to imaging and computer vision tasks such as object recognition. Deep learning-based imaging models usually fail to give accurate performance for geometrically distorted images. In this paper, we propose the deformation-invariant neural network (DINN), a framework to address the problem of imaging tasks for geometrically distorted images. The DINN outputs consistent latent features for images that are geometrically distorted but represent the same underlying object or scene. The idea of DINN is to incorporate a simple component, called the quasiconformal transformer network (QCTN), into other existing deep networks for imaging tasks. The QCTN is a deep neural network that outputs a quasiconformal map, which can be used to transform a geometrically distorted image into an improved version that is closer to the distribution of natural or good images. It first outputs a Beltrami coefficient, which measures the quasiconformality of the output deformation map. By controlling the Beltrami coefficient, the local geometric distortion under the quasiconformal mapping can be controlled. The QCTN is lightweight and simple, which can be readily integrated into other existing deep neural networks to enhance their performance. Leveraging our framework, we have developed an image classification network that achieves accurate classification of distorted images. Our proposed framework has been applied to restore geometrically distorted images by atmospheric turbulence and water turbulence. DINN outperforms existing GAN-based restoration methods under these scenarios, demonstrating the effectiveness of the proposed framework. Additionally, we apply our proposed framework to the 1-1 verification of human face images under atmospheric turbulence and achieve satisfactory performance, further demonstrating the efficacy of our approach.

Three-Dimensional MRI Reconstruction with Gaussian Representations: Tackling the Undersampling Problem

2026-03-17T03:32:44Z

Three-Dimensional Gaussian Splatting (3DGS) has shown substantial promise in the field of computer vision, but remains unexplored in the field of magnetic resonance imaging (MRI). This study explores its potential for the reconstruction of isotropic resolution 3D MRI from undersampled k-space data. We introduce a novel framework termed 3D Gaussian MRI (3DGSMR), which employs 3D Gaussian distributions as an explicit representation for MR volumes. Experimental evaluations indicate that this method can effectively reconstruct voxelized MR images, achieving a quality on par with that of well-established 3D MRI reconstruction techniques found in the literature. Notably, the 3DGSMR scheme operates under a self-supervised framework, obviating the need for extensive training datasets or prior model training. This approach introduces significant innovations to the domain, notably the adaptation of 3DGS to MRI reconstruction and the novel application of the existing 3DGS methodology to decompose MR signals, which are presented in a complex-valued format.