https://arxiv.org/api/w1oaC5UR81ed3J1zDyv3SrjuQ7s 2026-04-11T11:50:18Z 30509 210 15 http://arxiv.org/abs/2501.04741v6 Rethinking domain generalization in medical image segmentation: One image as one domain 2026-03-21T07:07:24Z

Domain shifts in medical image segmentation, particularly when data comes from different centers, pose significant challenges. Intra-center variability, such as differences in scanner models or imaging protocols, can cause domain shifts as large as, or even larger than, those between centers. To address this, we propose the "one image as one domain" (OIOD) hypothesis, which treats each image as a unique domain, enabling flexible and robust domain generalization. Based on this hypothesis, we develop a unified disentanglement-based domain generalization (UniDDG) framework, which simultaneously handles both multi-source and single-source domain generalization without requiring explicit domain labels. This approach simplifies training with a fixed architecture, independent of the number of source domains, reducing complexity and enhancing scalability. We decouple each input image into content representation and style code, then exchange and combine these within the batch for segmentation, reconstruction, and further disentanglement. By maintaining distinct style codes for each image, our model ensures thorough decoupling of content representations and style codes, improving domain invariance of the content representations. Additionally, we enhance generalization with expansion mask attention (EMA) for boundary preservation and style augmentation (SA) to simulate diverse image styles, improving robustness to domain shifts. Extensive experiments show that our method achieves Dice scores of 84.43% and 88.91% for multi-source to single-center and single-center generalization in optic disc and optic cup segmentation, respectively, and 86.96% and 88.56% for prostate segmentation, outperforming current state-of-the-art domain generalization methods, offering superior performance and adaptability across clinical settings.

2025-01-08T03:29:52Z Jin Hong Bo Liu http://arxiv.org/abs/2312.04140v3 Polarimetric Light Transport Analysis for Specular Inter-reflection 2026-03-20T21:43:29Z

Polarization is well known for its ability to decompose diffuse and specular reflections. However, the existing decomposition methods only focus on direct reflection and overlook multiple reflections, especially specular inter-reflection. In this paper, we propose a novel decomposition method for handling specular inter-reflection of metal objects by using a unique polarimetric feature: the rotation direction of linear polarization. This rotation direction serves as a discriminative factor between direct and inter-reflection on specular surfaces. To decompose the reflectance components, we actively rotate the linear polarization of incident light and analyze the rotation direction of the reflected light. We evaluate our method using both synthetic and real data, demonstrating its effectiveness in decomposing specular inter-reflections of metal objects. Furthermore, we demonstrate that our method can be combined with other decomposition methods for a detailed analysis of light transport. As a practical application, we show its effectiveness in improving the accuracy of 3D measurement against strong specular inter-reflection.

2023-12-07T08:55:28Z Accepted to IEEE Transactions on Computational Imaging (TCI) Ryota Maeda Shinsaku Hiura 10.1109/TCI.2024.3404612 http://arxiv.org/abs/2603.20448v1 Thermal is Always Wild: Characterizing and Addressing Challenges in Thermal-Only Novel View Synthesis 2026-03-20T19:25:04Z

Thermal cameras provide reliable visibility in darkness and adverse conditions, but thermal imagery remains significantly harder to use for novel view synthesis (NVS) than visible-light images. This difficulty stems primarily from two characteristics of affordable thermal sensors. First, thermal images have extremely low dynamic range, which weakens appearance cues and limits the gradients available for optimization. Second, thermal data exhibit rapid frame-to-frame photometric fluctuations together with slow radiometric drift, both of which destabilize correspondence estimation and create high-frequency floater artifacts during view synthesis, particularly when no RGB guidance (beyond camera pose) is available. Guided by these observations, we introduce a lightweight preprocessing and splatting pipeline that expands usable dynamic range and stabilizes per-frame photometry. Our approach achieves state-of-the-art performance across thermal-only NVS benchmarks, without requiring any dataset-specific tuning.

2026-03-20T19:25:04Z To be published at CVPR, 2026. 15 Pages, 29 Figures M. Kerem Aydin Vishwanath Saragadam Emma Alexander http://arxiv.org/abs/2603.20077v1 A Unified Platform and Quality Assurance Framework for 3D Ultrasound Reconstruction with Robotic, Optical, and Electromagnetic Tracking 2026-03-20T15:56:50Z

Three-dimensional (3D) Ultrasound (US) can facilitate diagnosis, treatment planning, and image-guided therapy. However, current studies rarely provide a comprehensive evaluation of volumetric accuracy and reproducibility, highlighting the need for robust Quality Assurance (QA) frameworks, particularly for tracked 3D US reconstruction using freehand or robotic acquisition. This study presents a QA framework for 3D US reconstruction and a flexible open source platform for tracked US research. A custom phantom containing geometric inclusions with varying symmetry properties enables straightforward evaluation of optical, electromagnetic, and robotic kinematic tracking for 3D US at different scanning speeds and insonation angles. A standardised pipeline performs real-time segmentation and 3D reconstruction of geometric targets (DSC = 0.97, FPS = 46) without GPU acceleration, followed by automated registration and comparison with ground-truth geometries. Applying this framework showed that our robotic 3D US achieves state-of-the-art reconstruction performance (DSC-3D = 0.94 +- 0.01, HD95 = 1.17 +- 0.12), approaching the spatial resolution limit imposed by the transducer. This work establishes a flexible experimental platform and a reproducible validation methodology for 3D US reconstruction. The proposed framework enables robust cross-platform comparisons and improved reporting practices, supporting the safe and effective clinical translation of 3D ultrasound in diagnostic and image-guided therapy applications.

2026-03-20T15:56:50Z This work has been submitted to the IEEE for possible publication Lewis Howell Manisha Waterston Tze Min Wah James H. Chandler James R. McLaughlan http://arxiv.org/abs/2603.20045v1 Investigating a Policy-Based Formulation for Endoscopic Camera Pose Recovery 2026-03-20T15:30:59Z

In endoscopic surgery, surgeons continuously locate the endoscopic view relative to the anatomy by interpreting the evolving visual appearance of the intraoperative scene in the context of their prior knowledge. Vision-based navigation systems seek to replicate this capability by recovering camera pose directly from endoscopic video, but most approaches do not embody the same principles of reasoning about new frames that makes surgeons successful. Instead, they remain grounded in feature matching and geometric optimization over keyframes, an approach that has been shown to degrade under the challenging conditions of endoscopic imaging like low texture and rapid illumination changes. Here, we pursue an alternative approach and investigate a policy-based formulation of endoscopic camera pose recovery that seeks to imitate experts in estimating trajectories conditioned on the previous camera state. Our approach directly predicts short-horizon relative motions without maintaining an explicit geometric representation at inference time. It thus addresses, by design, some of the notorious challenges of geometry-based approaches, such as brittle correspondence matching, instability in texture-sparse regions, and limited pose coverage due to reconstruction failure. We evaluate the proposed formulation on cadaveric sinus endoscopy. Under oracle state conditioning, we compare short-horizon motion prediction quality to geometric baselines achieving lowest mean translation error and competitive rotational accuracy. We analyze robustness by grouping prediction windows according to texture richness and illumination change indicating reduced sensitivity to low-texture conditions. These findings suggest that a learned motion policy offers a viable alternative formulation for endoscopic camera pose recovery.

2026-03-20T15:30:59Z Jan Emily Mangulabnan Akshat Chauhan Laura Fleig Lalithkumar Seenivasan Roger D. Soberanis-Mukul S. Swaroop Vedula Russell H. Taylor Masaru Ishii Gregory D. Hager Mathias Unberath http://arxiv.org/abs/2603.19994v1 Evaluating Test-Time Adaptation For Facial Expression Recognition Under Natural Cross-Dataset Distribution Shifts 2026-03-20T14:44:25Z

Deep learning models often struggle under natural distribution shifts, a common challenge in real-world deployments. Test-Time Adaptation (TTA) addresses this by adapting models during inference without labeled source data. We present the first evaluation of TTA methods for FER under natural domain shifts, performing cross-dataset experiments with widely used FER datasets. This moves beyond synthetic corruptions to examine real-world shifts caused by differing collection protocols, annotation standards, and demographics. Results show TTA can boost FER performance under natural shifts by up to 11.34\%. Entropy minimization methods such as TENT and SAR perform best when the target distribution is clean. In contrast, prototype adjustment methods like T3A excel under larger distributional distance scenarios. Finally, feature alignment methods such as SHOT deliver the largest gains when the target distribution is noisier than our source. Our cross-dataset analysis shows that TTA effectiveness is governed by the distributional distance and the severity of the natural shift across domains.

2026-03-20T14:44:25Z Accepted at ICASSP 2026 John Turnbull Shivam Grover Amin Jalali Ali Etemad http://arxiv.org/abs/2506.21349v6 Electromagnetic Inverse Scattering from a Single Transmitter 2026-03-20T14:16:54Z

Electromagnetic Inverse Scattering Problems (EISP) seek to reconstruct relative permittivity from scattered fields and are fundamental to applications like medical imaging. This inverse process is inherently ill-posed and highly nonlinear, making it particularly challenging, especially under sparse transmitter setups, e.g., with only one transmitter. While recent machine learning-based approaches have shown promising results, they often rely on time-consuming, case-specific optimization and perform poorly under sparse transmitter setups. To address these limitations, we revisit EISP from a data-driven perspective. The scarcity of transmitters leads to an insufficient amount of measured data, which fails to capture adequate physical information for stable inversion. Accordingly, we propose a fully end-to-end and data-driven framework that predicts the relative permittivity of scatterers from measured fields, leveraging data distribution priors to compensate for the incomplete information from sparse measurements. This design enables data-driven training and feed-forward prediction of relative permittivity while maintaining strong robustness to transmitter sparsity. Extensive experiments show that our method outperforms state-of-the-art approaches in reconstruction accuracy and robustness. Notably, we demonstrate, for the first time, high-quality reconstruction from a single transmitter. This work advances practical electromagnetic imaging by providing a new, cost-effective paradigm to inverse scattering. Code and models are released at https://gomenei.github.io/SingleTX-EISP/.

2025-06-26T15:02:50Z Yizhe Cheng Chunxun Tian Haoru Wang Wentao Zhu Xiaoxuan Ma Yizhou Wang http://arxiv.org/abs/2603.20355v1 CaroTo: A Tool for Fast Comprehensive Analysis of Carotid Artery Stenosis in 4D PC- and 3D BB-MRI Data 2026-03-20T13:28:28Z

Atherosclerosis of the carotid artery increases stroke risk. Atherosclerosis assessment with MRI requires multimodal and multidimensional segmentation of the carotid artery, reproducible extraction of biomarkers, and the visualization of segmentations and biomarkers. We developed CaroTo, a tool that allows for standardized carotid atherosclerosis assessment. It combines the capabilities of MEVISFlow with specialized tools for carotid geometry and vessel wall assessment. It supports manual and automatic segmentation for 2D, 2D+time, and 3D images, facilitating precise and consistent evaluations of carotid artery stenosis.

2026-03-20T13:28:28Z VCBM 2024, Poster Honorable Mention Hinrich Rahlfs Markus Hüllebrand Sebastian Schmitter Jonathan Andrae Christoph Strecker Andreas Harloff Anja Hennemuth http://arxiv.org/abs/2603.19925v1 ReconMIL: Synergizing Latent Space Reconstruction with Bi-Stream Mamba for Whole Slide Image Analysis 2026-03-20T13:09:54Z

Whole slide image (WSI) analysis heavily relies on multiple instance learning (MIL). While recent methods benefit from large-scale foundation models and advanced sequence modeling to capture long-range dependencies, they still struggle with two critical issues. First, directly applying frozen, task-agnostic features often leads to suboptimal separability due to the domain gap with specific histological tasks. Second, relying solely on global aggregators can cause over-smoothing, where sparse but critical diagnostic signals are overshadowed by the dominant background context. In this paper, we present ReconMIL, a novel framework designed to bridge this domain gap and balance global-local feature aggregation. Our approach introduces a Latent Space Reconstruction module that adaptively projects generic features into a compact, task-specific manifold, improving boundary delineation. To prevent information dilution, we develop a bi-stream architecture combining a Mamba-based global stream for contextual priors and a CNN-based local stream to preserve subtle morphological anomalies. A scale-adaptive selection mechanism dynamically fuses these two streams, determining when to rely on overall architecture versus local saliency. Evaluations across multiple diagnostic and survival prediction benchmarks show that ReconMIL consistently outperforms current state-of-the-art methods, effectively localizing fine-grained diagnostic regions while suppressing background noise. Visualization results confirm the models superior ability to localize diagnostic regions by effectively balancing global structure and local granularity.

2026-03-20T13:09:54Z Lubin Gan Jing Zhang Heng Zhang Xin Di Zhifeng Wang Wenke Huang Xiaoyan Sun http://arxiv.org/abs/2603.20353v1 Scene Representation using 360° Saliency Graph and its Application in Vision-based Indoor Navigation 2026-03-20T12:44:25Z

A Scene, represented visually using different formats such as RGB-D, LiDAR scan, keypoints, rectangular, spherical, multi-views, etc., contains information implicitly embedded relevant to applications such as scene indexing, vision-based navigation. Thus, these representations may not be efficient for such applications. This paper proposes a novel 360° saliency graph representation of the scenes. This rich representation explicitly encodes the relevant visual, contextual, semantic, and geometric information of the scene as nodes, edges, edge weights, and angular position in the 360° graph. Also, this representation is robust against scene view change and addresses challenges of indoor environments such as varied illumination, occlusions, and shadows as in the case of existing traditional methods. We have utilized this rich and efficient representation for vision-based navigation and compared it with existing navigation methods using 360° scenes. However, these existing methods suffer from limitations of poor scene representation, lacking scene-specific information. This work utilizes the proposed representation first to localize the query scene in the given topological map, and then facilitate 2D navigation by estimating the next required movement directions towards the target destination in the topological map by using the embedded geometric information in the 360° saliency graph. Experimental results demonstrate the efficacy of the proposed 360° saliency graph representation in enhancing both scene localization and vision-based indoor navigation.

2026-03-20T12:44:25Z Preeti Meena Himanshu Kumar Sandeep Yadav http://arxiv.org/abs/2603.19801v1 Offshore oil and gas platform dynamics in the North Sea, Gulf of Mexico, and Persian Gulf: Exploiting the Sentinel-1 archive 2026-03-20T09:40:32Z

The increasing use of marine spaces by offshore infrastructure, including oil and gas platforms, underscores the need for consistent, scalable monitoring. Offshore development has economic, environmental, and regulatory implications, yet maritime areas remain difficult to monitor systematically due to their inaccessibility and spatial extent. This study presents an automated approach to the spatiotemporal detection of offshore oil and gas platforms based on freely available Earth observation data. Leveraging Sentinel-1 archive data and deep learning-based object detection, a consistent quarterly time series of platform locations for three major production regions: the North Sea, the Gulf of Mexico, and the Persian Gulf, was created for the period 2017-2025. In addition, platform size, water depth, distance to the coast, national affiliation, and installation and decommissioning dates were derived. 3,728 offshore platforms were identified in 2025, 356 in the North Sea, 1,641 in the Gulf of Mexico, and 1,731 in the Persian Gulf. While expansion was observed in the Persian Gulf until 2024, the Gulf of Mexico and the North Sea saw a decline in platform numbers from 2018-2020. At the same time, a pronounced dynamic was apparent. More than 2,700 platforms were installed or relocated to new sites, while a comparable number were decommissioned or relocated. Furthermore, the increasing number of platforms with short lifespans points to a structural change in the offshore sector associated with the growing importance of mobile offshore units such as jack-ups or drillships. The results highlighted the potential of freely available Earth observation data and deep learning for consistent, long-term monitoring of marine infrastructure. The derived dataset is public and provides a basis for offshore monitoring, maritime planning, and analyses of the transformation of the offshore energy sector.

2026-03-20T09:40:32Z 16 pages, 10 figures, 1 table Robin Spanier Thorsten Hoeser John Truckenbrodt Felix Bachofer Claudia Kuenzer http://arxiv.org/abs/2603.18123v2 Understanding Task Aggregation for Generalizable Ultrasound Foundation Models 2026-03-20T09:24:34Z

Foundation models promise to unify multiple clinical tasks within a single framework, but recent ultrasound studies report that unified models can underperform task-specific baselines. We hypothesize that this degradation arises not from model capacity limitations, but from task aggregation strategies that ignore interactions between task heterogeneity and available training data scale. In this work, we systematically analyze when heterogeneous ultrasound tasks can be jointly learned without performance loss, establishing practical criteria for task aggregation in unified clinical imaging models. We introduce M2DINO, a multi-organ, multi-task framework built on DINOv3 with task-conditioned Mixture-of-Experts blocks for adaptive capacity allocation. We systematically evaluate 27 ultrasound tasks spanning segmentation, classification, detection, and regression under three paradigms: task-specific, clinically-grouped, and all-task unified training. Our results show that aggregation effectiveness depends strongly on training data scale. While clinically-grouped training can improve performance in data-rich settings, it may induce substantial negative transfer in low-data settings. In contrast, all-task unified training exhibits more consistent performance across clinical groups. We further observe that task sensitivity varies by task type in our experiments: segmentation shows the largest performance drops compared with regression and classification. These findings provide practical guidance for ultrasound foundation models, emphasizing that aggregation strategies should jointly consider training data availability and task characteristics rather than relying on clinical taxonomy alone.

2026-03-18T16:43:43Z Fangyijie Wang Tanya Akumu Vien Ngoc Dang Amelia Jiménez-Sánchez Jieyun Bai Guénolé Silvestre Karim Lekadir Kathleen M. Curran http://arxiv.org/abs/2511.17038v2 DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing 2026-03-19T21:30:58Z

From a Bayesian perspective, score-based diffusion solves inverse problems through joint inference, embedding the likelihood with the prior to guide the sampling process. However, this formulation fails to explain its practical behavior: the prior offers limited guidance, while reconstruction is largely driven by the measurement-consistency term, leading to an inference process that is effectively decoupled from the diffusion dynamics. We show that the diffusion prior in these solvers functions primarily as a warm initializer that places estimates near the data manifold, while reconstruction is driven almost entirely by measurement consistency. Based on this observation, we introduce \textbf{DAPS++}, which fully decouples diffusion-based initialization from likelihood-driven refinement, allowing the likelihood term to guide inference more directly while maintaining numerical stability and providing insight into why unified diffusion trajectories remain effective in practice. By requiring fewer function evaluations (NFEs) and measurement-optimization steps, \textbf{DAPS++} achieves high computational efficiency and robust reconstruction performance across diverse image restoration tasks.

2025-11-21T08:28:36Z Hao Chen Renzheng Zhang Scott S. Howard http://arxiv.org/abs/2603.19386v1 TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis 2026-03-19T18:22:09Z

Contrast-enhanced magnetic resonance imaging (CE-MRI) plays a crucial role in brain tumor assessment; however, its acquisition requires gadolinium-based contrast agents (GBCAs), which increase costs and raise safety concerns. Consequently, synthesizing CE-MRI from non-contrast MRI (NC-MRI) has emerged as a promising alternative. Early Generative Adversarial Network (GAN)-based approaches suffered from instability and mode collapse, while diffusion models, despite impressive synthesis quality, remain computationally expensive and often fail to faithfully reproduce critical tumor contrast patterns. To address these limitations, we propose Tumor-Biased Latent Bridge Matching (TuLaBM), which formulates NC-to-CE MRI translation as Brownian bridge transport between source and target distributions in a learned latent space, enabling efficient training and inference. To enhance tumor-region fidelity, we introduce a Tumor-Biased Attention Mechanism (TuBAM) that amplifies tumor-relevant latent features during bridge evolution, along with a boundary-aware loss that constrains tumor interfaces to improve margin sharpness. While bridge matching has been explored for medical image translation in pixel space, our latent formulation substantially reduces computational cost and inference time. Experiments on BraTS2023-GLI (BraSyn) and Cleveland Clinic (in-house) liver MRI dataset show that TuLaBM consistently outperforms state-of-the-art baselines on both whole-image and tumor-region metrics, generalizes effectively to unseen liver MRI data in zero-shot and fine-tuned settings, and achieves inference times under 0.097 seconds per image.

2026-03-19T18:22:09Z Atharva Rege Adinath Madhavrao Dukre Numan Balci Dwarikanath Mahapatra Imran Razzak http://arxiv.org/abs/2603.19187v1 GenMFSR: Generative Multi-Frame Image Restoration and Super-Resolution 2026-03-19T17:43:16Z

Camera pipelines receive raw Bayer-format frames that need to be denoised, demosaiced, and often super-resolved. Multiple frames are captured to utilize natural hand tremors and enhance resolution. Multi-frame super-resolution is therefore a fundamental problem in camera pipelines. Existing adversarial methods are constrained by the quality of ground truth. We propose GenMFSR, the first Generative Multi-Frame Raw-to-RGB Super Resolution pipeline, that incorporates image priors from foundation models to obtain sub-pixel information for camera ISP applications. GenMFSR can align multiple raw frames, unlike existing single-frame super-resolution methods, and we propose a loss term that restricts generation to high-frequency regions in the raw domain, thus preventing low-frequency artifacts.

2026-03-19T17:43:16Z Harshana Weligampola Joshua Peter Ebenezer Weidi Liu Abhinau K. Venkataramanan Sreenithy Chandran Seok-Jun Lee Hamid Rahim Sheikh