Skip to main navigation Skip to search Skip to main content

Text-Driven Weakly Supervised OCT Lesion Segmentation With Structural Guidance

  • Jiaqi Yang
  • , Nitish Mehta
  • , Xiaoling Hu
  • , Chao Chen
  • , Chia Ling Tsai
  • City University of New York
  • New York University
  • Harvard University

Research output: Contribution to journalArticlepeer-review

Abstract

Accurate segmentation of Optical Coherence Tomography (OCT) images is crucial for diagnosing and monitoring retinal diseases. However, the labor-intensive nature of pixel-level annotation limits the scalability of supervised learning for large datasets. Weakly Supervised Semantic Segmentation (WSSS) offers a promising alternative by using weaker forms of supervision, such as image-level labels, to reduce the annotation burden. Despite its advantages, weak supervision inherently carries limited information. We propose a novel WSSS framework with only image-level labels for OCT lesion segmentation that integrates structural and text-driven guidance to produce high-quality, pixel-level pseudo labels. The framework employs two visual processing modules: one that processes the original OCT images and another that operates on layer segmentations augmented with anomalous signals, enabling the model to associate lesions with their corresponding anatomical layers. Complementing these visual cues, we leverage large-scale pretrained models to provide two forms of textual guidance: label-derived descriptions that encode local semantics, and domain-agnostic synthetic descriptions that, although expressed in natural image terms, capture spatial and relational semantics useful for generating globally consistent representations. By fusing these visual and textual features in a multimodal framework, our method aligns semantic meaning with structural relevance, thereby improving lesion localization and segmentation performance. Experiments on three OCT datasets demonstrate state-of-the-art results, highlighting its potential to advance diagnostic accuracy and efficiency in medical imaging.

Original languageEnglish
Pages (from-to)3408-3421
Number of pages14
JournalIEEE Journal of Biomedical and Health Informatics
Volume30
Issue number4
DOIs
StatePublished - Apr 1 2026

Keywords

  • Multimodal learning
  • retinal OCT lesion segmentation
  • structural guidance
  • vision-language models
  • weakly supervised semantic segmentation

Fingerprint

Dive into the research topics of 'Text-Driven Weakly Supervised OCT Lesion Segmentation With Structural Guidance'. Together they form a unique fingerprint.

Cite this