Skip to main navigation Skip to search Skip to main content

Selecting training samples for ovarian cancer classification via a semi-supervised clustering approach

  • Jennifer Salguero L.
  • , Prateek Prasanna
  • , Germán Corredor
  • , Angel Cruz-Roa
  • , David Becerra
  • , Eduardo Romero
  • Universidad Nacional de Colombia
  • Case Western Reserve University
  • Louis Stokes VA Medical Center
  • Universidad de los Llanos

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Machine learning techniques have shown great promise in digital pathology. However, a major bottleneck is the difficulty of annotating necessary amount of tissue to deal with several variability factors, namely chemical fixation, sample slicing, or staining. Usually, models are trained using sets of annotated small image patches, but then, the number of required patches may increase exponentially and yet they must represent such variability. This paper presents a method for automatic sample selection to train a classifier for ovarian cancer by integrating a novel soft clustering strategy. The method starts by classifying a large set of patches with a previously trained classifier and divide patches from the cancer class as highly and moderately confident. An unsupervised selection of moderately confident patches by a Probabilistic Latent Semantic Analysis (PLSA), picks samples from relevant and meaningful groups with maximum within-group variance. A new model is re-trained using the highly confident patches together with patches obtained from the associated PLSA. This strategy outperforms a model trained with a larger set of annotated patches while the training times and the number of samples are much more smaller. The strategy was evaluated in a set of patches from 18 patients with Serous Ovarian Cancer, obtaining a reduction of 54.62% in the training time and 73.66% in the number of samples, while recall rate improved from 0.69 to 0.73.

Original languageEnglish
Title of host publicationMedical Imaging 2022
Subtitle of host publicationDigital and Computational Pathology
EditorsJohn E. Tomaszewski, Aaron D. Ward, Richard M. Levenson
PublisherSPIE
ISBN (Electronic)9781510649538
DOIs
StatePublished - 2022
EventMedical Imaging 2022: Digital and Computational Pathology - Virtual, Online
Duration: Mar 21 2022Mar 27 2022

Publication series

NameProgress in Biomedical Optics and Imaging - Proceedings of SPIE
Volume12039
ISSN (Print)1605-7422

Conference

ConferenceMedical Imaging 2022: Digital and Computational Pathology
CityVirtual, Online
Period03/21/2203/27/22

Keywords

  • Pathologist navigation; Decision Support
  • Probabilistic Latent Semantic Analysis
  • Serous ovarian Cancer

Fingerprint

Dive into the research topics of 'Selecting training samples for ovarian cancer classification via a semi-supervised clustering approach'. Together they form a unique fingerprint.

Cite this