TY - GEN
T1 - Selecting training samples for ovarian cancer classification via a semi-supervised clustering approach
AU - Salguero L., Jennifer
AU - Prasanna, Prateek
AU - Corredor, Germán
AU - Cruz-Roa, Angel
AU - Becerra, David
AU - Romero, Eduardo
N1 - Publisher Copyright:
© COPYRIGHT SPIE.
PY - 2022
Y1 - 2022
N2 - Machine learning techniques have shown great promise in digital pathology. However, a major bottleneck is the difficulty of annotating necessary amount of tissue to deal with several variability factors, namely chemical fixation, sample slicing, or staining. Usually, models are trained using sets of annotated small image patches, but then, the number of required patches may increase exponentially and yet they must represent such variability. This paper presents a method for automatic sample selection to train a classifier for ovarian cancer by integrating a novel soft clustering strategy. The method starts by classifying a large set of patches with a previously trained classifier and divide patches from the cancer class as highly and moderately confident. An unsupervised selection of moderately confident patches by a Probabilistic Latent Semantic Analysis (PLSA), picks samples from relevant and meaningful groups with maximum within-group variance. A new model is re-trained using the highly confident patches together with patches obtained from the associated PLSA. This strategy outperforms a model trained with a larger set of annotated patches while the training times and the number of samples are much more smaller. The strategy was evaluated in a set of patches from 18 patients with Serous Ovarian Cancer, obtaining a reduction of 54.62% in the training time and 73.66% in the number of samples, while recall rate improved from 0.69 to 0.73.
AB - Machine learning techniques have shown great promise in digital pathology. However, a major bottleneck is the difficulty of annotating necessary amount of tissue to deal with several variability factors, namely chemical fixation, sample slicing, or staining. Usually, models are trained using sets of annotated small image patches, but then, the number of required patches may increase exponentially and yet they must represent such variability. This paper presents a method for automatic sample selection to train a classifier for ovarian cancer by integrating a novel soft clustering strategy. The method starts by classifying a large set of patches with a previously trained classifier and divide patches from the cancer class as highly and moderately confident. An unsupervised selection of moderately confident patches by a Probabilistic Latent Semantic Analysis (PLSA), picks samples from relevant and meaningful groups with maximum within-group variance. A new model is re-trained using the highly confident patches together with patches obtained from the associated PLSA. This strategy outperforms a model trained with a larger set of annotated patches while the training times and the number of samples are much more smaller. The strategy was evaluated in a set of patches from 18 patients with Serous Ovarian Cancer, obtaining a reduction of 54.62% in the training time and 73.66% in the number of samples, while recall rate improved from 0.69 to 0.73.
KW - Pathologist navigation; Decision Support
KW - Probabilistic Latent Semantic Analysis
KW - Serous ovarian Cancer
UR - https://www.scopus.com/pages/publications/85132805551
U2 - 10.1117/12.2612984
DO - 10.1117/12.2612984
M3 - Conference contribution
AN - SCOPUS:85132805551
T3 - Progress in Biomedical Optics and Imaging - Proceedings of SPIE
BT - Medical Imaging 2022
A2 - Tomaszewski, John E.
A2 - Ward, Aaron D.
A2 - Levenson, Richard M.
PB - SPIE
T2 - Medical Imaging 2022: Digital and Computational Pathology
Y2 - 21 March 2022 through 27 March 2022
ER -