TY - JOUR
T1 - Generalisable long COVID subtypes
T2 - Findings from the NIH N3C and RECOVER programmes
AU - N3C Consortium
AU - RECOVER Consortium
AU - Reese, Justin T.
AU - Blau, Hannah
AU - Casiraghi, Elena
AU - Bergquist, Timothy
AU - Loomba, Johanna J.
AU - Callahan, Tiffany J.
AU - Laraway, Bryan
AU - Antonescu, Corneliu
AU - Coleman, Ben
AU - Gargano, Michael
AU - Wilkins, Kenneth J.
AU - Cappelletti, Luca
AU - Fontana, Tommaso
AU - Ammar, Nariman
AU - Antony, Blessy
AU - Murali, T. M.
AU - Caufield, J. Harry
AU - Karlebach, Guy
AU - McMurry, Julie A.
AU - Williams, Andrew
AU - Moffitt, Richard
AU - Banerjee, Jineta
AU - Solomonides, Anthony E.
AU - Davis, Hannah
AU - Kostka, Kristin
AU - Valentini, Giorgio
AU - Sahner, David
AU - Chute, Christopher G.
AU - Madlock-Brown, Charisse
AU - Haendel, Melissa A.
AU - Robinson, Peter N.
AU - Spratt, Heidi
AU - Visweswaran, Shyam
AU - Flack, Joseph Eugene
AU - Yoo, Yun Jae
AU - Gabriel, Davera
AU - Alexander, G. Caleb
AU - Mehta, Hemalkumar B.
AU - Liu, Feifan
AU - Miller, Robert T.
AU - Wong, Rachel
AU - Hill, Elaine L.
AU - Thorpe, Lorna E.
AU - Divers, Jasmin
N1 - Publisher Copyright:
© 2022 The Authors
PY - 2023/1
Y1 - 2023/1
N2 - Background: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. Methods: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. Findings: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. Interpretation: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. Funding: NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.
AB - Background: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. Methods: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. Findings: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. Interpretation: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. Funding: NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.
KW - COVID-19
KW - Human Phenotype Ontology
KW - Long COVID
KW - Machine learning
KW - Precision medicine
KW - Semantic similarity
UR - https://www.scopus.com/pages/publications/85144488606
U2 - 10.1016/j.ebiom.2022.104413
DO - 10.1016/j.ebiom.2022.104413
M3 - Article
C2 - 36563487
AN - SCOPUS:85144488606
SN - 2352-3964
VL - 87
JO - EBioMedicine
JF - EBioMedicine
M1 - 104413
ER -