Skip to main navigation Skip to search Skip to main content

Evaluation of crowdsourced mortality prediction models as a framework for assessing artificial intelligence in medicine

  • the Patient Mortality Prediction DREAM Challenge Consortium
  • Sage Bionetworks
  • University of Washington
  • University of Wisconsin-Madison
  • Proacta
  • University of Warsaw
  • University of Illinois at Chicago
  • University of Cambridge
  • Newcastle University
  • Imperial College London
  • Korea University
  • University of Washington
  • University of Rostock
  • ESAC Inc.
  • Georgetown University
  • University of Greifswald
  • Clalit Health Services
  • University of Luxembourg
  • Helmholtz Zentrum München - German Research Center for Environmental Health
  • Ludwig Maximilian University of Munich
  • CAS - Chongqing Institute of Green and Intelligent Technology
  • German Centre for Cardiovascular Research
  • University of Patras
  • PJM Consulting LLC

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Objective: Applications of machine learning in healthcare are of high interest and have the potential to improve patient care. Yet, the real-world accuracy of these models in clinical practice and on different patient subpopulations remains unclear. To address these important questions, we hosted a community challenge to evaluate methods that predict healthcare outcomes. We focused on the prediction of all-cause mortality as the community challenge question. Materials and methods: Using a Model-to-Data framework, 345 registered participants, coalescing into 25 independent teams, spread over 3 continents and 10 countries, generated 25 accurate models all trained on a dataset of over 1.1 million patients and evaluated on patients prospectively collected over a 1-year observation of a large health system. Results: The top performing team achieved a final area under the receiver operator curve of 0.947 (95% CI, 0.942-0.951) and an area under the precision-recall curve of 0.487 (95% CI, 0.458-0.499) on a prospectively collected patient cohort. Discussion: Post hoc analysis after the challenge revealed that models differ in accuracy on subpopulations, delineated by race or gender, even when they are trained on the same data. Conclusion: This is the largest community challenge focused on the evaluation of state-of-the-art machine learning methods in a healthcare system performed to date, revealing both opportunities and pitfalls of clinical AI.

Original languageEnglish
Pages (from-to)35-44
Number of pages10
JournalJournal of the American Medical Informatics Association
Volume31
Issue number1
DOIs
StatePublished - Jan 1 2024

Keywords

  • evaluation
  • health informatics
  • machine learning

Fingerprint

Dive into the research topics of 'Evaluation of crowdsourced mortality prediction models as a framework for assessing artificial intelligence in medicine'. Together they form a unique fingerprint.

Cite this