Characterising risk of in-hospital mortality following cardiac arrest using machine learning: A retrospective international registry study

by Shane Nanayakkara, Sam Fogarty, Michael Tremeer, Kelvin Ross, Brent Richards, Christoph Bergmeir, Sheng Xu, Dion Stub, Karen Smith, Mark Tacey, Danny Liew, David Pilcher, David M. Kaye


Resuscitated cardiac arrest is associated with high mortality; however, the ability to estimate risk of adverse outcomes using existing illness severity scores is limited. Using in-hospital data available within the first 24 hours of admission, we aimed to develop more accurate models of risk prediction using both logistic regression (LR) and machine learning (ML) techniques, with a combination of demographic, physiologic, and biochemical information.

Methods and findings

Patient-level data were extracted from the Australian and New Zealand Intensive Care Society (ANZICS) Adult Patient Database for patients who had experienced a cardiac arrest within 24 hours prior to admission to an intensive care unit (ICU) during the period January 2006 to December 2016. The primary outcome was in-hospital mortality. The models were trained and tested on a dataset (split 90:10) including age, lowest and highest physiologic variables during the first 24 hours, and key past medical history. LR and 5 ML approaches (gradient boosting machine [GBM], support vector classifier [SVC], random forest [RF], artificial neural network [ANN], and an ensemble) were compared to the APACHE III and Australian and New Zealand Risk of Death (ANZROD) predictions. In all, 39,566 patients from 186 ICUs were analysed. Mean (±SD) age was 61 ± 17 years; 65% were male. Overall in-hospital mortality was 45.5%. Models were evaluated in the test set. The APACHE III and ANZROD scores demonstrated good discrimination (area under the receiver operating characteristic curve [AUROC] = 0.80 [95% CI 0.79–0.82] and 0.81 [95% CI 0.8–0.82], respectively) and modest calibration (Brier score 0.19 for both), which was slightly improved by LR (AUROC = 0.82 [95% CI 0.81–0.83], DeLong test, p < 0.001). Discrimination was significantly improved using ML models (ensemble and GBM AUROCs = 0.87 [95% CI 0.86–0.88], DeLong test, p < 0.001), with an improvement in performance (Brier score reduction of 22%). Explainability models were created to assist in identifying the physiologic features that most contributed to an individual patient’s survival. Key limitations include the absence of pre-hospital data and absence of external validation.


ML approaches significantly enhance predictive discrimination for mortality following cardiac arrest compared to existing illness severity scores and LR, without the use of pre-hospital data. The discriminative ability of these ML models requires validation in external cohorts to establish generalisability.

Tratto da:
Note sul Copyright: Articles and accompanying materials published by PLOS on the PLOS Sites, unless otherwise indicated, are licensed by the respective authors of such articles for use and distribution by you subject to citation of the original source in accordance with the Creative Commons Attribution (CC BY) license.