Machine learning (ML) driven risk models based on multi-protein proteomics outperform classical regression models and clinical scores for prediction of all-cause mortality in patients at increased cardiovascular risk, according to new research.
Published online Monday and in the Oct. 19 issue of the Journal of the American College of Cardiology, the study compared proteomics-enabled ML algorithms with classical and clinical risk prediction methods for all-cause mortality in cohorts of patients with cardiovascular risk factors in the LIFE-Heart Study, followed by validation in the PLIC (Progressione della Lesione Intimale Carotidea) study.
“By using a proteomic analysis in two large, well characterized cohorts at elevated cardiovascular risk, application of modern ML techniques greatly outperformed models based on regression analyses and current clinical risk scores in survival prediction,” said the authors, led by Matthias Unterhuber, MD, from the Heart Center Leipzig at University Leipzig, Germany. “Our study is the first to demonstrate the feasibility and high precision of time-sensitive ML approaches applied to large and complex variable data sets when assessing unbiased outcomes.”
Using the OLINK-Cardiovascular-II panel, Unterhuber and colleagues measured 92 proteins in a cohort of 1,998 individuals from the LIFE-Heart Study (derivation) and 772 subjects from the PLIC cohort (external validation).
“We constructed protein based mortality prediction models using eXtreme Gradient Boosting (XGBoost) and a neural network, comparing the prediction performance with classical clinical risk scores (Systemic Coronary Risk Evaluation, Framingham), logistic and Cox regression models,” explained the authors.
On internal and external validation, the Framingham Risk Score achieved areas under the curve (AUCs) of 0.64 (95% confidence interval [CI]: 0.59-0.68) and 0.65 (95% CI: 0.58-0.74), with a logistic regression AUCs of 0.65 (95% CI: 0.57-0.73) and 0.67 (95% CI: 0.59-0.74) and Cox regression AUCs of 0.55 (95% CI: 0.51–0.59) and 0.65 (95% CI: 0.57-0.73).
The XGBoost classifier had AUCs of 0.83 (95% CI: 0.79-0.87) and 0.91 (95% CI: 0.86-0.95), the XGBoost survival estimator AUCs of 0.83 (95% CI: 0.79-0.87) and 0.93 (95%CI: 0.88-0.97), and the neural network AUCs of 0.87 (95% CI: 0.83-0.91) and 0.94 (95% CI: 0.90-0.98), respectively (modern vs classical ML: P < 0.001).
The team noted that modern ML algorithms show only a marginal improvement in comparison with traditional regression models when “simpler data” including clinical risk factors such as age, sex, blood pressure or medical history, all of which have proven linear relationships in terms of outcome prediction are included.
“Thus, these algorithms might mainly excel when analyzing data with complex interactions,” they said.
“In this study, we demonstrated an excellent external predictive value of the different ML algorithms. This paves the way for an individual risk prediction based on multidimensional data that are unique to the subject and are progressively easier to obtain.”
Unterhuber and colleagues concluded that although the interaction between proteins and mortality remains poorly understood, new machine-learning modeling methods could provide insights on mechanistic links and that proteomics-based risk estimates could lead to new therapeutic options.
The future of practice?
Writing in an accompanying editorial, Jean-Sébastien Hulot, MD, PhD, from the Université de Paris, INSERM, and Hôpital Européen Georges-Pompidou, Paris, and Paul Clopton, MS, from Stanford University School of Medicine, said that ML algorithms have the extraordinary potential to discover hidden patterns using clinical, imaging, or biological data from disparate sources.
“Simply said, ML algorithms have the potential to see what the human eye cannot see, especially when it comes to large and multidimensional data,” they noted, adding that as the volume of health care data explodes, ML-based improvements may therefore drive a revolution in medicine and cardiology.
The editorialists added that while the predictive power of the circulating proteome has been known for a long time, “what is novel is the development of proteomic platforms that are suitable for assaying a large number of these circulating proteins with a limited amount of blood (typically <250 mL).”
“However, this raises the question of how to appropriately analyze these complex data,” they said, adding that the new study provides one of the first “striking examples” of ML approaches applied to a large dataset to better estimate the risk of mortality.
The expert commentators noted that the ML models largely outperform the clinical models with an important gain in AUC (>0.25) and added that the ML algorithms were able to identify subtle and unrecognized patterns using proteomics data, thereby leading to significant improvement in risk prediction.
“This is remarkable progress that now requires further validation across different populations,” they said. “However, we are certainly seeing the premise of future practices that will use ML-enabled testing for individualized risk prediction.”
Unterhuber M, Kresoja K-P, Rommel K-P, et al. Proteomics-Enabled Deep Learning Machine Algorithms Can Enhance Prediction of Mortality. J Am Coll Cardiol 2021;78:1621-1631.
Hulot J-S, Clopton P. When Natural Peptides Meet Artificial Intelligence to Improve Risk Prediction. J Am Coll Cardiol 2021;78:1632-1634.
Image Credit: Epstudio20 – stock.adobe.com