Logo image
Machine Learning Risk Stratification Approach Using Patient-Reported Outcomes for Forecasting Unplanned Health Care Use and Symptom Burden in Cancer Survivors
Journal article   Peer reviewed

Machine Learning Risk Stratification Approach Using Patient-Reported Outcomes for Forecasting Unplanned Health Care Use and Symptom Burden in Cancer Survivors

JCO clinical cancer informatics, Vol.10, e2500389
2026-05
PMID: 42190150
Appears in  College Of Engineering - Latest Publications

Abstract

PURPOSEEffective risk stratification in cancer survivorship requires handling longitudinal data characterized by multimodal inputs, irregular follow-up, and recurrent clinical events. This study evaluated the incremental value of integrating patient-reported outcomes (PROs) with electronic health record (EHR) data and identified optimal windowing strategies for machine learning-based prediction of adverse survivorship outcomes.PATIENTS AND METHODSThis study used a cohort of 25,592 cancer survivors followed for 36 months. Data from four domains were integrated: baseline measures, treatments, PROs, and health care utilization (emergency room visits and hospitalizations). Two classification models, LASSO and CATBOOST, were applied across modality combinations and five temporal representations of patient history: static early-phase (0-6 months), cumulative history, sliding windows (4- and 12-month), and a most-recent baseline. Performance was evaluated for predicting monthly health care utilization and patient-reported symptom burden using average precision (AP). SHapley Additive exPlanations (SHAP) analysis identified key predictors and characterized their evolving influence.RESULTSFor health care utilization, CATBOOST models trained on the full multimodal data set with time-windowed predictors achieved strong discrimination (AP = 0.207), outperforming static baselines by 27%. SHAP analyses emphasized dynamic contributions from recent utilization and treatment toxicity. For symptom burden, PRO integration was crucial, nearly doubling clinical-only performance (AP = 0.132 v 0.071), with longer historical context improving characterization of progressive functional decline and symptom severity. Flagging the top 10% of patients by predicted risk captured 51.7% of health care utilizations and 46.7% of symptom burden events.CONCLUSIONAdverse survivorship risk is dynamic and outcome-specific: acute health care utilization is best predicted by recent clinical momentum, while longitudinal patient-reported trends drive symptom burden. Implementing decoupled, dynamic windows provides a flexible framework for risk stratification and risk prediction beyond standard clinical heuristics, facilitating proactive, precision-based survivorship care.

Metrics

1 Record Views

Details

Logo image