Research Fellow Washington University in St. Louis
Introduction: Artificial intelligence-driven disease prediction has advanced significantly in recent years. At present, it can improve diagnosis precision, enable disease prevention by early detection, and streamline clinical decision-making. This study aims to train a deep learning model using raw Electronic Health Record (EHR) data to predict the future onset of Degenerative Cervical Myelopathy (DCM).
Methods: EHR data from Merative Explorys Dataset was used in this study. DCM patients were identified using ICD codes and matched with control patients at a 30:1 ratio. The dataset included time-ordered medical codes from SNOMED, RxNorm, LOINC, and CPT vocabularies. Two models, an RNN with GRU units and a temporal convolution network, were implemented using PyHealth, alongside the clmbr-t-base transformer model, pre-trained on data from 2.57 million Stanford Hospital patients. All models were trained using a two-logit linear classifier to predict diagnoses 6 months, 1 year, and 2 years in advance. The Sophia optimizer was used with weighted cross-entropy loss, undersampling the control group and adjusting gradient updates for cohort samples.
Results: The study included a total of 1,492,681 patients, with 49,756 diagnosed with myelopathy. The clmbr-t-base model outperformed other models, achieving an AUROC of 0.824 and PPV of 0.333 for 6-month predictions with 5:1 undersampling. For 1-year predictions, it achieved an AUROC of 0.807 and PPV of 0.241 with 3:1 undersampling, while at 2 years, it reached an AUROC of 0.752 and PPV of 0.162 with 10:1 undersampling.
Conclusion : Our findings suggest that EHR data can be leveraged to predict DCM diagnosis ahead of time. This finding emphasizes the potential of predictive models in implementing preventive measures, which may help decrease disease incidence and improve overall patient outcomes.