Medical Student Case Western Reserve University School of Medicine and Cleveland Clinic Foundation Center for Spine Health Cleveland Heights, OH, US
Introduction: A growing body of literature reports on prediction models for patient-reported outcomes of spine surgery, carrying broad implications for use in value-based care and decision-making. This review assesses the performance and transparency of reporting of these models.
Methods: We queried four for studies reporting development and/or validation of prediction models for patient-reported outcome measures (PROMs) following elective spine surgery with performance metrics such as area under the receiver operating curve (AUC) scores. Adherence to Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD-AI) guidelines was assessed. One representative model was selected from each study.
Results: Of 4,471 screened studies, 35 were included, with nine development, 24 development and evaluation, and two evaluation studies. Sixteen machine learning models and 19 traditional prediction models were represented. Oswestry Disability Index and modified Japanese Orthopaedic Association scores were most commonly used. Among 29 categorical outcome prediction models, median [interquartile range (IQR)] AUC was 0.79 [0.73, 0.84]. Median [IQR] AUC was 0.825 [0.76, 0.84] among machine learning models and 0.74 [0.71, 0.81] among traditional models. Adherence to TRIPOD-AI guidelines was inconsistent, with no studies commenting on healthcare inequalities in the sample population, model fairness, or disclosure of study protocols or registration.
Conclusion : We found considerable variation between studies, not only in chosen patient populations and outcome measures, but also in their manner of evaluation and reporting. Agreement about outcome definitions, more frequent external validation, and improved completeness of reporting may facilitate the effective use and interpretation of these models.