Medical Student Drexel University College of Medicine
Introduction: Spondylolisthesis, the forward displacement of a vertebral body, requires accurate grading to guide treatment decisions. The Meyerding classification, commonly used to assess spondylolisthesis severity on lumbar spine X-rays, is traditionally measured manually, a process that is both time-consuming and susceptible to interrater variability. This study aims to automate the assessment of spondylolisthesis severity by leveraging an artificial intelligence (AI) model trained to identify vertebral positions on lumbar X-rays, allowing for efficient and standardized grading based on the Meyerding classification.
Methods: We developed and trained a YOLO V11 Pose model on the BUU-LSPINE dataset, which includes lateral (LA) X-ray images from 3,600 anonymized patients. Each patient’s dataset contains two views, totaling 7,200 images. The model was trained to accurately locate spinal endplates, which are essential for calculating vertebral displacement and, subsequently, the spondylolisthesis grade. Validation was performed using an independent test set acquired from the Mayo Clinic with 25 patients in each Meyerding grade (1–5).
Results: In locating the spinal endplates, the model had a precision of 0.96. In classifying spondylolisthesis, sensitivity for lower grades (1 and 2) was 94%, with specificity at 90%. For higher grades (3 to 5), sensitivity was 95% and specificity was 91%. Area Under the Curve (AUC) values were above 0.90 across all grades, indicating consistent classification performance across severity levels.
Conclusion : The proposed AI-based approach provides a rapid and reliable method for assessing spondylolisthesis severity on lumbar spine X-rays, with performance metrics comparable to human evaluation. This model has the potential to streamline clinical workflows, reduce variability in spondylolisthesis grading, and support consistent, data-driven treatment planning in clinical practice. Further testing on external datasets is recommended to assess the generalizability of the model across diverse patient populations.