Introduction: Current approaches for vertebra keypoint detection overlook X-rays containing surgical hardware as seen in spinal fusion or disc arthroplasty cases. This project aims to develop a model that can accurately locate vertebrae keypoints in lateral lumbar X-rays, which can then be used for radiographic parameter measurements.
Methods: We developed several keypoint detection models for comparison by fine tuning ResNet-50, ResNet-101, and ResNet-152 in two phases. First, models were fine-tuned using the open-source dataset BUU-LSPINE, augmented to 10,616 X-rays. Second, models were fine-tuned again using 1,000 institutional X-rays, including both preoperative and postoperative images of patients who underwent lumbar fusion or lumbar disc arthroplasty. 100 unseen institutional X-rays were used to test the performance of each model.
Results: To compare model performance, mean squared error (MSE) was used to indicate the average distance between predicted and ground truth keypoints. For the first phase models, trained on X-rays not containing surgical hardware, the performance was lower, with ResNet-50 having a MSE of 32.2, ResNet-101 having a MSE of 20.2, and ResNet-152 having a MSE of 29.5. For the second phase models, fine tuned on institutional pre and postoperative X-rays, ResNet-50 had a MSE of 11.0, ResNet-101 had a MSE of 4.7, and ResNet-152 had a MSE of 7.2. The best performing model was ResNet-101 from phase two. Euclidean distance was measured between predicted and ground truth keypoints. The mean Euclidean distance is 7.3 pixels, the median is 5.4 pixels, and the standard deviation is 8.0 pixels as calculated over the 2200 keypoints in the test set. The R2 values for each coordinate (x and y for each keypoint) were then calculated across the test set. Average values for this were 0.98 for the x coordinates and 0.96 for the y coordinates.
Conclusion : The ResNet-101 model fine-tuned on both the open-source and institutional data demonstrated the best performance in accurately locating vertebrae keypoints, including those containing surgical hardware. Future work includes evaluating the model’s performance on radiographic parameter measurement.