Medical Student Northwestern University Chicago, IL, US
Introduction: Accurate vertebral segmentation is an important step in diagnosis and treatment of spinal metastases. Segmenting these metastases is challenging given their radiographic heterogeneity. Conventional approaches to segmentation include manual review or deep learning. However, manual review is time-intensive with interrater reliability issues, while deep learning requires large datasets. The rise of generative AI, notably Meta’s “Segment Anything Model 2” (SAM2), promises the ability to rapidly generate segmentations without any pretraining.
The goal of this study was to assess the ability of SAM2 to segment vertebrae with metastases.
Methods: We used a dataset of spinal CT scans from The Cancer Imaging Archive, including patient sex, BMI, vertebral locations, lesion type (lytic, blastic, or mixed), and primary cancer type. We also extracted neuroradiologist-derived ground-truth segmentations for each vertebra.
SAM2 produced segmentations for each vertebral slice without any training data, which were compared to gold standard segmentations using the Dice score. We also assessed relative performance differences across clinical subgroups using standard statistical techniques.
Results: We analyzed 55 patients and 782 unique thoracolumbar vertebrae, 153 of which had metastatic tumor involvement (59 blastic, 46 lytic, 58 mixed). Across these vertebrae, SAM2 had a mean volumetric Dice score of 0.840 (0.097). There was no significant difference in SAM2 performance across sex (p = 0.46) or BMI (p = 0.27). SAM2 performed significantly worse on thoracic vertebrae relative to lumbar vertebrae (0.816 versus 0.874, p< 0.001. The model performed worst on mixed (0.783 [0.045]) and lytic lesions (0.820 [0.011]) relative to vertebrae with blastic lesions (0.885 [0.045]) or no metastatic disease (0.842 [0.022]) (p < 0.001). Performance was lowest for urothelial (0.612 [0.026]), lung (0.738 [0.118]), and skin (0.738 [0.107]) lesions, while segmentations for soft tissue sarcoma (0.891 [0.035]), uterine (0.906 [0.027]), and cervical (0.904 [0.027]) were the best (p < 0.001).
Conclusion : Our results demonstrate that general-purpose segmentation models like SAM2 can provide reasonable vertebral segmentation accuracy out-of-the-box, comparable to previously published trained models. Future research should include optimization for location and type of lesion.