Automating Surgical Billing: The Role of Artificial Intelligence in Current Procedural Terminology Code Prediction for Open Vascular Neurosurgical Procedures

Presenting Author(s)

Joanna M. Roy, MBBS

Research Fellow
Thomas Jefferson University Hospital

Introduction: The United States spends about 25-31% of its healthcare expenditure on billing. Large language models (LLMs) have demonstrated the ability to automate tasks in healthcare. Our study aims to assess the ability of publicly available LLMs in predicting CPT codes for open vascular neurosurgical procedures.

Methods: A total of 25 operative reports from patients treated between 2022 and 2024 were inputted into three LLMs (ChatGPT 4.0, AtlasGPT and Gemini). Procedures included craniotomy for aneurysm clipping, carotid endarterectomy, bypass surgery, resection of cavernomas and arteriovenous malformation clipping. Responses were classified as correct, partially correct or incorrect. Univariate analyses were performed to compare responses across LLMs.

Results: ChatGPT provided correct responses in 8% (n= 2) cases and partially correct responses in 88% of cases (n= 22). AtlasGPT provided correct responses in 8% (n= 2) cases and partially correct responses in 48% (n= 12) cases. Gemini provided the highest proportion of incorrect responses (n= 20, 80%) (P < 0.001). On an average, ChatGPT correctly identified CPT codes in 43.96% of cases, followed by Gemini (12%) and AtlasGPT (20.69%). A Kruskal Wallis test revealed significant difference across percentage of correct CPT codes across LLMs (P < 0.001).

Conclusion : Untrained LLMs have the potential to provide correct and partially correct CPT codes from operative reports in open vascular neurosurgery. Training these models could improve their performance to allow for their incorporation in daily tasks and improve healthcare resource allocation.