Research Fellow Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania Philadelphia, PA, US
Introduction: Manual chart review (MCR) for extracting surgical data from Electronic Health Records (EHRs) is time-consuming, prone to error, and a significant bottleneck in clinical research and quality control. This study aimed to develop and validate a novel artificial intelligence (AI) framework that integrates Natural Language Processing (NLP) with a Large Language Model (LLM) to automate the extraction of relevant clinical data from spinal surgery EHRs and automate postoperative billing.
Methods: This study was supported by the TRIPOD+AI guidelines. We utilized three institutional databases comprising thoracolumbar adult spinal deformity cases (N=646), lumbar endoscopic spinal surgery cases (N=182), and lumbar decompression cases (N=5,998). The AI framework was replicated ten times to address hallucinations. The primary outcome was the accurate identification of surgical details, including surgery type, levels operated, number of disks removed, levels fused, incidental durotomies, and postoperative billing. Secondary objectives explored time efficiency and costs. Performance metrics such as accuracy, sensitivity, AUC-ROC, F1-score, and positive predictive value were calculated with 95% confidence intervals using bootstrapping.
Results: The NLP+LLM framework achieved a sensitivity of 0.999 and an AUC-ROC of 0.997 for clinical data extraction, demonstrating similar performance in billing automation, outperforming the human control. The use of a majority vote, utilizing data from the deduplicated (ten replications) run, eliminated all errors from singular runs. Tokenization and cost analyses indicated substantial time savings (38.8 seconds overall) and cost savings ($9.04 overall) compared to manual chart reviews.
Conclusion : We demonstrated that the integration of NLP and LLM within an AI framework can significantly improve the accuracy, time, and cost efficiency of clinical data extraction and postoperative billing. These results suggest the potential for widespread adoption in healthcare. Further research will focus on enhancing the sensitivity and validating the model in broader clinical settings to further optimize billing automation and clinical documentation processes.