Medical Student Charles E. Schmidt College of Medicine, Florida Atlantic University
Introduction: Since its launch in early 2024 through a Journal of Neurosurgery editorial and appearance at the American Association of Neurological Surgeons (AANS) annual conference, AtlasGPT has attracted attention in neurosurgery. While the AtlasGPT has many features, the most impactful may be as an educational tool for patients. This study aims to evaluate AtlasGPT’s viable resource for education patients against existing AANS patient educational resources.
Methods: A literature review was conducted to find the most common questions patients asked during a neurosurgical referral. To assess for AtlasGPT strengths, questions were divided into three categories: cranial, spine, and general neurosurgery. AtlasGPT was presented with these questions and responses were evaluated for accuracy, readability, grade level, and understandability. Accuracy was judged from 1, least accurate, to 5, most accurate. Flesch-Kincaid Grade Level was used (1-100), readability scoring (1-18 scale), understability (PEMAT, 1-100). As a control, scores were calculated for AANS education articles, assigned blindly by residents and attendings.
Results: A total of 60 questions were chosen from existing literature based on previous AANS educational research: 20 spine, 20 cranial, and 20 general neurosurgery. Mean accuracy, readability, grade level, understandability, and scores for AtlasGPT responses was 5.0, 52.88 ±14.04, 10.35 ± 2.59, and 7.9 ± 0.9, respectively. AtlasGPT compared to AANS educational literature demonstrated significantly improved readability (52.88±14.04 vs 39.95±7.83, p < 0.001) and grade level (10.35±2.59 vs 12.26±1.33, p< 0.001). AtlasGPT responses to spine and cranial questions had significantly improved readability and grade level (p < 0.009) compared to AANS patient educational materials. Out of the three question categories, AtlasGPT’s responses to general neurosurgery questions had the best readability (57.41±13.29) and grade level (9.56±2.37).
Conclusion : The accuracy of AtlasGPT responses was superb, scoring perfectly. Despite delivering highly accurate information, AtlasGPT responses were consistent with a 10th grader, well above AMA and NIH recommendations of below 6 and 8, respectively. Further improvements should be made to make responses readable for a wider audience. These improvements should focus on maintaining the high accuracy while maximizing response readability.