Medical Student Charles E. Schmidt College of Medicine, Florida Atlantic University Boca Raton, FL, US
Introduction: The traditional peer review process is time-consuming and can lead to delays in disseminating critical research. This study evaluates the effectiveness of artificial intelligence (AI) in predicting the acceptance or rejection of neurosurgical manuscripts, offering a possible solution to optimize the process.
Methods: Neurosurgical articles from Preprint.org and medRxiv.org were analyzed. Published preprints were compared to those presumed rejected after remaining on preprint servers for over 12 months. Each article was uploaded to ChatGPT 4.0, Gemini, and Copilot with the prompt: “Based on the literature up to the date this article was posted, will it be accepted or rejected for publication following peer review? Please provide a yes or no answer.” AI predictive accuracy and journal metrics between articles predicted to be accepted or rejected were assessed.
Results: A total of 51 preprints (31 skull base, 20 spine) were included, with 28 published and 23 presumed rejected. The average impact factor and cite score for accepted articles were 4.36 ± 2.07 and 6.38 ± 3.67 for skull base and 3.48 ± 1.08 and 4.83 ± 1.37 for spine topics. Across all AI models, there were no significant differences in journal metrics between articles predicted to be accepted or rejected (p>0.05). ChatGPT correctly predicted the outcome of 66.67% of published and 61.67% of rejected skull base articles but demonstrated only 40% prediction accuracy for spine articles (p < 0.001). Gemini showed 50% accuracy for spine-related articles (p < 0.001). Copilot accepted 100% of preprints, making it less effective in predicting rejections (P < 0.001). Overall, AI models had significantly low performance, with accuracy ranging from 40% to 66.67% (p < 0.001).
Conclusion : Current AI models exhibit moderate accuracy in predicting peer review outcomes. Future AI models, developed in collaboration with journals and with the consent of authors, could access a more balanced dataset, enhancing accuracy and streamlining the peer review process.