85 Views

Education

Application of large language and artificial intelligence modeling in the prediction of peer-review outcomes

Application of Large Language and Artificial Intelligence Modeling in the Prediction of Peer-review Outcomes

Presenting Author(s)

BH

Benjamin Hopkins, MD, MBA

Resident Physician
Keck School of Medicine of USC
Los Angeles, CA, US

Introduction: The rapid development of artificial intelligence (AI) presents an opportunity to streamline the peer-review process and provide key information to guide academic journals, editorial staff, reviewers, and authors. This study aimed to fine-tune several large language and transformer models based on textual peer-reviewer comments and editorial outcomes to find text-based associations with journal manuscript decisions.

Methods: All anonymized manuscript submissions (including reviewer comments) from 2021-2023 to the Journal of Neurosurgery (JNS) and its subsidiaries were included for analysis. Final editorial decisions were grouped as a binary outcome (i.e., acceptance/revision vs. rejection/transfer). Leading words were removed, and reviewer comments were analyzed using AI models including BERT, GPT-2/3/4, and GRU variants to predict final decisions. Shapley Additive Explanations (SHAP) analysis was conducted to evaluate the impact of individual words on model predictions.

Results: In our ROC analysis, the fine-tuned GPT-4mini and GPT-3 models achieved the highest AUCs of 0.91, followed by BERT and GPT-2 with AUCs of 0.84. These were followed by the bidirectional GRU, GPT-3 (untrained), the unidirectional GRU, and GPT-4o with AUCs less than or equal to 0.75. In the SHAP analysis, logistic regression modeling identified words in manuscript reviews such as “future” (OR: 18.2, p< 0.001), “interesting” (OR: 15.3, p< 0.001), and “written” (OR: 15.5, p< 0.001) as positive predictors of manuscript acceptance to The JNS, whereas “clear” (OR: 0.18, p< 0.001), “unclear” (OR: 0.07, p< 0.001), and “does” (OR: 0.16, p< 0.001) were associated with manuscript rejections. The GRU model identified “study,” “useful,” and “journal” as significant positive predictors, and “unclear,” “reading,” and “incidence” as negative predictors.

Conclusion : This proof-of-concept study demonstrates that fine-tuned AI models can accurately predict manuscript acceptance using only reviewer comments, highlighting their potential in facilitating peer-review. Emerging themes that lend weight to article outcome include clarity, utility, suitability, cohort size, and diligence in addressing reviewer queries.