Resident Physician University of Florida Gainesville, FL, US
Introduction: Chart review is a critical clinical research method but involves a time-intensive process of documenting and categorizing vast amounts of unstructured data. While automatic data downloads are possible from electronic records, this relies on reliable coding, so manual chart review of narrative notes are often a better way to obtain accurate, detailed patient information. This paper explores the application of artificial intelligence (AI) for automating the categorization process after data collection, with a hypothesis that AI may help to streamline the data extraction process, improve consistency and accuracy, and enhance reproducibility of research outcomes.
Methods: A retrospective chart review was conducted of all patients who presented to Connaught Hospital in Freetown, Sierra Leone from 2019-2023. Reason for presentation, symptoms upon presentation, and diagnoses were captured using open-ended questions, and generative artificial intelligence (ChatGPT 4) was attempted to automatically clean and categorize data prior to analysis. Over 200 patients were manually categorized and ChatGPT was presented with these as examples as well as categorical rules, key words and phrases, and hierarchical rules for complex presentations. An iterative process was used to review and correct AI responses and update rules to improve accuracy.
Results: Of the 1,886 patients evaluated with AI, 603 (32.0%) reason for presentation, 1,499 (79.5%) symptoms, 13 (6.5%) history of loss of consciousness, 107 (5.7%) history of seizure, 959 (50.8%) diagnoses, and 486 (25.8%) imaging confirmation categorizations were incorrect.
Conclusion : This study demonstrates a failure of public AI use for the categorization process after data extraction in a clinical context. While AI succeeded in processing large amounts of data rapidly, the error rates were unacceptably high and required manual correction that offset any initial time savings. These results highlight the ongoing challenges of applying AI to complex scenarios, particularly in cases involving multiple diagnoses, incomplete records, or nuanced clinical interpretations. The implications of these findings underscore the need to approach AI research with a critical eye, further model refinement, and continued human oversight of AI-assisted research.