Medical Student Case Western Reserve University School of Medicine
Introduction: Randomized controlled trials (RCTs) are considered the gold standard in neurosurgical and spine research due to their strict methodology and statistical power, enabling stronger causal inference. Sample sizes of RCTs are an important consideration as smaller sample sizes can lack generalizability while larger sizes may be potentially unethical based on exposure. To better characterize sample sizes in these RCTs, a novel natural language processing algorithm (NLP) was developed to extract sample size data from RCT abstracts.
Methods: A random sample of 200 neurosurgery and spine surgery RCT abstracts published between January 1st, 2000 to December 31st, 2022 were obtained from the PubMed database. A Python-ChatGPT NLP algorithm was developed to extract sample size data. Two independent reviewers verified the NLP output and classification rates were calculated. Additionally, references to sample size and terminology used to describe study subjects were collected.
Results: The algorithm achieved a true positive (TP) rate of 92.5% and a true negative (TN) rate of 7.5% for extracting overall sample size. For extracting subgroup sample size, the algorithm had a TP rate of 58.29% a TN rate of 36.68%, a false positive rate of 1.51%, and a false negative rate of 3.52%. Among the 200 abstracts, 185 reported on sample size with a mean of 170 patients and a standard deviation of 527. Regarding terminology, 87.57% referred to subjects as “patients”, 3.78% by sex (“Male” or “Female’), 1.08% as “adults”, 1.08% as “volunteers”, and the remaining 6.49% used other terms. Regarding subgroups, 46.69% of abstracts listed their subgroup populations explicitly while 53.31% implied them (eg. “1:1 random assignment”).
Conclusion : The Python-ChatGPT algorithm developed may propose a novel tool for extracting sample size data from RCT abstracts, potentially improving study quality assessment and efficiency in meta-analyses.