Business Analyst BC Children's Hospital Research Institute
Introduction: Machine Learning (ML) techniques may predict Engel outcome in pediatric patients undergoing epilepsy surgery using cortical surface saliency features from pre-op Magnetic Resonance Imaging (MRI) scans. A key challenge is managing these features distributed across many brain regions, especially in small datasets. One solution is to merge regions based on physiology/anatomy to reduce feature count and overfitting risk. However, this may sacrifice critical information such as subtle regional differences important for predicting outcomes.
Methods: We explored consolidating cortical thickness, intrinsic curvature, and sulcal depth features from 68 brain regions into 16 broader categories based on regional anatomy, reducing independent variables from 204 to 48 (bilateral regions). We used absolute z-scores from the Multi-centre Epilepsy Lesion Detection (MELD) algorithm to highlight significant deviations in these features from the contralateral hemisphere and age- and sex-matched controls. Two 5-fold cross-validated models were generated with 50 patients with Engel scores as a training set, one using 204 unmerged MRI features and the other using 48 merged MRI features, to predict binarized Engel outcome. Model performance was evaluated with training accuracy, validation accuracy (21 patients), and the Bayesian Information Criterion (BIC), balancing model simplicity and explanatory power to assess if merging improves parsimony without sacrificing performance.
Results: The unmerged model (204 features) showed high training accuracy (94%) but low validation accuracy (50%), indicating overfitting and complexity. In contrast, the merged model (48 features) achieved balanced performance (training accuracy of 69%, validation accuracy of 58%) and a lower BIC, indicating better generalizability.
Conclusion : Though less detailed, the model with merged MRI saliency feature reduces overfitting and exhibits improved generalizability, suggesting that feature aggregation may be a beneficial strategy for reducing model complexity while maintaining accuracy. Incorporating these merged features alongside other critical clinical variables may help develop a robust model for post-surgical outcome prediction.