Assistant Professor University of Louisville Louisville, Kentucky, United States
Introduction: DNA methylation signatures are distinct across various nervous system neoplasms. Whether transcriptomic signatures exhibit similar uniqueness has not been comprehensively demonstrated. Additionally, no large-scale dataset is available for comparative gene expression analyses across the diagnostic spectrum of nervous system neoplasms. This study addresses these knowledge and resource gaps.
Methods: Raw transcriptomic and associated clinical data for nervous system neoplasms (5,402 samples) and non-neoplastic entities (1,973 samples) were obtained from publicly available sources. These data were generated using a single microarray transcriptomic platform covering 20,360 genes and reprocessed simultaneously for harmonized integration. Machine learning tools were used to visualize all the samples and evaluate cluster formation. Of them, 2,127 samples needed to be reclassified according to the current classification. For this, we used machine learning classifiers trained using 5,248 samples with a known diagnosis.
Results: We created a large-scale, clinically annotated transcriptomic atlas containing 7,375 samples. Visualization using machine learning tools revealed clustering primarily based on diagnosis. We confidently reclassified nearly all (96.3%) of the 2,127 samples with uncertain diagnoses using machine learning classifiers. This process revealed the need to refine the classification between pilocytic astrocytomas and gangliogliomas, supported by the DNA methylation-based classifier results (Nature, 2018).
Conclusion : We demonstrate that the diagnostic distinctiveness of bulk DNA methylation signatures also extends to gene expression across the diagnostic spectrum of nervous system neoplasms and age. Our atlas covers 52 diagnoses, including rarely studied entities, from fetuses to 100+-year-old patients from around the world, broadening its ethnic representation. Its utility is boosted by including clinical data such as sex, tumor location, genetic information, tumor grade/stage, and overall survival for many samples. Comparative gene expression analyses done using this atlas will have robust statistical power, even for rare entities, because of the large number of samples afforded by the reclassification of older samples according to the latest classification. Finally, our methodological workflow can be used to integrate and harmonize existing raw data of other rare diagnoses and conditions, increasing their utility and informing future research.