Ten simple rules for predictive modeling
Thursday, February 3, 2022: 4:00 - 5:00 pm
Dr.Dustin Scheinost, Dept. Radiology & Biomedical Imaging, Biomedical Engineering, Statistics & Data Science, and at the Yale Child Study Center, Yale School of Medicine; The Multi-modal Imaging, Neuroinformatics, & Data Science (MINDS) Lab
Watch: Ten simple rules for predictive modeling.
Watch:Ten simple rules for predictive modeling questions and answers.
Machine learning, artificial intelligence, and prediction are popular buzz words in science. While many wish to incorporate these methods, they can be easy to mis-use.
This talk introduces 10 "rules" to follow when using machine learning and predictive modeling for neuroimaging data:
- 4 rules for validating predictive models through independent data.
- 3 rules for assessing model performance.
- 3 rules for removing confounds and increasing interpretability of models.
These rules explain common issues aimed at both novice and experienced users of predictive models with the hope of encouraging more researchers to use these approaches. These rules focus on the concepts of using machine learning and predictive modeling, rather than algorithms or underlying mathematics.
These rules are general and apply to most neuroimaging studies employing predictive modeling, independent of the exact algorithm used. Similarly, while the examples provided during the talk are based on functional magnetic resonance imaging (fMRI) connectivity data, the same concepts apply to other types of data, such as task activation, structural connectivity, or even non-neuroimaging data.
Most other data analytic approaches are explanatory. That is, they focus on explaining some observation. On the other hand, machine learning and prediction focus on predicting information in data novel to model. Because models are defined and validated with independent data, they promise to improve our ability to uncover generalizable brain-behavior associations.
Prerequisites: Basic skills in programming languages: MATLAB (license), Python (free), or R (free). All three languages have various toolboxes and packages for machine learning. One will need to load the data into memory and call the algorithm function from these packages. A more advance skill would have a sense of which algorithm to use for the task at hand.
- Statistics and Machine Learning Toolbox
While many would assume a deep knowledge of math would be needed to employ predictive modeling approach, conceptual issues presented in this talk are more important.
Datasets: Any dataset; the data type is not important. Sample sizes should be sufficiently large to avoid overfitting.
For neuroimaging data (in particular functional connectivity):
- Shen, X., Finn, E. S., Scheinost, D., Rosenberg, M. D., Chun, M. M., Papademetris, X., & Constable, R. T. (2017). Using connectome-based predictive modeling to predict individual behavior from brain connectivity. Nature Protocols, 12(3), 506-518, doi: 10.1038/nprot.2016.178
- Scheinost, Dustin, Stephanie Noble, Corey Horien, Abigail S. Greene, Evelyn MR Lake, Mehraveh Salehi, Siyuan Gao et al. Ten simple rules for predictive modeling of individual differences in neuroimaging. NeuroImage 193 (2019): 35-45, doi: 10.1016/j.neuroimage.2019.02.057
There are many, many free resources for machine learning (too many to list). For beginners, I recommend focusing on the basic concepts and not worry about the exact algorithms and underlying math.
Please subscribe to the MAPs workshop via google group: email@example.com