Midsummer Course Sharpens Skills in Informatics and Data Science

August 09, 2019
by Robert Forman

The Center for Biomedical Data Science (CBDS), in conjunction with Yale Center for Medical Informatics (YCMI), has held its first midsummer course, “Introduction to Informatics and Data Science in the Clinical Health and Biomedical Context.” It was five concentrated days on the Yale West Campus,

The Center for Biomedical Data Science (CBDS), in conjunction with Yale Center for Medical Informatics (YCMI), has held its first midsummer course, “Introduction to Informatics and Data Science in the Clinical Health and Biomedical Context.” It was five concentrated days on the Yale West Campus, with instruction in areas including databases, ontologies, natural language processing, machine learning, and FAIR data sharing and integration (meeting standards of Findability, Accessibility, Interoperability, and Reusability). 

Approximately 50 students, ranging from MD and PhD candidates to postdoctoral associates and others with advanced degrees—including faculty members—took the course. Their shared purpose was to learn the basics of a field—data science—that has become essential to nearly all areas of biomedical scholarship.

“They're really generating a huge amount of data that, without computers, is really hard for them to sift through and understand what is going on,” says Kei-Hoi Cheung, PhD, professor of emergency medicine, who is affiliated with YCMI and CBDS and taught several of the classes. “I think the idea is finding the needle in the haystack. You have all this data but how can you understand how to find true signals with all the noise and errors included? That's what they are doing. I think it's important for them.”

Cheung notes that much of the material taught during the week was already available at Yale, but before this course students would have needed to hunt it down and absorb it in a scattered fashion. Instead, he says, with the material laid out methodically and presented in one week, they were better able to focus their minds on it and retain it.  

Declan Clarke, PhD, and two other scientists—Prashant Emani, PhD, and Jonathan Warrell, PhD—in the lab of Mark B. Gerstein, PhD, Albert L. Williams Professor of Biomedical Informatics, professor of molecular biophysics and biochemistry, of computer science, and of statistics, taught classes related to machine learning. Clarke says there was much to absorb from the course, and students are not yet experts. “But they know now at least that we’ve pointed them to some very good resources, and we’ve also invited them to contact us directly for consultation either in the context of their research or in terms of independent study that they might choose to start.”

Students said the course had served them well. Diana Yanez, a fifth-year MD-PhD student at the medical school, said it opened her eyes to new ways of absorbing her coursework and lab work. “It just simplified everything for me and I think it’s going to help me in the future when I’m looking at data, just how to look at it in a more basic manner and how to exactly extract what I want.”

Gabriela Pizzurro, PhD, a postdoctoral associate in the biomedical engineering lab of Kathryn Miller-Jensen, PhD, said the material will help her excel in a multidisciplinary environment. “It was perfect for me to get a little more background to be able to work with people in my lab and then collaborate more so I can understand better what they’re doing.”

Lolahon Kadiri, MD, PhD, earned her doctorate in neuroscience nine years ago. Much of data science has been developed since. She said it was “absolutely fantastic” to take the course for her role as a business development associate in Yale’s Office of Cooperative Research. “I wanted to learn more about state of the art, where the research is, what people are working on here, what kind of tools they use, and what kind of questions they can actually ask by using these tools,” she said, “and, what are potential implications for intellectual property, or for collaboration, or for output in terms of clinical recommendations or commercialization.”

The plan is to make the course a recurring event, says Xinxin (Katie) Zhu, MD, PhD, executive director of CBDS, who organized it with Cynthia Brandt, MD, MPH, professor of emergency medicine, of anesthesiology, and of biostatistics, and director of YCMI. Zhu says the commitment of the students was impressive. “There was a student traveling from Harlem to West Campus to take the course,” she notes. “Many stayed late to ask as many questions as possible; some skipped snack and coffee breaks—a clear sign that they need and want this.”

Zhu hopes to arrange another week of classes next spring. It may even be a version 2.0. “Already we have more than 40 students on the waiting list and we could easily have it right away,” she says, “but we want to take our time to recap and refresh and refine this material and make the students’ experience better every time we offer this.”

It was “truly a cross-campus team effort,” Zhu says. Participants included the School of Public Health, the School of Nursing, the School of Medicine, and the VA. Faculty and other scientists affiliated with CBDS, YCMI, and the Health Informatics Division participated in the teaching, along with students, postdoctoral associates, and research scientists from Computational Biology and Bioinformatics, and staff from the Medical Library.

Partial funding for curriculum development and the course itself came from a supplement to a longstanding training grant to YCMI from the National Library of Medicine (NLM). The course was also funded in part by the medical school’s Office of the Dean.

Submitted by Robert Forman on August 07, 2019