New center to harness burgeoning data

School of Medicine joins university-wide move toward using advanced data science to enhance research, scholarship, and education

Xinxin (Katie) Zhu
Photo by Robert A. Lisak
Xinxin (Katie) Zhu came to Yale last August, to take primary responsibility for growing the Center for Biomedical Data Science, whose goals include bringing together biomedical scientists and quantitative experts to bolster each other’s research, adding to data science expertise throughout the medical school and university, and exploring how to manage rapidly expanding sources of data.

Data have become an essential ingredient for investigators pursuing advanced biomedical science. James Noonan, PhD, associate professor of genetics, says the time has long passed when his lab could do its work on the human phenotype without the insights that abundant data provide. “To really understand these complex biological systems, we need data,” Noonan says. “We need a lot of data and a lot of different kinds of data.”

This reality for labs throughout the School of Medicine, and for the school’s clinical researchers and providers, led to the recent establishment of the Center for Biomedical Data Science (CBDS), which is one part clearinghouse for the newest data-related ideas; one part incubator for ways to educate students, trainees, and faculty on what data can offer; and, one part an innovator of data techniques to advance biomedical science and data science. As CBDS matures, additional pursuits are sure to follow.

Xinxin (Katie) Zhu, MD, PhD, arrived last August from the IBM Watson Research Center to take primary responsibility for moving CBDS forward. “People don’t necessarily consider biomedicine as a big data field, or health care as a big data field, but we are,” says Zhu. One of her first jobs has been to identify and connect Yale experts who can enhance each other’s work. “You could have a biologist working on things s/he’s been doing before,” Zhu says, “but s/he may not have the data science knowledge; or may not have a computer scientist to help with some computational models. That’s where we come in.” In turn, there are quantitative experts throughout Yale whose methods could help shape biomedical science, but have not been introduced to potential collaborators.

Ruth R. Montgomery, PhD, associate dean for scientific affairs and professor of internal medicine, who helped establish CBDS, says connections such as those Zhu describes have greatly helped her own science. “I have really found it fascinating to work with my bioinformatics colleagues when they analyze my data in a way that is not from my own background,” she says. “We go back and forth several times until we have each understood what the other was looking for and really find new things.”

One-to-one matchmaking is just an early step. CBDS already has enlisted more than 100 faculty members, and expects to welcome even more, to facilitate broader exchanges of ideas that can boost the scientific enterprise exponentially. Yale is the ideal place for it, according to Harlan M. Krumholz, MD, Harold H. Hines, Jr. Professor of Medicine, who is a member of the CBDS faculty steering committee. “We have a small-town atmosphere here,” says Krumholz. “People are not worried about turf in general. They are open to people from different sectors of the university coming together and thinking about how we might best do things together.”

Zhu foresees experts sharing insights with each other as never before. “Data science is quite new, a quite different discipline in the academic world,” she explains. “This is one of the areas where the more you share, the more you have, instead of holding something very tight to your chest, and you become very good.”

An essential task for CBDS will be figuring out how to manage ever-expanding datasets from multiple sources that Yale both generates and receives, so as not to drown in a sea of numbers without knowing how to make sense of them. “With all the technological advances we have seen, such as imaging, sequencing, wearable devices, hospital databases, and cloud computing,” says Hongyu Zhao, PhD, chair and Ira V. Hiscock Professor of Biostatistics, professor of genetics and of statistics and data science, and co-director of CBDS, “there is a great need to develop efficient ways to collect, store, manage, analyze, visualize, and interpret the results.”

It is a kind of “data fusion,” adds Mark B. Gerstein, PhD, Albert L. Williams Professor of Biomedical Informatics, professor of molecular biophysics and biochemistry, of computer science, and of statistics and data science, and CBDS co-director. “We’re integrating all the different types of data together, and that’s just something that is hard to do in a completely generic, automated way. You need to think about what you’re putting together.”

That includes gathering insights that enhance clinical care. Krumholz says that one of CBDS’s great advantages is the presence and participation of Yale clinicians, who can help steer data-driven science toward its greatest possible impact. “We have a medical center and actual patients being seen and care being delivered,” he explains, “and so we have the opportunity to go end to end, where we’re thinking at first about the end users.” That, Krumholz says, could lead to new discoveries, quicker paths toward better treatment decisions by clinicians, and even insights into how the broader health care system can be redesigned for the better.

The center also has a vital educational mission, giving Yale scientists at all levels—including accomplished investigators whose training preceded the “big data” era—a greater familiarity with data science and what it can do for them. That will require detailed planning, and extensive listening to various constituencies’ needs. “It’s easy to expose graduate students to it because you can just make a course,” says Noonan, who also is on the CBDS steering committee. “But you can’t do that with postdocs and faculty. What we have to do through the center is get people in place who know how to teach these concepts to a diverse group of people with diverse backgrounds and help them understand what they need to know to function in this new world because they’re going to need to know it.” Gerstein adds that having more data expertise in the Yale community will be a catalyst toward another essential goal—attracting future recruits in the field at all levels, from students to expert faculty.

CBDS is part of a constellation of data science activity exploding throughout the university, from a new undergraduate major to the recruitment of theoreticians at the most advanced levels of the field. “Integrative data science and its mathematical foundations” is on the University Science Strategy Committee’s suggested list of the most promising opportunities for investment across the sciences, which President Peter Salovey, PhD ’86, endorsed in November. “CBDS fits perfectly into this new emphasis,” says Zhao. “This center really serves at the interface of mathematics, statistics, computer science, biology, medicine, and public health.”

“We’re poised to do good work. We’re positioned to be successful,” says Krumholz, who credits the late Carolyn W. Slayman, PhD, deputy dean for academic and scientific affairs, Sterling Professor of Genetics, and professor of cellular and molecular physiology, for having the early sense that a center was needed, and considers CBDS to be part of her legacy. “I think Yale has a unique opportunity to lead and inspire and produce tools that will take us into that next era, and I’m excited to be part of it.”