Big data is getting bigger. By 2025, genomics will have surpassed astronomy, Twitter, and YouTube to become the largest data-generating enterprise by far. What began 65 years ago when Watson, Crick, and Franklin unlocked the double helix of DNA has become, in just the past few years, an exponentially growing archive of individual genomes. Yale’s Center for Genome Analysis actually holds the ninth largest genomic library in the world.
However, large-scale genomic sequencing is just one of several big data enterprises that have taken off in biomedical science in the past five years. There’s also electronic health record analysis, proteomics, metabolimics, and data collected through smart, wearable electronics (think FitBits), to name just a few. Due to this explosion of data, late last month Yale launched the new Center for Biomedical Data Science (CBDS), which will act as a hub for research and education in biomedical data science at Yale.
This announcement marks Yale’s second major investment in data science in the past year; in March 2017, the Department of Statistics became the Department of Statistics and Data Science. Tamar Gendler, Dean of the Faculty of Arts and Sciences and the Vincent J. Scully Professor of Philosophy, said of the opening, “We are delighted to see campus-wide collaboration in this important area and look forward to working together in this vital area of research and teaching.”
CBDS was officially introduced to the Yale biomedical research community at a Yale School of Medicine Dean’s Workshop on Feb. 7. After a welcome by Ruth Montgomery, associate dean for scientific affairs, the interim co-directors of CBDS, Mark Gerstein and Hongyu Zhao, elaborated on the multiple missions of the center. Montgomery, Gerstein, and Zhao also gave credit to the late Carolyn Slayman, deputy dean for academic and scientific affairs at the Yale School of Medicine, for being the driving force behind the formation of the biomedical data science center over the past decade.
“First and foremost, the center will serve as a focus for the emerging data science community in biomedicine at Yale and will enhance research in the broad area of biomedical data science,” said Zhao, chair and the Ira V. Hiscock Professor of Biostatistics. “Additionally, CBDS will be an educational hub that connects expert data scientists with non-expert investigators while also helping to train the next generation of biomedical research pioneers. Finally, the center will organize the necessary infrastructure, both physical and computational, to facilitate its research and educational missions.”
In its day-to-day operations, CBDS will be responsible for ingesting large amounts of data from labs and researchers across the university and performing calculations with the information — from simple statistical associations to complex machine learning models, to simulations of molecular, cellular, and organismic systems. With proper handling, said the directors, big data analysis will be able detect anomalies and hidden patterns, which often remain hidden when research stays in the silo of a single lab.
“The center will enhance research across the spectrum of biomedical sciences through the development and integration of rapidly emerging methods in data science,” said Gerstein, the Albert L. Williams Professor of Biomedical Informatics. “Acting as this centralized forum for exchange of data science applications and knowledge, CBDS will make possible what has been impossible for any single department or school to accomplish.”
Several investigators, early members of CBDS, shared their perspectives on the power of the center as it relates to their own research and the application of data science techniques in their field more generally. The plenary speaker was X. Shirley Liu, professor of statistics, biostatistics and computational biology at the Harvard T.H. Chan School of Public Health, who gave a talk titled “Hidden immunology signals from TCGA tumor RNA-seq data.”
Among the Yale faculty who spoke were Murat Günel of neuroscience and neurosurgery, Monika Jadi of psychiatry and neuroscience, Smita Krishnaswamy of genetics and computer science, Harlan Krumholz of cardiology and the Institute for Social and Policy Studies, Andre Levchenko of biomedical engineering, and Daniel Spielman of statistics, computer, and data science.
In his presentation, “Data Science and Medicine: Inventing the Future of Healthcare,” Krumholz made clear what’s at stake in doing data science right, especially at a research institution like Yale. “The current medical research enterprise cannot produce information at the pace required by patients, clinicians, and policymakers,” he said. “At Yale, we need to take responsibility for the ‘end to end’ use of data. Following the harvesting and analyzing of data, the implementation of data-driven approaches at the bedside is often where research enterprises fall short. Yale will be one of those places where we close the loop and translate our work into benefit.”
Krumholz, a cardiologist, described using big data to build an intelligent tool that would detect, with more accuracy than a trained doctor, coronary stenosis — the narrowing of arteries in the heart, the condition often responsible for heart attacks. “For at the end of the day,” said Krumholz, “we must judge ourselves not by the fancy algorithms we produce but by the positive impact we have on people’s lives.”
CBDS has a founding group of Yale faculty members, and other faculty who are interested in becoming a member should send their CV to Beth Pranger (beth.pranger@yale.edu) in the Dean’s Office at the School of Medicine. All CBDS members will be listed on a new website and will receive emails about relevant seminars and center events. CBDS members will be expected to participate in center activities such as workshops, teaching, and recruitment committees, with specifics to be defined as the center evolves. A committee has convened to search for a permanent executive director for the center as well as a physical space for collaboration between CBDS members.