The Yale School of Public Health (YSPH) is creating a robust data science and data equity (DSDE) program that aims to transform public health research through data science discoveries and equitable implementation, and to provide data analysts with a new, collaborative career path in academic public health. The DSDE educational programming will enable the next generation of public health leaders to master core data science concepts.
The comprehensive DSDE strategy addresses topics ranging from new technology, including artificial intelligence, to global social change.
“Health equity refers to the principle that all individuals should have the opportunity to attain their highest level of health,” said Dr. Bhramar Mukherjee, PhD, who leads the DSDE initiative. “It involves ensuring that everyone has a fair and just opportunity to be as healthy as possible, regardless of socioeconomic status, race, ethnicity, gender, sexual identity, age, political belief, geographic location, or other personal attributes and circumstances. Similarly, data equity represents an ethos where all individuals in the world should benefit equally from data science innovations and products; communities across the globe should have equitable representation in terms of quality and availability of data.”
Dean Megan L. Ranney, MD, MPH, has identified data-driven leadership as a central pillar for the future of public health. “Our team is excited to be part of this school-wide, strategic vision,” Mukherjee said.
A Future Data Services Enclave
In building a strong academic data science workforce, Mukherjee imagines a community of DSDE analysts who are well-paid and have clear career paths, and whose work includes sharing reusable analytics and best practices with each other. The future Data Services Enclave will offer “quick turnaround services” such as basic data cleaning and formatting, visualization and coding, and AI assistance. She calls it “a good home for practicing data science with a mission.”
The DSDE Task Force
In September, the DSDE Task Force, which is responsible for guiding and implementing the school’s DSDE strategy, was appointed. Members of the task force represent various departments at YSPH and Yale, including Biostatistics, Social & Behavioral Sciences, Epidemiology of Microbial Diseases, Health Policy Management, Chronic Disease Epidemiology, Environmental Health Sciences, as well as Yale School of Medicine, the Data-Intensive Social Science Center (DISSC), and graduate students.
The task force discussed DSDE’s vision, mission, and aspirations for public health data science, with a special focus on ensuring that data science is practiced in an equitable way and drives equitable decisions.
“This is a defining moment for data science,” Mukherjee said. A Congressional bill, the Data Sciences and Literacy Act of 2023, was introduced proposing a grant program for data literacy, data science, and statistical education, and President Biden signed an executive order for safe, secure and trustworthy use of AI in October, 2023.
“AI is possibly going to play a major role in our lives, in our scholarship, and in our future,” Mukherjee said. “But biased data collection, measurements, and exclusionary cohorts for training AI models coupled with blindly trained algorithms can result in harmful and incorrect conclusions. This results in misguided policies that make inequity and disparity worse. The voices of those that are unseen in the datasets remain unheard.”
A DSDE definition
“While data science leans on computer science, statistics, and domain science, and has a very well-established definition, our ideation of data equity revolves around four core areas: advocating for representative, high-quality data collection across the world; studying populations that continue to be underrepresented in scientific studies; invoking best practices around generalizability, representativeness, causality in our current data analysis; and focusing on algorithmic fairness, accountability, transparency and the ethics in AI. This focus is what sets DSDE apart,” Mukherjee said. The DSDE work at YSPH builds on and leverages the existing rich landscape of data science and AI at Yale. “YSPH can lead change by embracing the principles of data equity and fairness in everyday analysis of research data,” she said.
What is next?
DSDE must elevate and transform public health training and research in the coming years by growing four areas of investment:
Educational Programming
Equip the next generation of public health leaders with an essential mastery of core data science and AI skills needed to drive innovative research and foster equitable health outcomes.
Services and Resources
Offer a suite of services and resources to support public health research and data science efforts with scalable and nimble implementation.
Community Outreach
Create engaging workshops and networking opportunities designed to foster a collaborative community of public health researchers and data scientists who are connected with external partners at Yale and beyond.
Research Profiles
Develop innovative methods and tools that promote equitable health outcomes and drive positive social change.
About Dr. Bhramar Mukherjee
Dr. Bhramar Mukherjee, PhD, is the Anna M.R. Lauder Professor of Biostatistics and Professor of Chronic Disease Epidemiology at the Yale School of Public Health (YSPH). Professor Mukherjee serves as the inaugural Senior Associate Dean of Public Health Data Science and Data Equity at YSPH. She holds a secondary appointment in the Department of Statistics and Data Science and is affiliated with the MacMillan Center and the Institute for the Foundations of Data Science. She serves on the Yale Cancer Center Director’s Cabinet.
Dr. Mukherjee joined the YSPH faculty on August 1, 2024, as the inaugural senior associate dean of public health data science and data equity. She assumed the Lauder chair previously held by Dr. Paul D, Cleary, PhD, the school’s former dean.
A leader in the field of biostatistics – with pioneering contributions in the integration of genetic, environmental, and healthcare data – her research interests include analysis of electronic health records, studies of gene-environment interactions, shrinkage estimation, data integration, and assessment of multiple environment pollutants. Her collaborative contributions have focused on cancer, cardiovascular diseases, COVID-19, exposure science, environmental epidemiology, and reproductive health.