Kei-Hoi Cheung, PhD, professor of biomedical informatics and data science, has been awarded a grant by the National Institute of Environmental Health Sciences (NIEHS) to research environmental health data and drinking water contamination using AI methods.
The U.S. Environmental Protection Agency (EPA) defines emerging contaminants, or contaminants of emerging concern, as “chemicals that have not previously been detected in water, or that are being detected at significantly different levels than expected.” These potential pollutants include pharmaceuticals, microplastics, and endocrine disrupting chemicals caused by industrial land use and agricultural runoff. Researchers and government agencies warn that these chemicals may pose adverse health and ecological effects.
Only a fraction of these contaminants have been extensively evaluated, but Cheung’s project aims to address this. The study will explore how new data and metadata standards can be used to harmonize diverse environmental health information. Integrating a variety of data types in this way could help other researchers investigate drinking water contaminants and their associated impact on human health. To extract and integrate these data types, Cheung’s team will deploy artificial intelligence (AI) techniques like natural language processing (NLP) and machine learning. They also plan to build an environmental exposure knowledge graph, and engage with users to evaluate the impact of their project.
“There is a great desire by the data science, exposure science, and epidemiology communities to use data and metadata standards to accelerate environmental research workflow, gain new knowledge, and increase data reuse,” said Cheung, who is also a professor of biostatistics at the Yale School of Public Health. “Bringing this desire to fruition requires a set of community-driven standards for describing environmental exposures and linking them to human health and disease-related data.”
Cheung's co-investigators at Yale include Nicole Deziel, PhD, MHS, associate professor of epidemiology, Vasilis Vasiliou, PhD, Susan Dwight Bliss Professor of Epidemiology, and Hua Xu, PhD, FACMI, Robert T. McCluskey Professor of Biomedical Informatics and Data Science. Mark Musen, professor of biomedical informatics at Stanford University, is also a co-investigator.
The grant will award $600,000 annually for the next five years.