Kei-Hoi Cheung, PhD
Professor of Biomedical Informatics & Data ScienceCards
Additional Titles
Professor, Biostatistics
Contact Info
Appointments
Additional Titles
Professor, Biostatistics
Contact Info
Appointments
Additional Titles
Professor, Biostatistics
Contact Info
About
Titles
Professor of Biomedical Informatics & Data Science
Professor, Biostatistics
Biography
Kei-Hoi Cheung, PhD has distinguished himself as a researcher and educator in the field of Biomedical Informatics with a growing national and international reputation. A particular strength is Dr. Cheung’s ability to forge strong, productive collaborations with a range of different bioscience researchers at Yale, in which his contributions include the development of complex databases and informatics tools that are critical for the research projects being performed. In the context of these collaborations, Dr. Cheung is simultaneously able to carry out his own informatics research on issues involved in robust interoperation and integration of databases and tools in the biosciences. In addition to giving talks and presentations at national and international meetings, he has published his own informatics research in peer-reviewed journals and conference proceedings, as well as contributing to publications focused on his collaborators’ research. He has established a broad base of collaborations with faculty in different departments at Yale, including Genetics, Pathology, Computer Science, Biostatistics, Molecular Biophysics and Biochemistry, and Biology. He was Director of Biostatistics and Bioinformatics Core of the NIDA Proteomics Center, focused on collaborative informatics support of neuroproteomics research at Yale. In addition to being a collaborator on numerous grants, Dr. Cheung has been PI on several federal grants (NIH and NSF). Dr. Cheung is also a core faculty member of Yale's Ph.D. Program in Computational Biology and Bioinformatics.
Dr. Cheung’ s research interests include the semantic web using the next generation of web technologies to integrate life science data and tools, and is co-editor of two books for Springer-Verlag titled: “Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences” and “Semantic e-Science”, respectively. Dr. Cheung also led the BioRDF task force (2008-2010) of the Semantic Web for Health Care and Life Sciences Interest Group that is an international community engaging in the creative use of Semantic Web in biomedicine. In addition, Dr. Cheung has recently embarked on natural language processing (NLP) projects in annotating, extracting and retrieving information from clinical text as part of the Veteran Administration (VA) electronic medical records. In summary, Dr. Cheung’s biomedical informatics expertise in database/semantic web research and NLP tool development, his national and international recognition as a researcher/educator, and his research contributions in these areas exemplify the attributes of a prominent researcher in biomedical informatics.
Appointments
Biomedical Informatics & Data Science
ProfessorPrimaryBiostatistics
ProfessorSecondary
Other Departments & Organizations
- Alzheimer's Disease Research Center (ADRC)
- Biomedical Informatics & Data Science
- Biostatistics
- Center for Biomedical Data Science
- Computational Biology and Biomedical Informatics
- Computational Biology and Bioinformatics
- Emergency Medicine York Street Campus Faculty
- NIDA Neuroproteomics Center
- Proteomics
- Yale Combined Program in the Biological and Biomedical Sciences (BBS)
- Yale School of Public Health
- Yale Superfund Research Center
- Yale Ventures
Education & Training
- PhD
- University of Connecticut, Computer Science (1998)
Research
Overview
Ongoing Projects:
- Yale Protein Expression Database (YPED). YPED is an institution-wide database for use by proteomics researchers at Yale and outside of Yale
- Human Immunology Project Consortium (HIPC). HIPC was established by NIAID, which generates a wide variety of phenotypic and molecular data from well-characterized patient cohorts, including genome-wide
expression profiling, high-dimensional flow cytometry and serum cytokine concentrations. The adoption and adherence
to data standards is critical to enable data integration across HIPC centers, and facilitate data re-use by the wider scientific community. One key component of HIPC involves data standardization effort, along with the infrastructure that has been developed. - Center for Expanded Data Annotation and Retrieval (CEDAR). CEDAR is part of the Big Data to Knowledge (BD2K) initiative funded by NIH. It studies the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse.
- Clinical Natural Language Processing (NLP). To extract and retrieve information from large amounts of clinical notes (unstructured data) for facilitating clinical research, a variety of NLP techniques including the incorporation of ontologies have been explored in different domains including lung/colon cancer, post-traumatic stress disorder, psychogenic nonepileptic seizure, and chronic pain.
Medical Research Interests
ORCID
0000-0001-6432-9372- View Lab Website
Biomedical Informatics & Data Science
Research at a Glance
Yale Co-Authors
Publications Timeline
Research Interests
Hamada Hamid Altalib, DO, MPH, FAES
Joseph Lucien Goulet, PhD, MS
Benjamin Tolchin, MD, MS, FAAN (Neurology), FAES
Caroline Zeiss, DACVP, DACLAM
Cynthia Brandt, MD, MPH
Michael Krauthammer, MD, PhD
Natural Language Processing
Publications
2024
Chemical entity normalization for successful translational development of Alzheimer’s disease and dementia therapeutics
Mullin S, McDougal R, Cheung K, Kilicoglu H, Beck A, Zeiss C. Chemical entity normalization for successful translational development of Alzheimer’s disease and dementia therapeutics. Journal Of Biomedical Semantics 2024, 15: 13. PMID: 39080729, PMCID: PMC11290083, DOI: 10.1186/s13326-024-00314-1.Peer-Reviewed Original ResearchMeSH Keywords and ConceptsConceptsEntity normalizationChemical mentionsNatural language modelDictionary-based methodsDictionary-based approachCRAFT corpusDownstream tasksLanguage modelChemical Entities of Biological InterestPubMedBERT modelDisambiguationChEBIDownstream applicationsArticle abstractsRelationship typesMentionsPubMedBERTOntologyDementia literatureTaskDementia CohortMethodEntitiesAccuracyDementia
2020
Psychosis and Seizures in the Veteran Population (4651)
Bornovski Y, Argraves S, Jackson-Shaheed E, Tolchin B, Goulet J, Cheung K, Hitchins A, Altalib H. Psychosis and Seizures in the Veteran Population (4651). Neurology 2020, 94 DOI: 10.1212/wnl.94.15_supplement.4651.Peer-Reviewed Original Research
2018
Preliminary Report of Psychogenic Non-Epileptic Seizure Diagnosis Among Veterans From 2004–2014 (P6.275)
Khan A, Proops N, Flaherty J, Fenton B, Pugh M, Cheung K, Goulet J, Brandt C, Altalib H. Preliminary Report of Psychogenic Non-Epileptic Seizure Diagnosis Among Veterans From 2004–2014 (P6.275). Neurology 2018, 90 DOI: 10.1212/wnl.90.15_supplement.p6.275.Peer-Reviewed Original Research
2013
Utilizing protein structure to identify non-random somatic mutations
Ryslik GA, Cheng Y, Cheung KH, Modis Y, Zhao H. Utilizing protein structure to identify non-random somatic mutations. BMC Bioinformatics 2013, 14: 190. PMID: 23758891, PMCID: PMC3691676, DOI: 10.1186/1471-2105-14-190.Peer-Reviewed Original ResearchMeSH Keywords and ConceptsConceptsProtein Data BankProtein structureSomatic mutationsTertiary structureDimensional protein structureProtein tertiary structureTertiary protein structureDriver mutationsMutation clustersTumor suppressorProteinMutational clusteringPharmacological successMutationsCancer proteinsNovel clusterMutational clustersOncogeneSingle strandsR packageData BankCurrent methodologiesGenomeEIF2AK2Suppressor
2012
A semantic web framework to integrate cancer omics data with biological knowledge
Holford ME, McCusker JP, Cheung KH, Krauthammer M. A semantic web framework to integrate cancer omics data with biological knowledge. BMC Bioinformatics 2012, 13: s10. PMID: 22373303, PMCID: PMC3471346, DOI: 10.1186/1471-2105-13-s1-s10.Peer-Reviewed Original ResearchMeSH Keywords and ConceptsConceptsBiological knowledgeOmics dataSemantic modelFundamental biological knowledgeGene ontology dataCancer omics dataEpigenomic dataRegulatory networksSemantic Web technologiesTranscription factorsSemantic Web frameworkUnified data sourceGene promoterDemethylating agentApoptosis pathwayAnti-cancer therapySemantic WebSPARQL endpointsWeb technologiesRDF triplesWeb frameworkOntology dataData warehouseUniform interfaceReasoning tools
2010
Structured digital tables on the Semantic Web: toward a structured digital literature
Cheung KH, Samwald M, Auerbach RK, Gerstein MB. Structured digital tables on the Semantic Web: toward a structured digital literature. Molecular Systems Biology 2010, 6: msb201045. PMID: 20739925, PMCID: PMC2950080, DOI: 10.1038/msb.2010.45.Peer-Reviewed Original ResearchMeSH Keywords and Concepts
2007
AlzPharm: integration of neurodegeneration data using RDF
Lam H, Marenco L, Clark T, Gao Y, Kinoshita J, Shepherd G, Miller P, Wu E, Wong GT, Liu N, Crasto C, Morse T, Stephens S, Cheung KH. AlzPharm: integration of neurodegeneration data using RDF. BMC Bioinformatics 2007, 8: s4. PMID: 17493287, PMCID: PMC1892101, DOI: 10.1186/1471-2105-8-s3-s4.Peer-Reviewed Original ResearchMeSH Keywords and ConceptsMeSH KeywordsBrainDatabase Management SystemsDatabases, FactualDocumentationHumansInformation DisseminationInformation Storage and RetrievalInternationalityInternetNatural Language ProcessingNerve Tissue ProteinsNeurodegenerative DiseasesNeurosciencesPilot ProjectsResearchResearch DesignSemanticsSystems IntegrationConceptsResource Description FrameworkSemantic Web approachWeb approachData modelData setsRDF data modelStandard data modelHeterogeneous data setsData integration approachData of interestMultiple research domainsRDF SchemaDescription FrameworkDomain ontologyAdvanced queriesOracle databaseDescription languageWeb interfaceBiomedical dataData integrationSeamless integrationNeuroscience dataOntological structureResearch domainNeuroscience researchers
2005
YeastHub: a semantic web use case for integrating data in the life sciences domain
Cheung KH, Yip KY, Smith A, Deknikker R, Masiar A, Gerstein M. YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics 2005, 21: i85-i96. PMID: 15961502, DOI: 10.1093/bioinformatics/bti1026.Peer-Reviewed Original ResearchMeSH Keywords and ConceptsConceptsResource Description FrameworkRDF Site SummaryData warehouseData integrationPrototype web-based applicationSemantic Web technologiesRDF data storesWeb-based applicationRDF formatSemantic WebWeb technologiesData storesDescription FrameworkRDF mappingRDF structureUse casesData repositorySite SummaryBiological datasetsDifferent formatsIntegration needsWarehouseDifferent resourcesQueriesTabular format
Academic Achievements & Community Involvement
activity SenseLab Project
ResearchDetails01/01/2006 - 01/01/2006Austria; China; South KoreaAbstract/SynopsisThis is a neuroscience informatics project involving the exploration of Semantic Web technologies in neuroscience data integration. In 2006 a visiting scholar from Zhejiang University, Hangzhou, Zhejiang, China came to YCMI to collaborate with Dr. Cheung on this neuroinformatics project. A PhD student from University of Vienna, Austria visited YCMI to work with Prof. Cheung as a summer intern on the SenseLab/Semantic Web project. It appears likely that a professor from the Catholic University of Korea, Seoul, South Korea will join YCMI as a visiting scholar to collaborate with Prof. Cheung on the SenseLab/Semantic Web project. URL: http://senselab.med.yale.edu
News & Links
News
- May 15, 2024
Cheung Receives NIH Grant to Research Water Contaminants and Human Health
- October 02, 2023
What Does Natural Language Processing Mean for Biomedicine?
- October 03, 2018Source: Yale Daily News
School of Public Health to offer new degree
- October 24, 2005
Research on Premature Birth Boosted with $10 Million NIH Grant