Hua Xu, PhD
Cards
Appointments
Additional Titles
Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science
Associate Dean for Biomedical Informatics, Yale School of Medicine
Director, CBB MS Program , Biomedical Informatics & Data Science
Professor, Computer Science
Contact Info
Biomedical Informatics & Data Science
101 College St
New Haven, Connecticut 06510
United States
Appointments
Additional Titles
Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science
Associate Dean for Biomedical Informatics, Yale School of Medicine
Director, CBB MS Program , Biomedical Informatics & Data Science
Professor, Computer Science
Contact Info
Biomedical Informatics & Data Science
101 College St
New Haven, Connecticut 06510
United States
Appointments
Additional Titles
Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science
Associate Dean for Biomedical Informatics, Yale School of Medicine
Director, CBB MS Program , Biomedical Informatics & Data Science
Professor, Computer Science
Contact Info
Biomedical Informatics & Data Science
101 College St
New Haven, Connecticut 06510
United States
About
Copy Link
Titles
Robert T. McCluskey Professor of Biomedical Informatics and Data Science
Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science; Associate Dean for Biomedical Informatics, Yale School of Medicine; Director, CBB MS Program , Biomedical Informatics & Data Science; Professor, Computer Science
Biography
Dr. Hua Xu is Robert T. McCluskey Professor and Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science at Yale School of Medicine (YSM). He also serves as Associate Dean for Biomedical Informatics at YSM. He received his Ph.D. in Biomedical Informatics from Columbia University. His primary research interests include biomedical natural language processing (NLP), large language models (LLMs), and AI agents, as well as their applications in clinical practice and biomedical research. His research is funded by multiple agencies (i.e., NLM, NCI, NIGMS, NIA, AHA, and CPRIT), and methods/tools developed in his lab have been widely used to support diverse biomedical applications. Dr. Xu is a fellow of both the American College of Medical Informatics (ACMI) and the International Academy of Health Sciences Informatics (IAHSI). See more information about Dr. Xu's lab here.
Appointments
Biomedical Informatics & Data Science
ProfessorPrimaryComputer Science
ProfessorSecondary
Other Departments & Organizations
- All Institutions
- Biomedical Informatics & Data Science
- Clinical NLP Lab
- Computational Biology and Biomedical Informatics
- Computer Science
- Wu Tsai Institute
- Yale Biomedical Informatics & Computing
- Yale Combined Program in the Biological and Biomedical Sciences (BBS)
Education & Training
- PhD
- Columbia University, Biomedical Informatics
- MS
- New Jersey Institute of Technology, Computer Science
- BS
- Nanjing University, Biochemistry
Research
Copy Link
Overview
Medical Research Interests
ORCID
0000-0002-5274-4672- View Lab Website
Clinical NLP Lab
Research at a Glance
Yale Co-Authors
Publications Timeline
Research Interests
Lucila Ohno-Machado, MD, MBA, PhD
Vipina K. Keloth, PhD
Huan He, PhD
Qingyu Chen, PhD
Na Hong, PhD
Harlan Krumholz, MD, SM
Natural Language Processing
Publications
Featured Publications
Benchmarking large language models for biomedical natural language processing applications and recommendations
Chen Q, Hu Y, Peng X, Xie Q, Jin Q, Gilson A, Singer M, Ai X, Lai P, Wang Z, Keloth V, Raja K, Huang J, He H, Lin F, Du J, Zhang R, Zheng W, Adelman R, Lu Z, Xu H. Benchmarking large language models for biomedical natural language processing applications and recommendations. Nature Communications 2025, 16: 3280. PMID: 40188094, PMCID: PMC11972378, DOI: 10.1038/s41467-025-56989-2.Peer-Reviewed Original ResearchCitationsAltmetricMeSH Keywords and ConceptsConceptsLanguage modelNatural language processing applicationsBiomedical natural language processingMedical question answeringLanguage processing applicationsNatural language processingGrowth of biomedical literatureMissing informationFew-shotQuestion AnsweringZero-ShotKnowledge curationLanguage processingProcessing applicationsBioNLPBART modelPerformance gapBiomedical literatureGeneral domainTaskBenchmarksBERTInformationPerformanceLLMMedical foundation large language models for comprehensive text analysis and beyond
Xie Q, Chen Q, Chen A, Peng C, Hu Y, Lin F, Peng X, Huang J, Zhang J, Keloth V, Zhou X, Qian L, He H, Shung D, Ohno-Machado L, Wu Y, Xu H, Bian J. Medical foundation large language models for comprehensive text analysis and beyond. Npj Digital Medicine 2025, 8: 141. PMID: 40044845, PMCID: PMC11882967, DOI: 10.1038/s41746-025-01533-1.Peer-Reviewed Original ResearchCitationsAltmetricConceptsText analysis tasksAnalysis tasksLanguage modelDomain-specific knowledgeZero-ShotHuman evaluationSupervised settingTask-specific instructionsClinical data sourcesSpecialized medical knowledgeChatGPTText analysisPretrainingTaskData sourcesMedical applicationsMedical knowledgeEnhanced performanceTextPerformanceImproving large language models for clinical named entity recognition via prompt engineering
Hu Y, Chen Q, Du J, Peng X, Keloth V, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. Journal Of The American Medical Informatics Association 2024, 31: 1812-1820. PMID: 38281112, PMCID: PMC11339492, DOI: 10.1093/jamia/ocad259.Peer-Reviewed Original ResearchCitationsConceptsClinical NER tasksNER taskTask-specific promptsEntity recognitionLanguage modelTraining samplesState-of-the-art modelsFew-shot learningState-of-the-artMinimal training dataTask-specific knowledgeF1-socreAnnotated samplesConcept extractionModel performanceAnnotated datasetsTraining dataF1 scoreTask descriptionFormat specificationsComplex clinical dataOptimal performanceTaskEvaluation schemaGPT modelBiomedRAG: A retrieval augmented large language model for biomedicine
Li M, Kilicoglu H, Xu H, Zhang R. BiomedRAG: A retrieval augmented large language model for biomedicine. Journal Of Biomedical Informatics 2025, 162: 104769. PMID: 39814274, PMCID: PMC11837810, DOI: 10.1016/j.jbi.2024.104769.Peer-Reviewed Original ResearchCitationsAltmetricMeSH Keywords and Concepts
2026
A birth certificate for data to improve findability, accountability, and traceability
Li R, Das A, Yang Y, Li Z, Hong N, Xu H, Martone M, Zheng W. A birth certificate for data to improve findability, accountability, and traceability. NAR Genomics And Bioinformatics 2026, 8: lqag037. PMID: 41972008, PMCID: PMC13069677, DOI: 10.1093/nargab/lqag037.Peer-Reviewed Original ResearchAltmetricMeSH Keywords and ConceptsConceptsAdvancement of artificial intelligenceProvenance informationData creatorsArtificial intelligenceInformation managementScientific domainsData generationResearch reproducibilityData stewardshipData trackingBirth certificatesDataFindabilityTraceabilityData qualityIdentified systemRobust mechanismIdentifiersInteroperabilityUniversal frameworkMetadataFrameworkIntelligenceCertificationGLP-1 Receptor Agonist Prescriptions for Adolescents With Obesity and Associated Disparities
Kim C, Sharifi M, Ross J, Chen Y, Xu H, Krumholz H, Lu Y. GLP-1 Receptor Agonist Prescriptions for Adolescents With Obesity and Associated Disparities. JAMA Pediatrics 2026, 180: 334-336. PMID: 41557442, PMCID: PMC12820773, DOI: 10.1001/jamapediatrics.2025.5708.Peer-Reviewed Original ResearchAltmetricThreading the needle: Practical considerations for merging theory-driven computational psychiatry with data-driven analytics to enhance precision health at scale
Cheng A, Konova A, Powers A, Corlett P, Levy I, Gu X, Huys Q, Pushkarskya H, Fineberg S, Hauser T, Bzdok D, Harpaz-Rotem I, Babuscio T, Nichols L, Zhao Y, Sharma M, Meeker D, Xu H, Rutledge R, Pearlson G, Pittenger C, Yip S. Threading the needle: Practical considerations for merging theory-driven computational psychiatry with data-driven analytics to enhance precision health at scale. Biological Psychiatry Cognitive Neuroscience And Neuroimaging 2026 PMID: 41763489, DOI: 10.1016/j.bpsc.2026.02.009.Peer-Reviewed Original ResearchConceptsComputational psychiatryLongitudinal trajectoriesLongitudinal dataParsing heterogeneityDiagnostic boundariesBehavioral tasksPsychiatric disordersPsychiatric diagnosticsDimensional approachLongitudinal courseSymptom trajectoriesCognitive processesDiagnostic categoriesPsychiatryIndividual changesUnderlying mechanismsClinical research methodsClinical researchHiTOPRDoCIndividualsClinical realityDisordersSymptomsTaskA suite of large language models for public health infoveillance
Zhou X, Zhou J, Wang C, Xie Q, Ding K, Mao C, Liu Y, Cao Z, Chu H, Chen X, Xu H, Larson H, Luo Y. A suite of large language models for public health infoveillance. Npj Digital Medicine 2026, 9: 270. PMID: 41731011, DOI: 10.1038/s41746-026-02435-6.Peer-Reviewed Original ResearchAltmetricConceptsLanguage modelZero-shot performanceState-of-the-artTraining corpusEnglish tasksMultilingual taskSocial mediaMultilingual capabilitiesHealth monitoringCost-effective solutionPublic engagementLanguagePublic sentimentTaskHealth issuesInfoveillancePublic health issueLoRaEnglishPublic health monitoringCorpusHealth interventionsCancerLLM: a large language model in cancer domain
Li M, Zhan Z, Huang J, Yeung J, Ding K, Blaes A, Johnson S, Liu H, Xu H, Zhang R. CancerLLM: a large language model in cancer domain. Npj Digital Medicine 2026, 9: 266. PMID: 41720895, PMCID: PMC13036061, DOI: 10.1038/s41746-026-02441-8.Peer-Reviewed Original ResearchCitationsAltmetricConceptsLanguage modelAverage F1 score improvementF1 score improvementPhenotype extractionCancer domainGPU usageNLP tasksF1 scoreDiagnosis generationGeneration taskRobust solutionComputational burdenHealthcare settingsLack modelsClinical notesTaskScore improvementGPUClinical researchInternational benchmarksCancer typesBenchmarksDomainLeveraging multi-modal foundation models for analysing spatial multi-omic and histopathology data
Liu T, Huang T, Ding T, Wu H, Humphrey P, Perincheri S, Schalper K, Ying R, Xu H, Zou J, Mahmood F, Zhao H. Leveraging multi-modal foundation models for analysing spatial multi-omic and histopathology data. Nature Biomedical Engineering 2026, 1-18. PMID: 41644824, DOI: 10.1038/s41551-025-01602-6.Peer-Reviewed Original ResearchAltmetricConceptsMulti-omics analysisMulti-omics technologiesSpatial biologyMulti-OmicsSpatial domain identificationBiological discoveryInteraction inferenceMulti-modal representationTissue contextDownstream tasksLanguage modelHistopathological imagesData typesProtein expressionSpatial multi-omicsDisease predictionComputational frameworkComplementary informationDomain identificationSingle-mode modelFoundation modelGenesMedical reportsProtein
Academic Achievements & Community Involvement
Copy Link
News
Copy Link
News
- March 24, 2026
AI in Cancer Workshop: Advancing Precision Medicine through Interdisciplinary Innovation
- March 04, 2026
AI in Medicine: Collaborating on Challenges and Opportunities
- September 16, 2025Source: NIH
Yale Team Recognized in NIH $1 Million Data Sharing Challenge
- July 01, 2025
Hua Xu, PhD, Receives NIH Supplement to Advance Mental Health Research
Get In Touch
Copy Link
Contacts
Biomedical Informatics & Data Science
101 College St
New Haven, Connecticut 06510
United States
Events
Everyone
Everyone Qingxia "Cindy" Chen, PhD
Everyone
Everyone Speakers to be announced.