Hua Xu, PhD
Cards
Appointments
Additional Titles
Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science
Assistant Dean for Biomedical Informatics, Yale School of Medicine
Contact Info
Biomedical Informatics & Data Science
100 College St
New Haven, Connecticut 06510
United States
Appointments
Additional Titles
Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science
Assistant Dean for Biomedical Informatics, Yale School of Medicine
Contact Info
Biomedical Informatics & Data Science
100 College St
New Haven, Connecticut 06510
United States
Appointments
Additional Titles
Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science
Assistant Dean for Biomedical Informatics, Yale School of Medicine
Contact Info
Biomedical Informatics & Data Science
100 College St
New Haven, Connecticut 06510
United States
About
Titles
Robert T. McCluskey Professor of Biomedical Informatics and Data Science
Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science; Assistant Dean for Biomedical Informatics, Yale School of Medicine
Biography
Dr. Hua Xu is a well-known researcher in clinical natural language processing (NLP). He has developed novel algorithms for important clinical NLP tasks such as entity recognition and relation extraction, which have been top ranked in over a dozen of international biomedical NLP challenges. His lab has developed CLAMP, a comprehensive clinical NLP toolkit that has been successfully commercialized and used by hundreds of healthcare organizations. Moreover, he has led multiple national/international initiatives (e.g., Chair of the NLP working group at Observational Health Data Sciences and Informatics - OHDSI program) to apply developed NLP technologies to diverse clinical and translational studies, thus greatly accelerating clinical evidence generation using electronic health records data. Recently, he also utilizes NLP to harmonize metadata of biomedical digital objects (e.g., indexing millions of biomedical datasets to make them findable), with the goal to promote FAIR principles in biomedicine. Currently Dr. Xu's lab is actively working on developing large language models for diverse biomedical applications. See more information about Dr. Xu's lab here.
Appointments
Biomedical Informatics & Data Science
ProfessorPrimary
Other Departments & Organizations
- Biomedical Informatics & Data Science
- Computational Biology and Biomedical Informatics
- Wu Tsai Institute
- Yale Biomedical Informatics & Computing
- Yale Combined Program in the Biological and Biomedical Sciences (BBS)
Education & Training
- PhD
- Columbia University, Biomedical Informatics
- MS
- New Jersey Institute of Technology, Computer Science
- BS
- Nanjing University, Biochemistry
Research
Overview
Medical Research Interests
ORCID
0000-0001-9730-7276- View Lab Website
Clinical NLP Lab
Research at a Glance
Yale Co-Authors
Publications Timeline
Research Interests
Lucila Ohno-Machado, MD, MBA, PhD
Vipina K. Keloth, PhD
Qingyu Chen, PhD
Tsung-Ting Kuo, PhD
Huan He, PhD
Jihoon Kim, PhD
Natural Language Processing
Publications
2024
OncoSplicing 3.0: an updated database for identifying RBPs regulating alternative splicing events in cancers
Zhang Y, Liu K, Xu Z, Li B, Wu X, Fan R, Yao X, Wu H, Duan C, Gong Y, Chen K, Zeng J, Li L, Xu H. OncoSplicing 3.0: an updated database for identifying RBPs regulating alternative splicing events in cancers. Nucleic Acids Research 2024, gkae1098. PMID: 39558172, DOI: 10.1093/nar/gkae1098.Peer-Reviewed Original ResearchConceptsRNA-binding proteinsAlternative splicing eventsAS eventsSplicing eventsAlternative splicingPotential RNA-binding proteinsRegulate alternative splicing eventsTCGA cancersRNA-binding motifRNA-seq dataRegulate gene expressionMRNA expression dataECLIP-seqGTEx tissuesENCODE projectAbnormal alternative splicingIntron sequencesSplicing analysisRNA-seqExpression dataProtein complexesMinigene constructsSplicingGene expressionPerturbation experimentsSirtuin1 Suppresses Calcium Oxalate Nephropathy via Inhibition of Renal Proximal Tubular Cell Ferroptosis Through PGC‐1α‐mediated Transcriptional Coactivation
Duan C, Li B, Liu H, Zhang Y, Yao X, Liu K, Wu X, Mao X, Wu H, Xu Z, Zhong Y, Hu Z, Gong Y, Xu H. Sirtuin1 Suppresses Calcium Oxalate Nephropathy via Inhibition of Renal Proximal Tubular Cell Ferroptosis Through PGC‐1α‐mediated Transcriptional Coactivation. Advanced Science 2024, e2408945. PMID: 39498889, DOI: 10.1002/advs.202408945.Peer-Reviewed Original ResearchConceptsCrystal-induced kidney injuryPGC-1aSingle-cell transcriptome sequencingNuclear factor erythroid 2-related factor 2Resistance to ferroptosisKidney injuryTranscriptional coactivatorTranscriptome sequencingRenal tubular epithelial cell injuryCalcium oxalate nephropathyPromoter regionRenal proximal tubular cellsTubular epithelial cell injuryEpithelial cell injuryProximal tubular cellsFactor erythroid 2-related factor 2Erythroid 2-related factor 2Oxalate nephropathyCell ferroptosisSIRT1Crystal nephropathyFerroptosisTubular cellsGPX4 transcriptionTherapeutic targetSEETrials: Leveraging large language models for safety and efficacy extraction in oncology clinical trials
Lee K, Paek H, Huang L, Hilton C, Datta S, Higashi J, Ofoegbu N, Wang J, Rubinstein S, Cowan A, Kwok M, Warner J, Xu H, Wang X. SEETrials: Leveraging large language models for safety and efficacy extraction in oncology clinical trials. Informatics In Medicine Unlocked 2024, 50: 101589. PMID: 39493413, PMCID: PMC11530223, DOI: 10.1016/j.imu.2024.101589.Peer-Reviewed Original ResearchConceptsAntibody-drug conjugatesOverall response rateMultiple myelomaF1 scoreCAR-TComplete responseBispecific antibodiesComparative performance analysisClinical trial studyClinical trial outcomesLanguage modelAccurate data extractionTherapy subgroupFine granularityOncology clinical trialsAdverse eventsClinical decision-makingPerformance analysisClinical trialsInnovative therapiesDiverse therapiesClinical trial abstractsCancer domainData elementsTherapyImproving tabular data extraction in scanned laboratory reports using deep learning models
Li Y, Wei Q, Chen X, Li J, Tao C, Xu H. Improving tabular data extraction in scanned laboratory reports using deep learning models. Journal Of Biomedical Informatics 2024, 159: 104735. PMID: 39393477, DOI: 10.1016/j.jbi.2024.104735.Peer-Reviewed Original ResearchAltmetricConceptsTree edit distanceOptical character recognitionTable recognitionDeep learning modelsAverage recallAverage precisionState-of-the-art deep learning modelsLearning modelsRegion-of-interest detectionState-of-the-artCharacter recognitionDetection evaluationTree editingTabular dataImpressive resultsLab test resultsLaboratory test reportsClinical documentationRecognitionLaboratory reportsHealthcare organizationsClinical data analysisDecision makingClinical decision makingTest reportsAugmenting biomedical named entity recognition with general-domain resources
Yin Y, Kim H, Xiao X, Wei C, Kang J, Lu Z, Xu H, Fang M, Chen Q. Augmenting biomedical named entity recognition with general-domain resources. Journal Of Biomedical Informatics 2024, 159: 104731. PMID: 39368529, DOI: 10.1016/j.jbi.2024.104731.Peer-Reviewed Original ResearchAltmetricConceptsBioNER datasetsMulti-task learningNER datasetsEntity typesBiomedical datasetsBaseline modelGeneral domain datasetsBiomedical language modelNeural network-basedYield performance improvementsBioNER modelsEntity recognitionBiomedical corporaHuman annotatorsLabel ambiguityLanguage modelTransfer learningF1 scoreBioNERHuman effortNetwork-basedBiomedical resourcesPerformance improvementDatasetSuperior performanceAscle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study
Yang R, Zeng Q, You K, Qiao Y, Huang L, Hsieh C, Rosand B, Goldwasser J, Dave A, Keenan T, Ke Y, Hong C, Liu N, Chew E, Radev D, Lu Z, Xu H, Chen Q, Li I. Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study. Journal Of Medical Internet Research 2024, 26: e60601. PMID: 39361955, PMCID: PMC11487205, DOI: 10.2196/60601.Peer-Reviewed Original ResearchCitationsAltmetricMeSH Keywords and ConceptsConceptsNatural language processingNatural language processing toolkitQuestion-answering taskLanguage modelText generationText processingDomain-specific language modelsNatural language processing functionsMinimal programming expertiseText generation tasksMedical knowledge graphMachine translation tasksROUGE-L scoreDomain-specific challengesAll-in-one solutionROUGE-LText summarizationBLEU scoreKnowledge graphMachine translationUnstructured textQuestion-answeringHugging FaceProcessing toolkitLanguage processingRelation extraction using large language models: a case study on acupuncture point locations
Li Y, Peng X, Li J, Zuo X, Peng S, Pei D, Tao C, Xu H, Hong N. Relation extraction using large language models: a case study on acupuncture point locations. Journal Of The American Medical Informatics Association 2024, 31: 2622-2631. PMID: 39208311, PMCID: PMC11491641, DOI: 10.1093/jamia/ocae233.Peer-Reviewed Original ResearchCitationsAltmetricConceptsAcupuncture point locationsAcupoint locationLocation of acupointsClinical decision supportAcupuncture knowledgeAcupuncture trainingAcupuncture therapyAcupunctureAcupointsComplementary medicineEducational moduleWestern Pacific RegionInformatics applicationsDecision supportScoresGenerative Pre-trained TransformerWHO standardsF1 scoreLanguage modelPacific regionWHODomain-specific fine-tuningTrainingStudyMicro-averaged F1 scoreBalancing the efforts of chart review and gains in PRS prediction accuracy: An empirical study
Lei Y, Christian Naj A, Xu H, Li R, Chen Y. Balancing the efforts of chart review and gains in PRS prediction accuracy: An empirical study. Journal Of Biomedical Informatics 2024, 157: 104705. PMID: 39134233, DOI: 10.1016/j.jbi.2024.104705.Peer-Reviewed Original ResearchConceptsAlzheimer's Disease Genetics ConsortiumChart reviewPRS modelCase-control datasetGenetic association analysisGenetics ConsortiumPhenotype misclassificationSimulated phenotypesPhenotypic dataAssociation analysisEstimation of associated parametersBias reduction methodMedian thresholdPhenotypeMisclassification rateOriginal phenotypeDiverse arrayChartsMisclassificationGenotypesReviewEffects of biasBiasPrediction modelPRSRe: Iver Nordentoft, Sia Viborg Lindskrog, Karin Birkenkamp-Demtröder, et al. Whole-genome Mutational Analysis for Tumor-informed Detection of Circulating Tumor DNA in Patients with Urothelial Carcinoma. Eur Urol. In press. https://doi.org/10.1016/j.eururo.2024.05.014
Wu X, Yao X, Chen Z, Xu H. Re: Iver Nordentoft, Sia Viborg Lindskrog, Karin Birkenkamp-Demtröder, et al. Whole-genome Mutational Analysis for Tumor-informed Detection of Circulating Tumor DNA in Patients with Urothelial Carcinoma. Eur Urol. In press. https://doi.org/10.1016/j.eururo.2024.05.014. European Urology 2024 PMID: 39117526, DOI: 10.1016/j.eururo.2024.07.021.Peer-Reviewed Original ResearchLeveraging error-prone algorithm-derived phenotypes: Enhancing association studies for risk factors in EHR data
Lu Y, Tong J, Chubak J, Lumley T, Hubbard R, Xu H, Chen Y. Leveraging error-prone algorithm-derived phenotypes: Enhancing association studies for risk factors in EHR data. Journal Of Biomedical Informatics 2024, 157: 104690. PMID: 39004110, DOI: 10.1016/j.jbi.2024.104690.Peer-Reviewed Original ResearchConceptsElectronic health recordsElectronic health record dataKaiser Permanente WashingtonEHR-derived phenotypesAssociation studiesHealth recordsColon cancer recurrencePhenotyping errorsComputable phenotypeRisk factorsCancer recurrenceMultiple phenotypesReduce biasImprove estimation accuracySimulation studyBias reductionKaiserReduction of biasBiasEstimation accuracyAssociationStudyOutcomesRiskEstimation efficiency
News
News
- December 11, 2024Source: wfsb
Researchers at Yale Working with A.I. to Make Getting a Second Opinion Easier
- November 25, 2024Source: WTNH
How A.I. is Already Predicting Medical Outcomes in Connecticut
- October 21, 2024
Yale BIDS Presenting at the AMIA 2024 Annual Symposium
- October 02, 2024
NIH Awards $1.5 Million Grant to Improve Factual Correctness in Large Language Models in Health Care
Get In Touch
Contacts
Biomedical Informatics & Data Science
100 College St
New Haven, Connecticut 06510
United States