Hua Xu, PhD
he/him/his
Robert T. McCluskey Professor of Biomedical Informatics and Data Science; Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science; Assistant Dean for Biomedical Informatics, Yale School of Medicine
Research & Publications
Biography
News
Coauthors
Selected Publications
- Development of Clinical NLP SystemsXu H, Demner Fushman D. Development of Clinical NLP Systems. 2024, 301-324. DOI: 10.1007/978-3-031-55865-8_11.
- Medical Concept NormalizationXu H, Demner Fushman D, Hong N, Raja K. Medical Concept Normalization. 2024, 137-164. DOI: 10.1007/978-3-031-55865-8_6.
- Introduction to Natural Language Processing of Clinical TextDemner Fushman D, Xu H. Introduction to Natural Language Processing of Clinical Text. 2024, 3-11. DOI: 10.1007/978-3-031-55865-8_1.
- NLP Applications—Other Biomedical TextsRoberts K, Xu H, Demner Fushman D. NLP Applications—Other Biomedical Texts. 2024, 429-444. DOI: 10.1007/978-3-031-55865-8_15.
- Large language models for biomedicine: foundations, opportunities, challenges, and best practicesSahoo S, Plasek J, Xu H, Uzuner Ö, Cohen T, Yetisgen M, Liu H, Meystre S, Wang Y. Large language models for biomedicine: foundations, opportunities, challenges, and best practices. Journal Of The American Medical Informatics Association 2024, ocae074. PMID: 38657567, DOI: 10.1093/jamia/ocae074.
- Repurposing non-pharmacological interventions for Alzheimer's disease through link prediction on biomedical literatureXiao Y, Hou Y, Zhou H, Diallo G, Fiszman M, Wolfson J, Zhou L, Kilicoglu H, Chen Y, Su C, Xu H, Mantyh W, Zhang R. Repurposing non-pharmacological interventions for Alzheimer's disease through link prediction on biomedical literature. Scientific Reports 2024, 14: 8693. PMID: 38622164, PMCID: PMC11018822, DOI: 10.1038/s41598-024-58604-8.
- Ensemble pretrained language models to extract biomedical knowledge from literatureLi Z, Wei Q, Huang L, Li J, Hu Y, Chuang Y, He J, Das A, Keloth V, Yang Y, Diala C, Roberts K, Tao C, Jiang X, Zheng W, Xu H. Ensemble pretrained language models to extract biomedical knowledge from literature. Journal Of The American Medical Informatics Association 2024, ocae061. PMID: 38520725, DOI: 10.1093/jamia/ocae061.
- Developing deep learning-based strategies to predict the risk of hepatocellular carcinoma among patients with nonalcoholic fatty liver disease from electronic health recordsLi Z, Lan L, Zhou Y, Li R, Chavin K, Xu H, Li L, Shih D, Zheng W. Developing deep learning-based strategies to predict the risk of hepatocellular carcinoma among patients with nonalcoholic fatty liver disease from electronic health records. Journal Of Biomedical Informatics 2024, 152: 104626. PMID: 38521180, DOI: 10.1016/j.jbi.2024.104626.
- Advancing entity recognition in biomedicine via instruction tuning of large language modelsKeloth V, Hu Y, Xie Q, Peng X, Wang Y, Zheng A, Selek M, Raja K, Wei C, Jin Q, Lu Z, Chen Q, Xu H. Advancing entity recognition in biomedicine via instruction tuning of large language models. Bioinformatics 2024, 40: btae163. PMID: 38514400, PMCID: PMC11001490, DOI: 10.1093/bioinformatics/btae163.
- A scoping review of fair machine learning techniques when using real-world dataHuang Y, Guo J, Chen W, Lin H, Tang H, Wang F, Xu H, Bian J. A scoping review of fair machine learning techniques when using real-world data. Journal Of Biomedical Informatics 2024, 151: 104622. PMID: 38452862, PMCID: PMC11146346, DOI: 10.1016/j.jbi.2024.104622.
- FedFSA: Hybrid and federated framework for functional status ascertainment across institutionsFu S, Jia H, Vassilaki M, Keloth V, Dang Y, Zhou Y, Garg M, Petersen R, St Sauver J, Moon S, Wang L, Wen A, Li F, Xu H, Tao C, Fan J, Liu H, Sohn S. FedFSA: Hybrid and federated framework for functional status ascertainment across institutions. Journal Of Biomedical Informatics 2024, 152: 104623. PMID: 38458578, PMCID: PMC11005095, DOI: 10.1016/j.jbi.2024.104623.
- Artificial intelligence-powered pharmacovigilance: A review of machine and deep learning in clinical text-based adverse drug event detection for benchmark datasetsLi Y, Tao W, Li Z, Sun Z, Li F, Fenton S, Xu H, Tao C. Artificial intelligence-powered pharmacovigilance: A review of machine and deep learning in clinical text-based adverse drug event detection for benchmark datasets. Journal Of Biomedical Informatics 2024, 152: 104621. PMID: 38447600, DOI: 10.1016/j.jbi.2024.104621.
- Prompt Tuning in Biomedical Relation ExtractionHe J, Li F, Li J, Hu X, Nian Y, Xiang Y, Wang J, Wei Q, Li Y, Xu H, Tao C. Prompt Tuning in Biomedical Relation Extraction. Journal Of Healthcare Informatics Research 2024, 8: 206-224. PMID: 38681754, PMCID: PMC11052745, DOI: 10.1007/s41666-024-00162-9.
- Improving large language models for clinical named entity recognition via prompt engineeringHu Y, Chen Q, Du J, Peng X, Keloth V, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. Journal Of The American Medical Informatics Association 2024, ocad259. PMID: 38281112, DOI: 10.1093/jamia/ocad259.
- Standardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach.Zuo X, Zhou Y, Duke J, Hripcsak G, Shah N, Banda J, Reeves R, Miller T, Waitman L, Natarajan K, Xu H. Standardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach. AMIA Annual Symposium Proceedings 2024, 2023: 834-843. PMID: 38222429, PMCID: PMC10785935.
- Mapping Clinical Documents to the Logical Observation Identifiers, Names and Codes (LOINC) Document Ontology using Electronic Health Record Systems Structured Metadata.Khan H, Mosa A, Paka V, Rana M, Mandhadi V, Islam S, Xu H, McClay J, Sarker S, Rao P, Waitman L. Mapping Clinical Documents to the Logical Observation Identifiers, Names and Codes (LOINC) Document Ontology using Electronic Health Record Systems Structured Metadata. AMIA Annual Symposium Proceedings 2024, 2023: 1017-1026. PMID: 38222329, PMCID: PMC10785913.
- Confidence score: a data-driven measure for inclusive systematic reviews considering unpublished preprintsTong J, Luo C, Sun Y, Duan R, Saine M, Lin L, Peng Y, Lu Y, Batra A, Pan A, Wang O, Li R, Marks-Anglin A, Yang Y, Zuo X, Liu Y, Bian J, Kimmel S, Hamilton K, Cuker A, Hubbard R, Xu H, Chen Y. Confidence score: a data-driven measure for inclusive systematic reviews considering unpublished preprints. Journal Of The American Medical Informatics Association 2023, 31: 809-819. PMID: 38065694, PMCID: PMC10990515, DOI: 10.1093/jamia/ocad248.
- Social and Behavior Factors of Alzheimer's Disease and Related Dementias: A National Study in the U.S.Ciciora D, Vásquez E, Valachovic E, Hou L, Zheng Y, Xu H, Jiang X, Huang K, Gabriel K, Deng H, Gallant M, Zhang K. Social and Behavior Factors of Alzheimer's Disease and Related Dementias: A National Study in the U.S. American Journal Of Preventive Medicine 2023, 66: 573-581. PMID: 37995949, DOI: 10.1016/j.amepre.2023.11.017.
- Complementary and Integrative Health Information in the literature: its lexicon and named entity recognitionZhou H, Austin R, Lu S, Silverman G, Zhou Y, Kilicoglu H, Xu H, Zhang R. Complementary and Integrative Health Information in the literature: its lexicon and named entity recognition. Journal Of The American Medical Informatics Association 2023, 31: 426-434. PMID: 37952122, PMCID: PMC10797266, DOI: 10.1093/jamia/ocad216.
- Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare IntegrationYu P, Xu H, Hu X, Deng C. Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration. Healthcare 2023, 11: 2776. PMID: 37893850, PMCID: PMC10606429, DOI: 10.3390/healthcare11202776.
- Towards More Generalizable and Accurate Sentence Classification in Medical Abstracts with Less DataHu Y, Chen Y, Xu H. Towards More Generalizable and Accurate Sentence Classification in Medical Abstracts with Less Data. Journal Of Healthcare Informatics Research 2023, 7: 542-556. PMID: 37927376, PMCID: PMC10620359, DOI: 10.1007/s41666-023-00141-6.
- The All of Us Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical ResearchMayo K, Basford M, Carroll R, Dillon M, Fullen H, Leung J, Master H, Rura S, Sulieman L, Kennedy N, Banks E, Bernick D, Gauchan A, Lichtenstein L, Mapes B, Marginean K, Nyemba S, Ramirez A, Rotundo C, Wolfe K, Xia W, Azuine R, Cronin R, Denny J, Kho A, Lunt C, Malin B, Natarajan K, Wilkins C, Xu H, Hripcsak G, Roden D, Philippakis A, Glazer D, Harris P. The All of Us Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research. Annual Review Of Biomedical Data Science 2023, 6: 443-464. PMID: 37561600, PMCID: PMC11157478, DOI: 10.1146/annurev-biodatasci-122120-104825.
- Repurposing Drugs for Alzheimer's Diseases through Link Prediction on Biomedical LiteratureXiao Y, Hou Y, Zhou H, Diallo G, Fiszman M, Wolfson J, Kilicoglu H, Chen Y, Xu H, Mantyh W, Zhang R. Repurposing Drugs for Alzheimer's Diseases through Link Prediction on Biomedical Literature. 2023, 00: 750-752. DOI: 10.1109/ichi57859.2023.00137.
- Suicide Tendency Prediction from Psychiatric Notes Using Transformer ModelsLi Z, Ameer I, Hu Y, Abdelhameed A, Tao C, Selek S, Xu H. Suicide Tendency Prediction from Psychiatric Notes Using Transformer Models. 2023, 00: 481-483. DOI: 10.1109/ichi57859.2023.00074.
- Development of a Natural Language Processing Tool to Extract Acupuncture Point Location TermsLi Y, Peng X, Li J, Peng S, Pei D, Tao C, Xu H, Hong N. Development of a Natural Language Processing Tool to Extract Acupuncture Point Location Terms. 2023, 00: 344-351. DOI: 10.1109/ichi57859.2023.00053.
- Systematic design and data-driven evaluation of social determinants of health ontology (SDoHO).Dang Y, Li F, Hu X, Keloth V, Zhang M, Fu S, Amith M, Fan J, Du J, Yu E, Liu H, Jiang X, Xu H, Tao C. Systematic design and data-driven evaluation of social determinants of health ontology (SDoHO). Journal Of The American Medical Informatics Association 2023, 30: 1465-1473. PMID: 37301740, PMCID: PMC10436148, DOI: 10.1093/jamia/ocad096.
- Automated Identification of Missing IS-A Relations in the Human Phenotype Ontology.Mohtashamian M, Hu R, Abeysinghe R, Hao X, Xu H, Cui L. Automated Identification of Missing IS-A Relations in the Human Phenotype Ontology. AMIA Annual Symposium Proceedings 2023, 2022: 785-794. PMID: 37128366, PMCID: PMC10148310.
- Representing and utilizing clinical textual data for real world studies: An OHDSI approachKeloth V, Banda J, Gurley M, Heider P, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves R, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei W, Williams A, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. Journal Of Biomedical Informatics 2023, 142: 104343. PMID: 36935011, PMCID: PMC10428170, DOI: 10.1016/j.jbi.2023.104343.
- Assess the documentation of cognitive tests and biomarkers in electronic health records via natural language processing for Alzheimer’s disease and related dementiasChen Z, Zhang H, Yang X, Wu S, He X, Xu J, Guo J, Prosperi M, Wang F, Xu H, Chen Y, Hu H, DeKosky S, Farrer M, Guo Y, Wu Y, Bian J. Assess the documentation of cognitive tests and biomarkers in electronic health records via natural language processing for Alzheimer’s disease and related dementias. International Journal Of Medical Informatics 2022, 170: 104973. PMID: 36577203, DOI: 10.1016/j.ijmedinf.2022.104973.
- ClinicalLayoutLM: A Pre-trained Multi-modal Model for Understanding Scanned Document in Electronic Health RecordsWei Q, Zuo X, Anjum O, Hu Y, Denlinger R, Bernstam E, Citardi M, Xu H. ClinicalLayoutLM: A Pre-trained Multi-modal Model for Understanding Scanned Document in Electronic Health Records. 2022, 00: 2821-2827. DOI: 10.1109/bigdata55660.2022.10020569.
- The All of Us Research Program: Data quality, utility, and diversityRamirez A, Sulieman L, Schlueter D, Halvorson A, Qian J, Ratsimbazafy F, Loperena R, Mayo K, Basford M, Deflaux N, Muthuraman K, Natarajan K, Kho A, Xu H, Wilkins C, Anton-Culver H, Boerwinkle E, Cicek M, Clark C, Cohn E, Ohno-Machado L, Schully S, Ahmedani B, Argos M, Cronin R, O’Donnell C, Fouad M, Goldstein D, Greenland P, Hebbring S, Karlson E, Khatri P, Korf B, Smoller J, Sodeke S, Wilbanks J, Hentges J, Mockrin S, Lunt C, Devaney S, Gebo K, Denny J, Carroll R, Glazer D, Harris P, Hripcsak G, Philippakis A, Roden D, Program T, Ahmedani B, Johnson C, Ahsan H, Antoine-LaVigne D, Singleton G, Anton-Culver H, Topol E, Baca-Motes K, Steinhubl S, Wade J, Begale M, Jain P, Sutherland S, Lewis B, Korf B, Behringer M, Gharavi A, Goldstein D, Hripcsak G, Bier L, Boerwinkle E, Brilliant M, Murali N, Hebbring S, Farrar-Edwards D, Burnside E, Drezner M, Taylor A, Channamsetty V, Montalvo W, Sharma Y, Chinea C, Jenks N, Cicek M, Thibodeau S, Holmes B, Schlueter E, Collier E, Winkler J, Corcoran J, D’Addezio N, Daviglus M, Winn R, Wilkins C, Roden D, Denny J, Doheny K, Nickerson D, Eichler E, Jarvik G, Funk G, Philippakis A, Rehm H, Lennon N, Kathiresan S, Gabriel S, Gibbs R, Rico E, Glazer D, Grand J, Greenland P, Harris P, Shenkman E, Hogan W, Igho-Pemu P, Pollan C, Jorge M, Okun S, Karlson E, Smoller J, Murphy S, Ross M, Kaushal R, Winford E, Wallace F, Khatri P, Kheterpal V, Ojo A, Moreno F, Kron I, Peterson R, Menon U, Lattimore P, Leviner N, Obedin-Maliver J, Lunn M, Malik-Gagnon L, Mangravite L, Marallo A, Marroquin O, Visweswaran S, Reis S, Marshall G, McGovern P, Mignucci D, Moore J, Munoz F, Talavera G, O'Connor G, O'Donnell C, Ohno-Machado L, Orr G, Randal F, Theodorou A, Reiman E, Roxas-Murray M, Stark L, Tepp R, Zhou A, Topper S, Trousdale R, Tsao P, Weidman L, Weiss S, Wellis D, Whittle J, Wilson A, Zuchner S, Zwick M. The All of Us Research Program: Data quality, utility, and diversity. Patterns 2022, 3: 100570. PMID: 36033590, PMCID: PMC9403360, DOI: 10.1016/j.patter.2022.100570.
- Improving Sentence Classification in Abstracts of Randomized Controlled Trial using Prompt LearningHu Y, Chen Y, Xu H. Improving Sentence Classification in Abstracts of Randomized Controlled Trial using Prompt Learning. 2022, 00: 606-607. DOI: 10.1109/ichi54592.2022.00119.
- DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed modelsLuo C, Islam M, Sheils N, Buresh J, Reps J, Schuemie M, Ryan P, Edmondson M, Duan R, Tong J, Marks-Anglin A, Bian J, Chen Z, Duarte-Salles T, Fernández-Bertolín S, Falconer T, Kim C, Park R, Pfohl S, Shah N, Williams A, Xu H, Zhou Y, Lautenbach E, Doshi J, Werner R, Asch D, Chen Y. DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models. Nature Communications 2022, 13: 1678. PMID: 35354802, PMCID: PMC8967932, DOI: 10.1038/s41467-022-29160-4.
- OncoSplicing: an updated database for clinically relevant alternative splicing in 33 human cancersZhang Y, Yao X, Zhou H, Wu X, Tian J, Zeng J, Yan L, Duan C, Liu H, Li H, Chen K, Hu Z, Ye Z, Xu H. OncoSplicing: an updated database for clinically relevant alternative splicing in 33 human cancers. Nucleic Acids Research 2021, 50: d1340-d1347. PMID: 34554251, PMCID: PMC8728274, DOI: 10.1093/nar/gkab851.
- Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognitionLi J, Zhou Y, Jiang X, Natarajan K, Pakhomov S, Liu H, Xu H. Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition. Journal Of The American Medical Informatics Association 2021, 28: 2193-2201. PMID: 34272955, PMCID: PMC8449609, DOI: 10.1093/jamia/ocab112.
- Nonselective beta‐blockers are associated with a lower risk of hepatocellular carcinoma among cirrhotic patients in the United StatesWijarnpreecha K, Li F, Xiang Y, Xu X, Zhu C, Maroufy V, Wang Q, Tao W, Dang Y, Pham H, Zhou Y, Li J, Zhang X, Xu H, Taner C, Yang L, Tao C. Nonselective beta‐blockers are associated with a lower risk of hepatocellular carcinoma among cirrhotic patients in the United States. Alimentary Pharmacology & Therapeutics 2021, 54: 481-492. PMID: 34224163, DOI: 10.1111/apt.16490.
- Normalizing Clinical Document Titles to LOINC Document Ontology: an Initial Study.Zuo X, Li J, Zhao B, Zhou Y, Dong X, Duke J, Natarajan K, Hripcsak G, Shah N, Banda J, Reeves R, Miller T, Xu H. Normalizing Clinical Document Titles to LOINC Document Ontology: an Initial Study. AMIA Annual Symposium Proceedings 2021, 2020: 1441-1450. PMID: 33936520, PMCID: PMC8075502.
- Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologiesRasmy L, Tiryaki F, Zhou Y, Xiang Y, Tao C, Xu H, Zhi D. Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies. Journal Of The American Medical Informatics Association 2020, 27: 1593-1599. PMID: 32930711, PMCID: PMC7647355, DOI: 10.1093/jamia/ocaa180.
- BERT-based Ranking for Biomedical Entity Normalization.Ji Z, Wei Q, Xu H. BERT-based Ranking for Biomedical Entity Normalization. AMIA Joint Summits On Translational Science Proceedings 2020, 2020: 269-277. PMID: 32477646, PMCID: PMC7233044.
- Efficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the CorpusLi Y, Luo Y, Wampfler J, Rubinstein S, Tiryaki F, Ashok K, Warner J, Xu H, Yang P. Efficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the Corpus. JCO Clinical Cancer Informatics 2020, 4: cci.19.00147. PMID: 32364754, PMCID: PMC7265793, DOI: 10.1200/cci.19.00147.
- Relation Extraction from Clinical Narratives Using Pre-trained Language Models.Wei Q, Ji Z, Si Y, Du J, Wang J, Tiryaki F, Wu S, Tao C, Roberts K, Xu H. Relation Extraction from Clinical Narratives Using Pre-trained Language Models. AMIA Annual Symposium Proceedings 2020, 2019: 1236-1245. PMID: 32308921, PMCID: PMC7153059.
- Achievability to Extract Specific Date Information for Cancer Research.Wang L, Wampfler J, Dispenzieri A, Xu H, Yang P, Liu H. Achievability to Extract Specific Date Information for Cancer Research. AMIA Annual Symposium Proceedings 2020, 2019: 893-902. PMID: 32308886, PMCID: PMC7153063.
- Electronic Health Records for Drug Repurposing: Current Status, Challenges, and Future DirectionsXu H, Li J, Jiang X, Chen Q. Electronic Health Records for Drug Repurposing: Current Status, Challenges, and Future Directions. Clinical Pharmacology & Therapeutics 2020, 107: 712-714. PMID: 32012237, PMCID: PMC10815929, DOI: 10.1002/cpt.1769.
- Artificial Intelligence Approaches for Drug Safety Surveillance and AnalysisAgah A, Liu M, Hu Y, Matheny M, Duan L, Xu H. Artificial Intelligence Approaches for Drug Safety Surveillance and Analysis. 2013, 431-452. DOI: 10.1201/b15618-28.