Hua Xu, PhD
he/him/his
Robert T. McCluskey Professor of Biomedical Informatics and Data Science; Vice Chair for Research and Development, Section of Biomedical Informatics and Data Science; Assistant Dean for Biomedical Informatics, Yale School of Medicine
Research & Publications
Biography
Coauthors
Selected Publications
- Towards More Generalizable and Accurate Sentence Classification in Medical Abstracts with Less DataHu Y, Chen Y, Xu H. Towards More Generalizable and Accurate Sentence Classification in Medical Abstracts with Less Data. Journal Of Healthcare Informatics Research 2023, 7: 542-556. PMID: 37927376, PMCID: PMC10620359, DOI: 10.1007/s41666-023-00141-6.
- Systematic design and data-driven evaluation of social determinants of health ontology (SDoHO).Dang Y, Li F, Hu X, Keloth V, Zhang M, Fu S, Amith M, Fan J, Du J, Yu E, Liu H, Jiang X, Xu H, Tao C. Systematic design and data-driven evaluation of social determinants of health ontology (SDoHO). Journal Of The American Medical Informatics Association 2023, 30: 1465-1473. PMID: 37301740, PMCID: PMC10436148, DOI: 10.1093/jamia/ocad096.
- Automated Identification of Missing IS-A Relations in the Human Phenotype Ontology.Mohtashamian M, Hu R, Abeysinghe R, Hao X, Xu H, Cui L. Automated Identification of Missing IS-A Relations in the Human Phenotype Ontology. AMIA Annual Symposium Proceedings 2023, 2022: 785-794. PMID: 37128366, PMCID: PMC10148310.
- Representing and utilizing clinical textual data for real world studies: An OHDSI approachKeloth V, Banda J, Gurley M, Heider P, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves R, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei W, Williams A, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. Journal Of Biomedical Informatics 2023, 142: 104343. PMID: 36935011, PMCID: PMC10428170, DOI: 10.1016/j.jbi.2023.104343.
- Assess the documentation of cognitive tests and biomarkers in electronic health records via natural language processing for Alzheimer’s disease and related dementiasChen Z, Zhang H, Yang X, Wu S, He X, Xu J, Guo J, Prosperi M, Wang F, Xu H, Chen Y, Hu H, DeKosky S, Farrer M, Guo Y, Wu Y, Bian J. Assess the documentation of cognitive tests and biomarkers in electronic health records via natural language processing for Alzheimer’s disease and related dementias. International Journal Of Medical Informatics 2022, 170: 104973. PMID: 36577203, DOI: 10.1016/j.ijmedinf.2022.104973.
- ClinicalLayoutLM: A Pre-trained Multi-modal Model for Understanding Scanned Document in Electronic Health RecordsWei Q, Zuo X, Anjum O, Hu Y, Denlinger R, Bernstam E, Citardi M, Xu H. ClinicalLayoutLM: A Pre-trained Multi-modal Model for Understanding Scanned Document in Electronic Health Records. 2022, 00: 2821-2827. DOI: 10.1109/bigdata55660.2022.10020569.
- The All of Us Research Program: Data quality, utility, and diversityRamirez A, Sulieman L, Schlueter D, Halvorson A, Qian J, Ratsimbazafy F, Loperena R, Mayo K, Basford M, Deflaux N, Muthuraman K, Natarajan K, Kho A, Xu H, Wilkins C, Anton-Culver H, Boerwinkle E, Cicek M, Clark C, Cohn E, Ohno-Machado L, Schully S, Ahmedani B, Argos M, Cronin R, O’Donnell C, Fouad M, Goldstein D, Greenland P, Hebbring S, Karlson E, Khatri P, Korf B, Smoller J, Sodeke S, Wilbanks J, Hentges J, Mockrin S, Lunt C, Devaney S, Gebo K, Denny J, Carroll R, Glazer D, Harris P, Hripcsak G, Philippakis A, Roden D, Program T, Ahmedani B, Johnson C, Ahsan H, Antoine-LaVigne D, Singleton G, Anton-Culver H, Topol E, Baca-Motes K, Steinhubl S, Wade J, Begale M, Jain P, Sutherland S, Lewis B, Korf B, Behringer M, Gharavi A, Goldstein D, Hripcsak G, Bier L, Boerwinkle E, Brilliant M, Murali N, Hebbring S, Farrar-Edwards D, Burnside E, Drezner M, Taylor A, Channamsetty V, Montalvo W, Sharma Y, Chinea C, Jenks N, Cicek M, Thibodeau S, Holmes B, Schlueter E, Collier E, Winkler J, Corcoran J, D’Addezio N, Daviglus M, Winn R, Wilkins C, Roden D, Denny J, Doheny K, Nickerson D, Eichler E, Jarvik G, Funk G, Philippakis A, Rehm H, Lennon N, Kathiresan S, Gabriel S, Gibbs R, Rico E, Glazer D, Grand J, Greenland P, Harris P, Shenkman E, Hogan W, Igho-Pemu P, Pollan C, Jorge M, Okun S, Karlson E, Smoller J, Murphy S, Ross M, Kaushal R, Winford E, Wallace F, Khatri P, Kheterpal V, Ojo A, Moreno F, Kron I, Peterson R, Menon U, Lattimore P, Leviner N, Obedin-Maliver J, Lunn M, Malik-Gagnon L, Mangravite L, Marallo A, Marroquin O, Visweswaran S, Reis S, Marshall G, McGovern P, Mignucci D, Moore J, Munoz F, Talavera G, O'Connor G, O'Donnell C, Ohno-Machado L, Orr G, Randal F, Theodorou A, Reiman E, Roxas-Murray M, Stark L, Tepp R, Zhou A, Topper S, Trousdale R, Tsao P, Weidman L, Weiss S, Wellis D, Whittle J, Wilson A, Zuchner S, Zwick M. The All of Us Research Program: Data quality, utility, and diversity. Patterns 2022, 3: 100570. PMID: 36033590, PMCID: PMC9403360, DOI: 10.1016/j.patter.2022.100570.
- Improving Sentence Classification in Abstracts of Randomized Controlled Trial using Prompt LearningHu Y, Chen Y, Xu H. Improving Sentence Classification in Abstracts of Randomized Controlled Trial using Prompt Learning. 2022, 00: 606-607. DOI: 10.1109/ichi54592.2022.00119.
- DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed modelsLuo C, Islam M, Sheils N, Buresh J, Reps J, Schuemie M, Ryan P, Edmondson M, Duan R, Tong J, Marks-Anglin A, Bian J, Chen Z, Duarte-Salles T, Fernández-Bertolín S, Falconer T, Kim C, Park R, Pfohl S, Shah N, Williams A, Xu H, Zhou Y, Lautenbach E, Doshi J, Werner R, Asch D, Chen Y. DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models. Nature Communications 2022, 13: 1678. PMID: 35354802, PMCID: PMC8967932, DOI: 10.1038/s41467-022-29160-4.
- OncoSplicing: an updated database for clinically relevant alternative splicing in 33 human cancersZhang Y, Yao X, Zhou H, Wu X, Tian J, Zeng J, Yan L, Duan C, Liu H, Li H, Chen K, Hu Z, Ye Z, Xu H. OncoSplicing: an updated database for clinically relevant alternative splicing in 33 human cancers. Nucleic Acids Research 2021, 50: d1340-d1347. PMID: 34554251, PMCID: PMC8728274, DOI: 10.1093/nar/gkab851.
- Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognitionLi J, Zhou Y, Jiang X, Natarajan K, Pakhomov S, Liu H, Xu H. Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition. Journal Of The American Medical Informatics Association 2021, 28: 2193-2201. PMID: 34272955, PMCID: PMC8449609, DOI: 10.1093/jamia/ocab112.
- Nonselective beta‐blockers are associated with a lower risk of hepatocellular carcinoma among cirrhotic patients in the United StatesWijarnpreecha K, Li F, Xiang Y, Xu X, Zhu C, Maroufy V, Wang Q, Tao W, Dang Y, Pham H, Zhou Y, Li J, Zhang X, Xu H, Taner C, Yang L, Tao C. Nonselective beta‐blockers are associated with a lower risk of hepatocellular carcinoma among cirrhotic patients in the United States. Alimentary Pharmacology & Therapeutics 2021, 54: 481-492. PMID: 34224163, DOI: 10.1111/apt.16490.
- Normalizing Clinical Document Titles to LOINC Document Ontology: an Initial Study.Zuo X, Li J, Zhao B, Zhou Y, Dong X, Duke J, Natarajan K, Hripcsak G, Shah N, Banda J, Reeves R, Miller T, Xu H. Normalizing Clinical Document Titles to LOINC Document Ontology: an Initial Study. AMIA Annual Symposium Proceedings 2021, 2020: 1441-1450. PMID: 33936520, PMCID: PMC8075502.
- Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologiesRasmy L, Tiryaki F, Zhou Y, Xiang Y, Tao C, Xu H, Zhi D. Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies. Journal Of The American Medical Informatics Association 2020, 27: 1593-1599. PMID: 32930711, PMCID: PMC7647355, DOI: 10.1093/jamia/ocaa180.
- BERT-based Ranking for Biomedical Entity Normalization.Ji Z, Wei Q, Xu H. BERT-based Ranking for Biomedical Entity Normalization. AMIA Joint Summits On Translational Science Proceedings 2020, 2020: 269-277. PMID: 32477646, PMCID: PMC7233044.
- Efficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the CorpusLi Y, Luo Y, Wampfler J, Rubinstein S, Tiryaki F, Ashok K, Warner J, Xu H, Yang P. Efficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the Corpus. JCO Clinical Cancer Informatics 2020, 4: cci.19.00147. PMID: 32364754, PMCID: PMC7265793, DOI: 10.1200/cci.19.00147.
- Achievability to Extract Specific Date Information for Cancer Research.Wang L, Wampfler J, Dispenzieri A, Xu H, Yang P, Liu H. Achievability to Extract Specific Date Information for Cancer Research. AMIA Annual Symposium Proceedings 2020, 2019: 893-902. PMID: 32308886, PMCID: PMC7153063.
- Relation Extraction from Clinical Narratives Using Pre-trained Language Models.Wei Q, Ji Z, Si Y, Du J, Wang J, Tiryaki F, Wu S, Tao C, Roberts K, Xu H. Relation Extraction from Clinical Narratives Using Pre-trained Language Models. AMIA Annual Symposium Proceedings 2020, 2019: 1236-1245. PMID: 32308921, PMCID: PMC7153059.
- Electronic Health Records for Drug Repurposing: Current Status, Challenges, and Future DirectionsXu H, Li J, Jiang X, Chen Q. Electronic Health Records for Drug Repurposing: Current Status, Challenges, and Future Directions. Clinical Pharmacology & Therapeutics 2020, 107: 712-714. PMID: 32012237, PMCID: PMC10815929, DOI: 10.1002/cpt.1769.