2025
Trans-ancestry GWAS identifies 59 loci and improves risk prediction and fine-mapping for kidney stone disease
Cao X, Jiang M, Guan Y, Li S, Duan C, Gong Y, Kong Y, Shao Z, Wu H, Yao X, Li B, Wang M, Xu H, Hao X. Trans-ancestry GWAS identifies 59 loci and improves risk prediction and fine-mapping for kidney stone disease. Nature Communications 2025, 16: 3473. PMID: 40216741, DOI: 10.1038/s41467-025-58782-7.Peer-Reviewed Original ResearchImproving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media
Li Y, Viswaroopan D, He W, Li J, Zuo X, Xu H, Tao C. Improving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media. Journal Of Biomedical Informatics 2025, 163: 104789. PMID: 39923968, DOI: 10.1016/j.jbi.2025.104789.Peer-Reviewed Original ResearchConceptsTraditional deep learning modelsDeep learning modelsRecurrent neural networkLearning modelsEntity recognitionLanguage modelF1 scoreEnsemble of deep learningAdvances of natural language processingEffectiveness of ensemble methodsMicro-averaged F1Bidirectional Encoder RepresentationsExtensive labeled dataNatural language processingFine-tuned modelsBiomedical text miningFeature representationEncoder RepresentationsEvent extractionEntity typesText dataDeep learningSequential dataGPT-2Neural networkEvaluating the Bias, type I error and statistical power of the prior Knowledge-Guided integrated likelihood estimation (PIE) for bias reduction in EHR based association studies
Jing N, Lu Y, Tong J, Weaver J, Ryan P, Xu H, Chen Y. Evaluating the Bias, type I error and statistical power of the prior Knowledge-Guided integrated likelihood estimation (PIE) for bias reduction in EHR based association studies. Journal Of Biomedical Informatics 2025, 163: 104787. PMID: 39904407, DOI: 10.1016/j.jbi.2025.104787.Peer-Reviewed Original ResearchConceptsType I errorIntegrated likelihood estimatorsElectronic health recordsUse-case analysisLikelihood estimationLow prevalence outcomesUse-casesBias reductionNaive methodEffect sizeSynthetic dataPhenotyping algorithmsEstimation biasReal-world scenariosStatistical inferenceSimulation studyAssociation effect sizesAccurate prior informationBinary outcomesPoint estimatesAssociation estimatesStatistical powerHealth recordsKnowledge-guidedOutcome prevalenceBiomedRAG: A retrieval augmented large language model for biomedicine
Li M, Kilicoglu H, Xu H, Zhang R. BiomedRAG: A retrieval augmented large language model for biomedicine. Journal Of Biomedical Informatics 2025, 162: 104769. PMID: 39814274, PMCID: PMC11837810, DOI: 10.1016/j.jbi.2024.104769.Peer-Reviewed Original Research
2024
Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study
Yang R, Zeng Q, You K, Qiao Y, Huang L, Hsieh C, Rosand B, Goldwasser J, Dave A, Keenan T, Ke Y, Hong C, Liu N, Chew E, Radev D, Lu Z, Xu H, Chen Q, Li I. Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study. Journal Of Medical Internet Research 2024, 26: e60601. PMID: 39361955, PMCID: PMC11487205, DOI: 10.2196/60601.Peer-Reviewed Original ResearchConceptsNatural language processingNatural language processing toolkitQuestion-answering taskLanguage modelText generationText processingDomain-specific language modelsNatural language processing functionsMinimal programming expertiseText generation tasksMedical knowledge graphMachine translation tasksROUGE-L scoreDomain-specific challengesAll-in-one solutionROUGE-LText summarizationBLEU scoreKnowledge graphMachine translationUnstructured textQuestion-answeringHugging FaceProcessing toolkitLanguage processingExtracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition
Zuo X, Kumar A, Shen S, Li J, Cong G, Jin E, Chen Q, Warner J, Yang P, Xu H. Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition. JCO Clinical Cancer Informatics 2024, 8: e2300166. PMID: 38885475, DOI: 10.1200/cci.23.00166.Peer-Reviewed Original ResearchConceptsNatural language processingDomain-specific language modelsNatural language processing systemsInformation extraction systemRule-based moduleNarrative clinical textsNLP tasksEntity recognitionText normalizationAssertion classificationLanguage modelInformation extractionClinical textElectronic health recordsLearning-basedClinical notesLanguage processingTest setSystem performanceHealth recordsResponse extractionTime-consumingAnticancer therapyInformationAssessment informationRepurposing non-pharmacological interventions for Alzheimer's disease through link prediction on biomedical literature
Xiao Y, Hou Y, Zhou H, Diallo G, Fiszman M, Wolfson J, Zhou L, Kilicoglu H, Chen Y, Su C, Xu H, Mantyh W, Zhang R. Repurposing non-pharmacological interventions for Alzheimer's disease through link prediction on biomedical literature. Scientific Reports 2024, 14: 8693. PMID: 38622164, PMCID: PMC11018822, DOI: 10.1038/s41598-024-58604-8.Peer-Reviewed Original ResearchConceptsAlzheimer's diseaseManual therapy techniquesR-GCNKnowledge graphAD preventionNon-pharmacological interventionsBiomedical literatureGraph convolutional network modelKG embedding modelsTest setLink prediction modelIntegrated healthConvolutional network modelImprove cognitive functionHighest scoring candidatesDomain expertsEmbedding modelNon-pharmaceutical interventionsReal-world data analysisGround truthPrevent ADCognitive functionTherapy techniquesNetwork modelDiscovery patternsA scoping review of fair machine learning techniques when using real-world data
Huang Y, Guo J, Chen W, Lin H, Tang H, Wang F, Xu H, Bian J. A scoping review of fair machine learning techniques when using real-world data. Journal Of Biomedical Informatics 2024, 151: 104622. PMID: 38452862, PMCID: PMC11146346, DOI: 10.1016/j.jbi.2024.104622.Peer-Reviewed Original ResearchConceptsReal-world dataHealth care applicationsHealth care domainMachine learningArtificial intelligenceCare applicationsMulti-modal dataIntegration of artificial intelligenceMachine learning techniquesPre-processing techniquesCare domainBias mitigation approachesPublic datasetsAI/ML modelsModel fairnessLearning techniquesOptimal fairnessHealth care dataAI toolsHealth careAlgorithmic biasML modelsAI/MLFairnessBias issuesMapping Clinical Documents to the Logical Observation Identifiers, Names and Codes (LOINC) Document Ontology using Electronic Health Record Systems Structured Metadata.
Khan H, Mosa A, Paka V, Rana M, Mandhadi V, Islam S, Xu H, McClay J, Sarker S, Rao P, Waitman L. Mapping Clinical Documents to the Logical Observation Identifiers, Names and Codes (LOINC) Document Ontology using Electronic Health Record Systems Structured Metadata. AMIA Annual Symposium Proceedings 2024, 2023: 1017-1026. PMID: 38222329, PMCID: PMC10785913.Peer-Reviewed Original ResearchConceptsDocument ontologyElectronic health recordsBag-of-words approachNatural language processing techniquesFree-text documentsLanguage processing techniquesClinical documentationLogical Observation IdentifiersText documentsStructured metadataWords approachComputational scalabilityMetadataHealth recordsEHR documentationElectronic health record fieldsProcessing techniquesOntologyDocumentsAutomated pipelineNLPScalabilityClinical careFrameworkLOINCStandardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach.
Zuo X, Zhou Y, Duke J, Hripcsak G, Shah N, Banda J, Reeves R, Miller T, Waitman L, Natarajan K, Xu H. Standardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach. AMIA Annual Symposium Proceedings 2024, 2023: 834-843. PMID: 38222429, PMCID: PMC10785935.Peer-Reviewed Original Research
2023
AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models
Datta S, Lee K, Paek H, Manion F, Ofoegbu N, Du J, Li Y, Huang L, Wang J, Lin B, Xu H, Wang X. AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models. Journal Of The American Medical Informatics Association 2023, 31: 375-385. PMID: 37952206, PMCID: PMC10797270, DOI: 10.1093/jamia/ocad218.Peer-Reviewed Original ResearchConceptsLanguage modelInformation extraction systemOverall F1 scoreCriteria informationF1 scoreManual annotationScalable solutionContextual informationComplex scenariosContextual attributesExtraction systemReal-world settingsSystem evaluationModeling capabilitiesClinical trial protocol documentsInformationProtocol documentsThe All of Us Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research
Mayo K, Basford M, Carroll R, Dillon M, Fullen H, Leung J, Master H, Rura S, Sulieman L, Kennedy N, Banks E, Bernick D, Gauchan A, Lichtenstein L, Mapes B, Marginean K, Nyemba S, Ramirez A, Rotundo C, Wolfe K, Xia W, Azuine R, Cronin R, Denny J, Kho A, Lunt C, Malin B, Natarajan K, Wilkins C, Xu H, Hripcsak G, Roden D, Philippakis A, Glazer D, Harris P. The All of Us Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research. Annual Review Of Biomedical Data Science 2023, 6: 443-464. PMID: 37561600, PMCID: PMC11157478, DOI: 10.1146/annurev-biodatasci-122120-104825.Peer-Reviewed Original ResearchA guide to the BRAIN Initiative Cell Census Network data ecosystem
Hawrylycz M, Martone M, Ascoli G, Bjaalie J, Dong H, Ghosh S, Gillis J, Hertzano R, Haynor D, Hof P, Kim Y, Lein E, Liu Y, Miller J, Mitra P, Mukamel E, Ng L, Osumi-Sutherland D, Peng H, Ray P, Sanchez R, Regev A, Ropelewski A, Scheuermann R, Tan S, Thompson C, Tickle T, Tilgner H, Varghese M, Wester B, White O, Zeng H, Aevermann B, Allemang D, Ament S, Athey T, Baker C, Baker K, Baker P, Bandrowski A, Banerjee S, Bishwakarma P, Carr A, Chen M, Choudhury R, Cool J, Creasy H, D’Orazi F, Degatano K, Dichter B, Ding S, Dolbeare T, Ecker J, Fang R, Fillion-Robin J, Fliss T, Gee J, Gillespie T, Gouwens N, Zhang G, Halchenko Y, Harris N, Herb B, Hintiryan H, Hood G, Horvath S, Huo B, Jarecka D, Jiang S, Khajouei F, Kiernan E, Kir H, Kruse L, Lee C, Lelieveldt B, Li Y, Liu H, Liu L, Markuhar A, Mathews J, Mathews K, Mezias C, Miller M, Mollenkopf T, Mufti S, Mungall C, Orvis J, Puchades M, Qu L, Receveur J, Ren B, Sjoquist N, Staats B, Tward D, van Velthoven C, Wang Q, Xie F, Xu H, Yao Z, Yun Z, Zhang Y, Zheng W, Zingg B. A guide to the BRAIN Initiative Cell Census Network data ecosystem. PLOS Biology 2023, 21: e3002133. PMID: 37390046, PMCID: PMC10313015, DOI: 10.1371/journal.pbio.3002133.Peer-Reviewed Original ResearchSystematic design and data-driven evaluation of social determinants of health ontology (SDoHO).
Dang Y, Li F, Hu X, Keloth V, Zhang M, Fu S, Amith M, Fan J, Du J, Yu E, Liu H, Jiang X, Xu H, Tao C. Systematic design and data-driven evaluation of social determinants of health ontology (SDoHO). Journal Of The American Medical Informatics Association 2023, 30: 1465-1473. PMID: 37301740, DOI: 10.1093/jamia/ocad096.Peer-Reviewed Original ResearchAutomated Identification of Missing IS-A Relations in the Human Phenotype Ontology.
Mohtashamian M, Hu R, Abeysinghe R, Hao X, Xu H, Cui L. Automated Identification of Missing IS-A Relations in the Human Phenotype Ontology. AMIA Annual Symposium Proceedings 2023, 2022: 785-794. PMID: 37128366, PMCID: PMC10148310.Peer-Reviewed Original ResearchPrediction of Brain Metastases Development in Patients With Lung Cancer by Explainable Artificial Intelligence From Electronic Health Records
Li Z, Li R, Zhou Y, Rasmy L, Zhi D, Zhu P, Dono A, Jiang X, Xu H, Esquenazi Y, Zheng W. Prediction of Brain Metastases Development in Patients With Lung Cancer by Explainable Artificial Intelligence From Electronic Health Records. JCO Clinical Cancer Informatics 2023, 7: e2200141. PMID: 37018650, PMCID: PMC10281421, DOI: 10.1200/cci.22.00141.Peer-Reviewed Original ResearchConceptsBrain metastasesExplainable artificial intelligenceFeature attribution methodsLung cancerEHR dataArtificial intelligenceCerner Health Facts databaseBM developmentExplainable artificial intelligence approachBrain metastasis developmentHealth Facts databaseElectronic health record dataRecurrent neural network modelArtificial intelligence approachHealth record dataModel decision processStructured EHR dataNeural network modelDecision processAttribution methodsHigh-quality cohortElectronic health recordsPrompt treatmentMetastasis developmentIntelligence approachRepresenting and utilizing clinical textual data for real world studies: An OHDSI approach
Keloth V, Banda J, Gurley M, Heider P, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves R, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei W, Williams A, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. Journal Of Biomedical Informatics 2023, 142: 104343. PMID: 36935011, PMCID: PMC10428170, DOI: 10.1016/j.jbi.2023.104343.Peer-Reviewed Original ResearchConceptsNatural language processingCommon data modelTextual dataNLP solutionObservational Health Data SciencesOMOP Common Data ModelSpecific use casesObservational Medical Outcomes Partnership Common Data ModelHealth Data SciencesRepresentation of informationUse casesElectronic health recordsReal-world evidence generationData scienceClinical textData modelClinical notesLanguage processingHealth recordsLoad dataClinical documentationCurrent applicationsInformationWorkflowEvidence generationBlockchain-enabled immutable, distributed, and highly available clinical research activity logging system for federated COVID-19 data analysis from multiple institutions
Kuo T, Pham A, Edelson M, Kim J, Chan J, Gupta Y, Ohno-Machado L, Anderson D, Balacha C, Bath T, Baxter S, Becker-Pennrich A, Bell D, Bernstam E, Ngan C, Day M, Doctor J, DuVall S, El-Kareh R, Florian R, Follett R, Geisler B, Ghigi A, Gottlieb A, Hinske L, Hu Z, Ir D, Jiang X, Kim K, Kim J, Knight T, Koola J, Kuo T, Lee N, Mansmann U, Matheny M, Meeker D, Mou Z, Neumann L, Nguyen N, Nick A, Ohno-Machado L, Park E, Paul P, Pletcher M, Post K, Rieder C, Scherer C, Schilling L, Soares A, SooHoo S, Soysal E, Steven C, Tep B, Toy B, Wang B, Wu Z, Xu H, Yong C, Zheng K, Zhou Y, Zucker R. Blockchain-enabled immutable, distributed, and highly available clinical research activity logging system for federated COVID-19 data analysis from multiple institutions. Journal Of The American Medical Informatics Association 2023, 30: 1167-1178. PMID: 36916740, PMCID: PMC10198529, DOI: 10.1093/jamia/ocad049.Peer-Reviewed Original ResearchConceptsFederated data analysisUser activity logsSmart contract deploymentRun-time efficiencyData analysis systemData analysis activitiesActivity logsData discoveryQuerying timeBlockchain systemBlockchain technologyNetwork transactionsCOVID-19 data analysisMultiple institutionsLow deploymentBlockchainGitHub repositoryMultiple nodesLarge networksQueriesAnalysis activitiesHigh availabilityLanguage codeBaseline solutionData analysisSocial Determinants, Cardiovascular Disease, and Health Care Cost: A Nationwide Study in the United States Using Machine Learning
Sun F, Yao J, Du S, Qian F, Appleton A, Tao C, Xu H, Liu L, Dai Q, Joyce B, Nannini D, Hou L, Zhang K. Social Determinants, Cardiovascular Disease, and Health Care Cost: A Nationwide Study in the United States Using Machine Learning. Journal Of The American Heart Association 2023, 12: e027919. PMID: 36802713, PMCID: PMC10111459, DOI: 10.1161/jaha.122.027919.Peer-Reviewed Original ResearchA hierarchical strategy to minimize privacy risk when linking “De-identified” data in biomedical research consortia
Ohno-Machado L, Jiang X, Kuo T, Tao S, Chen L, Ram P, Zhang G, Xu H. A hierarchical strategy to minimize privacy risk when linking “De-identified” data in biomedical research consortia. Journal Of Biomedical Informatics 2023, 139: 104322. PMID: 36806328, PMCID: PMC10975485, DOI: 10.1016/j.jbi.2023.104322.Peer-Reviewed Original ResearchConceptsPrivacy of individualsAppropriate privacy protectionData-driven modelsPrivacy protectionPrivacy risksData Coordination CenterData hubData repositoryHierarchical strategyPrivacyBiomedical discoveryData setsRecord linkageData Coordinating CenterRepositoryComplex strategiesCoordination centerTechnologyTechniqueDataPartiesSetHierarchy
This site is protected by hCaptcha and its Privacy Policy and Terms of Service apply