2024
Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study
Yang R, Zeng Q, You K, Qiao Y, Huang L, Hsieh C, Rosand B, Goldwasser J, Dave A, Keenan T, Ke Y, Hong C, Liu N, Chew E, Radev D, Lu Z, Xu H, Chen Q, Li I. Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study. Journal Of Medical Internet Research 2024, 26: e60601. PMID: 39361955, PMCID: PMC11487205, DOI: 10.2196/60601.Peer-Reviewed Original ResearchConceptsNatural language processingNatural language processing toolkitQuestion-answering taskLanguage modelText generationText processingDomain-specific language modelsNatural language processing functionsMinimal programming expertiseText generation tasksMedical knowledge graphMachine translation tasksROUGE-L scoreDomain-specific challengesAll-in-one solutionROUGE-LText summarizationBLEU scoreKnowledge graphMachine translationUnstructured textQuestion-answeringHugging FaceProcessing toolkitLanguage processingExtracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition
Zuo X, Kumar A, Shen S, Li J, Cong G, Jin E, Chen Q, Warner J, Yang P, Xu H. Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition. JCO Clinical Cancer Informatics 2024, 8: e2300166. PMID: 38885475, DOI: 10.1200/cci.23.00166.Peer-Reviewed Original ResearchConceptsNatural language processingDomain-specific language modelsNatural language processing systemsInformation extraction systemRule-based moduleNarrative clinical textsNLP tasksEntity recognitionText normalizationAssertion classificationLanguage modelInformation extractionClinical textElectronic health recordsLearning-basedClinical notesLanguage processingTest setSystem performanceHealth recordsResponse extractionTime-consumingAnticancer therapyInformationAssessment informationRepurposing non-pharmacological interventions for Alzheimer's disease through link prediction on biomedical literature
Xiao Y, Hou Y, Zhou H, Diallo G, Fiszman M, Wolfson J, Zhou L, Kilicoglu H, Chen Y, Su C, Xu H, Mantyh W, Zhang R. Repurposing non-pharmacological interventions for Alzheimer's disease through link prediction on biomedical literature. Scientific Reports 2024, 14: 8693. PMID: 38622164, PMCID: PMC11018822, DOI: 10.1038/s41598-024-58604-8.Peer-Reviewed Original ResearchConceptsAlzheimer's diseaseManual therapy techniquesR-GCNKnowledge graphAD preventionNon-pharmacological interventionsBiomedical literatureGraph convolutional network modelKG embedding modelsTest setLink prediction modelIntegrated healthConvolutional network modelImprove cognitive functionHighest scoring candidatesDomain expertsEmbedding modelNon-pharmaceutical interventionsReal-world data analysisGround truthPrevent ADCognitive functionTherapy techniquesNetwork modelDiscovery patternsA scoping review of fair machine learning techniques when using real-world data
Huang Y, Guo J, Chen W, Lin H, Tang H, Wang F, Xu H, Bian J. A scoping review of fair machine learning techniques when using real-world data. Journal Of Biomedical Informatics 2024, 151: 104622. PMID: 38452862, PMCID: PMC11146346, DOI: 10.1016/j.jbi.2024.104622.Peer-Reviewed Original ResearchConceptsReal-world dataHealth care applicationsHealth care domainMachine learningArtificial intelligenceCare applicationsMulti-modal dataIntegration of artificial intelligenceMachine learning techniquesPre-processing techniquesCare domainBias mitigation approachesPublic datasetsAI/ML modelsModel fairnessLearning techniquesOptimal fairnessHealth care dataAI toolsHealth careAlgorithmic biasML modelsAI/MLFairnessBias issuesMapping Clinical Documents to the Logical Observation Identifiers, Names and Codes (LOINC) Document Ontology using Electronic Health Record Systems Structured Metadata.
Khan H, Mosa A, Paka V, Rana M, Mandhadi V, Islam S, Xu H, McClay J, Sarker S, Rao P, Waitman L. Mapping Clinical Documents to the Logical Observation Identifiers, Names and Codes (LOINC) Document Ontology using Electronic Health Record Systems Structured Metadata. AMIA Annual Symposium Proceedings 2024, 2023: 1017-1026. PMID: 38222329, PMCID: PMC10785913.Peer-Reviewed Original ResearchConceptsDocument ontologyElectronic health recordsBag-of-words approachNatural language processing techniquesFree-text documentsLanguage processing techniquesClinical documentationLogical Observation IdentifiersText documentsStructured metadataWords approachComputational scalabilityMetadataHealth recordsEHR documentationElectronic health record fieldsProcessing techniquesOntologyDocumentsAutomated pipelineNLPScalabilityClinical careFrameworkLOINCStandardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach.
Zuo X, Zhou Y, Duke J, Hripcsak G, Shah N, Banda J, Reeves R, Miller T, Waitman L, Natarajan K, Xu H. Standardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach. AMIA Annual Symposium Proceedings 2024, 2023: 834-843. PMID: 38222429, PMCID: PMC10785935.Peer-Reviewed Original Research
2023
AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models
Datta S, Lee K, Paek H, Manion F, Ofoegbu N, Du J, Li Y, Huang L, Wang J, Lin B, Xu H, Wang X. AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models. Journal Of The American Medical Informatics Association 2023, 31: 375-385. PMID: 37952206, PMCID: PMC10797270, DOI: 10.1093/jamia/ocad218.Peer-Reviewed Original ResearchConceptsLanguage modelInformation extraction systemOverall F1 scoreCriteria informationF1 scoreManual annotationScalable solutionContextual informationComplex scenariosContextual attributesExtraction systemReal-world settingsSystem evaluationModeling capabilitiesClinical trial protocol documentsInformationProtocol documentsThe All of Us Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research
Mayo K, Basford M, Carroll R, Dillon M, Fullen H, Leung J, Master H, Rura S, Sulieman L, Kennedy N, Banks E, Bernick D, Gauchan A, Lichtenstein L, Mapes B, Marginean K, Nyemba S, Ramirez A, Rotundo C, Wolfe K, Xia W, Azuine R, Cronin R, Denny J, Kho A, Lunt C, Malin B, Natarajan K, Wilkins C, Xu H, Hripcsak G, Roden D, Philippakis A, Glazer D, Harris P. The All of Us Data and Research Center: Creating a Secure, Scalable, and Sustainable Ecosystem for Biomedical Research. Annual Review Of Biomedical Data Science 2023, 6: 443-464. PMID: 37561600, PMCID: PMC11157478, DOI: 10.1146/annurev-biodatasci-122120-104825.Peer-Reviewed Original ResearchA guide to the BRAIN Initiative Cell Census Network data ecosystem
Hawrylycz M, Martone M, Ascoli G, Bjaalie J, Dong H, Ghosh S, Gillis J, Hertzano R, Haynor D, Hof P, Kim Y, Lein E, Liu Y, Miller J, Mitra P, Mukamel E, Ng L, Osumi-Sutherland D, Peng H, Ray P, Sanchez R, Regev A, Ropelewski A, Scheuermann R, Tan S, Thompson C, Tickle T, Tilgner H, Varghese M, Wester B, White O, Zeng H, Aevermann B, Allemang D, Ament S, Athey T, Baker C, Baker K, Baker P, Bandrowski A, Banerjee S, Bishwakarma P, Carr A, Chen M, Choudhury R, Cool J, Creasy H, D’Orazi F, Degatano K, Dichter B, Ding S, Dolbeare T, Ecker J, Fang R, Fillion-Robin J, Fliss T, Gee J, Gillespie T, Gouwens N, Zhang G, Halchenko Y, Harris N, Herb B, Hintiryan H, Hood G, Horvath S, Huo B, Jarecka D, Jiang S, Khajouei F, Kiernan E, Kir H, Kruse L, Lee C, Lelieveldt B, Li Y, Liu H, Liu L, Markuhar A, Mathews J, Mathews K, Mezias C, Miller M, Mollenkopf T, Mufti S, Mungall C, Orvis J, Puchades M, Qu L, Receveur J, Ren B, Sjoquist N, Staats B, Tward D, van Velthoven C, Wang Q, Xie F, Xu H, Yao Z, Yun Z, Zhang Y, Zheng W, Zingg B. A guide to the BRAIN Initiative Cell Census Network data ecosystem. PLOS Biology 2023, 21: e3002133. PMID: 37390046, PMCID: PMC10313015, DOI: 10.1371/journal.pbio.3002133.Peer-Reviewed Original ResearchSystematic design and data-driven evaluation of social determinants of health ontology (SDoHO).
Dang Y, Li F, Hu X, Keloth V, Zhang M, Fu S, Amith M, Fan J, Du J, Yu E, Liu H, Jiang X, Xu H, Tao C. Systematic design and data-driven evaluation of social determinants of health ontology (SDoHO). Journal Of The American Medical Informatics Association 2023, 30: 1465-1473. PMID: 37301740, PMCID: PMC10436148, DOI: 10.1093/jamia/ocad096.Peer-Reviewed Original ResearchAutomated Identification of Missing IS-A Relations in the Human Phenotype Ontology.
Mohtashamian M, Hu R, Abeysinghe R, Hao X, Xu H, Cui L. Automated Identification of Missing IS-A Relations in the Human Phenotype Ontology. AMIA Annual Symposium Proceedings 2023, 2022: 785-794. PMID: 37128366, PMCID: PMC10148310.Peer-Reviewed Original ResearchPrediction of Brain Metastases Development in Patients With Lung Cancer by Explainable Artificial Intelligence From Electronic Health Records
Li Z, Li R, Zhou Y, Rasmy L, Zhi D, Zhu P, Dono A, Jiang X, Xu H, Esquenazi Y, Zheng W. Prediction of Brain Metastases Development in Patients With Lung Cancer by Explainable Artificial Intelligence From Electronic Health Records. JCO Clinical Cancer Informatics 2023, 7: e2200141. PMID: 37018650, PMCID: PMC10281421, DOI: 10.1200/cci.22.00141.Peer-Reviewed Original ResearchConceptsBrain metastasesExplainable artificial intelligenceFeature attribution methodsLung cancerEHR dataArtificial intelligenceCerner Health Facts databaseBM developmentExplainable artificial intelligence approachBrain metastasis developmentHealth Facts databaseElectronic health record dataRecurrent neural network modelArtificial intelligence approachHealth record dataModel decision processStructured EHR dataNeural network modelDecision processAttribution methodsHigh-quality cohortElectronic health recordsPrompt treatmentMetastasis developmentIntelligence approachRepresenting and utilizing clinical textual data for real world studies: An OHDSI approach
Keloth V, Banda J, Gurley M, Heider P, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves R, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei W, Williams A, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. Journal Of Biomedical Informatics 2023, 142: 104343. PMID: 36935011, PMCID: PMC10428170, DOI: 10.1016/j.jbi.2023.104343.Peer-Reviewed Original ResearchConceptsNatural language processingCommon data modelTextual dataNLP solutionObservational Health Data SciencesOMOP Common Data ModelSpecific use casesObservational Medical Outcomes Partnership Common Data ModelHealth Data SciencesRepresentation of informationUse casesElectronic health recordsReal-world evidence generationData scienceClinical textData modelClinical notesLanguage processingHealth recordsLoad dataClinical documentationCurrent applicationsInformationWorkflowEvidence generationBlockchain-enabled immutable, distributed, and highly available clinical research activity logging system for federated COVID-19 data analysis from multiple institutions
Kuo T, Pham A, Edelson M, Kim J, Chan J, Gupta Y, Ohno-Machado L, Anderson D, Balacha C, Bath T, Baxter S, Becker-Pennrich A, Bell D, Bernstam E, Ngan C, Day M, Doctor J, DuVall S, El-Kareh R, Florian R, Follett R, Geisler B, Ghigi A, Gottlieb A, Hinske L, Hu Z, Ir D, Jiang X, Kim K, Kim J, Knight T, Koola J, Kuo T, Lee N, Mansmann U, Matheny M, Meeker D, Mou Z, Neumann L, Nguyen N, Nick A, Ohno-Machado L, Park E, Paul P, Pletcher M, Post K, Rieder C, Scherer C, Schilling L, Soares A, SooHoo S, Soysal E, Steven C, Tep B, Toy B, Wang B, Wu Z, Xu H, Yong C, Zheng K, Zhou Y, Zucker R. Blockchain-enabled immutable, distributed, and highly available clinical research activity logging system for federated COVID-19 data analysis from multiple institutions. Journal Of The American Medical Informatics Association 2023, 30: 1167-1178. PMID: 36916740, PMCID: PMC10198529, DOI: 10.1093/jamia/ocad049.Peer-Reviewed Original ResearchConceptsFederated data analysisUser activity logsSmart contract deploymentRun-time efficiencyData analysis systemData analysis activitiesActivity logsData discoveryQuerying timeBlockchain systemBlockchain technologyNetwork transactionsCOVID-19 data analysisMultiple institutionsLow deploymentBlockchainGitHub repositoryMultiple nodesLarge networksQueriesAnalysis activitiesHigh availabilityLanguage codeBaseline solutionData analysisSocial Determinants, Cardiovascular Disease, and Health Care Cost: A Nationwide Study in the United States Using Machine Learning
Sun F, Yao J, Du S, Qian F, Appleton A, Tao C, Xu H, Liu L, Dai Q, Joyce B, Nannini D, Hou L, Zhang K. Social Determinants, Cardiovascular Disease, and Health Care Cost: A Nationwide Study in the United States Using Machine Learning. Journal Of The American Heart Association 2023, 12: e027919. PMID: 36802713, PMCID: PMC10111459, DOI: 10.1161/jaha.122.027919.Peer-Reviewed Original ResearchA hierarchical strategy to minimize privacy risk when linking “De-identified” data in biomedical research consortia
Ohno-Machado L, Jiang X, Kuo T, Tao S, Chen L, Ram P, Zhang G, Xu H. A hierarchical strategy to minimize privacy risk when linking “De-identified” data in biomedical research consortia. Journal Of Biomedical Informatics 2023, 139: 104322. PMID: 36806328, PMCID: PMC10975485, DOI: 10.1016/j.jbi.2023.104322.Peer-Reviewed Original ResearchConceptsPrivacy of individualsAppropriate privacy protectionData-driven modelsPrivacy protectionPrivacy risksData Coordination CenterData hubData repositoryHierarchical strategyPrivacyBiomedical discoveryData setsRecord linkageData Coordinating CenterRepositoryComplex strategiesCoordination centerTechnologyTechniqueDataPartiesSetHierarchy
2022
Assess the documentation of cognitive tests and biomarkers in electronic health records via natural language processing for Alzheimer’s disease and related dementias
Chen Z, Zhang H, Yang X, Wu S, He X, Xu J, Guo J, Prosperi M, Wang F, Xu H, Chen Y, Hu H, DeKosky S, Farrer M, Guo Y, Wu Y, Bian J. Assess the documentation of cognitive tests and biomarkers in electronic health records via natural language processing for Alzheimer’s disease and related dementias. International Journal Of Medical Informatics 2022, 170: 104973. PMID: 36577203, PMCID: PMC11325083, DOI: 10.1016/j.ijmedinf.2022.104973.Peer-Reviewed Original ResearchConceptsElectronic health recordsPatients' electronic health recordsCognitive testsCognitive test scoresFlorida health systemSeverity categoriesHealth recordsAD-related dementiaAD/ADRD researchAD/ADRDPatient levelAlzheimer's diseaseClinical narrativesHealth systemBiomarkersDifferent severityDiseaseSeverityPatientsADRD researchStandardized approachDementiaTest scoresPopulation characteristicsScoresAssociations Between Vascular Diseases and Alzheimer’s Disease or Related Dementias in a Large Cohort of Men and Women with Colorectal Cancer
Du X, Song L, Schulz P, Xu H, Chan W. Associations Between Vascular Diseases and Alzheimer’s Disease or Related Dementias in a Large Cohort of Men and Women with Colorectal Cancer. Journal Of Alzheimer's Disease 2022, 90: 211-231. PMID: 36093703, PMCID: PMC9661325, DOI: 10.3233/jad-220548.Peer-Reviewed Original ResearchConceptsColorectal cancerVascular diseaseCardiovascular diseaseAlzheimer's diseaseRisk of ADSignificant dose-response relationshipRetrospective cohort studyCohort of patientsTypes of dementiaLong-term riskDose-response relationshipRisk of ADRDTumor factorsCohort studyCumulative incidenceOlder patientsLarge cohortPatientsRelated dementiaHypertensionCancerDiseaseDiabetesDementiaStrokeA comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora
Li J, Wei Q, Ghiasvand O, Chen M, Lobanov V, Weng C, Xu H. A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora. BMC Medical Informatics And Decision Making 2022, 22: 235. PMID: 36068551, PMCID: PMC9450226, DOI: 10.1186/s12911-022-01967-7.Peer-Reviewed Original ResearchConceptsPre-trained language modelsNER taskUnstructured textEntity recognitionLanguage modelNatural language processing techniquesClinical trial eligibility criteriaLanguage processing techniquesData augmentation resultsData augmentation approachDomain-specific corpusBetter performanceTransformer modelCross-validation showMultiple data sourcesEligibility criteria textBiomedical domainEmbedding modelsNER performanceAugmentation approachContextual embeddingsMeaningful informationEvaluation resultsSuch documentsProcessing techniquesAssessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz I, Wang N, Yang P, Xu H, Warner J, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clinical Cancer Informatics 2022, 6: e2200006. PMID: 35917480, PMCID: PMC9470142, DOI: 10.1200/cci.22.00006.Peer-Reviewed Original Research