2024
Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study
Yang R, Zeng Q, You K, Qiao Y, Huang L, Hsieh C, Rosand B, Goldwasser J, Dave A, Keenan T, Ke Y, Hong C, Liu N, Chew E, Radev D, Lu Z, Xu H, Chen Q, Li I. Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study. Journal Of Medical Internet Research 2024, 26: e60601. PMID: 39361955, DOI: 10.2196/60601.Peer-Reviewed Original ResearchConceptsNatural language processingNatural language processing toolkitQuestion-answering taskLanguage modelText generationText processingDomain-specific language modelsNatural language processing functionsMinimal programming expertiseText generation tasksMedical knowledge graphMachine translation tasksROUGE-L scoreDomain-specific challengesAll-in-one solutionROUGE-LText summarizationBLEU scoreKnowledge graphMachine translationUnstructured textQuestion-answeringHugging FaceProcessing toolkitLanguage processingAugmenting biomedical named entity recognition with general-domain resources
Yin Y, Kim H, Xiao X, Wei C, Kang J, Lu Z, Xu H, Fang M, Chen Q. Augmenting biomedical named entity recognition with general-domain resources. Journal Of Biomedical Informatics 2024, 104731. PMID: 39368529, DOI: 10.1016/j.jbi.2024.104731.Peer-Reviewed Original ResearchBioNER datasetsMulti-task learningNER datasetsEntity typesBiomedical datasetsBaseline modelGeneral domain datasetsBiomedical language modelNeural network-basedYield performance improvementsBioNER modelsEntity recognitionBiomedical corporaHuman annotatorsLabel ambiguityLanguage modelTransfer learningF1 scoreBioNERHuman effortNetwork-basedBiomedical resourcesPerformance improvementDatasetSuperior performanceRelation extraction using large language models: a case study on acupuncture point locations
Li Y, Peng X, Li J, Zuo X, Peng S, Pei D, Tao C, Xu H, Hong N. Relation extraction using large language models: a case study on acupuncture point locations. Journal Of The American Medical Informatics Association 2024, ocae233. PMID: 39208311, DOI: 10.1093/jamia/ocae233.Peer-Reviewed Original ResearchAcupuncture point locationsAcupoint locationLocation of acupointsClinical decision supportAcupuncture knowledgeAcupuncture trainingAcupuncture therapyAcupunctureAcupointsComplementary medicineEducational moduleWestern Pacific RegionInformatics applicationsDecision supportScoresGenerative Pre-trained TransformerWHO standardsF1 scoreLanguage modelPacific regionWHODomain-specific fine-tuningTrainingStudyMicro-averaged F1 scoreBalancing the efforts of chart review and gains in PRS prediction accuracy: An empirical study
Lei Y, Christian Naj A, Xu H, Li R, Chen Y. Balancing the efforts of chart review and gains in PRS prediction accuracy: An empirical study. Journal Of Biomedical Informatics 2024, 157: 104705. PMID: 39134233, DOI: 10.1016/j.jbi.2024.104705.Peer-Reviewed Original ResearchAlzheimer's Disease Genetics ConsortiumChart reviewPRS modelCase-control datasetGenetic association analysisGenetics ConsortiumPhenotype misclassificationSimulated phenotypesPhenotypic dataAssociation analysisEstimation of associated parametersBias reduction methodMedian thresholdPhenotypeMisclassification rateOriginal phenotypeDiverse arrayChartsMisclassificationGenotypesReviewEffects of biasBiasPrediction modelPRSLeveraging error-prone algorithm-derived phenotypes: Enhancing association studies for risk factors in EHR data
Lu Y, Tong J, Chubak J, Lumley T, Hubbard R, Xu H, Chen Y. Leveraging error-prone algorithm-derived phenotypes: Enhancing association studies for risk factors in EHR data. Journal Of Biomedical Informatics 2024, 157: 104690. PMID: 39004110, DOI: 10.1016/j.jbi.2024.104690.Peer-Reviewed Original ResearchElectronic health recordsElectronic health record dataKaiser Permanente WashingtonEHR-derived phenotypesAssociation studiesHealth recordsColon cancer recurrencePhenotyping errorsComputable phenotypeRisk factorsCancer recurrenceMultiple phenotypesReduce biasImprove estimation accuracySimulation studyBias reductionKaiserReduction of biasBiasEstimation accuracyAssociationStudyOutcomesRiskEstimation efficiencyDevelop and validate a computable phenotype for the identification of Alzheimer's disease patients using electronic health record data
He X, Wei R, Huang Y, Chen Z, Lyu T, Bost S, Tong J, Li L, Zhou Y, Li Z, Guo J, Tang H, Wang F, DeKosky S, Xu H, Chen Y, Zhang R, Xu J, Guo Y, Wu Y, Bian J. Develop and validate a computable phenotype for the identification of Alzheimer's disease patients using electronic health record data. Alzheimer's & Dementia Diagnosis Assessment & Disease Monitoring 2024, 16: e12613. PMID: 38966622, PMCID: PMC11220631, DOI: 10.1002/dad2.12613.Peer-Reviewed Original ResearchElectronic health record dataElectronic health recordsComputable phenotypeHealth record dataManual chart reviewHealth recordsAlzheimer's diseaseDiagnosis codesRecord dataChart reviewUTHealthAlzheimer's disease patientsUniversity of MinnesotaAD diagnosisAD identificationDisease patientsPatientsAlzheimerAD patientsDemographicsDiagnosisDiseaseCodeDataUniversityExtracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition
Zuo X, Kumar A, Shen S, Li J, Cong G, Jin E, Chen Q, Warner J, Yang P, Xu H. Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition. JCO Clinical Cancer Informatics 2024, 8: e2300166. PMID: 38885475, DOI: 10.1200/cci.23.00166.Peer-Reviewed Original ResearchConceptsNatural language processingDomain-specific language modelsNatural language processing systemsInformation extraction systemRule-based moduleNarrative clinical textsNLP tasksEntity recognitionText normalizationAssertion classificationLanguage modelInformation extractionClinical textElectronic health recordsLearning-basedClinical notesLanguage processingTest setSystem performanceHealth recordsResponse extractionTime-consumingAnticancer therapyInformationAssessment informationNLP Applications—Other Biomedical Texts
Roberts K, Xu H, Demner Fushman D. NLP Applications—Other Biomedical Texts. Cognitive Informatics In Biomedicine And Healthcare 2024, 429-444. DOI: 10.1007/978-3-031-55865-8_15.Peer-Reviewed Original ResearchIntroduction to Natural Language Processing of Clinical Text
Demner Fushman D, Xu H. Introduction to Natural Language Processing of Clinical Text. Cognitive Informatics In Biomedicine And Healthcare 2024, 3-11. DOI: 10.1007/978-3-031-55865-8_1.Peer-Reviewed Original ResearchNatural language processingLanguage processingComplex language processingBiomedical natural language processingClinical natural language processingLanguage generation tasksClinical language processingBiomedical language processingLanguage modelClinical textGeneration taskMachine learningDelivery of informationClinical languageLanguageMedical Concept Normalization
Xu H, Demner Fushman D, Hong N, Raja K. Medical Concept Normalization. Cognitive Informatics In Biomedicine And Healthcare 2024, 137-164. DOI: 10.1007/978-3-031-55865-8_6.Peer-Reviewed Original ResearchConcept normalizationDeep learning-based techniquesMedical concept normalizationLearning-based techniquesContemporary machine learningRule-based methodologyAnnotated corpusNLP systemsMachine learningComputing applicationsBiomedical terminologiesNormalization approachStandardized terminologyOntologyTaskLearningDevelopment of Clinical NLP Systems
Xu H, Demner Fushman D. Development of Clinical NLP Systems. Cognitive Informatics In Biomedicine And Healthcare 2024, 301-324. DOI: 10.1007/978-3-031-55865-8_11.Peer-Reviewed Original ResearchKamino: A Scalable Architecture to Support Medical AI Research Using Large Real World Data
Lin F, Young P, He H, Huang J, Gagne R, Rice D, Price N, Byron W, Hu Y, Felker D, Button W, Meeker D, Hsiao A, Xu H, Torre C, Schulz W. Kamino: A Scalable Architecture to Support Medical AI Research Using Large Real World Data. 2024, 00: 500-504. DOI: 10.1109/ichi61247.2024.00072.Peer-Reviewed Original ResearchElectronic health recordsAI researchNatural language processing tasksElectronic health record dataLanguage processing tasksComputing resource managementLarge-scale data retrievalMedical AI researchLeveraging electronic health recordsStandard data modelKubernetes orchestratorScalable architectureProcessing tasksResource allocation systemsSecurity considerationsAccess managementData retrievalData modelArchitectural solutionsOMOP CDMReal World DataWorld DataHealth recordsOMOPDataMapping Study Variables to Common Data Elements Using GPT for Sheets: Towards Standardized Data Collection and Sharing
Ram P, Hong N, Xu H, Jiang X. Mapping Study Variables to Common Data Elements Using GPT for Sheets: Towards Standardized Data Collection and Sharing. 2024, 00: 320-325. DOI: 10.1109/ichi61247.2024.00048.Peer-Reviewed Original ResearchLarge language models for biomedicine: foundations, opportunities, challenges, and best practices
Sahoo S, Plasek J, Xu H, Uzuner Ö, Cohen T, Yetisgen M, Liu H, Meystre S, Wang Y. Large language models for biomedicine: foundations, opportunities, challenges, and best practices. Journal Of The American Medical Informatics Association 2024, 31: 2114-2124. PMID: 38657567, PMCID: PMC11339493, DOI: 10.1093/jamia/ocae074.Peer-Reviewed Original ResearchNatural language processingPrompt tuningNLP applicationsLanguage modelState-of-the-art performanceNLP practitionersNatural language processing applicationsBiomedical NLP applicationsPre-training datasetNatural language understandingNeural network architecture modelNatural language generationBiomedical informatics communityNetwork architecture modelAmerican Medical Informatics Association (AMIAPrompt-tuningFew-shotZero-ShotNLP challengeNLP tasksReinforcement learningHuman feedbackLanguage generationLanguage understandingEvaluation metricsRepurposing non-pharmacological interventions for Alzheimer's disease through link prediction on biomedical literature
Xiao Y, Hou Y, Zhou H, Diallo G, Fiszman M, Wolfson J, Zhou L, Kilicoglu H, Chen Y, Su C, Xu H, Mantyh W, Zhang R. Repurposing non-pharmacological interventions for Alzheimer's disease through link prediction on biomedical literature. Scientific Reports 2024, 14: 8693. PMID: 38622164, PMCID: PMC11018822, DOI: 10.1038/s41598-024-58604-8.Peer-Reviewed Original ResearchConceptsAlzheimer's diseaseManual therapy techniquesR-GCNKnowledge graphAD preventionNon-pharmacological interventionsBiomedical literatureGraph convolutional network modelKG embedding modelsTest setLink prediction modelIntegrated healthConvolutional network modelImprove cognitive functionHighest scoring candidatesDomain expertsEmbedding modelNon-pharmaceutical interventionsReal-world data analysisGround truthPrevent ADCognitive functionTherapy techniquesNetwork modelDiscovery patternsEnsemble pretrained language models to extract biomedical knowledge from literature
Li Z, Wei Q, Huang L, Li J, Hu Y, Chuang Y, He J, Das A, Keloth V, Yang Y, Diala C, Roberts K, Tao C, Jiang X, Zheng W, Xu H. Ensemble pretrained language models to extract biomedical knowledge from literature. Journal Of The American Medical Informatics Association 2024, 31: 1904-1911. PMID: 38520725, PMCID: PMC11339500, DOI: 10.1093/jamia/ocae061.Peer-Reviewed Original ResearchNatural language processingNatural language processing systemsLanguage modelExpansion of biomedical literatureZero-shot settingManually annotated corpusKnowledge graph developmentTask-specific modelsDomain-specific modelsZero-ShotEntity recognitionBillion parametersEnsemble learningLocation informationKnowledge basesBiomedical entitiesLanguage processingFree textGraph developmentBiomedical conceptsAutomated techniqueBiomedical literatureDetection methodPredictive performanceBiomedical knowledgeDeveloping deep learning-based strategies to predict the risk of hepatocellular carcinoma among patients with nonalcoholic fatty liver disease from electronic health records
Li Z, Lan L, Zhou Y, Li R, Chavin K, Xu H, Li L, Shih D, Zheng W. Developing deep learning-based strategies to predict the risk of hepatocellular carcinoma among patients with nonalcoholic fatty liver disease from electronic health records. Journal Of Biomedical Informatics 2024, 152: 104626. PMID: 38521180, DOI: 10.1016/j.jbi.2024.104626.Peer-Reviewed Original ResearchDeep learning modelsElectronic health recordsHCC risk predictionHealth recordsTime-varying covariatesLearning modelsElectronic health record dataRisk predictionHealth record dataAccuracy of deep learning modelsDeep learning-based strategyCovariate imbalanceDisease prediction tasksLearning-based strategyDeep learning performanceDisease risk predictionEHR databaseClassification problemLength of follow-upTransfer learningFatty liver diseasePrediction taskCarcinoma riskModel trainingRecord dataAdvancing entity recognition in biomedicine via instruction tuning of large language models
Keloth V, Hu Y, Xie Q, Peng X, Wang Y, Zheng A, Selek M, Raja K, Wei C, Jin Q, Lu Z, Chen Q, Xu H. Advancing entity recognition in biomedicine via instruction tuning of large language models. Bioinformatics 2024, 40: btae163. PMID: 38514400, PMCID: PMC11001490, DOI: 10.1093/bioinformatics/btae163.Peer-Reviewed Original ResearchNamed Entity RecognitionSequence labeling taskNatural language processingBiomedical NER datasetsLanguage modelNER datasetsEntity recognitionLabeling taskText generationField of natural language processingBiomedical NERFew-shot learning capabilityReasoning tasksMulti-domain scenariosDomain-specific modelsEnd-to-endMinimal fine-tuningSOTA performanceF1 scoreHealthcare applicationsBiomedical entitiesBiomedical domainLanguage processingMulti-taskingPubMedBERT modelFedFSA: Hybrid and federated framework for functional status ascertainment across institutions
Fu S, Jia H, Vassilaki M, Keloth V, Dang Y, Zhou Y, Garg M, Petersen R, St Sauver J, Moon S, Wang L, Wen A, Li F, Xu H, Tao C, Fan J, Liu H, Sohn S. FedFSA: Hybrid and federated framework for functional status ascertainment across institutions. Journal Of Biomedical Informatics 2024, 152: 104623. PMID: 38458578, PMCID: PMC11005095, DOI: 10.1016/j.jbi.2024.104623.Peer-Reviewed Original ResearchNatural language processingElectronic health recordsStatus informationInformation extractionFunctional status informationRule-based information extractionFederated learning frameworkPrivate local dataNatural language processing frameworkHealthcare sitesPatient's functional statusMultiple healthcare institutionsFederated learningPyTorch libraryConcept normalizationBERT modelLearning frameworkCollaborative development effortsCorpus annotationLanguage processingHealthcare institutionsFunctional statusPredictor of health outcomesActivities of daily livingNatural language processing performanceA scoping review of fair machine learning techniques when using real-world data
Huang Y, Guo J, Chen W, Lin H, Tang H, Wang F, Xu H, Bian J. A scoping review of fair machine learning techniques when using real-world data. Journal Of Biomedical Informatics 2024, 151: 104622. PMID: 38452862, PMCID: PMC11146346, DOI: 10.1016/j.jbi.2024.104622.Peer-Reviewed Original ResearchConceptsReal-world dataHealth care applicationsHealth care domainMachine learningArtificial intelligenceCare applicationsMulti-modal dataIntegration of artificial intelligenceMachine learning techniquesPre-processing techniquesCare domainBias mitigation approachesPublic datasetsAI/ML modelsModel fairnessLearning techniquesOptimal fairnessHealth care dataAI toolsHealth careAlgorithmic biasML modelsAI/MLFairnessBias issues