2024
Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study
Yang R, Zeng Q, You K, Qiao Y, Huang L, Hsieh C, Rosand B, Goldwasser J, Dave A, Keenan T, Ke Y, Hong C, Liu N, Chew E, Radev D, Lu Z, Xu H, Chen Q, Li I. Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study. Journal Of Medical Internet Research 2024, 26: e60601. PMID: 39361955, PMCID: PMC11487205, DOI: 10.2196/60601.Peer-Reviewed Original ResearchConceptsNatural language processingNatural language processing toolkitQuestion-answering taskLanguage modelText generationText processingDomain-specific language modelsNatural language processing functionsMinimal programming expertiseText generation tasksMedical knowledge graphMachine translation tasksROUGE-L scoreDomain-specific challengesAll-in-one solutionROUGE-LText summarizationBLEU scoreKnowledge graphMachine translationUnstructured textQuestion-answeringHugging FaceProcessing toolkitLanguage processingExtracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition
Zuo X, Kumar A, Shen S, Li J, Cong G, Jin E, Chen Q, Warner J, Yang P, Xu H. Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition. JCO Clinical Cancer Informatics 2024, 8: e2300166. PMID: 38885475, DOI: 10.1200/cci.23.00166.Peer-Reviewed Original ResearchConceptsNatural language processingDomain-specific language modelsNatural language processing systemsInformation extraction systemRule-based moduleNarrative clinical textsNLP tasksEntity recognitionText normalizationAssertion classificationLanguage modelInformation extractionClinical textElectronic health recordsLearning-basedClinical notesLanguage processingTest setSystem performanceHealth recordsResponse extractionTime-consumingAnticancer therapyInformationAssessment informationRelation Extraction
Devarakonda M, Raja K, Xu H. Relation Extraction. Cognitive Informatics In Biomedicine And Healthcare 2024, 101-135. DOI: 10.1007/978-3-031-55865-8_5.Peer-Reviewed Original ResearchNatural language processingBiomedical textExtraction taskElectronic health recordsLanguage processingHealth recordsIntroduction to Natural Language Processing of Clinical Text
Demner Fushman D, Xu H. Introduction to Natural Language Processing of Clinical Text. Cognitive Informatics In Biomedicine And Healthcare 2024, 3-11. DOI: 10.1007/978-3-031-55865-8_1.Peer-Reviewed Original ResearchNatural language processingLanguage processingComplex language processingBiomedical natural language processingClinical natural language processingLanguage generation tasksClinical language processingBiomedical language processingLanguage modelClinical textGeneration taskMachine learningDelivery of informationClinical languageLanguageDevelopment of Clinical NLP Systems
Xu H, Demner Fushman D. Development of Clinical NLP Systems. Cognitive Informatics In Biomedicine And Healthcare 2024, 301-324. DOI: 10.1007/978-3-031-55865-8_11.Peer-Reviewed Original ResearchA Study of Biomedical Relation Extraction Using GPT Models.
Zhang J, Wibert M, Zhou H, Peng X, Chen Q, Keloth V, Hu Y, Zhang R, Xu H, Raja K. A Study of Biomedical Relation Extraction Using GPT Models. AMIA Joint Summits On Translational Science Proceedings 2024, 2024: 391-400. PMID: 38827097, PMCID: PMC11141827.Peer-Reviewed Original ResearchLarge language models for biomedicine: foundations, opportunities, challenges, and best practices
Sahoo S, Plasek J, Xu H, Uzuner Ö, Cohen T, Yetisgen M, Liu H, Meystre S, Wang Y. Large language models for biomedicine: foundations, opportunities, challenges, and best practices. Journal Of The American Medical Informatics Association 2024, 31: 2114-2124. PMID: 38657567, PMCID: PMC11339493, DOI: 10.1093/jamia/ocae074.Peer-Reviewed Original ResearchNatural language processingPrompt tuningNLP applicationsLanguage modelState-of-the-art performanceNLP practitionersNatural language processing applicationsBiomedical NLP applicationsPre-training datasetNatural language understandingNeural network architecture modelNatural language generationBiomedical informatics communityNetwork architecture modelAmerican Medical Informatics Association (AMIAPrompt-tuningFew-shotZero-ShotNLP challengeNLP tasksReinforcement learningHuman feedbackLanguage generationLanguage understandingEvaluation metricsEnsemble pretrained language models to extract biomedical knowledge from literature
Li Z, Wei Q, Huang L, Li J, Hu Y, Chuang Y, He J, Das A, Keloth V, Yang Y, Diala C, Roberts K, Tao C, Jiang X, Zheng W, Xu H. Ensemble pretrained language models to extract biomedical knowledge from literature. Journal Of The American Medical Informatics Association 2024, 31: 1904-1911. PMID: 38520725, PMCID: PMC11339500, DOI: 10.1093/jamia/ocae061.Peer-Reviewed Original ResearchNatural language processingNatural language processing systemsLanguage modelExpansion of biomedical literatureZero-shot settingManually annotated corpusKnowledge graph developmentTask-specific modelsDomain-specific modelsZero-ShotEntity recognitionBillion parametersEnsemble learningLocation informationKnowledge basesBiomedical entitiesLanguage processingFree textGraph developmentBiomedical conceptsAutomated techniqueBiomedical literatureDetection methodPredictive performanceBiomedical knowledgeAdvancing entity recognition in biomedicine via instruction tuning of large language models
Keloth V, Hu Y, Xie Q, Peng X, Wang Y, Zheng A, Selek M, Raja K, Wei C, Jin Q, Lu Z, Chen Q, Xu H. Advancing entity recognition in biomedicine via instruction tuning of large language models. Bioinformatics 2024, 40: btae163. PMID: 38514400, PMCID: PMC11001490, DOI: 10.1093/bioinformatics/btae163.Peer-Reviewed Original ResearchNamed Entity RecognitionSequence labeling taskNatural language processingBiomedical NER datasetsLanguage modelNER datasetsEntity recognitionLabeling taskText generationField of natural language processingBiomedical NERFew-shot learning capabilityReasoning tasksMulti-domain scenariosDomain-specific modelsEnd-to-endMinimal fine-tuningSOTA performanceF1 scoreHealthcare applicationsBiomedical entitiesBiomedical domainLanguage processingMulti-taskingPubMedBERT modelFedFSA: Hybrid and federated framework for functional status ascertainment across institutions
Fu S, Jia H, Vassilaki M, Keloth V, Dang Y, Zhou Y, Garg M, Petersen R, St Sauver J, Moon S, Wang L, Wen A, Li F, Xu H, Tao C, Fan J, Liu H, Sohn S. FedFSA: Hybrid and federated framework for functional status ascertainment across institutions. Journal Of Biomedical Informatics 2024, 152: 104623. PMID: 38458578, PMCID: PMC11005095, DOI: 10.1016/j.jbi.2024.104623.Peer-Reviewed Original ResearchNatural language processingElectronic health recordsStatus informationInformation extractionFunctional status informationRule-based information extractionFederated learning frameworkPrivate local dataNatural language processing frameworkHealthcare sitesPatient's functional statusMultiple healthcare institutionsFederated learningPyTorch libraryConcept normalizationBERT modelLearning frameworkCollaborative development effortsCorpus annotationLanguage processingHealthcare institutionsFunctional statusPredictor of health outcomesActivities of daily livingNatural language processing performancePrompt Tuning in Biomedical Relation Extraction
He J, Li F, Li J, Hu X, Nian Y, Xiang Y, Wang J, Wei Q, Li Y, Xu H, Tao C. Prompt Tuning in Biomedical Relation Extraction. Journal Of Healthcare Informatics Research 2024, 8: 206-224. PMID: 38681754, PMCID: PMC11052745, DOI: 10.1007/s41666-024-00162-9.Peer-Reviewed Original ResearchFew-shot scenariosBiomedical relation extractionNatural language processingBiomedical RERelation extractionPrompt tuningState-of-the-art performanceText mining applicationsTuning modelBioCreative VISemEval-2013Knowledge graphLanguage modelMining applicationsBiomedical textOriginal inputComputational resourcesLanguage processingExternal knowledgeSpecific textsSuperior performanceDatasetEfficient approachTaskModel performance
2023
Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach
Hu Y, Keloth V, Raja K, Chen Y, Xu H. Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach. Bioinformatics 2023, 39: btad542. PMID: 37669123, PMCID: PMC10500081, DOI: 10.1093/bioinformatics/btad542.Peer-Reviewed Original ResearchNatural language processingMicro-F1 scoreCOVID-19 datasetNLP pipelineF1 scoreEntity recognition modelAD datasetPICO elementsSentence classificationNER modelRecognition modelLanguage processingLearning approachLearning modelEnd evaluationSupplementary dataDatasetPipelineExtractionInformationRCT abstractsAnnotationSentencesBioinformaticsComplexityDevelopment of a Natural Language Processing Tool to Extract Acupuncture Point Location Terms
Li Y, Peng X, Li J, Peng S, Pei D, Tao C, Xu H, Hong N. Development of a Natural Language Processing Tool to Extract Acupuncture Point Location Terms. 2023, 00: 344-351. DOI: 10.1109/ichi57859.2023.00053.Peer-Reviewed Original ResearchAcupuncture point locationsNatural language processingRecurrent neural networkConditional random fieldWorld Health OrganizationWorld Health Organization standardsNatural language processing toolsEffect of acupuncture therapyLocation informationAcupuncture researchAcupuncture therapyAcupoint locationRecurrent neural network modelDictionary lookup methodNatural language processing modelsDeep learning techniquesAcupunctureLanguage processing toolsWestern Pacific RegionFree-text formatInternational anatomical terminologyHealth OrganizationF1 scoreInformatics applicationsNeural networkRepresenting and utilizing clinical textual data for real world studies: An OHDSI approach
Keloth V, Banda J, Gurley M, Heider P, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves R, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei W, Williams A, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. Journal Of Biomedical Informatics 2023, 142: 104343. PMID: 36935011, PMCID: PMC10428170, DOI: 10.1016/j.jbi.2023.104343.Peer-Reviewed Original ResearchConceptsNatural language processingCommon data modelTextual dataNLP solutionObservational Health Data SciencesOMOP Common Data ModelSpecific use casesObservational Medical Outcomes Partnership Common Data ModelHealth Data SciencesRepresentation of informationUse casesElectronic health recordsReal-world evidence generationData scienceClinical textData modelClinical notesLanguage processingHealth recordsLoad dataClinical documentationCurrent applicationsInformationWorkflowEvidence generation
2022
Natural Language Processing
Xu H, Roberts K. Natural Language Processing. Cognitive Informatics In Biomedicine And Healthcare 2022, 213-234. DOI: 10.1007/978-3-031-09108-7_7.Peer-Reviewed Original ResearchNatural language processingLanguage processingElectronic health recordsBiomedical domainBiomedical natural language processingCommon NLP tasksNarrative textNLP tasksBiomedical articlesClinical documentsNLP fieldTextHealth recordsLarge amountBasic conceptsBibliographic databasesProcessingTaskArticleDocumentsDomainChapterDatabaseInformationAttentionEvaluation of mCODE Coverage in EHR: a Scoping Review of Cancer Natural Language Processing
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz I, Wang N, Yang P, Xu H, Warner J, Liu H. Evaluation of mCODE Coverage in EHR: a Scoping Review of Cancer Natural Language Processing. 2022, 00: 517-518. DOI: 10.1109/ichi54592.2022.00094.Peer-Reviewed Original ResearchImproving Pharmacovigilance Signal Detection from Clinical Notes with Locality Sensitive Neural Concept Embeddings.
Mower J, Bernstam E, Xu H, Myneni S, Subramanian D, Cohen T. Improving Pharmacovigilance Signal Detection from Clinical Notes with Locality Sensitive Neural Concept Embeddings. AMIA Joint Summits On Translational Science Proceedings 2022, 2022: 349-358. PMID: 35854716, PMCID: PMC9285153.Peer-Reviewed Original ResearchNatural language processingClinical notesRetrieval tasksConcept embeddingsNeural embeddingsLeverage informationLanguage processingEmbedding methodPharmacovigilance signal detectionADR signalsInherent complexityEmbeddingSignal detectionSignal recoveryAdverse drug reactionsStatistical measuresInformationDetectionCombining human and machine intelligence for clinical trial eligibility querying
Fang Y, Idnay B, Sun Y, Liu H, Chen Z, Marder K, Xu H, Schnall R, Weng C. Combining human and machine intelligence for clinical trial eligibility querying. Journal Of The American Medical Informatics Association 2022, 29: 1161-1171. PMID: 35426943, PMCID: PMC9196697, DOI: 10.1093/jamia/ocac051.Peer-Reviewed Original ResearchConceptsNegation scope detectionCohort queriesScope detectionHealth Information Technology Usability Evaluation ScaleHuman-computer collaborationValue normalizationNatural language processingMachine intelligenceDomain expertsEligibility criteria textUsability evaluationLearnability scoreF1 scoreUser interventionLanguage processingHuman intelligenceUsability scoreQueriesError correctionEngagement featuresIntelligenceDisease trialsFrequent modificationsEnhanced modulesCOVID-19 clinical trials
2021
A Discrete Joint Model for Entity and Relation Extraction from Clinical Notes.
Ji Z, Ghiasvand O, Wu S, Xu H. A Discrete Joint Model for Entity and Relation Extraction from Clinical Notes. AMIA Joint Summits On Translational Science Proceedings 2021, 2021: 315-324. PMID: 34457146, PMCID: PMC8378610.Peer-Reviewed Original ResearchConceptsRelation classificationPipeline architectureClinical natural language processingNatural language processingEntity recognitionBeam searchRelation extractionClinical notesLanguage processingClassification stepEntity pairsStructured perceptronFundamental taskClinical narrativesTraditional solutionsRecognition stepError propagationArchitectureJoint modelTaskSubtasksPerceptronClinical conceptsEntitiesClassification
2020
Relation Extraction from Clinical Narratives Using Pre-trained Language Models.
Wei Q, Ji Z, Si Y, Du J, Wang J, Tiryaki F, Wu S, Tao C, Roberts K, Xu H. Relation Extraction from Clinical Narratives Using Pre-trained Language Models. AMIA Annual Symposium Proceedings 2020, 2019: 1236-1245. PMID: 32308921, PMCID: PMC7153059.Peer-Reviewed Original ResearchConceptsPre-trained language modelsNatural language processingLanguage modelRE tasksNLP tasksClinical narrativesRecent deep learning methodsDeep learning methodsClinical NLP tasksRelation extraction taskTraditional word embeddingsTraditional machineExtraction taskArt performanceRelation extractionBERT modelLanguage processingLearning methodsWord embeddingsShared TaskPrevious stateBiomedical literatureDifferent implementationsTaskOpen domain