2024
Augmenting biomedical named entity recognition with general-domain resources
Yin Y, Kim H, Xiao X, Wei C, Kang J, Lu Z, Xu H, Fang M, Chen Q. Augmenting biomedical named entity recognition with general-domain resources. Journal Of Biomedical Informatics 2024, 159: 104731. PMID: 39368529, DOI: 10.1016/j.jbi.2024.104731.Peer-Reviewed Original ResearchBioNER datasetsMulti-task learningNER datasetsEntity typesBiomedical datasetsBaseline modelGeneral domain datasetsBiomedical language modelNeural network-basedYield performance improvementsBioNER modelsEntity recognitionBiomedical corporaHuman annotatorsLabel ambiguityLanguage modelTransfer learningF1 scoreBioNERHuman effortNetwork-basedBiomedical resourcesPerformance improvementDatasetSuperior performanceExtracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition
Zuo X, Kumar A, Shen S, Li J, Cong G, Jin E, Chen Q, Warner J, Yang P, Xu H. Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition. JCO Clinical Cancer Informatics 2024, 8: e2300166. PMID: 38885475, DOI: 10.1200/cci.23.00166.Peer-Reviewed Original ResearchConceptsNatural language processingDomain-specific language modelsNatural language processing systemsInformation extraction systemRule-based moduleNarrative clinical textsNLP tasksEntity recognitionText normalizationAssertion classificationLanguage modelInformation extractionClinical textElectronic health recordsLearning-basedClinical notesLanguage processingTest setSystem performanceHealth recordsResponse extractionTime-consumingAnticancer therapyInformationAssessment informationNamed Entity Recognition
Devarakonda M, Raja K, Xu H. Named Entity Recognition. Cognitive Informatics In Biomedicine And Healthcare 2024, 79-99. DOI: 10.1007/978-3-031-55865-8_4.Peer-Reviewed Original ResearchEnsemble pretrained language models to extract biomedical knowledge from literature
Li Z, Wei Q, Huang L, Li J, Hu Y, Chuang Y, He J, Das A, Keloth V, Yang Y, Diala C, Roberts K, Tao C, Jiang X, Zheng W, Xu H. Ensemble pretrained language models to extract biomedical knowledge from literature. Journal Of The American Medical Informatics Association 2024, 31: 1904-1911. PMID: 38520725, PMCID: PMC11339500, DOI: 10.1093/jamia/ocae061.Peer-Reviewed Original ResearchNatural language processingNatural language processing systemsLanguage modelExpansion of biomedical literatureZero-shot settingManually annotated corpusKnowledge graph developmentTask-specific modelsDomain-specific modelsZero-ShotEntity recognitionBillion parametersEnsemble learningLocation informationKnowledge basesBiomedical entitiesLanguage processingFree textGraph developmentBiomedical conceptsAutomated techniqueBiomedical literatureDetection methodPredictive performanceBiomedical knowledgeAdvancing entity recognition in biomedicine via instruction tuning of large language models
Keloth V, Hu Y, Xie Q, Peng X, Wang Y, Zheng A, Selek M, Raja K, Wei C, Jin Q, Lu Z, Chen Q, Xu H. Advancing entity recognition in biomedicine via instruction tuning of large language models. Bioinformatics 2024, 40: btae163. PMID: 38514400, PMCID: PMC11001490, DOI: 10.1093/bioinformatics/btae163.Peer-Reviewed Original ResearchNamed Entity RecognitionSequence labeling taskNatural language processingBiomedical NER datasetsLanguage modelNER datasetsEntity recognitionLabeling taskText generationField of natural language processingBiomedical NERFew-shot learning capabilityReasoning tasksMulti-domain scenariosDomain-specific modelsEnd-to-endMinimal fine-tuningSOTA performanceF1 scoreHealthcare applicationsBiomedical entitiesBiomedical domainLanguage processingMulti-taskingPubMedBERT modelImproving large language models for clinical named entity recognition via prompt engineering
Hu Y, Chen Q, Du J, Peng X, Keloth V, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. Journal Of The American Medical Informatics Association 2024, 31: 1812-1820. PMID: 38281112, PMCID: PMC11339492, DOI: 10.1093/jamia/ocad259.Peer-Reviewed Original ResearchClinical NER tasksNER taskTask-specific promptsEntity recognitionLanguage modelTraining samplesState-of-the-art modelsFew-shot learningState-of-the-artMinimal training dataTask-specific knowledgeF1-socreAnnotated samplesConcept extractionModel performanceAnnotated datasetsTraining dataF1 scoreTask descriptionFormat specificationsComplex clinical dataOptimal performanceTaskEvaluation schemaGPT model
2022
A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora
Li J, Wei Q, Ghiasvand O, Chen M, Lobanov V, Weng C, Xu H. A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora. BMC Medical Informatics And Decision Making 2022, 22: 235. PMID: 36068551, PMCID: PMC9450226, DOI: 10.1186/s12911-022-01967-7.Peer-Reviewed Original ResearchConceptsPre-trained language modelsNER taskUnstructured textEntity recognitionLanguage modelNatural language processing techniquesClinical trial eligibility criteriaLanguage processing techniquesData augmentation resultsData augmentation approachDomain-specific corpusBetter performanceTransformer modelCross-validation showMultiple data sourcesEligibility criteria textBiomedical domainEmbedding modelsNER performanceAugmentation approachContextual embeddingsMeaningful informationEvaluation resultsSuch documentsProcessing techniques
2021
From Tokenization to Self-Supervision: Building a High-Performance Information Extraction System for Chemical Reactions in Patents
Wang J, Ren Y, Zhang Z, Xu H, Zhang Y. From Tokenization to Self-Supervision: Building a High-Performance Information Extraction System for Chemical Reactions in Patents. Frontiers In Research Metrics And Analytics 2021, 6: 691105. PMID: 35005421, PMCID: PMC8727901, DOI: 10.3389/frma.2021.691105.Peer-Reviewed Original ResearchEvent extractionEntity recognitionNatural language processing techniquesAccurate information extractionInformation extraction systemLanguage processing techniquesKnowledge-based rulesInformation extractionAutomatic toolEnd systemArt resultsSemantic rolesLanguage modelSelf-SupervisionFree textChemical patentsSubtask 1Reaction extractionDifferent semantic rolesHybrid approachEvent triggersProcessing techniquesSubtasksTokenizationHigh performanceA Discrete Joint Model for Entity and Relation Extraction from Clinical Notes.
Ji Z, Ghiasvand O, Wu S, Xu H. A Discrete Joint Model for Entity and Relation Extraction from Clinical Notes. AMIA Joint Summits On Translational Science Proceedings 2021, 2021: 315-324. PMID: 34457146, PMCID: PMC8378610.Peer-Reviewed Original ResearchConceptsRelation classificationPipeline architectureClinical natural language processingNatural language processingEntity recognitionBeam searchRelation extractionClinical notesLanguage processingClassification stepEntity pairsStructured perceptronFundamental taskClinical narrativesTraditional solutionsRecognition stepError propagationArchitectureJoint modelTaskSubtasksPerceptronClinical conceptsEntitiesClassificationExtracting postmarketing adverse events from safety reports in the vaccine adverse event reporting system (VAERS) using deep learning
Du J, Xiang Y, Sankaranarayanapillai M, Zhang M, Wang J, Si Y, Pham H, Xu H, Chen Y, Tao C. Extracting postmarketing adverse events from safety reports in the vaccine adverse event reporting system (VAERS) using deep learning. Journal Of The American Medical Informatics Association 2021, 28: 1393-1400. PMID: 33647938, PMCID: PMC8279785, DOI: 10.1093/jamia/ocab014.Peer-Reviewed Original ResearchConceptsDeep learning algorithmsLearning-based methodsVaccine Adverse Event Reporting SystemLearning algorithmArt deep learning algorithmsDeep learning-based methodsConventional machine learning-based methodsMachine learning-based methodsConventional machine learningAdverse Event Reporting SystemGuillain-Barré syndromeLarge modelsAdverse eventsEvent Reporting SystemVAERS reportsDeep learningMachine learningEntity recognitionPeer modelInfluenza vaccine safetyNervous system disordersExact matchVaccine adverse eventsSafety reportsReporting system
2020
Named Entity Recognition from Table Headers in Randomized Controlled Trial Articles
Wei Q, Zhou Y, Zhao B, Hu X, Mei Q, Tao C, Xu H. Named Entity Recognition from Table Headers in Randomized Controlled Trial Articles. 2020, 00: 1-2. DOI: 10.1109/ichi48887.2020.9374323.Peer-Reviewed Original ResearchTable headersEntity recognitionDeep learning-based approachBiomedical text miningLearning-based approachNamed Entity RecognitionInformation extractionBiomedical entitiesF1 scoreText miningUnstructured natureBiomedical articlesContextual informationComputational applicationsHeaderSemantic complexityBetter performanceCorpusRecognitionInformationMiningApplicationsImportant informationComplexityBiomedical research
2019
Applying a deep learning-based sequence labeling approach to detect attributes of medical concepts in clinical text
Xu J, Li Z, Wei Q, Wu Y, Xiang Y, Lee H, Zhang Y, Wu S, Xu H. Applying a deep learning-based sequence labeling approach to detect attributes of medical concepts in clinical text. BMC Medical Informatics And Decision Making 2019, 19: 236. PMID: 31801529, PMCID: PMC6894107, DOI: 10.1186/s12911-019-0937-2.Peer-Reviewed Original ResearchConceptsSequence labeling approachMedical conceptsEntity recognitionRelation classificationClinical textDetection taskBidirectional long short-term memory networkLong short-term memory networkShort-term memory networkConditional Random FieldsSequence labeling problemTraditional methodsNLP applicationsBi-LSTMNeural architectureLabeling problemLabeling approachMemory networkNovel solutionRandom fieldsHigh accuracyEfficient wayTaskAttributesClassificationDeep learning in clinical natural language processing: a methodical review
Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, Soni S, Wang Q, Wei Q, Xiang Y, Zhao B, Xu H. Deep learning in clinical natural language processing: a methodical review. Journal Of The American Medical Informatics Association 2019, 27: 457-470. PMID: 31794016, PMCID: PMC7025365, DOI: 10.1093/jamia/ocz200.Peer-Reviewed Original ResearchConceptsNatural language processingClinical natural language processingDeep learningLanguage processingComputing Machinery Digital LibraryInformation extraction tasksMedical informatics communityComputational Linguistics anthologyRecurrent neural networkDigital librariesText classificationElectronic health recordsExtraction taskEntity recognitionWord2vec embeddingsNeural networkRelation extractionNLP communityNLP researchInformatics communitySpecific tasksHealth recordsNLP problemLearningClinical domainsExtracting entities with attributes in clinical text via joint deep learning
Shi X, Yi Y, Xiong Y, Tang B, Chen Q, Wang X, Ji Z, Zhang Y, Xu H. Extracting entities with attributes in clinical text via joint deep learning. Journal Of The American Medical Informatics Association 2019, 26: 1584-1591. PMID: 31550346, PMCID: PMC7647140, DOI: 10.1093/jamia/ocz158.Peer-Reviewed Original ResearchConceptsBidirectional long short-term memoryShort-term memoryLong short-term memoryNatural language processingEntity recognitionChinese corpusBest F1English corpusLanguage processingJoint deep learningTaskConditional Random FieldsRelation extractionAttribute recognitionMemorySequential subtasksDeep learning methodsClinical textCost-aware active learning for named entity recognition in clinical text
Wei Q, Chen Y, Salimi M, Denny J, Mei Q, Lasko T, Chen Q, Wu S, Franklin A, Cohen T, Xu H. Cost-aware active learning for named entity recognition in clinical text. Journal Of The American Medical Informatics Association 2019, 26: 1314-1322. PMID: 31294792, PMCID: PMC6798575, DOI: 10.1093/jamia/ocz102.Peer-Reviewed Original ResearchConceptsAnnotation costUser studyActive learningAL methodsAL algorithmCost-CAUSEReal-world environmentsAnnotation taskAnnotation timeAnnotation accuracyEntity recognitionClinical textAnnotation dataPassive learningInformative examplesCurve scoreMost approachesSimulation areaUsersSyntactic featuresLearningCost measuresAlgorithmCostAnnotationA study of deep learning approaches for medication and adverse drug event extraction from clinical text
Wei Q, Ji Z, Li Z, Du J, Wang J, Xu J, Xiang Y, Tiryaki F, Wu S, Zhang Y, Tao C, Xu H. A study of deep learning approaches for medication and adverse drug event extraction from clinical text. Journal Of The American Medical Informatics Association 2019, 27: 13-21. PMID: 31135882, PMCID: PMC6913210, DOI: 10.1093/jamia/ocz063.Peer-Reviewed Original ResearchConceptsDeep learning-based approachDeep learning approachLearning-based approachTraditional machineLearning approachNational NLP Clinical ChallengesAdverse drug event extractionOutperform traditional machineDifferent ensemble approachesConditional Random FieldsSequence labeling approachMIMIC-III databaseEvent extractionMedical domainEntity recognitionClassification componentF1 scoreClinical textRelation extractionClinical documentsVector machineEnd evaluationEnsemble approachClinical corpusMachine
2018
Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.
Wu Y, Yang X, Bian J, Guo Y, Xu H, Hogan W. Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition. AMIA Annual Symposium Proceedings 2018, 2018: 1110-1117. PMID: 30815153, PMCID: PMC6371322.Peer-Reviewed Original ResearchConceptsRecurrent neural networkWord embeddingsOne-hot vectorsWord representationsLow-frequency wordsOnly word embeddingsClinical Named Entity RecognitionClinical NER tasksWord embedding methodsConditional Random FieldsStatistical language modelNamed Entity RecognitionUnlabeled corpusLanguage modelLanguage systemNER taskDecent representationFactual medical knowledgeImportant wordsDeep learning modelsEntity recognitionClinical corpusNamed Entity Recognition SystemArt performanceFeature representationClinical Named Entity Recognition Using Deep Learning Models.
Wu Y, Jiang M, Xu J, Zhi D, Xu H. Clinical Named Entity Recognition Using Deep Learning Models. AMIA Annual Symposium Proceedings 2018, 2017: 1812-1819. PMID: 29854252, PMCID: PMC5977567.Peer-Reviewed Original ResearchConceptsClinical Named Entity RecognitionNamed Entity RecognitionDeep learning modelsConvolutional neural networkClinical NER systemRecurrent neural networkNeural networkLearning modelEntity recognitionRNN modelNER systemDeep neural network architecturePopular deep learning architecturesNatural language processing tasksUnsupervised learning featuresConditional random field modelAutomatic feature learningDeep learning architectureClinical NER tasksDeep neural networksNeural network architectureClinical concept extractionLanguage processing tasksFeature learningLearning architecture
2017
CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines
Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H, Xu H. CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines. Journal Of The American Medical Informatics Association 2017, 25: 331-336. PMID: 29186491, PMCID: PMC7378877, DOI: 10.1093/jamia/ocx132.Peer-Reviewed Original ResearchGraphic user interfaceUser interfaceUser-friendly graphic user interfaceNatural language processing systemsClinical natural language processing (NLP) systemsNatural language processing pipelineKnowledge Extraction SystemLanguage processing pipelineClinical Text AnalysisLanguage processing systemNLP componentsNLP toolkitInformation extractionNLP pipelineUse casesEntity recognitionClinical textEnd usersNLP communityProcessing pipelineProcessing systemIndividual tasksIndividual applicationsText analysisBetter performanceEntity recognition from clinical texts via recurrent neural network
Liu Z, Yang M, Wang X, Chen Q, Tang B, Wang Z, Xu H. Entity recognition from clinical texts via recurrent neural network. BMC Medical Informatics And Decision Making 2017, 17: 67. PMID: 28699566, PMCID: PMC5506598, DOI: 10.1186/s12911-017-0468-7.Peer-Reviewed Original ResearchConceptsRecurrent neural networkNatural language processingEntity recognitionClinical textTraditional machineNeural networkClinical natural language processingMedical concept extractionHand-crafted featuresClinical entity recognitionDeep learning methodsClinical event detectionConditional Random FieldsSupport vector machineI2b2 NLP challengePerformance of LSTMTypes of entitiesClinical domainsContext informationFeature engineeringConcept extractionDe-identificationEvent detectionKnowledge basesLSTM layers