2024
Medical Concept Normalization
Xu H, Demner Fushman D, Hong N, Raja K. Medical Concept Normalization. Cognitive Informatics In Biomedicine And Healthcare 2024, 137-164. DOI: 10.1007/978-3-031-55865-8_6.Peer-Reviewed Original ResearchConcept normalizationDeep learning-based techniquesMedical concept normalizationLearning-based techniquesContemporary machine learningRule-based methodologyAnnotated corpusNLP systemsMachine learningComputing applicationsBiomedical terminologiesNormalization approachStandardized terminologyOntologyTaskLearningPrompt Tuning in Biomedical Relation Extraction
He J, Li F, Li J, Hu X, Nian Y, Xiang Y, Wang J, Wei Q, Li Y, Xu H, Tao C. Prompt Tuning in Biomedical Relation Extraction. Journal Of Healthcare Informatics Research 2024, 8: 206-224. PMID: 38681754, PMCID: PMC11052745, DOI: 10.1007/s41666-024-00162-9.Peer-Reviewed Original ResearchFew-shot scenariosBiomedical relation extractionNatural language processingBiomedical RERelation extractionPrompt tuningState-of-the-art performanceText mining applicationsTuning modelBioCreative VISemEval-2013Knowledge graphLanguage modelMining applicationsBiomedical textOriginal inputComputational resourcesLanguage processingExternal knowledgeSpecific textsSuperior performanceDatasetEfficient approachTaskModel performanceImproving large language models for clinical named entity recognition via prompt engineering
Hu Y, Chen Q, Du J, Peng X, Keloth V, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. Journal Of The American Medical Informatics Association 2024, 31: 1812-1820. PMID: 38281112, PMCID: PMC11339492, DOI: 10.1093/jamia/ocad259.Peer-Reviewed Original ResearchClinical NER tasksNER taskTask-specific promptsEntity recognitionLanguage modelTraining samplesState-of-the-art modelsFew-shot learningState-of-the-artMinimal training dataTask-specific knowledgeF1-socreAnnotated samplesConcept extractionModel performanceAnnotated datasetsTraining dataF1 scoreTask descriptionFormat specificationsComplex clinical dataOptimal performanceTaskEvaluation schemaGPT model
2022
ClinicalLayoutLM: A Pre-trained Multi-modal Model for Understanding Scanned Document in Electronic Health Records
Wei Q, Zuo X, Anjum O, Hu Y, Denlinger R, Bernstam E, Citardi M, Xu H. ClinicalLayoutLM: A Pre-trained Multi-modal Model for Understanding Scanned Document in Electronic Health Records. 2022, 00: 2821-2827. DOI: 10.1109/bigdata55660.2022.10020569.Peer-Reviewed Original ResearchOptical character recognitionMulti-modal modelElectronic health recordsClinical documentsNatural language processing tasksInformation extraction technologyPre-trained modelsHealth recordsLanguage processing tasksInformation extractionImage informationF1 scoreCharacter recognitionLayout analysisProcessing tasksMulti-modal approachClinical corpusBaseline modelDocumentsOpen domainTaskExtraction technologyClinical operationsDifferent categoriesTextNatural Language Processing
Xu H, Roberts K. Natural Language Processing. Cognitive Informatics In Biomedicine And Healthcare 2022, 213-234. DOI: 10.1007/978-3-031-09108-7_7.Peer-Reviewed Original ResearchNatural language processingLanguage processingElectronic health recordsBiomedical domainBiomedical natural language processingCommon NLP tasksNarrative textNLP tasksBiomedical articlesClinical documentsNLP fieldTextHealth recordsLarge amountBasic conceptsBibliographic databasesProcessingTaskArticleDocumentsDomainChapterDatabaseInformationAttentionDiscovering novel drug-supplement interactions using SuppKG generated from the biomedical literature
Schutte D, Vasilakes J, Bompelli A, Zhou Y, Fiszman M, Xu H, Kilicoglu H, Bishop J, Adam T, Zhang R. Discovering novel drug-supplement interactions using SuppKG generated from the biomedical literature. Journal Of Biomedical Informatics 2022, 131: 104120. PMID: 35709900, PMCID: PMC9335448, DOI: 10.1016/j.jbi.2022.104120.Peer-Reviewed Original ResearchConceptsUnified Medical Language SystemComprehensive knowledge graphDomain terminologyKnowledge graphSemantic relationsNatural language processing technologyLanguage processing technologyNLP toolsDownstream tasksF1 scoreSemantic relationshipsDiscovery patternsPubMed abstractsLimited coverageBiomedical literatureProcessing technologyLanguage systemSemRepDietary supplement informationManual reviewNovel methodologyGraphNodesDomainTask
2021
A Discrete Joint Model for Entity and Relation Extraction from Clinical Notes.
Ji Z, Ghiasvand O, Wu S, Xu H. A Discrete Joint Model for Entity and Relation Extraction from Clinical Notes. AMIA Joint Summits On Translational Science Proceedings 2021, 2021: 315-324. PMID: 34457146, PMCID: PMC8378610.Peer-Reviewed Original ResearchConceptsRelation classificationPipeline architectureClinical natural language processingNatural language processingEntity recognitionBeam searchRelation extractionClinical notesLanguage processingClassification stepEntity pairsStructured perceptronFundamental taskClinical narrativesTraditional solutionsRecognition stepError propagationArchitectureJoint modelTaskSubtasksPerceptronClinical conceptsEntitiesClassification
2020
Relation Extraction from Clinical Narratives Using Pre-trained Language Models.
Wei Q, Ji Z, Si Y, Du J, Wang J, Tiryaki F, Wu S, Tao C, Roberts K, Xu H. Relation Extraction from Clinical Narratives Using Pre-trained Language Models. AMIA Annual Symposium Proceedings 2020, 2019: 1236-1245. PMID: 32308921, PMCID: PMC7153059.Peer-Reviewed Original ResearchConceptsPre-trained language modelsNatural language processingLanguage modelRE tasksNLP tasksClinical narrativesRecent deep learning methodsDeep learning methodsClinical NLP tasksRelation extraction taskTraditional word embeddingsTraditional machineExtraction taskArt performanceRelation extractionBERT modelLanguage processingLearning methodsWord embeddingsShared TaskPrevious stateBiomedical literatureDifferent implementationsTaskOpen domain
2019
Applying a deep learning-based sequence labeling approach to detect attributes of medical concepts in clinical text
Xu J, Li Z, Wei Q, Wu Y, Xiang Y, Lee H, Zhang Y, Wu S, Xu H. Applying a deep learning-based sequence labeling approach to detect attributes of medical concepts in clinical text. BMC Medical Informatics And Decision Making 2019, 19: 236. PMID: 31801529, PMCID: PMC6894107, DOI: 10.1186/s12911-019-0937-2.Peer-Reviewed Original ResearchConceptsSequence labeling approachMedical conceptsEntity recognitionRelation classificationClinical textDetection taskBidirectional long short-term memory networkLong short-term memory networkShort-term memory networkConditional Random FieldsSequence labeling problemTraditional methodsNLP applicationsBi-LSTMNeural architectureLabeling problemLabeling approachMemory networkNovel solutionRandom fieldsHigh accuracyEfficient wayTaskAttributesClassificationExtracting entities with attributes in clinical text via joint deep learning
Shi X, Yi Y, Xiong Y, Tang B, Chen Q, Wang X, Ji Z, Zhang Y, Xu H. Extracting entities with attributes in clinical text via joint deep learning. Journal Of The American Medical Informatics Association 2019, 26: 1584-1591. PMID: 31550346, PMCID: PMC7647140, DOI: 10.1093/jamia/ocz158.Peer-Reviewed Original ResearchConceptsBidirectional long short-term memoryShort-term memoryLong short-term memoryNatural language processingEntity recognitionChinese corpusBest F1English corpusLanguage processingJoint deep learningTaskConditional Random FieldsRelation extractionAttribute recognitionMemorySequential subtasksDeep learning methodsClinical textEnhancing clinical concept extraction with contextual embeddings
Si Y, Wang J, Xu H, Roberts K. Enhancing clinical concept extraction with contextual embeddings. Journal Of The American Medical Informatics Association 2019, 26: 1297-1304. PMID: 31265066, PMCID: PMC6798561, DOI: 10.1093/jamia/ocz096.Peer-Reviewed Original ResearchConceptsClinical concept extractionContextual embeddingsNatural language processing tasksTraditional word embeddingsTraditional word representationsClinical NLP tasksLanguage processing tasksSemantic informationWord embedding methodsLarge language modelsArt performanceConcept extraction taskSemEval 2014Word representationsNLP tasksLanguage modelWord embeddingsProcessing tasksNeural network-based representationI2b2 2010Concept extractionTaskLarge clinical corpusClinical corpusNetwork-based representation
2018
Clinical text annotation - what factors are associated with the cost of time?
Wei Q, Franklin A, Cohen T, Xu H. Clinical text annotation - what factors are associated with the cost of time? AMIA Annual Symposium Proceedings 2018, 2018: 1552-1560. PMID: 30815201, PMCID: PMC6371268.Peer-Reviewed Original ResearchConceptsAnnotation timeClinical textNatural language processing modelsClinical corpusIndividual user behaviorEntity recognition taskLanguage processing modelsPractice of annotationCharacteristics of sentencesClinical Text AnnotationText annotationsUser behaviorIndividual usersCost of timeActive learning researchRecognition taskLearning researchProcessing modelCost modelAnnotationUsersLimited workCorpusTextTask
2017
Lightweight predicate extraction for patient-level cancer information and ontology development
Amith M, Song H, Zhang Y, Xu H, Tao C. Lightweight predicate extraction for patient-level cancer information and ontology development. BMC Medical Informatics And Decision Making 2017, 17: 73. PMID: 28699547, PMCID: PMC5506564, DOI: 10.1186/s12911-017-0465-x.Peer-Reviewed Original ResearchConceptsOntological knowledgebaseKnowledge triplesInformation extraction toolsDevelopment of ontologiesNatural language domainRDF representationSoftware libraryOntology developmentCustom applicationsOntologyDevelopment processExtraction toolAccurate extractionPublic health domainKnowledgebaseTextual sourcesTriplesKnowledgebasesHealth domainsToolExtractionTaskMethodsThis paperMedlinePlusDomainA hybrid approach to automatic de-identification of psychiatric notes
Lee H, Wu Y, Zhang Y, Xu J, Xu H, Roberts K. A hybrid approach to automatic de-identification of psychiatric notes. Journal Of Biomedical Informatics 2017, 75: s19-s27. PMID: 28602904, PMCID: PMC5705430, DOI: 10.1016/j.jbi.2017.06.006.Peer-Reviewed Original ResearchConceptsPsychiatric notesCEGS N-GRIDNatural language processing systemsRule-based componentTask Track 1Language processing systemRule-based approachDe-identificationDomain adaptationRich featuresProcessing systemHybrid approachN gridTrack 1Clinical dataTest setSystem performanceMachineHealth informationHybrid systemSystemClinical applicationTaskInformationDataKnowledge-Based Approach for Named Entity Recognition in Biomedical Literature: A Use Case in Biomedical Software Identification
Amith M, Zhang Y, Xu H, Tao C. Knowledge-Based Approach for Named Entity Recognition in Biomedical Literature: A Use Case in Biomedical Software Identification. Lecture Notes In Computer Science 2017, 10351: 386-395. DOI: 10.1007/978-3-319-60045-1_40.Peer-Reviewed Original ResearchEntity recognitionNatural language processingContextual semantic informationNamed Entity RecognitionEntity recognition methodFeatures of ontologyMachine learning approachesKnowledge-based approachSoftware entitiesSoftware namesInformation extractionUse casesBiomedical softwareSemantic informationSoftware identificationLanguage processingRecognition methodLearning approachBiomedical literatureRecognitionOntologyEntitiesSoftwareResearch abstractsTaskInformation retrieval for biomedical datasets: the 2016 bioCADDIE dataset retrieval challenge
Roberts K, Gururaj A, Chen X, Pournejati S, Hersh W, Demner-Fushman D, Ohno-Machado L, Cohen T, Xu H. Information retrieval for biomedical datasets: the 2016 bioCADDIE dataset retrieval challenge. Database 2017, 2017: bax068. DOI: 10.1093/database/bax068.Peer-Reviewed Original ResearchBiomedical datasetsRetrieval challengesInformation retrieval techniquesAdvanced query processingBiomedical data repositoriesAdvanced retrieval methodsQuery processingInformation retrievalTest queriesRetrieval systemRank frameworkRetrieval approachRetrieval techniquesData repositoryRetrieval methodTop precisionDatasetQueriesRepositoryChallengesRetrievalTaskLearningSystemCorpus
2014
Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks
Tang B, Cao H, Wang X, Chen Q, Xu H. Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks. BioMed Research International 2014, 2014: 240403. PMID: 24729964, PMCID: PMC3963372, DOI: 10.1155/2014/240403.Peer-Reviewed Original ResearchConceptsBiomedical Named Entity RecognitionWord representationsNamed Entity Recognition (NER) taskMachine learning-based approachWord representation featuresNatural language processingLearning-based approachEntity recognition taskNamed Entity RecognitionCluster-based representationJNLPBA corpusEntity recognitionBiomedical domainF-measureLanguage processingRepresentation featuresWord embeddingsRecognition taskWR algorithmDistributional representationsTaskBetter performanceAlgorithmRepresentationDifferent types
2012
A study of transportability of an existing smoking status detection module across institutions.
Liu M, Shah A, Jiang M, Peterson N, Dai Q, Aldrich M, Chen Q, Bowton E, Liu H, Denny J, Xu H. A study of transportability of an existing smoking status detection module across institutions. AMIA Annual Symposium Proceedings 2012, 2012: 577-86. PMID: 23304330, PMCID: PMC3540509.Peer-Reviewed Original ResearchConceptsDetection moduleNatural language processing systemsKnowledge Extraction SystemEMR dataRule-based classifierClinical Text AnalysisHighest F-measureLanguage processing systemElectronic medical recordsF-measureLevels of classificationProcessing systemSpecific tasksText analysisClassifierDesirable performanceModuleModest effortExtraction systemCTAKESSmoking moduleMachineSystemTaskClassification
2010
An automated approach to calculating the daily dose of tacrolimus in electronic health records.
Xu H, Doan S, Birdwell K, Cowan J, Vincz A, Haas D, Basford M, Denny J. An automated approach to calculating the daily dose of tacrolimus in electronic health records. AMIA Joint Summits On Translational Science Proceedings 2010, 2010: 71-5. PMID: 21347153, PMCID: PMC3041548.Peer-Reviewed Original ResearchElectronic health recordsUnstructured clinical dataNatural language processingHealth recordsTime-consuming taskUnstructured formatClinical textLanguage processingAutomated ApproachDaily doseData setsTest casesDetailed drug informationDrug mentionsDaily dosesClinical dataMedication informationClinical researchMedication namesDrug informationInformationTacrolimusMedicationsDoseTask