2024
Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition
Zuo X, Kumar A, Shen S, Li J, Cong G, Jin E, Chen Q, Warner J, Yang P, Xu H. Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition. JCO Clinical Cancer Informatics 2024, 8: e2300166. PMID: 38885475, DOI: 10.1200/cci.23.00166.Peer-Reviewed Original ResearchConceptsNatural language processingDomain-specific language modelsNatural language processing systemsInformation extraction systemRule-based moduleNarrative clinical textsNLP tasksEntity recognitionText normalizationAssertion classificationLanguage modelInformation extractionClinical textElectronic health recordsLearning-basedClinical notesLanguage processingTest setSystem performanceHealth recordsResponse extractionTime-consumingAnticancer therapyInformationAssessment informationFedFSA: Hybrid and federated framework for functional status ascertainment across institutions
Fu S, Jia H, Vassilaki M, Keloth V, Dang Y, Zhou Y, Garg M, Petersen R, St Sauver J, Moon S, Wang L, Wen A, Li F, Xu H, Tao C, Fan J, Liu H, Sohn S. FedFSA: Hybrid and federated framework for functional status ascertainment across institutions. Journal Of Biomedical Informatics 2024, 152: 104623. PMID: 38458578, PMCID: PMC11005095, DOI: 10.1016/j.jbi.2024.104623.Peer-Reviewed Original ResearchNatural language processingElectronic health recordsStatus informationInformation extractionFunctional status informationRule-based information extractionFederated learning frameworkPrivate local dataNatural language processing frameworkHealthcare sitesPatient's functional statusMultiple healthcare institutionsFederated learningPyTorch libraryConcept normalizationBERT modelLearning frameworkCollaborative development effortsCorpus annotationLanguage processingHealthcare institutionsFunctional statusPredictor of health outcomesActivities of daily livingNatural language processing performance
2022
ClinicalLayoutLM: A Pre-trained Multi-modal Model for Understanding Scanned Document in Electronic Health Records
Wei Q, Zuo X, Anjum O, Hu Y, Denlinger R, Bernstam E, Citardi M, Xu H. ClinicalLayoutLM: A Pre-trained Multi-modal Model for Understanding Scanned Document in Electronic Health Records. 2022, 00: 2821-2827. DOI: 10.1109/bigdata55660.2022.10020569.Peer-Reviewed Original ResearchOptical character recognitionMulti-modal modelElectronic health recordsClinical documentsNatural language processing tasksInformation extraction technologyPre-trained modelsHealth recordsLanguage processing tasksInformation extractionImage informationF1 scoreCharacter recognitionLayout analysisProcessing tasksMulti-modal approachClinical corpusBaseline modelDocumentsOpen domainTaskExtraction technologyClinical operationsDifferent categoriesText
2021
From Tokenization to Self-Supervision: Building a High-Performance Information Extraction System for Chemical Reactions in Patents
Wang J, Ren Y, Zhang Z, Xu H, Zhang Y. From Tokenization to Self-Supervision: Building a High-Performance Information Extraction System for Chemical Reactions in Patents. Frontiers In Research Metrics And Analytics 2021, 6: 691105. PMID: 35005421, PMCID: PMC8727901, DOI: 10.3389/frma.2021.691105.Peer-Reviewed Original ResearchEvent extractionEntity recognitionNatural language processing techniquesAccurate information extractionInformation extraction systemLanguage processing techniquesKnowledge-based rulesInformation extractionAutomatic toolEnd systemArt resultsSemantic rolesLanguage modelSelf-SupervisionFree textChemical patentsSubtask 1Reaction extractionDifferent semantic rolesHybrid approachEvent triggersProcessing techniquesSubtasksTokenizationHigh performance
2020
Opioid2FHIR: A system for extracting FHIR-compatible opioid prescriptions from clinical text
Wang J, Mathews W, Pham H, Xu H, Zhang Y. Opioid2FHIR: A system for extracting FHIR-compatible opioid prescriptions from clinical text. 2020, 00: 1748-1751. DOI: 10.1109/bibm49941.2020.9313258.Peer-Reviewed Original ResearchFast Healthcare Interoperability ResourcesInformation extractionNatural language processing techniquesLanguage processing techniquesMedical concept normalizationOpioid informationPost-processing rulesClinical decision supportManual effortConcept normalizationClinical textF-measureNLP applicationsPrescription recordsClinical data standardsData standardsDecision supportFree textProcessing toolsPrescription drug monitoring programsNational public health emergencyProcessing techniquesPrescription opioid overdoseDrug monitoring programsDrug overdose deathsNamed Entity Recognition from Table Headers in Randomized Controlled Trial Articles
Wei Q, Zhou Y, Zhao B, Hu X, Mei Q, Tao C, Xu H. Named Entity Recognition from Table Headers in Randomized Controlled Trial Articles. 2020, 00: 1-2. DOI: 10.1109/ichi48887.2020.9374323.Peer-Reviewed Original ResearchTable headersEntity recognitionDeep learning-based approachBiomedical text miningLearning-based approachNamed Entity RecognitionInformation extractionBiomedical entitiesF1 scoreText miningUnstructured natureBiomedical articlesContextual informationComputational applicationsHeaderSemantic complexityBetter performanceCorpusRecognitionInformationMiningApplicationsImportant informationComplexityBiomedical research
2019
Developing Customizable Cancer Information Extraction Modules for Pathology Reports Using CLAMP
Soysal E, Warner J, Wang J, Jiang M, Harvey K, Jain S, Dong X, Song H, Siddhanamatha H, Wang L, Dai Q, Chen Q, Du X, Tao C, Yang P, Denny J, Liu H, Xu H. Developing Customizable Cancer Information Extraction Modules for Pathology Reports Using CLAMP. 2019, 264: 1041-1045. PMID: 31438083, PMCID: PMC7359882, DOI: 10.3233/shti190383.Peer-Reviewed Original ResearchConceptsElectronic health recordsNLP solutionNatural language processing technologyInformation extraction moduleLanguage processing technologyInformation extraction tasksUser-friendly interfaceBest F-measureInformation extractionExtraction moduleExtraction taskCustomizable modulesNLP systemsF-measureAcademic useHealth recordsComparable performanceProcessing technologyVanderbilt University Medical CenterModuleDiverse typesInformationNLPSubstantial effortSystem
2017
CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines
Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H, Xu H. CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines. Journal Of The American Medical Informatics Association 2017, 25: 331-336. PMID: 29186491, PMCID: PMC7378877, DOI: 10.1093/jamia/ocx132.Peer-Reviewed Original ResearchGraphic user interfaceUser interfaceUser-friendly graphic user interfaceNatural language processing systemsClinical natural language processing (NLP) systemsNatural language processing pipelineKnowledge Extraction SystemLanguage processing pipelineClinical Text AnalysisLanguage processing systemNLP componentsNLP toolkitInformation extractionNLP pipelineUse casesEntity recognitionClinical textEnd usersNLP communityProcessing pipelineProcessing systemIndividual tasksIndividual applicationsText analysisBetter performanceKnowledge-Based Approach for Named Entity Recognition in Biomedical Literature: A Use Case in Biomedical Software Identification
Amith M, Zhang Y, Xu H, Tao C. Knowledge-Based Approach for Named Entity Recognition in Biomedical Literature: A Use Case in Biomedical Software Identification. Lecture Notes In Computer Science 2017, 10351: 386-395. DOI: 10.1007/978-3-319-60045-1_40.Peer-Reviewed Original ResearchEntity recognitionNatural language processingContextual semantic informationNamed Entity RecognitionEntity recognition methodFeatures of ontologyMachine learning approachesKnowledge-based approachSoftware entitiesSoftware namesInformation extractionUse casesBiomedical softwareSemantic informationSoftware identificationLanguage processingRecognition methodLearning approachBiomedical literatureRecognitionOntologyEntitiesSoftwareResearch abstractsTask
2015
Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature
Chen G, Zhao J, Cohen T, Tao C, Sun J, Xu H, Bernstam E, Lawson A, Zeng J, Johnson A, Holla V, Bailey A, Lara-Guerra H, Litzenburger B, Meric-Bernstam F, Zheng W. Using Ontology Fingerprints to disambiguate gene name entities in the biomedical literature. Database 2015, 2015: bav034. PMID: 25858285, PMCID: PMC4390608, DOI: 10.1093/database/bav034.Peer-Reviewed Original Research