2024
Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition
Zuo X, Kumar A, Shen S, Li J, Cong G, Jin E, Chen Q, Warner J, Yang P, Xu H. Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition. JCO Clinical Cancer Informatics 2024, 8: e2300166. PMID: 38885475, DOI: 10.1200/cci.23.00166.Peer-Reviewed Original ResearchConceptsNatural language processingDomain-specific language modelsNatural language processing systemsInformation extraction systemRule-based moduleNarrative clinical textsNLP tasksEntity recognitionText normalizationAssertion classificationLanguage modelInformation extractionClinical textElectronic health recordsLearning-basedClinical notesLanguage processingTest setSystem performanceHealth recordsResponse extractionTime-consumingAnticancer therapyInformationAssessment information
2023
AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models
Datta S, Lee K, Paek H, Manion F, Ofoegbu N, Du J, Li Y, Huang L, Wang J, Lin B, Xu H, Wang X. AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models. Journal Of The American Medical Informatics Association 2023, 31: 375-385. PMID: 37952206, PMCID: PMC10797270, DOI: 10.1093/jamia/ocad218.Peer-Reviewed Original ResearchConceptsLanguage modelInformation extraction systemOverall F1 scoreCriteria informationF1 scoreManual annotationScalable solutionContextual informationComplex scenariosContextual attributesExtraction systemReal-world settingsSystem evaluationModeling capabilitiesClinical trial protocol documentsInformationProtocol documentsFinFax: Fast Interpretation of Fax with NLP
Anjum O, Chen L, Denlinger R, Anam E, Dongsheng Y, Wooldridge C, Citardi M, Zhang J, Xu H, Jiang X. FinFax: Fast Interpretation of Fax with NLP. 2023, 1-2. DOI: 10.1145/3584371.3613019.Peer-Reviewed Original ResearchEnd systemCritical medical informationElectronic health record systemsHealth record systemsReal-life applicationsVital clinical dataFirst endManual processingInformation exchangeHealthcare organizationsOverall workflowHealth recordsMedical informationRecord systemWorkflowFinal outputFast interpretationFaxAcademic environmentInformationPertinent informationMultiple solutionsProcessingReal-life hospital settingsNLPTowards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach
Hu Y, Keloth V, Raja K, Chen Y, Xu H. Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach. Bioinformatics 2023, 39: btad542. PMID: 37669123, PMCID: PMC10500081, DOI: 10.1093/bioinformatics/btad542.Peer-Reviewed Original ResearchNatural language processingMicro-F1 scoreCOVID-19 datasetNLP pipelineF1 scoreEntity recognition modelAD datasetPICO elementsSentence classificationNER modelRecognition modelLanguage processingLearning approachLearning modelEnd evaluationSupplementary dataDatasetPipelineExtractionInformationRCT abstractsAnnotationSentencesBioinformaticsComplexityRepresenting and utilizing clinical textual data for real world studies: An OHDSI approach
Keloth V, Banda J, Gurley M, Heider P, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves R, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei W, Williams A, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. Journal Of Biomedical Informatics 2023, 142: 104343. PMID: 36935011, PMCID: PMC10428170, DOI: 10.1016/j.jbi.2023.104343.Peer-Reviewed Original ResearchConceptsNatural language processingCommon data modelTextual dataNLP solutionObservational Health Data SciencesOMOP Common Data ModelSpecific use casesObservational Medical Outcomes Partnership Common Data ModelHealth Data SciencesRepresentation of informationUse casesElectronic health recordsReal-world evidence generationData scienceClinical textData modelClinical notesLanguage processingHealth recordsLoad dataClinical documentationCurrent applicationsInformationWorkflowEvidence generation
2022
Natural Language Processing
Xu H, Roberts K. Natural Language Processing. Cognitive Informatics In Biomedicine And Healthcare 2022, 213-234. DOI: 10.1007/978-3-031-09108-7_7.Peer-Reviewed Original ResearchNatural language processingLanguage processingElectronic health recordsBiomedical domainBiomedical natural language processingCommon NLP tasksNarrative textNLP tasksBiomedical articlesClinical documentsNLP fieldTextHealth recordsLarge amountBasic conceptsBibliographic databasesProcessingTaskArticleDocumentsDomainChapterDatabaseInformationAttentionImproving Pharmacovigilance Signal Detection from Clinical Notes with Locality Sensitive Neural Concept Embeddings.
Mower J, Bernstam E, Xu H, Myneni S, Subramanian D, Cohen T. Improving Pharmacovigilance Signal Detection from Clinical Notes with Locality Sensitive Neural Concept Embeddings. AMIA Joint Summits On Translational Science Proceedings 2022, 2022: 349-358. PMID: 35854716, PMCID: PMC9285153.Peer-Reviewed Original ResearchNatural language processingClinical notesRetrieval tasksConcept embeddingsNeural embeddingsLeverage informationLanguage processingEmbedding methodPharmacovigilance signal detectionADR signalsInherent complexityEmbeddingSignal detectionSignal recoveryAdverse drug reactionsStatistical measuresInformationDetection
2020
Conversational ontology operator: patient-centric vaccine dialogue management engine for spoken conversational agents
Amith M, Lin R, Cui L, Wang D, Zhu A, Xiong G, Xu H, Roberts K, Tao C. Conversational ontology operator: patient-centric vaccine dialogue management engine for spoken conversational agents. BMC Medical Informatics And Decision Making 2020, 20: 259. PMID: 33317519, PMCID: PMC7734717, DOI: 10.1186/s12911-020-01267-y.Peer-Reviewed Original ResearchConceptsDialogue engineUser-centric systemOntology-based systemQuestion-answering systemManagement engineSoftware engineQuestion AnsweringConversational agentsDialogue interactionCompetency questionsContextual informationConsumer usersCore taskAccuracy scoresConsumer questionsEngineConversational flowHealth informationSimulation trialsInformationUsersFuture plansNext stepOntologyWizardNamed Entity Recognition from Table Headers in Randomized Controlled Trial Articles
Wei Q, Zhou Y, Zhao B, Hu X, Mei Q, Tao C, Xu H. Named Entity Recognition from Table Headers in Randomized Controlled Trial Articles. 2020, 00: 1-2. DOI: 10.1109/ichi48887.2020.9374323.Peer-Reviewed Original ResearchTable headersEntity recognitionDeep learning-based approachBiomedical text miningLearning-based approachNamed Entity RecognitionInformation extractionBiomedical entitiesF1 scoreText miningUnstructured natureBiomedical articlesContextual informationComputational applicationsHeaderSemantic complexityBetter performanceCorpusRecognitionInformationMiningApplicationsImportant informationComplexityBiomedical researchEfficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the Corpus
Li Y, Luo Y, Wampfler J, Rubinstein S, Tiryaki F, Ashok K, Warner J, Xu H, Yang P. Efficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the Corpus. JCO Clinical Cancer Informatics 2020, 4: cci.19.00147. PMID: 32364754, PMCID: PMC7265793, DOI: 10.1200/cci.19.00147.Peer-Reviewed Original ResearchConceptsNatural language processing toolsElectronic health recordsLanguage processing toolsGold standard dataUnstructured electronic health recordsProcessing toolsAmount of dataClinical notesStandard dataMayo Clinic electronic health recordsClinic's electronic health recordEnvironment toolsAccurate annotationHealth recordsInformatics toolsEffective analysisData setsTextual sourcesCorpusToolInformationData extractionSetExtractingAnnotation
2019
Developing Customizable Cancer Information Extraction Modules for Pathology Reports Using CLAMP
Soysal E, Warner J, Wang J, Jiang M, Harvey K, Jain S, Dong X, Song H, Siddhanamatha H, Wang L, Dai Q, Chen Q, Du X, Tao C, Yang P, Denny J, Liu H, Xu H. Developing Customizable Cancer Information Extraction Modules for Pathology Reports Using CLAMP. 2019, 264: 1041-1045. PMID: 31438083, PMCID: PMC7359882, DOI: 10.3233/shti190383.Peer-Reviewed Original ResearchConceptsElectronic health recordsNLP solutionNatural language processing technologyInformation extraction moduleLanguage processing technologyInformation extraction tasksUser-friendly interfaceBest F-measureInformation extractionExtraction moduleExtraction taskCustomizable modulesNLP systemsF-measureAcademic useHealth recordsComparable performanceProcessing technologyVanderbilt University Medical CenterModuleDiverse typesInformationNLPSubstantial effortSystem
2018
Adapting Word Embeddings from Multiple Domains to Symptom Recognition from Psychiatric Notes.
Zhang Y, Li H, Wang J, Cohen T, Roberts K, Xu H. Adapting Word Embeddings from Multiple Domains to Symptom Recognition from Psychiatric Notes. AMIA Joint Summits On Translational Science Proceedings 2018, 2017: 281-289. PMID: 29888086, PMCID: PMC5961810.Peer-Reviewed Original ResearchWord embeddingsClinical textTarget domainSource domainNatural language processing techniquesLanguage processing techniquesMultiple word embeddingsBaseline methodsBiomedical literatureFirst workProcessing techniquesEmbeddingPsychiatric notesMultiple domainsExperimental resultsDifferent weightsSuch informationImportant topicRecognitionDifferent approachesWikipediaInformationPersonalizationDomainText
2017
CNN-based ranking for biomedical entity normalization
Li H, Chen Q, Tang B, Wang X, Xu H, Wang B, Huang D. CNN-based ranking for biomedical entity normalization. BMC Bioinformatics 2017, 18: 385. PMID: 28984180, PMCID: PMC5629610, DOI: 10.1186/s12859-017-1805-7.Peer-Reviewed Original ResearchConceptsBiomedical entity normalizationEntity normalizationSemantic informationCNN architectureNovel convolutional neural network architectureConvolutional neural network architectureTraditional rule-based methodsNeural network architectureRule-based systemRanking methodRule-based methodNetwork architectureBiomedical entitiesBenchmark datasetsArt performanceEntity mentionsRanking problemCNNNormalization systemArchitectureMorphological informationComparison resultsInformationDatasetSystemA hybrid approach to automatic de-identification of psychiatric notes
Lee H, Wu Y, Zhang Y, Xu J, Xu H, Roberts K. A hybrid approach to automatic de-identification of psychiatric notes. Journal Of Biomedical Informatics 2017, 75: s19-s27. PMID: 28602904, PMCID: PMC5705430, DOI: 10.1016/j.jbi.2017.06.006.Peer-Reviewed Original ResearchConceptsPsychiatric notesCEGS N-GRIDNatural language processing systemsRule-based componentTask Track 1Language processing systemRule-based approachDe-identificationDomain adaptationRich featuresProcessing systemHybrid approachN gridTrack 1Clinical dataTest setSystem performanceMachineHealth informationHybrid systemSystemClinical applicationTaskInformationDataCATTLE (CAncer treatment treasury with linked evidence): An integrated knowledge base for personalized oncology research and practice
Soysal E, Lee H, Zhang Y, Huang L, Chen X, Wei Q, Zheng W, Chang J, Cohen T, Sun J, Xu H. CATTLE (CAncer treatment treasury with linked evidence): An integrated knowledge base for personalized oncology research and practice. CPT Pharmacometrics & Systems Pharmacology 2017, 6: 188-196. PMID: 28296354, PMCID: PMC5351410, DOI: 10.1002/psp4.12174.Peer-Reviewed Original Research
2016
Leveraging syntactic and semantic graph kernels to extract pharmacokinetic drug drug interactions from biomedical literature
Zhang Y, Wu H, Xu J, Wang J, Soysal E, Li L, Xu H. Leveraging syntactic and semantic graph kernels to extract pharmacokinetic drug drug interactions from biomedical literature. BMC Systems Biology 2016, 10: 67. PMID: 27585838, PMCID: PMC5009562, DOI: 10.1186/s12918-016-0311-2.Peer-Reviewed Original ResearchConceptsPaths graph kernelGraph kernelsSemantic classesSemantic informationBiomedical literatureShallow semantic representationsText mining techniquesBest F-scoreAutomatic DDI extractionProblem of sparsenessDependency structureSemantic graphDDI detectionKnowledge basesDDI corpusF-scoreDDI extractionSemantic representationNovel approachExperimental resultsKernelHigh precisionInformationSparsenessGraph
2015
Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
Chen Y, Sun J, Huang L, Xu H, Zhao Z. Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations. BioMed Research International 2015, 2015: 491502. PMID: 26539502, PMCID: PMC4619847, DOI: 10.1155/2015/491502.Peer-Reviewed Original ResearchConceptsMachine learningF-measureAvailable big dataSupport vector machineBig dataVector machineClassification experimentsAccurate classificationCancer classificationGene function informationMachineSomatic mutation informationClassificationMutation informationFunction informationLearningGene symbolsInformationGene featuresGreat opportunityPerformanceSomatic mutation dataMutation dataAccuracyPrediction
2014
PhenDisco: phenotype discovery system for the database of genotypes and phenotypes
Doan S, Lin K, Conway M, Ohno-Machado L, Hsieh A, Feupe S, Garland A, Ross M, Jiang X, Farzaneh S, Walker R, Alipanah N, Zhang J, Xu H, Kim H. PhenDisco: phenotype discovery system for the database of genotypes and phenotypes. Journal Of The American Medical Informatics Association 2014, 21: 31-36. PMID: 23989082, PMCID: PMC3912702, DOI: 10.1136/amiajnl-2013-001882.Peer-Reviewed Original ResearchConceptsNew information retrieval systemInformation retrieval systemsInformation retrieval toolsDatabase of GenotypesText processing toolsRetrieval systemSearch scenariosDiscovery systemRetrieval toolsAuthorized usersNon-standardized wayCross-study validationSearch comparisonProcessing toolsPromising performanceUsersPhenotype informationDatabaseInformationBiotechnology InformationQueriesMetadataEntrezResourcesSystem
2012
Extracting epidemiologic exposure and outcome terms from literature using machine learning approaches.
Lu Y, Xu H, Peterson N, Dai Q, Jiang M, Denny J, Liu M. Extracting epidemiologic exposure and outcome terms from literature using machine learning approaches. International Journal Of Data Mining And Bioinformatics 2012, 6: 447-59. PMID: 23155773, DOI: 10.1504/ijdmb.2012.049284.Peer-Reviewed Original Research
2011
Modeling drug exposure data in electronic medical records: an application to warfarin.
Liu M, Jiang M, Kawai V, Stein C, Roden D, Denny J, Xu H. Modeling drug exposure data in electronic medical records: an application to warfarin. AMIA Annual Symposium Proceedings 2011, 2011: 815-23. PMID: 22195139, PMCID: PMC3243123.Peer-Reviewed Original ResearchConceptsNatural language processingMachine learning technologiesElectronic medical recordsDrug exposure informationLearning technologyLanguage processingTemporal informationInformatics frameworkClinical narrativesDrug mentionsMedical recordsDrug exposure dataFrameworkReceiver operator characteristic curveDrug exposure historyInformationDrug-related researchWarfarin exposureDrug regimensHospital admissionDrug exposureAccurate modelingDrug informationExposure informationExposure data