2024
Standardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach.
Zuo X, Zhou Y, Duke J, Hripcsak G, Shah N, Banda J, Reeves R, Miller T, Waitman L, Natarajan K, Xu H. Standardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach. AMIA Annual Symposium Proceedings 2024, 2023: 834-843. PMID: 38222429, PMCID: PMC10785935.Peer-Reviewed Original ResearchMeSH KeywordsElectronic Health RecordsHumansInformation Storage and RetrievalLogical Observation Identifiers Names and CodesSemantics
2023
AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models
Datta S, Lee K, Paek H, Manion F, Ofoegbu N, Du J, Li Y, Huang L, Wang J, Lin B, Xu H, Wang X. AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models. Journal Of The American Medical Informatics Association 2023, 31: 375-385. PMID: 37952206, PMCID: PMC10797270, DOI: 10.1093/jamia/ocad218.Peer-Reviewed Original ResearchMeSH KeywordsBreast NeoplasmsEligibility DeterminationFemaleHumansInformation Storage and RetrievalLanguageNatural Language ProcessingConceptsLanguage modelInformation extraction systemOverall F1 scoreCriteria informationF1 scoreManual annotationScalable solutionContextual informationComplex scenariosContextual attributesExtraction systemReal-world settingsSystem evaluationModeling capabilitiesClinical trial protocol documentsInformationProtocol documents
2022
A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora
Li J, Wei Q, Ghiasvand O, Chen M, Lobanov V, Weng C, Xu H. A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora. BMC Medical Informatics And Decision Making 2022, 22: 235. PMID: 36068551, PMCID: PMC9450226, DOI: 10.1186/s12911-022-01967-7.Peer-Reviewed Original ResearchMeSH KeywordsClinical Trials as TopicEligibility DeterminationHumansInformation Storage and RetrievalLanguageMedicineNamesNatural Language ProcessingConceptsPre-trained language modelsNER taskUnstructured textEntity recognitionLanguage modelNatural language processing techniquesClinical trial eligibility criteriaLanguage processing techniquesData augmentation resultsData augmentation approachDomain-specific corpusBetter performanceTransformer modelCross-validation showMultiple data sourcesEligibility criteria textBiomedical domainEmbedding modelsNER performanceAugmentation approachContextual embeddingsMeaningful informationEvaluation resultsSuch documentsProcessing techniquesAssessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz I, Wang N, Yang P, Xu H, Warner J, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clinical Cancer Informatics 2022, 6: e2200006. PMID: 35917480, PMCID: PMC9470142, DOI: 10.1200/cci.22.00006.Peer-Reviewed Original Research
2021
Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition
Li J, Zhou Y, Jiang X, Natarajan K, Pakhomov S, Liu H, Xu H. Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition. Journal Of The American Medical Informatics Association 2021, 28: 2193-2201. PMID: 34272955, PMCID: PMC8449609, DOI: 10.1093/jamia/ocab112.Peer-Reviewed Original ResearchPrivacy-protecting, reliable response data discovery using COVID-19 patient observations
Kim J, Neumann L, Paul P, Day M, Aratow M, Bell D, Doctor J, Hinske L, Jiang X, Kim K, Matheny M, Meeker D, Pletcher M, Schilling L, SooHoo S, Xu H, Zheng K, Ohno-Machado L, Anderson D, Anderson N, Balacha C, Bath T, Baxter S, Becker-Pennrich A, Bernstam E, Carter W, Chau N, Choi Y, Covington S, DuVall S, El-Kareh R, Florian R, Follett R, Geisler B, Ghigi A, Gottlieb A, Hu Z, Ir D, Knight T, Koola J, Kuo T, Lee N, Mansmann U, Mou Z, Murphy R, Neumann L, Nguyen N, Niedermayer S, Park E, Perkins A, Post K, Rieder C, Scherer C, Soares A, Soysal E, Tep B, Toy B, Wang B, Wu Z, Zhou Y, Zucker R. Privacy-protecting, reliable response data discovery using COVID-19 patient observations. Journal Of The American Medical Informatics Association 2021, 28: 1765-1776. PMID: 34051088, PMCID: PMC8194878, DOI: 10.1093/jamia/ocab054.Peer-Reviewed Original ResearchCOVID-19 SignSym: a fast adaptation of a general clinical NLP tool to identify and normalize COVID-19 signs and symptoms to OMOP common data model
Wang J, Abu-El-Rub N, Gray J, Pham H, Zhou Y, Manion F, Liu M, Song X, Xu H, Rouhizadeh M, Zhang Y. COVID-19 SignSym: a fast adaptation of a general clinical NLP tool to identify and normalize COVID-19 signs and symptoms to OMOP common data model. Journal Of The American Medical Informatics Association 2021, 28: 1275-1283. PMID: 33674830, PMCID: PMC7989301, DOI: 10.1093/jamia/ocab015.Peer-Reviewed Original ResearchMeSH KeywordsCOVID-19Deep LearningElectronic Health RecordsHumansInformation Storage and RetrievalNatural Language ProcessingSymptom AssessmentConceptsNatural language processing toolsCommon data modelLanguage processing toolsElectronic health recordsClinical natural language processing toolsData modelDeep learning-based modelProcessing toolsOMOP Common Data ModelPattern-based rulesObservational Medical Outcomes Partnership Common Data ModelLearning-based modelsSpecific information needsUse casesNLP toolsClinical textFree textExtensive evaluationDownloadable packageInformation needsHybrid approachResearch communityHealth recordsData sourcesHigh performance
2020
Achievability to Extract Specific Date Information for Cancer Research.
Wang L, Wampfler J, Dispenzieri A, Xu H, Yang P, Liu H. Achievability to Extract Specific Date Information for Cancer Research. AMIA Annual Symposium Proceedings 2020, 2019: 893-902. PMID: 32308886, PMCID: PMC7153063.Peer-Reviewed Original ResearchRelation Extraction from Clinical Narratives Using Pre-trained Language Models.
Wei Q, Ji Z, Si Y, Du J, Wang J, Tiryaki F, Wu S, Tao C, Roberts K, Xu H. Relation Extraction from Clinical Narratives Using Pre-trained Language Models. AMIA Annual Symposium Proceedings 2020, 2019: 1236-1245. PMID: 32308921, PMCID: PMC7153059.Peer-Reviewed Original ResearchMeSH KeywordsDatasets as TopicHumansInformation Storage and RetrievalMachine LearningNarrationNatural Language ProcessingSemanticsConceptsPre-trained language modelsNatural language processingLanguage modelRE tasksNLP tasksClinical narrativesRecent deep learning methodsDeep learning methodsClinical NLP tasksRelation extraction taskTraditional word embeddingsTraditional machineExtraction taskArt performanceRelation extractionBERT modelLanguage processingLearning methodsWord embeddingsShared TaskPrevious stateBiomedical literatureDifferent implementationsTaskOpen domain
2019
Developing Customizable Cancer Information Extraction Modules for Pathology Reports Using CLAMP
Soysal E, Warner J, Wang J, Jiang M, Harvey K, Jain S, Dong X, Song H, Siddhanamatha H, Wang L, Dai Q, Chen Q, Du X, Tao C, Yang P, Denny J, Liu H, Xu H. Developing Customizable Cancer Information Extraction Modules for Pathology Reports Using CLAMP. 2019, 264: 1041-1045. PMID: 31438083, PMCID: PMC7359882, DOI: 10.3233/shti190383.Peer-Reviewed Original ResearchMeSH KeywordsElectronic Health RecordsHumansInformation Storage and RetrievalNatural Language ProcessingNeoplasmsResearch ReportConceptsElectronic health recordsNLP solutionNatural language processing technologyInformation extraction moduleLanguage processing technologyInformation extraction tasksUser-friendly interfaceBest F-measureInformation extractionExtraction moduleExtraction taskCustomizable modulesNLP systemsF-measureAcademic useHealth recordsComparable performanceProcessing technologyVanderbilt University Medical CenterModuleDiverse typesInformationNLPSubstantial effortSystemCost-aware active learning for named entity recognition in clinical text
Wei Q, Chen Y, Salimi M, Denny J, Mei Q, Lasko T, Chen Q, Wu S, Franklin A, Cohen T, Xu H. Cost-aware active learning for named entity recognition in clinical text. Journal Of The American Medical Informatics Association 2019, 26: 1314-1322. PMID: 31294792, PMCID: PMC6798575, DOI: 10.1093/jamia/ocz102.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsBig DataComputer SimulationElectronic Health RecordsHumansInformation Storage and RetrievalModels, EconomicNatural Language ProcessingConceptsAnnotation costUser studyActive learningAL methodsAL algorithmCost-CAUSEReal-world environmentsAnnotation taskAnnotation timeAnnotation accuracyEntity recognitionClinical textAnnotation dataPassive learningInformative examplesCurve scoreMost approachesSimulation areaUsersSyntactic featuresLearningCost measuresAlgorithmCostAnnotationEnhancing clinical concept extraction with contextual embeddings
Si Y, Wang J, Xu H, Roberts K. Enhancing clinical concept extraction with contextual embeddings. Journal Of The American Medical Informatics Association 2019, 26: 1297-1304. PMID: 31265066, PMCID: PMC6798561, DOI: 10.1093/jamia/ocz096.Peer-Reviewed Original ResearchMeSH KeywordsBig DataDatabases, FactualHumansInformation Storage and RetrievalNatural Language ProcessingNeural Networks, ComputerPublic Reporting of Healthcare DataConceptsClinical concept extractionContextual embeddingsNatural language processing tasksTraditional word embeddingsTraditional word representationsClinical NLP tasksLanguage processing tasksSemantic informationWord embedding methodsLarge language modelsArt performanceConcept extraction taskSemEval 2014Word representationsNLP tasksLanguage modelWord embeddingsProcessing tasksNeural network-based representationI2b2 2010Concept extractionTaskLarge clinical corpusClinical corpusNetwork-based representationA study of deep learning approaches for medication and adverse drug event extraction from clinical text
Wei Q, Ji Z, Li Z, Du J, Wang J, Xu J, Xiang Y, Tiryaki F, Wu S, Zhang Y, Tao C, Xu H. A study of deep learning approaches for medication and adverse drug event extraction from clinical text. Journal Of The American Medical Informatics Association 2019, 27: 13-21. PMID: 31135882, PMCID: PMC6913210, DOI: 10.1093/jamia/ocz063.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsDeep LearningDrug-Related Side Effects and Adverse ReactionsElectronic Health RecordsHumansInformation Storage and RetrievalMachine LearningNarrationNatural Language ProcessingPharmaceutical PreparationsConceptsDeep learning-based approachDeep learning approachLearning-based approachTraditional machineLearning approachNational NLP Clinical ChallengesAdverse drug event extractionOutperform traditional machineDifferent ensemble approachesConditional Random FieldsSequence labeling approachMIMIC-III databaseEvent extractionMedical domainEntity recognitionClassification componentF1 scoreClinical textRelation extractionClinical documentsVector machineEnd evaluationEnsemble approachClinical corpusMachineTime-sensitive clinical concept embeddings learned from large electronic health records
Xiang Y, Xu J, Si Y, Li Z, Rasmy L, Zhou Y, Tiryaki F, Li F, Zhang Y, Wu Y, Jiang X, Zheng W, Zhi D, Tao C, Xu H. Time-sensitive clinical concept embeddings learned from large electronic health records. BMC Medical Informatics And Decision Making 2019, 19: 58. PMID: 30961579, PMCID: PMC6454598, DOI: 10.1186/s12911-019-0766-3.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsDatabases, FactualDeep LearningElectronic Health RecordsHumansInformation Storage and RetrievalTime FactorsConceptsConcept similarity measurePositive pointwise mutual informationConcept embeddingsSimilarity measurePredictive modeling tasksLarge electronic health recordTime-sensitive informationPointwise mutual informationImportant research areaDeep learningElectronic health recordsMedical domainLarge electronic health record databaseWord2vec embeddingsTemporal dependenciesLearning methodsFastText algorithmModeling tasksResultsOur experimentsExtrinsic evaluationIntrinsic evaluationMutual informationHealth recordsDistributional representationsEmbeddingA fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text
Xiong Y, Wang Z, Jiang D, Wang X, Chen Q, Xu H, Yan J, Tang B. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text. BMC Medical Informatics And Decision Making 2019, 19: 66. PMID: 30961602, PMCID: PMC6454584, DOI: 10.1186/s12911-019-0770-7.Peer-Reviewed Original ResearchChinaElectronic Health RecordsHumansInformation Storage and RetrievalNatural Language ProcessingSpeech
2018
Identifying direct temporal relations between time and events from clinical notes
Lee H, Zhang Y, Jiang M, Xu J, Tao C, Xu H. Identifying direct temporal relations between time and events from clinical notes. BMC Medical Informatics And Decision Making 2018, 18: 49. PMID: 30066643, PMCID: PMC6069692, DOI: 10.1186/s12911-018-0627-5.Peer-Reviewed Original ResearchDelivery of Health CareDocumentationHumansInformation Storage and RetrievalMedical RecordsTime Factors
2017
A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge
Cohen T, Roberts K, Gururaj A, Chen X, Pournejati S, Alter G, Hersh W, Demner-Fushman D, Ohno-Machado L, Xu H. A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge. Database 2017, 2017: bax061. PMID: 29220453, PMCID: PMC5737202, DOI: 10.1093/database/bax061.Peer-Reviewed Original Research
2016
Automated identification of molecular effects of drugs (AIMED)
Fathiamini S, Johnson A, Zeng J, Araya A, Holla V, Bailey A, Litzenburger B, Sanchez N, Khotskaya Y, Xu H, Meric-Bernstam F, Bernstam E, Cohen T. Automated identification of molecular effects of drugs (AIMED). Journal Of The American Medical Informatics Association 2016, 23: 758-765. PMID: 27107438, PMCID: PMC4926748, DOI: 10.1093/jamia/ocw030.Peer-Reviewed Original Research
2014
Validating drug repurposing signals using electronic health records: a case study of metformin associated with reduced cancer mortality
Xu H, Aldrich M, Chen Q, Liu H, Peterson N, Dai Q, Levy M, Shah A, Han X, Ruan X, Jiang M, Li Y, St Julien J, Warner J, Friedman C, Roden D, Denny J. Validating drug repurposing signals using electronic health records: a case study of metformin associated with reduced cancer mortality. Journal Of The American Medical Informatics Association 2014, 22: 179-191. PMID: 25053577, PMCID: PMC4433365, DOI: 10.1136/amiajnl-2014-002649.Peer-Reviewed Original ResearchConceptsType 2 diabetes patientsElectronic health recordsCancer patientsCancer mortalityDiabetes patientsEHR dataNon-diabetic cancer patientsCox proportional hazards modelDrug exposure informationOral hypoglycemic medicationsCharlson Comorbidity IndexNon-diabetic patientsUse of metforminCancer diagnosisHealth recordsSite-specific cancersBody mass indexProportional hazards modelVanderbilt University Medical CenterUniversity Medical CenterLarge electronic health recordHypoglycemic medicationsCause mortalityComorbidity indexInsulin use
2012
Recognition of medication information from discharge summaries using ensembles of classifiers
Doan S, Collier N, Xu H, Duy P, Phuong T. Recognition of medication information from discharge summaries using ensembles of classifiers. BMC Medical Informatics And Decision Making 2012, 12: 36. PMID: 22564405, PMCID: PMC3502425, DOI: 10.1186/1472-6947-12-36.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsArtificial IntelligenceDecision Support TechniquesFemaleHumansInformation Storage and RetrievalInstitutional Management TeamsMaleMedication SystemsNatural Language ProcessingPatient DischargePattern Recognition, AutomatedPharmaceutical PreparationsReproducibility of ResultsSemanticsSoftware DesignSupport Vector MachineConceptsConditional Random FieldsNatural language processingClinical natural language processingSupport vector machineBest F-scoreEnsemble classifierF-scoreClinical textIndividual classifiersVoting methodMajority votingLocal support vector machineSupervised machine learning methodsClinical entity recognitionClinical NLP systemsDifferent voting strategiesEntity recognition systemRule-based systemEnsemble of classifiersMachine learning methodsRule-based methodI2b2 NLP challengeEntity recognitionRecognition systemNLP systems