2024
Leveraging error-prone algorithm-derived phenotypes: Enhancing association studies for risk factors in EHR data
Lu Y, Tong J, Chubak J, Lumley T, Hubbard R, Xu H, Chen Y. Leveraging error-prone algorithm-derived phenotypes: Enhancing association studies for risk factors in EHR data. Journal Of Biomedical Informatics 2024, 157: 104690. PMID: 39004110, DOI: 10.1016/j.jbi.2024.104690.Peer-Reviewed Original ResearchElectronic health recordsElectronic health record dataKaiser Permanente WashingtonEHR-derived phenotypesAssociation studiesHealth recordsColon cancer recurrencePhenotyping errorsComputable phenotypeRisk factorsCancer recurrenceMultiple phenotypesReduce biasImprove estimation accuracySimulation studyBias reductionKaiserReduction of biasBiasEstimation accuracyAssociationStudyOutcomesRiskEstimation efficiencyDevelop and validate a computable phenotype for the identification of Alzheimer's disease patients using electronic health record data
He X, Wei R, Huang Y, Chen Z, Lyu T, Bost S, Tong J, Li L, Zhou Y, Li Z, Guo J, Tang H, Wang F, DeKosky S, Xu H, Chen Y, Zhang R, Xu J, Guo Y, Wu Y, Bian J. Develop and validate a computable phenotype for the identification of Alzheimer's disease patients using electronic health record data. Alzheimer's & Dementia Diagnosis Assessment & Disease Monitoring 2024, 16: e12613. PMID: 38966622, PMCID: PMC11220631, DOI: 10.1002/dad2.12613.Peer-Reviewed Original ResearchElectronic health record dataElectronic health recordsComputable phenotypeHealth record dataManual chart reviewHealth recordsAlzheimer's diseaseDiagnosis codesRecord dataChart reviewUTHealthAlzheimer's disease patientsUniversity of MinnesotaAD diagnosisAD identificationDisease patientsPatientsAlzheimerAD patientsDemographicsDiagnosisDiseaseCodeDataUniversityExtracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition
Zuo X, Kumar A, Shen S, Li J, Cong G, Jin E, Chen Q, Warner J, Yang P, Xu H. Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition. JCO Clinical Cancer Informatics 2024, 8: e2300166. PMID: 38885475, DOI: 10.1200/cci.23.00166.Peer-Reviewed Original ResearchConceptsNatural language processingDomain-specific language modelsNatural language processing systemsInformation extraction systemRule-based moduleNarrative clinical textsNLP tasksEntity recognitionText normalizationAssertion classificationLanguage modelInformation extractionClinical textElectronic health recordsLearning-basedClinical notesLanguage processingTest setSystem performanceHealth recordsResponse extractionTime-consumingAnticancer therapyInformationAssessment informationKamino: A Scalable Architecture to Support Medical AI Research Using Large Real World Data
Lin F, Young P, He H, Huang J, Gagne R, Rice D, Price N, Byron W, Hu Y, Felker D, Button W, Meeker D, Hsiao A, Xu H, Torre C, Schulz W. Kamino: A Scalable Architecture to Support Medical AI Research Using Large Real World Data. 2024, 00: 500-504. DOI: 10.1109/ichi61247.2024.00072.Peer-Reviewed Original ResearchElectronic health recordsAI researchNatural language processing tasksElectronic health record dataLanguage processing tasksComputing resource managementLarge-scale data retrievalMedical AI researchLeveraging electronic health recordsStandard data modelKubernetes orchestratorScalable architectureProcessing tasksResource allocation systemsSecurity considerationsAccess managementData retrievalData modelArchitectural solutionsOMOP CDMReal World DataWorld DataHealth recordsOMOPDataDeveloping deep learning-based strategies to predict the risk of hepatocellular carcinoma among patients with nonalcoholic fatty liver disease from electronic health records
Li Z, Lan L, Zhou Y, Li R, Chavin K, Xu H, Li L, Shih D, Zheng W. Developing deep learning-based strategies to predict the risk of hepatocellular carcinoma among patients with nonalcoholic fatty liver disease from electronic health records. Journal Of Biomedical Informatics 2024, 152: 104626. PMID: 38521180, DOI: 10.1016/j.jbi.2024.104626.Peer-Reviewed Original ResearchDeep learning modelsElectronic health recordsHCC risk predictionHealth recordsTime-varying covariatesLearning modelsElectronic health record dataRisk predictionHealth record dataAccuracy of deep learning modelsDeep learning-based strategyCovariate imbalanceDisease prediction tasksLearning-based strategyDeep learning performanceDisease risk predictionEHR databaseClassification problemLength of follow-upTransfer learningFatty liver diseasePrediction taskCarcinoma riskModel trainingRecord dataMapping Clinical Documents to the Logical Observation Identifiers, Names and Codes (LOINC) Document Ontology using Electronic Health Record Systems Structured Metadata.
Khan H, Mosa A, Paka V, Rana M, Mandhadi V, Islam S, Xu H, McClay J, Sarker S, Rao P, Waitman L. Mapping Clinical Documents to the Logical Observation Identifiers, Names and Codes (LOINC) Document Ontology using Electronic Health Record Systems Structured Metadata. AMIA Annual Symposium Proceedings 2024, 2023: 1017-1026. PMID: 38222329, PMCID: PMC10785913.Peer-Reviewed Original ResearchConceptsDocument ontologyElectronic health recordsBag-of-words approachNatural language processing techniquesFree-text documentsLanguage processing techniquesClinical documentationLogical Observation IdentifiersText documentsStructured metadataWords approachComputational scalabilityMetadataHealth recordsEHR documentationElectronic health record fieldsProcessing techniquesOntologyDocumentsAutomated pipelineNLPScalabilityClinical careFrameworkLOINCStandardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach.
Zuo X, Zhou Y, Duke J, Hripcsak G, Shah N, Banda J, Reeves R, Miller T, Waitman L, Natarajan K, Xu H. Standardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach. AMIA Annual Symposium Proceedings 2024, 2023: 834-843. PMID: 38222429, PMCID: PMC10785935.Peer-Reviewed Original Research
2023
FinFax: Fast Interpretation of Fax with NLP
Anjum O, Chen L, Denlinger R, Anam E, Dongsheng Y, Wooldridge C, Citardi M, Zhang J, Xu H, Jiang X. FinFax: Fast Interpretation of Fax with NLP. 2023, 1-2. DOI: 10.1145/3584371.3613019.Peer-Reviewed Original ResearchEnd systemCritical medical informationElectronic health record systemsHealth record systemsReal-life applicationsVital clinical dataFirst endManual processingInformation exchangeHealthcare organizationsOverall workflowHealth recordsMedical informationRecord systemWorkflowFinal outputFast interpretationFaxAcademic environmentInformationPertinent informationMultiple solutionsProcessingReal-life hospital settingsNLPSuicide Tendency Prediction from Psychiatric Notes Using Transformer Models
Li Z, Ameer I, Hu Y, Abdelhameed A, Tao C, Selek S, Xu H. Suicide Tendency Prediction from Psychiatric Notes Using Transformer Models. 2023, 00: 481-483. DOI: 10.1109/ichi57859.2023.00074.Peer-Reviewed Original ResearchWeighted F1 scoreF1 scoreMachine learning modelsElectronic health recordsLearning modelsState-of-the-art modelsState-of-the-artBinary classification taskHealth recordsBinary classification modelStandard diagnosis codesClassification taskMulticlass classificationHealth informaticsClassification modelMental health informaticsTransformation modelPrediction algorithmPsychiatric notesInitial psychiatric evaluationSuicidal tendenciesMachineRandom forest modelSuicidal ideationPerformanceRepresenting and utilizing clinical textual data for real world studies: An OHDSI approach
Keloth V, Banda J, Gurley M, Heider P, Kennedy G, Liu H, Liu F, Miller T, Natarajan K, V Patterson O, Peng Y, Raja K, Reeves R, Rouhizadeh M, Shi J, Wang X, Wang Y, Wei W, Williams A, Zhang R, Belenkaya R, Reich C, Blacketer C, Ryan P, Hripcsak G, Elhadad N, Xu H. Representing and utilizing clinical textual data for real world studies: An OHDSI approach. Journal Of Biomedical Informatics 2023, 142: 104343. PMID: 36935011, PMCID: PMC10428170, DOI: 10.1016/j.jbi.2023.104343.Peer-Reviewed Original ResearchConceptsNatural language processingCommon data modelTextual dataNLP solutionObservational Health Data SciencesOMOP Common Data ModelSpecific use casesObservational Medical Outcomes Partnership Common Data ModelHealth Data SciencesRepresentation of informationUse casesElectronic health recordsReal-world evidence generationData scienceClinical textData modelClinical notesLanguage processingHealth recordsLoad dataClinical documentationCurrent applicationsInformationWorkflowEvidence generation
2022
Assess the documentation of cognitive tests and biomarkers in electronic health records via natural language processing for Alzheimer’s disease and related dementias
Chen Z, Zhang H, Yang X, Wu S, He X, Xu J, Guo J, Prosperi M, Wang F, Xu H, Chen Y, Hu H, DeKosky S, Farrer M, Guo Y, Wu Y, Bian J. Assess the documentation of cognitive tests and biomarkers in electronic health records via natural language processing for Alzheimer’s disease and related dementias. International Journal Of Medical Informatics 2022, 170: 104973. PMID: 36577203, PMCID: PMC11325083, DOI: 10.1016/j.ijmedinf.2022.104973.Peer-Reviewed Original ResearchConceptsElectronic health recordsPatients' electronic health recordsCognitive testsCognitive test scoresFlorida health systemSeverity categoriesHealth recordsAD-related dementiaAD/ADRD researchAD/ADRDPatient levelAlzheimer's diseaseClinical narrativesHealth systemBiomarkersDifferent severityDiseaseSeverityPatientsADRD researchStandardized approachDementiaTest scoresPopulation characteristicsScoresClinicalLayoutLM: A Pre-trained Multi-modal Model for Understanding Scanned Document in Electronic Health Records
Wei Q, Zuo X, Anjum O, Hu Y, Denlinger R, Bernstam E, Citardi M, Xu H. ClinicalLayoutLM: A Pre-trained Multi-modal Model for Understanding Scanned Document in Electronic Health Records. 2022, 00: 2821-2827. DOI: 10.1109/bigdata55660.2022.10020569.Peer-Reviewed Original ResearchOptical character recognitionMulti-modal modelElectronic health recordsClinical documentsNatural language processing tasksInformation extraction technologyPre-trained modelsHealth recordsLanguage processing tasksInformation extractionImage informationF1 scoreCharacter recognitionLayout analysisProcessing tasksMulti-modal approachClinical corpusBaseline modelDocumentsOpen domainTaskExtraction technologyClinical operationsDifferent categoriesTextNatural Language Processing
Xu H, Roberts K. Natural Language Processing. Cognitive Informatics In Biomedicine And Healthcare 2022, 213-234. DOI: 10.1007/978-3-031-09108-7_7.Peer-Reviewed Original ResearchNatural language processingLanguage processingElectronic health recordsBiomedical domainBiomedical natural language processingCommon NLP tasksNarrative textNLP tasksBiomedical articlesClinical documentsNLP fieldTextHealth recordsLarge amountBasic conceptsBibliographic databasesProcessingTaskArticleDocumentsDomainChapterDatabaseInformationAttentionAssessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz I, Wang N, Yang P, Xu H, Warner J, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clinical Cancer Informatics 2022, 6: e2200006. PMID: 35917480, PMCID: PMC9470142, DOI: 10.1200/cci.22.00006.Peer-Reviewed Original ResearchEvaluation of mCODE Coverage in EHR: a Scoping Review of Cancer Natural Language Processing
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz I, Wang N, Yang P, Xu H, Warner J, Liu H. Evaluation of mCODE Coverage in EHR: a Scoping Review of Cancer Natural Language Processing. 2022, 00: 517-518. DOI: 10.1109/ichi54592.2022.00094.Peer-Reviewed Original Research
2021
The application of artificial intelligence and data integration in COVID-19 studies: a scoping review
Guo Y, Zhang Y, Lyu T, Prosperi M, Wang F, Xu H, Bian J. The application of artificial intelligence and data integration in COVID-19 studies: a scoping review. Journal Of The American Medical Informatics Association 2021, 28: 2050-2067. PMID: 34151987, PMCID: PMC8344463, DOI: 10.1093/jamia/ocab098.Peer-Reviewed Original ResearchConceptsAI applicationsArtificial intelligenceData integrationHeterogeneous dataSocial media data analysisMost AI applicationsHeterogeneous data sourcesMedia data analysisProteomics data analysisAI algorithmsAI frameworkElectronic health recordsHeterogenous dataBiased algorithmsHealth recordsCOVID-19 researchData analysisSingle-source approachResearch topicData sourcesResearch areaIntelligenceSurveillance systemDifferent sourcesAlgorithmCOVID-19 SignSym: a fast adaptation of a general clinical NLP tool to identify and normalize COVID-19 signs and symptoms to OMOP common data model
Wang J, Abu-El-Rub N, Gray J, Pham H, Zhou Y, Manion F, Liu M, Song X, Xu H, Rouhizadeh M, Zhang Y. COVID-19 SignSym: a fast adaptation of a general clinical NLP tool to identify and normalize COVID-19 signs and symptoms to OMOP common data model. Journal Of The American Medical Informatics Association 2021, 28: 1275-1283. PMID: 33674830, PMCID: PMC7989301, DOI: 10.1093/jamia/ocab015.Peer-Reviewed Original ResearchConceptsNatural language processing toolsCommon data modelLanguage processing toolsElectronic health recordsClinical natural language processing toolsData modelDeep learning-based modelProcessing toolsOMOP Common Data ModelPattern-based rulesObservational Medical Outcomes Partnership Common Data ModelLearning-based modelsSpecific information needsUse casesNLP toolsClinical textFree textExtensive evaluationDownloadable packageInformation needsHybrid approachResearch communityHealth recordsData sourcesHigh performance
2020
COVID-19 TestNorm: A tool to normalize COVID-19 testing names to LOINC codes
Dong X, Li J, Soysal E, Bian J, DuVall S, Hanchrow E, Liu H, Lynch K, Matheny M, Natarajan K, Ohno-Machado L, Pakhomov S, Reeves R, Sitapati A, Abhyankar S, Cullen T, Deckard J, Jiang X, Murphy R, Xu H. COVID-19 TestNorm: A tool to normalize COVID-19 testing names to LOINC codes. Journal Of The American Medical Informatics Association 2020, 27: 1437-1442. PMID: 32569358, PMCID: PMC7337837, DOI: 10.1093/jamia/ocaa145.Peer-Reviewed Original ResearchConceptsElectronic health recordsLOINC codesSecondary useRule-based toolOnline web applicationOpen-source packageCritical data elementsWeb applicationData networksEnd usersData elementsIndependent test setHealth recordsTest setKey challengesData normalizationCritical resourcesTest namesRoutine clinical practice dataCodeClinical practice dataCoronavirus disease 2019COVID-19 diagnostic testsToolDevelopersEfficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the Corpus
Li Y, Luo Y, Wampfler J, Rubinstein S, Tiryaki F, Ashok K, Warner J, Xu H, Yang P. Efficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the Corpus. JCO Clinical Cancer Informatics 2020, 4: cci.19.00147. PMID: 32364754, PMCID: PMC7265793, DOI: 10.1200/cci.19.00147.Peer-Reviewed Original ResearchConceptsNatural language processing toolsElectronic health recordsLanguage processing toolsGold standard dataUnstructured electronic health recordsProcessing toolsAmount of dataClinical notesStandard dataMayo Clinic electronic health recordsClinic's electronic health recordEnvironment toolsAccurate annotationHealth recordsInformatics toolsEffective analysisData setsTextual sourcesCorpusToolInformationData extractionSetExtractingAnnotation
2019
Deep learning in clinical natural language processing: a methodical review
Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, Soni S, Wang Q, Wei Q, Xiang Y, Zhao B, Xu H. Deep learning in clinical natural language processing: a methodical review. Journal Of The American Medical Informatics Association 2019, 27: 457-470. PMID: 31794016, PMCID: PMC7025365, DOI: 10.1093/jamia/ocz200.Peer-Reviewed Original ResearchConceptsNatural language processingClinical natural language processingDeep learningLanguage processingComputing Machinery Digital LibraryInformation extraction tasksMedical informatics communityComputational Linguistics anthologyRecurrent neural networkDigital librariesText classificationElectronic health recordsExtraction taskEntity recognitionWord2vec embeddingsNeural networkRelation extractionNLP communityNLP researchInformatics communitySpecific tasksHealth recordsNLP problemLearningClinical domains