2024
Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study
Yang R, Zeng Q, You K, Qiao Y, Huang L, Hsieh C, Rosand B, Goldwasser J, Dave A, Keenan T, Ke Y, Hong C, Liu N, Chew E, Radev D, Lu Z, Xu H, Chen Q, Li I. Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study. Journal Of Medical Internet Research 2024, 26: e60601. PMID: 39361955, PMCID: PMC11487205, DOI: 10.2196/60601.Peer-Reviewed Original ResearchConceptsNatural language processingNatural language processing toolkitQuestion-answering taskLanguage modelText generationText processingDomain-specific language modelsNatural language processing functionsMinimal programming expertiseText generation tasksMedical knowledge graphMachine translation tasksROUGE-L scoreDomain-specific challengesAll-in-one solutionROUGE-LText summarizationBLEU scoreKnowledge graphMachine translationUnstructured textQuestion-answeringHugging FaceProcessing toolkitLanguage processingExtracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition
Zuo X, Kumar A, Shen S, Li J, Cong G, Jin E, Chen Q, Warner J, Yang P, Xu H. Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition. JCO Clinical Cancer Informatics 2024, 8: e2300166. PMID: 38885475, DOI: 10.1200/cci.23.00166.Peer-Reviewed Original ResearchConceptsNatural language processingDomain-specific language modelsNatural language processing systemsInformation extraction systemRule-based moduleNarrative clinical textsNLP tasksEntity recognitionText normalizationAssertion classificationLanguage modelInformation extractionClinical textElectronic health recordsLearning-basedClinical notesLanguage processingTest setSystem performanceHealth recordsResponse extractionTime-consumingAnticancer therapyInformationAssessment information
2022
DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models
Luo C, Islam M, Sheils N, Buresh J, Reps J, Schuemie M, Ryan P, Edmondson M, Duan R, Tong J, Marks-Anglin A, Bian J, Chen Z, Duarte-Salles T, Fernández-Bertolín S, Falconer T, Kim C, Park R, Pfohl S, Shah N, Williams A, Xu H, Zhou Y, Lautenbach E, Doshi J, Werner R, Asch D, Chen Y. DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models. Nature Communications 2022, 13: 1678. PMID: 35354802, PMCID: PMC8967932, DOI: 10.1038/s41467-022-29160-4.Peer-Reviewed Original Research
2021
Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition
Li J, Zhou Y, Jiang X, Natarajan K, Pakhomov S, Liu H, Xu H. Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition. Journal Of The American Medical Informatics Association 2021, 28: 2193-2201. PMID: 34272955, PMCID: PMC8449609, DOI: 10.1093/jamia/ocab112.Peer-Reviewed Original ResearchThe application of artificial intelligence and data integration in COVID-19 studies: a scoping review
Guo Y, Zhang Y, Lyu T, Prosperi M, Wang F, Xu H, Bian J. The application of artificial intelligence and data integration in COVID-19 studies: a scoping review. Journal Of The American Medical Informatics Association 2021, 28: 2050-2067. PMID: 34151987, PMCID: PMC8344463, DOI: 10.1093/jamia/ocab098.Peer-Reviewed Original ResearchConceptsAI applicationsArtificial intelligenceData integrationHeterogeneous dataSocial media data analysisMost AI applicationsHeterogeneous data sourcesMedia data analysisProteomics data analysisAI algorithmsAI frameworkElectronic health recordsHeterogenous dataBiased algorithmsHealth recordsCOVID-19 researchData analysisSingle-source approachResearch topicData sourcesResearch areaIntelligenceSurveillance systemDifferent sourcesAlgorithmPrivacy-protecting, reliable response data discovery using COVID-19 patient observations
Kim J, Neumann L, Paul P, Day M, Aratow M, Bell D, Doctor J, Hinske L, Jiang X, Kim K, Matheny M, Meeker D, Pletcher M, Schilling L, SooHoo S, Xu H, Zheng K, Ohno-Machado L, Anderson D, Anderson N, Balacha C, Bath T, Baxter S, Becker-Pennrich A, Bernstam E, Carter W, Chau N, Choi Y, Covington S, DuVall S, El-Kareh R, Florian R, Follett R, Geisler B, Ghigi A, Gottlieb A, Hu Z, Ir D, Knight T, Koola J, Kuo T, Lee N, Mansmann U, Mou Z, Murphy R, Neumann L, Nguyen N, Niedermayer S, Park E, Perkins A, Post K, Rieder C, Scherer C, Soares A, Soysal E, Tep B, Toy B, Wang B, Wu Z, Zhou Y, Zucker R. Privacy-protecting, reliable response data discovery using COVID-19 patient observations. Journal Of The American Medical Informatics Association 2021, 28: 1765-1776. PMID: 34051088, PMCID: PMC8194878, DOI: 10.1093/jamia/ocab054.Peer-Reviewed Original Research
2020
Learning from local to global: An efficient distributed algorithm for modeling time-to-event data
Duan R, Luo C, Schuemie M, Tong J, Liang C, Chang H, Boland M, Bian J, Xu H, Holmes J, Forrest C, Morton S, Berlin J, Moore J, Mahoney K, Chen Y. Learning from local to global: An efficient distributed algorithm for modeling time-to-event data. Journal Of The American Medical Informatics Association 2020, 27: 1028-1036. PMID: 32626900, PMCID: PMC7647322, DOI: 10.1093/jamia/ocaa044.Peer-Reviewed Original Research
2019
Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm
Duan R, Boland M, Liu Z, Liu Y, Chang H, Xu H, Chu H, Schmid C, Forrest C, Holmes J, Schuemie M, Berlin J, Moore J, Chen Y. Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm. Journal Of The American Medical Informatics Association 2019, 27: 376-385. PMID: 31816040, PMCID: PMC7025371, DOI: 10.1093/jamia/ocz199.Peer-Reviewed Original ResearchEditorial: The second international workshop on health natural language processing (HealthNLP 2019)
Wang Y, Xu H, Uzuner O. Editorial: The second international workshop on health natural language processing (HealthNLP 2019). BMC Medical Informatics And Decision Making 2019, 19: 233. PMID: 31801516, PMCID: PMC6894102, DOI: 10.1186/s12911-019-0930-9.Peer-Reviewed Original ResearchCost-aware active learning for named entity recognition in clinical text
Wei Q, Chen Y, Salimi M, Denny J, Mei Q, Lasko T, Chen Q, Wu S, Franklin A, Cohen T, Xu H. Cost-aware active learning for named entity recognition in clinical text. Journal Of The American Medical Informatics Association 2019, 26: 1314-1322. PMID: 31294792, PMCID: PMC6798575, DOI: 10.1093/jamia/ocz102.Peer-Reviewed Original ResearchConceptsAnnotation costUser studyActive learningAL methodsAL algorithmCost-CAUSEReal-world environmentsAnnotation taskAnnotation timeAnnotation accuracyEntity recognitionClinical textAnnotation dataPassive learningInformative examplesCurve scoreMost approachesSimulation areaUsersSyntactic featuresLearningCost measuresAlgorithmCostAnnotationA study of deep learning approaches for medication and adverse drug event extraction from clinical text
Wei Q, Ji Z, Li Z, Du J, Wang J, Xu J, Xiang Y, Tiryaki F, Wu S, Zhang Y, Tao C, Xu H. A study of deep learning approaches for medication and adverse drug event extraction from clinical text. Journal Of The American Medical Informatics Association 2019, 27: 13-21. PMID: 31135882, PMCID: PMC6913210, DOI: 10.1093/jamia/ocz063.Peer-Reviewed Original ResearchConceptsDeep learning-based approachDeep learning approachLearning-based approachTraditional machineLearning approachNational NLP Clinical ChallengesAdverse drug event extractionOutperform traditional machineDifferent ensemble approachesConditional Random FieldsSequence labeling approachMIMIC-III databaseEvent extractionMedical domainEntity recognitionClassification componentF1 scoreClinical textRelation extractionClinical documentsVector machineEnd evaluationEnsemble approachClinical corpusMachineTime-sensitive clinical concept embeddings learned from large electronic health records
Xiang Y, Xu J, Si Y, Li Z, Rasmy L, Zhou Y, Tiryaki F, Li F, Zhang Y, Wu Y, Jiang X, Zheng W, Zhi D, Tao C, Xu H. Time-sensitive clinical concept embeddings learned from large electronic health records. BMC Medical Informatics And Decision Making 2019, 19: 58. PMID: 30961579, PMCID: PMC6454598, DOI: 10.1186/s12911-019-0766-3.Peer-Reviewed Original ResearchConceptsConcept similarity measurePositive pointwise mutual informationConcept embeddingsSimilarity measurePredictive modeling tasksLarge electronic health recordTime-sensitive informationPointwise mutual informationImportant research areaDeep learningElectronic health recordsMedical domainLarge electronic health record databaseWord2vec embeddingsTemporal dependenciesLearning methodsFastText algorithmModeling tasksResultsOur experimentsExtrinsic evaluationIntrinsic evaluationMutual informationHealth recordsDistributional representationsEmbedding
2018
Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches
Miao S, Xu T, Wu Y, Xie H, Wang J, Jing S, Zhang Y, Zhang X, Yang Y, Zhang X, Shan T, Wang L, Xu H, Wang S, Liu Y. Extraction of BI-RADS findings from breast ultrasound reports in Chinese using deep learning approaches. International Journal Of Medical Informatics 2018, 119: 17-21. PMID: 30342682, DOI: 10.1016/j.ijmedinf.2018.08.009.Peer-Reviewed Original ResearchConceptsLearning-based methodsBreast ultrasound reportsElectronic health record systemsTraditional machine learning-based methodsDeep learning-based approachDeep learning-based methodsNatural language processing methodsMachine learning-based methodsDeep learning technologyConditional random field algorithmDeep learning approachLanguage processing methodsLearning-based approachUltrasound reportsBreast cancer researchRule-based methodHealth record systemsBreast radiology reportsLearning technologyNLP approachLearning approachField algorithmDetailed clinical informationWide adoptionRecord systemExtracting psychiatric stressors for suicide from social media using deep learning
Du J, Zhang Y, Luo J, Jia Y, Wei Q, Tao C, Xu H. Extracting psychiatric stressors for suicide from social media using deep learning. BMC Medical Informatics And Decision Making 2018, 18: 43. PMID: 30066665, PMCID: PMC6069295, DOI: 10.1186/s12911-018-0632-8.Peer-Reviewed Original ResearchConceptsConvolutional neural networkRecurrent neural networkDeep learningConditional Random FieldsSupport vector machineSuicide-related tweetsClinical textNeural networkPsychiatric stressorsExtra TreesBinary classifierTransfer learning strategiesEntity recognition taskSocial mediaExact matchTraditional machineAnnotation costLearning strategiesRecognition problemSharing flowInexact matchVector machineTwitter dataRecognition taskTwitterA study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set
Rasmy L, Wu Y, Wang N, Geng X, Zheng W, Wang F, Wu H, Xu H, Zhi D. A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set. Journal Of Biomedical Informatics 2018, 84: 11-16. PMID: 29908902, PMCID: PMC6076336, DOI: 10.1016/j.jbi.2018.06.011.Peer-Reviewed Original ResearchConceptsRecurrent neural networkOnset riskCapability of RNNCerner Health FactsHeterogeneous EHR dataHeart failure patientsData setsElectronic health record dataDeep learning modelsDifferent patient populationsNeural network-based predictive modelDifferent patient groupsHealth record dataEHR data setsPredictive modelingSmall data setsFailure patientsPatient groupPatient populationReduction of AUCNeural networkRNN modelRETAIN modelHealth FactsHospitalLeveraging existing corpora for de-identification of psychiatric notes using domain adaptation.
Lee H, Zhang Y, Roberts K, Xu H. Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation. AMIA Annual Symposium Proceedings 2018, 2017: 1070-1079. PMID: 29854175, PMCID: PMC5977650.Peer-Reviewed Original ResearchInteractive medical word sense disambiguation through informed learning
Wang Y, Zheng K, Xu H, Mei Q. Interactive medical word sense disambiguation through informed learning. Journal Of The American Medical Informatics Association 2018, 25: 800-808. PMID: 29584896, PMCID: PMC6658868, DOI: 10.1093/jamia/ocy013.Peer-Reviewed Original Research
2017
CNN-based ranking for biomedical entity normalization
Li H, Chen Q, Tang B, Wang X, Xu H, Wang B, Huang D. CNN-based ranking for biomedical entity normalization. BMC Bioinformatics 2017, 18: 385. PMID: 28984180, PMCID: PMC5629610, DOI: 10.1186/s12859-017-1805-7.Peer-Reviewed Original ResearchConceptsBiomedical entity normalizationEntity normalizationSemantic informationCNN architectureNovel convolutional neural network architectureConvolutional neural network architectureTraditional rule-based methodsNeural network architectureRule-based systemRanking methodRule-based methodNetwork architectureBiomedical entitiesBenchmark datasetsArt performanceEntity mentionsRanking problemCNNNormalization systemArchitectureMorphological informationComparison resultsInformationDatasetSystemAccurate Identification of Fatty Liver Disease in Data Warehouse Utilizing Natural Language Processing
Redman J, Natarajan Y, Hou J, Wang J, Hanif M, Feng H, Kramer J, Desiderio R, Xu H, El-Serag H, Kanwal F. Accurate Identification of Fatty Liver Disease in Data Warehouse Utilizing Natural Language Processing. Digestive Diseases And Sciences 2017, 62: 2713-2718. PMID: 28861720, DOI: 10.1007/s10620-017-4721-9.Peer-Reviewed Original ResearchConceptsData warehouseFatty liver diseaseLanguage processingNatural language processingLiver diseaseF-measureAlgorithm developmentVeterans Affairs Corporate Data WarehouseMagnetic resonance imaging reportsOutcomes of patientsAlgorithmExpert radiologistsValidation methodElectronic medical recordsCorporate Data WarehouseWarehouseAbdominal ultrasoundManual reviewHepatic steatosisMedical recordsRandom national sampleClinical studiesLarge cohortComputerized tomographyImaging reportsPsychiatric symptom recognition without labeled data using distributional representations of phrases and on-line knowledge
Zhang Y, Zhang O, Wu Y, Lee H, Xu J, Xu H, Roberts K. Psychiatric symptom recognition without labeled data using distributional representations of phrases and on-line knowledge. Journal Of Biomedical Informatics 2017, 75: s129-s137. PMID: 28624644, PMCID: PMC5705397, DOI: 10.1016/j.jbi.2017.06.014.Peer-Reviewed Original ResearchConceptsPsychiatric symptomsCandidate symptomMental disordersPsychiatric notesList of symptomsMayo ClinicClinical dataSymptom recognitionPatient experiencePersonalized preventionSymptomsAmerican Psychiatric AssociationAbstractTextClinical conceptsPsychiatric AssociationPhenotypic classificationDisordersClinical textMIMIC-IIHealthcare knowledgeConclusionSubjective descriptionsClinicDiseaseDiagnosis