2023
A hierarchical strategy to minimize privacy risk when linking “De-identified” data in biomedical research consortia
Ohno-Machado L, Jiang X, Kuo T, Tao S, Chen L, Ram P, Zhang G, Xu H. A hierarchical strategy to minimize privacy risk when linking “De-identified” data in biomedical research consortia. Journal Of Biomedical Informatics 2023, 139: 104322. PMID: 36806328, PMCID: PMC10975485, DOI: 10.1016/j.jbi.2023.104322.Peer-Reviewed Original ResearchConceptsPrivacy of individualsAppropriate privacy protectionData-driven modelsPrivacy protectionPrivacy risksData Coordination CenterData hubData repositoryHierarchical strategyPrivacyBiomedical discoveryData setsRecord linkageData Coordinating CenterRepositoryComplex strategiesCoordination centerTechnologyTechniqueDataPartiesSetHierarchy
2020
Efficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the Corpus
Li Y, Luo Y, Wampfler J, Rubinstein S, Tiryaki F, Ashok K, Warner J, Xu H, Yang P. Efficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the Corpus. JCO Clinical Cancer Informatics 2020, 4: cci.19.00147. PMID: 32364754, PMCID: PMC7265793, DOI: 10.1200/cci.19.00147.Peer-Reviewed Original ResearchConceptsNatural language processing toolsElectronic health recordsLanguage processing toolsGold standard dataUnstructured electronic health recordsProcessing toolsAmount of dataClinical notesStandard dataMayo Clinic electronic health recordsClinic's electronic health recordEnvironment toolsAccurate annotationHealth recordsInformatics toolsEffective analysisData setsTextual sourcesCorpusToolInformationData extractionSetExtractingAnnotation
2018
A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set
Rasmy L, Wu Y, Wang N, Geng X, Zheng W, Wang F, Wu H, Xu H, Zhi D. A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set. Journal Of Biomedical Informatics 2018, 84: 11-16. PMID: 29908902, PMCID: PMC6076336, DOI: 10.1016/j.jbi.2018.06.011.Peer-Reviewed Original ResearchConceptsRecurrent neural networkOnset riskCapability of RNNCerner Health FactsHeterogeneous EHR dataHeart failure patientsData setsElectronic health record dataDeep learning modelsDifferent patient populationsNeural network-based predictive modelDifferent patient groupsHealth record dataEHR data setsPredictive modelingSmall data setsFailure patientsPatient groupPatient populationReduction of AUCNeural networkRNN modelRETAIN modelHealth FactsHospital
2017
Search Datasets in Literature: A Case Study of GWAS.
Dong X, Zhang Y, Xu H. Search Datasets in Literature: A Case Study of GWAS. AMIA Joint Summits On Translational Science Proceedings 2017, 2017: 40-49. PMID: 28815103, PMCID: PMC5543360.Peer-Reviewed Original ResearchRecognition systemMEDLINE abstractsDataset search enginePattern-based rulesText mining methodsData setsUnderlying data setSearch datasetsData discoverabilityUse casesSearch enginesDataset attributesMining methodsF-measureDomain dictionaryScalable approachHybrid approachDatasetFinderRetrieving literatureDiscoverabilityUltimate goalCase studySetScientific publicationsFinding useful data across multiple biomedical data repositories using DataMed
Ohno-Machado L, Sansone S, Alter G, Fore I, Grethe J, Xu H, Gonzalez-Beltran A, Rocca-Serra P, Gururaj A, Bell E, Soysal E, Zong N, Kim H. Finding useful data across multiple biomedical data repositories using DataMed. Nature Genetics 2017, 49: 816-819. PMID: 28546571, PMCID: PMC6460922, DOI: 10.1038/ng.3864.Peer-Reviewed Original ResearchConceptsBiomedical data repositoriesHealth big dataData setsKnowledge discoveryBig dataMultiple repositoriesSearch enginesData indexFAIR principlesDataMedData repositoryService providersKnowledge initiativesKnowledge expertsBiomedical research communityResearch communityRepositoryScience landscapeUseful dataInteroperabilityMetadataFindabilitySetEngineData
2010
Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine.
Doan S, Xu H. Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine. Proceedings - International Conference On Computational Linguistics 2010, 2010: 259-266. PMID: 26848286, PMCID: PMC4736747.Peer-Reviewed Original ResearchSupport vector machineHospital discharge summariesConditional Random FieldsDischarge summariesMedication namesRelated entitiesClinical textVector machineType of medicationNamed Entity Recognition (NER) taskEntity recognition taskRule-based systemBest F-scoreI2b2 NLP challengeTypes of featuresF-scoreI2b2 challengeNLP challengeNER systemSemantic featuresRecognition taskMachineData setsRandom fieldsBetter performanceAn automated approach to calculating the daily dose of tacrolimus in electronic health records.
Xu H, Doan S, Birdwell K, Cowan J, Vincz A, Haas D, Basford M, Denny J. An automated approach to calculating the daily dose of tacrolimus in electronic health records. AMIA Joint Summits On Translational Science Proceedings 2010, 2010: 71-5. PMID: 21347153, PMCID: PMC3041548.Peer-Reviewed Original ResearchElectronic health recordsUnstructured clinical dataNatural language processingHealth recordsTime-consuming taskUnstructured formatClinical textLanguage processingAutomated ApproachDaily doseData setsTest casesDetailed drug informationDrug mentionsDaily dosesClinical dataMedication informationClinical researchMedication namesDrug informationInformationTacrolimusMedicationsDoseTask