2022
DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models
Luo C, Islam M, Sheils N, Buresh J, Reps J, Schuemie M, Ryan P, Edmondson M, Duan R, Tong J, Marks-Anglin A, Bian J, Chen Z, Duarte-Salles T, Fernández-Bertolín S, Falconer T, Kim C, Park R, Pfohl S, Shah N, Williams A, Xu H, Zhou Y, Lautenbach E, Doshi J, Werner R, Asch D, Chen Y. DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models. Nature Communications 2022, 13: 1678. PMID: 35354802, PMCID: PMC8967932, DOI: 10.1038/s41467-022-29160-4.Peer-Reviewed Original Research
2021
Comprehensive Characterization of COVID-19 Patients with Repeatedly Positive SARS-CoV-2 Tests Using a Large U.S. Electronic Health Record Database
Dong X, Zhou Y, Shu X, Bernstam E, Stern R, Aronoff D, Xu H, Lipworth L. Comprehensive Characterization of COVID-19 Patients with Repeatedly Positive SARS-CoV-2 Tests Using a Large U.S. Electronic Health Record Database. Microbiology Spectrum 2021, 9: 10.1128/spectrum.00327-21. PMID: 34406805, PMCID: PMC8552669, DOI: 10.1128/spectrum.00327-21.Peer-Reviewed Original ResearchConceptsPositive SARS-CoV-2 testSARS-CoV-2 testSecond positive testElectronic health record databaseCases of reinfectionHealth record databasePositive testPositive SARS-CoV-2 PCR test resultsSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) testingSARS-CoV-2 PCR test resultsRecord databaseSevere acute respiratory syndrome coronavirus 2Intensive care unit admissionAcute respiratory syndrome coronavirus 2SARS-CoV-2 infectionRespiratory syndrome coronavirus 2Long-term health consequencesLarge electronic health record databasePotential long-term health consequencesCare unit admissionOverweight/obeseChronic medical conditionsPositive molecular testCOVID-19 patientsSyndrome coronavirus 2
2020
Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies
Rasmy L, Tiryaki F, Zhou Y, Xiang Y, Tao C, Xu H, Zhi D. Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies. Journal Of The American Medical Informatics Association 2020, 27: 1593-1599. PMID: 32930711, PMCID: PMC7647355, DOI: 10.1093/jamia/ocaa180.Peer-Reviewed Original ResearchMeSH KeywordsAgedDatabases, FactualElectronic Health RecordsFemaleHumansMaleMiddle AgedROC CurveUnified Medical Language SystemVocabulary, ControlledConceptsUnified Medical Language SystemRecurrent neural networkNeural networkPrediction performanceLogistic regressionPredictive modelingDeep learningData aggregationElectronic health record dataMachine learningRisk predictionBetter prediction performanceDengue hemorrhagic feverHealth record dataEHR dataCancer predictionLarge vocabularyDifferent tasksPredictive modelHeart failureDiabetes patientsPancreatic cancerClinical dataHemorrhagic feverICD-9
2019
Enhancing clinical concept extraction with contextual embeddings
Si Y, Wang J, Xu H, Roberts K. Enhancing clinical concept extraction with contextual embeddings. Journal Of The American Medical Informatics Association 2019, 26: 1297-1304. PMID: 31265066, PMCID: PMC6798561, DOI: 10.1093/jamia/ocz096.Peer-Reviewed Original ResearchConceptsClinical concept extractionContextual embeddingsNatural language processing tasksTraditional word embeddingsTraditional word representationsClinical NLP tasksLanguage processing tasksSemantic informationWord embedding methodsLarge language modelsArt performanceConcept extraction taskSemEval 2014Word representationsNLP tasksLanguage modelWord embeddingsProcessing tasksNeural network-based representationI2b2 2010Concept extractionTaskLarge clinical corpusClinical corpusNetwork-based representationTime-sensitive clinical concept embeddings learned from large electronic health records
Xiang Y, Xu J, Si Y, Li Z, Rasmy L, Zhou Y, Tiryaki F, Li F, Zhang Y, Wu Y, Jiang X, Zheng W, Zhi D, Tao C, Xu H. Time-sensitive clinical concept embeddings learned from large electronic health records. BMC Medical Informatics And Decision Making 2019, 19: 58. PMID: 30961579, PMCID: PMC6454598, DOI: 10.1186/s12911-019-0766-3.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsDatabases, FactualDeep LearningElectronic Health RecordsHumansInformation Storage and RetrievalTime FactorsConceptsConcept similarity measurePositive pointwise mutual informationConcept embeddingsSimilarity measurePredictive modeling tasksLarge electronic health recordTime-sensitive informationPointwise mutual informationImportant research areaDeep learningElectronic health recordsMedical domainLarge electronic health record databaseWord2vec embeddingsTemporal dependenciesLearning methodsFastText algorithmModeling tasksResultsOur experimentsExtrinsic evaluationIntrinsic evaluationMutual informationHealth recordsDistributional representationsEmbeddingOntological representation–oriented term normalization and standardization of the Research Domain Criteria
Li F, Rao G, Du J, Xiang Y, Zhang Y, Selek S, Hamilton J, Xu H, Tao C. Ontological representation–oriented term normalization and standardization of the Research Domain Criteria. Health Informatics Journal 2019, 26: 726-737. PMID: 30843449, PMCID: PMC7863676, DOI: 10.1177/1460458219832059.Peer-Reviewed Original Research
2018
Analysis of treatment pathways for three chronic diseases using OMOP CDM
Zhang X, Wang L, Miao S, Xu H, Yin Y, Zhu Y, Dai Z, Shan T, Jing S, Wang J, Zhang X, Huang Z, Wang Z, Guo J, Liu Y. Analysis of treatment pathways for three chronic diseases using OMOP CDM. Journal Of Medical Systems 2018, 42: 260. PMID: 30421323, PMCID: PMC6244882, DOI: 10.1007/s10916-018-1076-5.Peer-Reviewed Original ResearchMeSH KeywordsChinaChronic DiseaseCritical PathwaysDatabases, FactualElectronic Health RecordsHumansModels, TheoreticalObservationConceptsTreatment pathwaysChronic diseasesStudy of drugsClinical data repositoryClinical treatmentDifferent medical institutionsProportion of monotherapyFirst-line medicationMedical institutionsFirst Affiliated HospitalType 2 diabetesNanjing Medical UniversityDifferent treatment pathwaysMost patientsCommon medicationsAffiliated HospitalMedicationsNational guidelinesMedication informationLocal hospitalMedical UniversitySame diseaseDiseasePatientsNew drugsA study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set
Rasmy L, Wu Y, Wang N, Geng X, Zheng W, Wang F, Wu H, Xu H, Zhi D. A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set. Journal Of Biomedical Informatics 2018, 84: 11-16. PMID: 29908902, PMCID: PMC6076336, DOI: 10.1016/j.jbi.2018.06.011.Peer-Reviewed Original ResearchConceptsRecurrent neural networkOnset riskCapability of RNNCerner Health FactsHeterogeneous EHR dataHeart failure patientsData setsElectronic health record dataDeep learning modelsDifferent patient populationsNeural network-based predictive modelDifferent patient groupsHealth record dataEHR data setsPredictive modelingSmall data setsFailure patientsPatient groupPatient populationReduction of AUCNeural networkRNN modelRETAIN modelHealth FactsHospitalToward a normalized clinical drug knowledge base in China—applying the RxNorm model to Chinese clinical drugs
Wang L, Zhang Y, Jiang M, Wang J, Dong J, Liu Y, Tao C, Jiang G, Zhou Y, Xu H. Toward a normalized clinical drug knowledge base in China—applying the RxNorm model to Chinese clinical drugs. Journal Of The American Medical Informatics Association 2018, 25: 809-818. PMID: 29635469, PMCID: PMC7647010, DOI: 10.1093/jamia/ocy020.Peer-Reviewed Original ResearchConceptsChinese patent drugDrug knowledge basePatent drugsClinical drugsChemical drugsChinese drugsManual reviewChinese patent medicineElectronic health record systemsClinical dataChina's health insurance systemHealth record systemsDrug AdministrationHealth insurance systemDrug informationDrugsPatent medicineDrug namesRecord systemPharmacy system
2017
Accurate Identification of Fatty Liver Disease in Data Warehouse Utilizing Natural Language Processing
Redman J, Natarajan Y, Hou J, Wang J, Hanif M, Feng H, Kramer J, Desiderio R, Xu H, El-Serag H, Kanwal F. Accurate Identification of Fatty Liver Disease in Data Warehouse Utilizing Natural Language Processing. Digestive Diseases And Sciences 2017, 62: 2713-2718. PMID: 28861720, DOI: 10.1007/s10620-017-4721-9.Peer-Reviewed Original ResearchConceptsData warehouseFatty liver diseaseLanguage processingNatural language processingLiver diseaseF-measureAlgorithm developmentVeterans Affairs Corporate Data WarehouseMagnetic resonance imaging reportsOutcomes of patientsAlgorithmExpert radiologistsValidation methodElectronic medical recordsCorporate Data WarehouseWarehouseAbdominal ultrasoundManual reviewHepatic steatosisMedical recordsRandom national sampleClinical studiesLarge cohortComputerized tomographyImaging reportsRisk of angioedema associated with levetiracetam compared with phenytoin: Findings of the observational health data sciences and informatics research network
Duke J, Ryan P, Suchard M, Hripcsak G, Jin P, Reich C, Schwalm M, Khoma Y, Wu Y, Xu H, Shah N, Banda J, Schuemie M. Risk of angioedema associated with levetiracetam compared with phenytoin: Findings of the observational health data sciences and informatics research network. Epilepsia 2017, 58: e101-e106. PMID: 28681416, PMCID: PMC6632067, DOI: 10.1111/epi.13828.Peer-Reviewed Original ResearchMeSH KeywordsAngioedemaCommunity NetworksDatabases, FactualEpilepsyFemaleHumansLevetiracetamMalePhenytoinPiracetamConceptsAngioedema riskAngioedema eventsHazard ratioObservational Health Data SciencesNew-user cohort studySummary hazard ratioRisk of angioedemaHealth Data SciencesAdverse event reportsPhenytoin usersResearch NetworkPhenytoin groupCohort studyTreat analysisAntiepileptic drugsComparator groupSeizure patientsLower riskLevetiracetamAngioedemaFurther studiesEvent reportsSignificant increaseRiskPhenytoinFinding useful data across multiple biomedical data repositories using DataMed
Ohno-Machado L, Sansone S, Alter G, Fore I, Grethe J, Xu H, Gonzalez-Beltran A, Rocca-Serra P, Gururaj A, Bell E, Soysal E, Zong N, Kim H. Finding useful data across multiple biomedical data repositories using DataMed. Nature Genetics 2017, 49: 816-819. PMID: 28546571, PMCID: PMC6460922, DOI: 10.1038/ng.3864.Peer-Reviewed Original ResearchMeSH KeywordsBiological OntologiesBiomedical ResearchComputational BiologyDatabases, FactualHumansMetadataSoftwareSystems IntegrationConceptsBiomedical data repositoriesHealth big dataData setsKnowledge discoveryBig dataMultiple repositoriesSearch enginesData indexFAIR principlesDataMedData repositoryService providersKnowledge initiativesKnowledge expertsBiomedical research communityResearch communityRepositoryScience landscapeUseful dataInteroperabilityMetadataFindabilitySetEngineDataCATTLE (CAncer treatment treasury with linked evidence): An integrated knowledge base for personalized oncology research and practice
Soysal E, Lee H, Zhang Y, Huang L, Chen X, Wei Q, Zheng W, Chang J, Cohen T, Sun J, Xu H. CATTLE (CAncer treatment treasury with linked evidence): An integrated knowledge base for personalized oncology research and practice. CPT Pharmacometrics & Systems Pharmacology 2017, 6: 188-196. PMID: 28296354, PMCID: PMC5351410, DOI: 10.1002/psp4.12174.Peer-Reviewed Original ResearchA publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge
Cohen T, Roberts K, Gururaj A, Chen X, Pournejati S, Alter G, Hersh W, Demner-Fushman D, Ohno-Machado L, Xu H. A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge. Database 2017, 2017: bax061. PMID: 29220453, PMCID: PMC5737202, DOI: 10.1093/database/bax061.Peer-Reviewed Original Research
2016
Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov
Xu J, Lee H, Zeng J, Wu Y, Zhang Y, Huang L, Johnson A, Holla V, Bailey A, Cohen T, Meric-Bernstam F, Bernstam E, Xu H. Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov. Journal Of The American Medical Informatics Association 2016, 23: 750-757. PMID: 27013523, PMCID: PMC4926744, DOI: 10.1093/jamia/ocw009.Peer-Reviewed Original ResearchConceptsCancer treatment trials
2015
Colorectal cancer drug target prediction using ontology-based inference and network analysis
Tao C, Sun J, Zheng W, Chen J, Xu H. Colorectal cancer drug target prediction using ontology-based inference and network analysis. Database 2015, 2015: bav015. PMID: 25818893, PMCID: PMC4375358, DOI: 10.1093/database/bav015.Peer-Reviewed Original ResearchMeSH KeywordsAntineoplastic AgentsColorectal NeoplasmsDatabases, FactualDrug Delivery SystemsDrug DesignHumansNeoplasm ProteinsConceptsColorectal cancer
2014
Linking Biochemical Pathways and Networks to Adverse Drug Reactions
Zheng H, Wang H, Xu H, Wu Y, Zhao Z, Azuaje F. Linking Biochemical Pathways and Networks to Adverse Drug Reactions. IEEE Transactions On NanoBioscience 2014, 13: 131-137. PMID: 24893363, DOI: 10.1109/tnb.2014.2319158.Peer-Reviewed Original ResearchNetwork‐Assisted Prediction of Potential Drugs for Addiction
Sun J, Huang L, Xu H, Zhao Z. Network‐Assisted Prediction of Potential Drugs for Addiction. BioMed Research International 2014, 2014: 258784. PMID: 24689033, PMCID: PMC3932722, DOI: 10.1155/2014/258784.Peer-Reviewed Original Research
2013
Characterization of Statin Dose Response in Electronic Medical Records
Wei W, Feng Q, Jiang L, Waitara M, Iwuchukwu O, Roden D, Jiang M, Xu H, Krauss R, Rotter J, Nickerson D, Davis R, Berg R, Peissig P, McCarty C, Wilke R, Denny J. Characterization of Statin Dose Response in Electronic Medical Records. Clinical Pharmacology & Therapeutics 2013, 95: 331-338. PMID: 24096969, PMCID: PMC3944214, DOI: 10.1038/clpt.2013.202.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsAllelesAtorvastatinCholesterol, LDLCohort StudiesDatabases, FactualDose-Response Relationship, DrugElectronic Health RecordsGenotypeHeptanoic AcidsHumansHydroxymethylglutaryl-CoA Reductase InhibitorsHyperlipidemiasLipid MetabolismLipidsPhenotypePolymorphism, Single NucleotidePyrrolesRandomized Controlled Trials as TopicSimvastatin
2012
Identifying the status of genetic lesions in cancer clinical trial documents using machine learning
Wu Y, Levy M, Micheel C, Yeh P, Tang B, Cantrell M, Cooreman S, Xu H. Identifying the status of genetic lesions in cancer clinical trial documents using machine learning. BMC Genomics 2012, 13: s21. PMID: 23282337, PMCID: PMC3535695, DOI: 10.1186/1471-2164-13-s8-s21.Peer-Reviewed Original Research