2023
A hierarchical strategy to minimize privacy risk when linking “De-identified” data in biomedical research consortia
Ohno-Machado L, Jiang X, Kuo T, Tao S, Chen L, Ram P, Zhang G, Xu H. A hierarchical strategy to minimize privacy risk when linking “De-identified” data in biomedical research consortia. Journal Of Biomedical Informatics 2023, 139: 104322. PMID: 36806328, PMCID: PMC10975485, DOI: 10.1016/j.jbi.2023.104322.Peer-Reviewed Original ResearchConceptsPrivacy of individualsAppropriate privacy protectionData-driven modelsPrivacy protectionPrivacy risksData Coordination CenterData hubData repositoryHierarchical strategyPrivacyBiomedical discoveryData setsRecord linkageData Coordinating CenterRepositoryComplex strategiesCoordination centerTechnologyTechniqueDataPartiesSetHierarchy
2020
Time event ontology (TEO): to support semantic representation and reasoning of complex temporal relations of clinical events
Li F, Du J, He Y, Song H, Madkour M, Rao G, Xiang Y, Luo Y, Chen H, Liu S, Wang L, Liu H, Xu H, Tao C. Time event ontology (TEO): to support semantic representation and reasoning of complex temporal relations of clinical events. Journal Of The American Medical Informatics Association 2020, 27: 1046-1056. PMID: 32626903, PMCID: PMC7647306, DOI: 10.1093/jamia/ocaa058.Peer-Reviewed Original ResearchConceptsTime Event OntologyComplex temporal relationsEvent ontologyNatural language processing fieldTemporal relationsTime-related queriesInformation annotationProcessing fieldTemporal informationData propertiesRelation representationClinical narrativesSemantic representationElectronic health record dataRich setHealth record dataOntologyStrong capabilityReasoningSetQueriesOrder relationRecord dataRepresentationPrimitivesEfficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the Corpus
Li Y, Luo Y, Wampfler J, Rubinstein S, Tiryaki F, Ashok K, Warner J, Xu H, Yang P. Efficient and Accurate Extracting of Unstructured EHRs on Cancer Therapy Responses for the Development of RECIST Natural Language Processing Tools: Part I, the Corpus. JCO Clinical Cancer Informatics 2020, 4: cci.19.00147. PMID: 32364754, PMCID: PMC7265793, DOI: 10.1200/cci.19.00147.Peer-Reviewed Original ResearchConceptsNatural language processing toolsElectronic health recordsLanguage processing toolsGold standard dataUnstructured electronic health recordsProcessing toolsAmount of dataClinical notesStandard dataMayo Clinic electronic health recordsClinic's electronic health recordEnvironment toolsAccurate annotationHealth recordsInformatics toolsEffective analysisData setsTextual sourcesCorpusToolInformationData extractionSetExtractingAnnotation
2017
Search Datasets in Literature: A Case Study of GWAS.
Dong X, Zhang Y, Xu H. Search Datasets in Literature: A Case Study of GWAS. AMIA Joint Summits On Translational Science Proceedings 2017, 2017: 40-49. PMID: 28815103, PMCID: PMC5543360.Peer-Reviewed Original ResearchRecognition systemMEDLINE abstractsDataset search enginePattern-based rulesText mining methodsData setsUnderlying data setSearch datasetsData discoverabilityUse casesSearch enginesDataset attributesMining methodsF-measureDomain dictionaryScalable approachHybrid approachDatasetFinderRetrieving literatureDiscoverabilityUltimate goalCase studySetScientific publicationsDATS, the data tag suite to enable discoverability of datasets
Sansone S, Gonzalez-Beltran A, Rocca-Serra P, Alter G, Grethe J, Xu H, Fore I, Lyle J, Gururaj A, Chen X, Kim H, Zong N, Li Y, Liu R, Ozyurt I, Ohno-Machado L. DATS, the data tag suite to enable discoverability of datasets. Scientific Data 2017, 4: 170059. PMID: 28585923, PMCID: PMC5460592, DOI: 10.1038/sdata.2017.59.Peer-Reviewed Original ResearchFinding useful data across multiple biomedical data repositories using DataMed
Ohno-Machado L, Sansone S, Alter G, Fore I, Grethe J, Xu H, Gonzalez-Beltran A, Rocca-Serra P, Gururaj A, Bell E, Soysal E, Zong N, Kim H. Finding useful data across multiple biomedical data repositories using DataMed. Nature Genetics 2017, 49: 816-819. PMID: 28546571, PMCID: PMC6460922, DOI: 10.1038/ng.3864.Peer-Reviewed Original ResearchConceptsBiomedical data repositoriesHealth big dataData setsKnowledge discoveryBig dataMultiple repositoriesSearch enginesData indexFAIR principlesDataMedData repositoryService providersKnowledge initiativesKnowledge expertsBiomedical research communityResearch communityRepositoryScience landscapeUseful dataInteroperabilityMetadataFindabilitySetEngineDataExpressing Biomedical Ontologies in Natural Language for Expert Evaluation.
Amith M, Manion F, Harris M, Zhang Y, Xu H, Tao C. Expressing Biomedical Ontologies in Natural Language for Expert Evaluation. 2017, 245: 838-842. PMID: 29295217, PMCID: PMC6644701.Peer-Reviewed Original Research
2012
Extracting epidemiologic exposure and outcome terms from literature using machine learning approaches.
Lu Y, Xu H, Peterson N, Dai Q, Jiang M, Denny J, Liu M. Extracting epidemiologic exposure and outcome terms from literature using machine learning approaches. International Journal Of Data Mining And Bioinformatics 2012, 6: 447-59. PMID: 23155773, DOI: 10.1504/ijdmb.2012.049284.Peer-Reviewed Original Research
2011
Detecting abbreviations in discharge summaries using machine learning methods.
Wu Y, Rosenbloom S, Denny J, Miller R, Mani S, Giuse D, Xu H. Detecting abbreviations in discharge summaries using machine learning methods. AMIA Annual Symposium Proceedings 2011, 2011: 1541-9. PMID: 22195219, PMCID: PMC3243185.Peer-Reviewed Original ResearchConceptsNatural language processingMachine learning methodsHighest F-measureF-measureClinical natural language processingLexical resourcesClinical abbreviationsTraining setPre-defined featuresRandom forest classifierDomain expertsML algorithmsML classifiersLanguage processingVoting schemeLearning methodsDischarge summariesForest classifierTest setClassifierCorpus-based methodSetResourcesAlgorithmAbbreviations