2024
Augmenting biomedical named entity recognition with general-domain resources
Yin Y, Kim H, Xiao X, Wei C, Kang J, Lu Z, Xu H, Fang M, Chen Q. Augmenting biomedical named entity recognition with general-domain resources. Journal Of Biomedical Informatics 2024, 159: 104731. PMID: 39368529, DOI: 10.1016/j.jbi.2024.104731.Peer-Reviewed Original ResearchBioNER datasetsMulti-task learningNER datasetsEntity typesBiomedical datasetsBaseline modelGeneral domain datasetsBiomedical language modelNeural network-basedYield performance improvementsBioNER modelsEntity recognitionBiomedical corporaHuman annotatorsLabel ambiguityLanguage modelTransfer learningF1 scoreBioNERHuman effortNetwork-basedBiomedical resourcesPerformance improvementDatasetSuperior performanceA Study of Biomedical Relation Extraction Using GPT Models.
Zhang J, Wibert M, Zhou H, Peng X, Chen Q, Keloth V, Hu Y, Zhang R, Xu H, Raja K. A Study of Biomedical Relation Extraction Using GPT Models. AMIA Joint Summits On Translational Science Proceedings 2024, 2024: 391-400. PMID: 38827097, PMCID: PMC11141827.Peer-Reviewed Original ResearchPrompt Tuning in Biomedical Relation Extraction
He J, Li F, Li J, Hu X, Nian Y, Xiang Y, Wang J, Wei Q, Li Y, Xu H, Tao C. Prompt Tuning in Biomedical Relation Extraction. Journal Of Healthcare Informatics Research 2024, 8: 206-224. PMID: 38681754, PMCID: PMC11052745, DOI: 10.1007/s41666-024-00162-9.Peer-Reviewed Original ResearchFew-shot scenariosBiomedical relation extractionNatural language processingBiomedical RERelation extractionPrompt tuningState-of-the-art performanceText mining applicationsTuning modelBioCreative VISemEval-2013Knowledge graphLanguage modelMining applicationsBiomedical textOriginal inputComputational resourcesLanguage processingExternal knowledgeSpecific textsSuperior performanceDatasetEfficient approachTaskModel performance
2023
Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach
Hu Y, Keloth V, Raja K, Chen Y, Xu H. Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach. Bioinformatics 2023, 39: btad542. PMID: 37669123, PMCID: PMC10500081, DOI: 10.1093/bioinformatics/btad542.Peer-Reviewed Original ResearchNatural language processingMicro-F1 scoreCOVID-19 datasetNLP pipelineF1 scoreEntity recognition modelAD datasetPICO elementsSentence classificationNER modelRecognition modelLanguage processingLearning approachLearning modelEnd evaluationSupplementary dataDatasetPipelineExtractionInformationRCT abstractsAnnotationSentencesBioinformaticsComplexity
2019
Cost-sensitive Active Learning for Phenotyping of Electronic Health Records.
Ji Z, Wei Q, Franklin A, Cohen T, Xu H. Cost-sensitive Active Learning for Phenotyping of Electronic Health Records. AMIA Joint Summits On Translational Science Proceedings 2019, 2019: 829-838. PMID: 31259040, PMCID: PMC6568101.Peer-Reviewed Original ResearchAnnotation timeElectronic health recordsActive learningMachine learning-based methodsCost-sensitive active learningLarge annotated datasetLearning-based methodsHealth recordsUse casesAnnotated datasetUser 1AL algorithmUser 2Phenotyping algorithmAL approachSecondary useAlgorithmBetter performanceActual timeLearningExperimental resultsBreast cancer patientsDatasetModel performancePassive learning
2017
CNN-based ranking for biomedical entity normalization
Li H, Chen Q, Tang B, Wang X, Xu H, Wang B, Huang D. CNN-based ranking for biomedical entity normalization. BMC Bioinformatics 2017, 18: 385. PMID: 28984180, PMCID: PMC5629610, DOI: 10.1186/s12859-017-1805-7.Peer-Reviewed Original ResearchConceptsBiomedical entity normalizationEntity normalizationSemantic informationCNN architectureNovel convolutional neural network architectureConvolutional neural network architectureTraditional rule-based methodsNeural network architectureRule-based systemRanking methodRule-based methodNetwork architectureBiomedical entitiesBenchmark datasetsArt performanceEntity mentionsRanking problemCNNNormalization systemArchitectureMorphological informationComparison resultsInformationDatasetSystemSearch Datasets in Literature: A Case Study of GWAS.
Dong X, Zhang Y, Xu H. Search Datasets in Literature: A Case Study of GWAS. AMIA Joint Summits On Translational Science Proceedings 2017, 2017: 40-49. PMID: 28815103, PMCID: PMC5543360.Peer-Reviewed Original ResearchRecognition systemMEDLINE abstractsDataset search enginePattern-based rulesText mining methodsData setsUnderlying data setSearch datasetsData discoverabilityUse casesSearch enginesDataset attributesMining methodsF-measureDomain dictionaryScalable approachHybrid approachDatasetFinderRetrieving literatureDiscoverabilityUltimate goalCase studySetScientific publicationsDATS, the data tag suite to enable discoverability of datasets
Sansone S, Gonzalez-Beltran A, Rocca-Serra P, Alter G, Grethe J, Xu H, Fore I, Lyle J, Gururaj A, Chen X, Kim H, Zong N, Li Y, Liu R, Ozyurt I, Ohno-Machado L. DATS, the data tag suite to enable discoverability of datasets. Scientific Data 2017, 4: 170059. PMID: 28585923, PMCID: PMC5460592, DOI: 10.1038/sdata.2017.59.Peer-Reviewed Original ResearchInformation retrieval for biomedical datasets: the 2016 bioCADDIE dataset retrieval challenge
Roberts K, Gururaj A, Chen X, Pournejati S, Hersh W, Demner-Fushman D, Ohno-Machado L, Cohen T, Xu H. Information retrieval for biomedical datasets: the 2016 bioCADDIE dataset retrieval challenge. Database 2017, 2017: bax068. DOI: 10.1093/database/bax068.Peer-Reviewed Original ResearchBiomedical datasetsRetrieval challengesInformation retrieval techniquesAdvanced query processingBiomedical data repositoriesAdvanced retrieval methodsQuery processingInformation retrievalTest queriesRetrieval systemRank frameworkRetrieval approachRetrieval techniquesData repositoryRetrieval methodTop precisionDatasetQueriesRepositoryChallengesRetrievalTaskLearningSystemCorpus
2012
Genetic studies of complex human diseases: Characterizing SNP-disease associations using Bayesian networks
Han B, Chen X, Talebizadeh Z, Xu H. Genetic studies of complex human diseases: Characterizing SNP-disease associations using Bayesian networks. BMC Systems Biology 2012, 6: s14. PMID: 23281790, PMCID: PMC3524021, DOI: 10.1186/1752-0509-6-s3-s14.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsAlzheimer DiseaseArtificial IntelligenceAutistic DisorderBayes TheoremComputational BiologyComputer SimulationDatabases, GeneticEpistasis, GeneticGenome-Wide Association StudyHumansMacular DegenerationMarkov ChainsModels, GeneticMonte Carlo MethodPolymorphism, Single NucleotideConceptsEpistatic interaction detectionBayesian network structure learning methodTwo-layer Bayesian networkBayesian network-based methodBayesian networkInteraction detectionMarkov chain Monte Carlo methodsStructure learning methodReal disease dataNetwork-based methodReal GWAS datasetMonte Carlo methodHigh-order epistatic interactionsMachine learningSearch spaceLearning methodsDisease datasetCarlo methodTarget nodeModel complexityStatistical methodsReal dataNew scoring functionComplex human diseasesDatasetExtracting epidemiologic exposure and outcome terms from literature using machine learning approaches.
Lu Y, Xu H, Peterson N, Dai Q, Jiang M, Denny J, Liu M. Extracting epidemiologic exposure and outcome terms from literature using machine learning approaches. International Journal Of Data Mining And Bioinformatics 2012, 6: 447-59. PMID: 23155773, DOI: 10.1504/ijdmb.2012.049284.Peer-Reviewed Original Research