2023
AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning
Luo L, Wei C, Lai P, Leaman R, Chen Q, Lu Z. AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning. Bioinformatics 2023, 39: btad310. PMID: 37171899, PMCID: PMC10212279, DOI: 10.1093/bioinformatics/btad310.Peer-Reviewed Original ResearchConceptsDeep learningEntity recognitionTraining dataEntity typesLabeling training dataNatural language textText mining tasksSignificant domain expertiseMulti-task learningMining tasksInformation extractionBioNER taskDomain expertiseBiomedical entitiesIndependent tasksSource codeBenchmark tasksLanguage textBiomedical textArt approachesAccurate annotationExternal dataData scarcityTaskLearning
2022
Assigning species information to corresponding genes by a sequence labeling framework
Luo L, Wei C, Lai P, Chen Q, Islamaj R, Lu Z. Assigning species information to corresponding genes by a sequence labeling framework. Database 2022, 2022: baac090. PMID: 36227127, PMCID: PMC9558450, DOI: 10.1093/database/baac090.Peer-Reviewed Original ResearchConceptsNovel deep learning-based frameworkDeep learning-based frameworkLearning-based frameworkText mining algorithmsSequence labeling taskGene normalization taskSequence labeling frameworkBinary classification frameworkSource codeBaseline methodsNormalization taskClassification frameworkLabeling taskLabeling frameworkAutomatic assignmentHigh-performance methodHeuristic rulesGene mentionsBenchmarking resultsDatabase URLDatabase recordsAssignment taskLitMC-BERT: Transformer-Based Multi-Label Classification of Biomedical Literature With An Application on COVID-19 Literature Curation
Chen Q, Du J, Allot A, Lu Z. LitMC-BERT: Transformer-Based Multi-Label Classification of Biomedical Literature With An Application on COVID-19 Literature Curation. IEEE/ACM Transactions On Computational Biology And Bioinformatics 2022, 19: 2584-2595. PMID: 35536809, PMCID: PMC9647722, DOI: 10.1109/tcbb.2022.3173562.Peer-Reviewed Original ResearchMulti-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations
Chen Q, Allot A, Leaman R, Islamaj R, Du J, Fang L, Wang K, Xu S, Zhang Y, Bagherzadeh P, Bergler S, Bhatnagar A, Bhavsar N, Chang Y, Lin S, Tang W, Zhang H, Tavchioski I, Pollak S, Tian S, Zhang J, Otmakhova Y, Yepes A, Dong H, Wu H, Dufour R, Labrak Y, Chatterjee N, Tandon K, Laleye F, Rakotoson L, Chersoni E, Gu J, Friedrich A, Pujari S, Chizhikova M, Sivadasan N, Vg S, Lu Z. Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. Database 2022, 2022: baac069. PMID: 36043400, PMCID: PMC9428574, DOI: 10.1093/database/baac069.Peer-Reviewed Original Research
2021
Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing
Chen Q, Leaman R, Allot A, Luo L, Wei C, Yan S, Lu Z. Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing. Annual Review Of Biomedical Data Science 2021, 4: 1-27. PMID: 34465169, DOI: 10.1146/annurev-biodatasci-021821-061045.Peer-Reviewed Original ResearchConceptsNatural language processingArtificial intelligenceLanguage processingInformation needsLiterature-based discoveryInformation retrievalEntity recognitionMisinformation detectionInformation overloadNLP studiesNLP tasksEmotion analysisTopic modelingCOVID-19 pandemicIntelligenceAdditional tasksHuman languagePublic health measuresTaskHealth measuresProcessingSerious health effectsHealth effectsRetrievalDataset
2020
LitCovid: an open database of COVID-19 literature
Chen Q, Allot A, Lu Z. LitCovid: an open database of COVID-19 literature. Nucleic Acids Research 2020, 49: d1534-d1540. PMID: 33166392, PMCID: PMC7778958, DOI: 10.1093/nar/gkaa952.Peer-Reviewed Original ResearchConceptsSerious information overloadCuration workflowData miningInformation overloadCollected articlesInformation needsOpen databaseManual curationNews articlesCOVID-19 literatureLiterature resourcesRapid growthUsersCOVID-19 researchMiningWorkflowAlgorithmCurationDate scientific informationDatabaseInformationGeneral publicResourcesAccessTextDeep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records
Chen Q, Du J, Kim S, Wilbur W, Lu Z. Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records. BMC Medical Informatics And Decision Making 2020, 20: 73. PMID: 32349758, PMCID: PMC7191680, DOI: 10.1186/s12911-020-1044-0.Peer-Reviewed Original ResearchConceptsEnd deep learning modelEncoder networkDeep learning modelsSentence embeddingsBiomedical corporaLearning modelRandom forestTraditional machineText mining applicationsDeep learning approachSimilar sentencesMachine learning modelsHigh performanceMining applicationsRelated datasetsClinical notesLearning approachSentence semanticsPubMed abstractsChallenge taskEnsembled modelBest submissionSentence pairsNetworkTest setBioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale
Chen Q, Lee K, Yan S, Kim S, Wei C, Lu Z. BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale. PLOS Computational Biology 2020, 16: e1007617. PMID: 32324731, PMCID: PMC7237030, DOI: 10.1371/journal.pcbi.1007617.Peer-Reviewed Original ResearchConceptsConcept embeddingsNER toolsLearning modelBiomedical text mining applicationsAdvanced deep learning modelsDifferent machine learning modelsEvaluation resultsText mining applicationsDeep learning modelsSemantics of conceptsMachine learning modelsLiterature-based discoveryConcept recognitionDifferent machineProtein-protein interaction predictionPubMed abstractsRecognition toolsMassive numberVector representationBiomedical conceptsLarge marginExtrinsic evaluationBiomedical literatureIntrinsic evaluationSemantic relatedness
2019
ML-Net: multi-label classification of biomedical texts with deep neural networks
Du J, Chen Q, Peng Y, Xiang Y, Tao C, Lu Z. ML-Net: multi-label classification of biomedical texts with deep neural networks. Journal Of The American Medical Informatics Association 2019, 26: 1279-1285. PMID: 31233120, PMCID: PMC7647240, DOI: 10.1093/jamia/ocz085.Peer-Reviewed Original ResearchConceptsMulti-label classificationML-NetBiomedical textEnd deep learning frameworkMulti-label text classificationDeep learning frameworkDeep neural networksTraditional machineDocument contextFeature engineeringText classificationTextual documentsMachine learningNovel endLearning frameworkPrediction networkIndividual classifiersNeural networkHuman effortTarget documentsF-measureArt methodsPrediction mechanismContextual informationLabel countsBioWordVec, improving biomedical word embeddings with subword information and MeSH
Zhang Y, Chen Q, Yang Z, Lin H, Lu Z. BioWordVec, improving biomedical word embeddings with subword information and MeSH. Scientific Data 2019, 6: 52. PMID: 31076572, PMCID: PMC6510737, DOI: 10.1038/s41597-019-0055-0.Peer-Reviewed Original ResearchConceptsWord embeddingsSubword informationWord representationsBiomedical natural language processingNatural language processingMultiple NLP tasksBiomedical word embeddingsInformation retrievalUnlabeled textBiomedical textText miningBiomedical domainLanguage processingNLP tasksStructured resourcesChallenging taskPrevious stateBenchmarking resultsLarge corpusEmbeddingWord levelBioWordVecSuch informationTaskInformationLitSense: making sense of biomedical literature at sentence level
Allot A, Chen Q, Kim S, Alvarez R, Comeau D, Wilbur W, Lu Z. LitSense: making sense of biomedical literature at sentence level. Nucleic Acids Research 2019, 47: w594-w599. PMID: 31020319, PMCID: PMC6602490, DOI: 10.1093/nar/gkz289.Peer-Reviewed Original ResearchConceptsFirst web-based systemFilter search resultsNeural embedding approachBiomedical literatureUser-friendly interfaceWeb-based systemTerm-weighting approachUser queriesQuery formulationUnified accessKeyword matchesBiomedical entitiesSentence retrievalResults visualizationSearch resultsEmbedding approachCurrent toolsQueriesRetrievalSentence levelRare termsRelevant resultsSignificant effortsPrevious knowledgePubTatorOverview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine
Doğan R, Kim S, Chatr-aryamontri A, Wei C, Comeau D, Antunes R, Matos S, Chen Q, Elangovan A, Panyam N, Verspoor K, Liu H, Wang Y, Liu Z, Altınel B, Hüsünbeyi Z, Özgür A, Fergadis A, Wang C, Dai H, Tran T, Kavuluru R, Luo L, Steppi A, Zhang J, Qu J, Lu Z. Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine. Database 2019, 2019: bay147. PMID: 30689846, PMCID: PMC6348314, DOI: 10.1093/database/bay147.Peer-Reviewed Original ResearchConceptsRelation extraction taskDocument triage taskBest F-scoreExtraction taskTriage taskKnowledge basesF-scorePubMed documentsArt deep learning methodsText-mining research communityLarge knowledge basesDeep learning methodsText mining systemText mining modelText mining toolsBest average precisionData setsLarge-scale corpusHuman annotationsElectronic health recordsSystem developersBetter recallText miningAverage precisionLearning methods