2024
GeneGPT: augmenting large language models with domain tools for improved access to biomedical information
Jin Q, Yang Y, Chen Q, Lu Z. GeneGPT: augmenting large language models with domain tools for improved access to biomedical information. Bioinformatics 2024, 40: btae075. PMID: 38341654, PMCID: PMC10904143, DOI: 10.1093/bioinformatics/btae075.Peer-Reviewed Original ResearchAPI callsWeb APIsLanguage modelState-of-the-art performanceMulti-hop questionsState-of-the-artDomain-specific toolsDecoding algorithmNational Center for Biotechnology InformationGPT-3Biomedical informationDatabase utilizationExperimental resultsAPITaskDomain toolsLearningChatGPTSpecialized knowledgeInformationLanguageGenomic questionsAlgorithmDatasetBiotechnology InformationImproving large language models for clinical named entity recognition via prompt engineering
Hu Y, Chen Q, Du J, Peng X, Keloth V, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. Journal Of The American Medical Informatics Association 2024, 31: 1812-1820. PMID: 38281112, PMCID: PMC11339492, DOI: 10.1093/jamia/ocad259.Peer-Reviewed Original ResearchClinical NER tasksNER taskTask-specific promptsEntity recognitionLanguage modelTraining samplesState-of-the-art modelsFew-shot learningState-of-the-artMinimal training dataTask-specific knowledgeF1-socreAnnotated samplesConcept extractionModel performanceAnnotated datasetsTraining dataF1 scoreTask descriptionFormat specificationsComplex clinical dataOptimal performanceTaskEvaluation schemaGPT model
2023
Opportunities and challenges for ChatGPT and large language models in biomedicine and health
Tian S, Jin Q, Yeganova L, Lai P, Zhu Q, Chen X, Yang Y, Chen Q, Kim W, Comeau D, Islamaj R, Kapoor A, Gao X, Lu Z. Opportunities and challenges for ChatGPT and large language models in biomedicine and health. Briefings In Bioinformatics 2023, 25: bbad493. PMID: 38168838, PMCID: PMC10762511, DOI: 10.1093/bib/bbad493.Peer-Reviewed Original ResearchConceptsLarge language modelsLanguage modelSensitive patient dataBiomedical information retrievalText generation tasksInformation retrievalPrivacy concernsDomain expertsInformation extractionText summarizationBiomedical domainArt methodsDiverse applicationsPrevious stateBiomedical researchersGeneration taskPatient dataSuch methodsTaskDistinct complexityGeneration capabilityExtensive literature surveySummarizationRecent rapid progressChallengesAIONER: all-in-one scheme-based biomedical named entity recognition using deep learning
Luo L, Wei C, Lai P, Leaman R, Chen Q, Lu Z. AIONER: all-in-one scheme-based biomedical named entity recognition using deep learning. Bioinformatics 2023, 39: btad310. PMID: 37171899, PMCID: PMC10212279, DOI: 10.1093/bioinformatics/btad310.Peer-Reviewed Original ResearchConceptsDeep learningEntity recognitionTraining dataEntity typesLabeling training dataNatural language textText mining tasksSignificant domain expertiseMulti-task learningMining tasksInformation extractionBioNER taskDomain expertiseBiomedical entitiesIndependent tasksSource codeBenchmark tasksLanguage textBiomedical textArt approachesAccurate annotationExternal dataData scarcityTaskLearning
2021
Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing
Chen Q, Leaman R, Allot A, Luo L, Wei C, Yan S, Lu Z. Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing. Annual Review Of Biomedical Data Science 2021, 4: 1-27. PMID: 34465169, DOI: 10.1146/annurev-biodatasci-021821-061045.Peer-Reviewed Original ResearchConceptsNatural language processingArtificial intelligenceLanguage processingInformation needsLiterature-based discoveryInformation retrievalEntity recognitionMisinformation detectionInformation overloadNLP studiesNLP tasksEmotion analysisTopic modelingCOVID-19 pandemicIntelligenceAdditional tasksHuman languagePublic health measuresTaskHealth measuresProcessingSerious health effectsHealth effectsRetrievalDataset
2019
BioSentVec: creating sentence embeddings for biomedical texts
Chen Q, Peng Y, Lu Z. BioSentVec: creating sentence embeddings for biomedical texts. 2019, 00: 1-5. DOI: 10.1109/ichi.2019.8904728.Peer-Reviewed Original ResearchNatural language processing systemsSentence embeddingsBiomedical textAdvanced deep learning methodsDeep learning methodsBiomedical text miningBiomedical word embeddingsLanguage processing systemPre-trained sentence encodersText miningArt performanceLearning methodsSentence semanticsSentence encoderWord embeddingsProcessing systemBenchmarking resultsEmbeddingSimilarity taskClinical notesTaskEssential partGeneral domainsClinical databaseSemanticsBioWordVec, improving biomedical word embeddings with subword information and MeSH
Zhang Y, Chen Q, Yang Z, Lin H, Lu Z. BioWordVec, improving biomedical word embeddings with subword information and MeSH. Scientific Data 2019, 6: 52. PMID: 31076572, PMCID: PMC6510737, DOI: 10.1038/s41597-019-0055-0.Peer-Reviewed Original ResearchConceptsWord embeddingsSubword informationWord representationsBiomedical natural language processingNatural language processingMultiple NLP tasksBiomedical word embeddingsInformation retrievalUnlabeled textBiomedical textText miningBiomedical domainLanguage processingNLP tasksStructured resourcesChallenging taskPrevious stateBenchmarking resultsLarge corpusEmbeddingWord levelBioWordVecSuch informationTaskInformation