2024
SEETrials: Leveraging large language models for safety and efficacy extraction in oncology clinical trials
Lee K, Paek H, Huang L, Hilton C, Datta S, Higashi J, Ofoegbu N, Wang J, Rubinstein S, Cowan A, Kwok M, Warner J, Xu H, Wang X. SEETrials: Leveraging large language models for safety and efficacy extraction in oncology clinical trials. Informatics In Medicine Unlocked 2024, 50: 101589. PMID: 39493413, PMCID: PMC11530223, DOI: 10.1016/j.imu.2024.101589.Peer-Reviewed Original ResearchAntibody-drug conjugatesOverall response rateMultiple myelomaF1 scoreCAR-TComplete responseBispecific antibodiesComparative performance analysisClinical trial studyClinical trial outcomesLanguage modelAccurate data extractionTherapy subgroupFine granularityOncology clinical trialsAdverse eventsClinical decision-makingPerformance analysisClinical trialsInnovative therapiesDiverse therapiesClinical trial abstractsCancer domainData elementsTherapyAugmenting biomedical named entity recognition with general-domain resources
Yin Y, Kim H, Xiao X, Wei C, Kang J, Lu Z, Xu H, Fang M, Chen Q. Augmenting biomedical named entity recognition with general-domain resources. Journal Of Biomedical Informatics 2024, 159: 104731. PMID: 39368529, DOI: 10.1016/j.jbi.2024.104731.Peer-Reviewed Original ResearchBioNER datasetsMulti-task learningNER datasetsEntity typesBiomedical datasetsBaseline modelGeneral domain datasetsBiomedical language modelNeural network-basedYield performance improvementsBioNER modelsEntity recognitionBiomedical corporaHuman annotatorsLabel ambiguityLanguage modelTransfer learningF1 scoreBioNERHuman effortNetwork-basedBiomedical resourcesPerformance improvementDatasetSuperior performanceRelation extraction using large language models: a case study on acupuncture point locations
Li Y, Peng X, Li J, Zuo X, Peng S, Pei D, Tao C, Xu H, Hong N. Relation extraction using large language models: a case study on acupuncture point locations. Journal Of The American Medical Informatics Association 2024, 31: 2622-2631. PMID: 39208311, PMCID: PMC11491641, DOI: 10.1093/jamia/ocae233.Peer-Reviewed Original ResearchAcupuncture point locationsAcupoint locationLocation of acupointsClinical decision supportAcupuncture knowledgeAcupuncture trainingAcupuncture therapyAcupunctureAcupointsComplementary medicineEducational moduleWestern Pacific RegionInformatics applicationsDecision supportScoresGenerative Pre-trained TransformerWHO standardsF1 scoreLanguage modelPacific regionWHODomain-specific fine-tuningTrainingStudyMicro-averaged F1 scoreAdvancing entity recognition in biomedicine via instruction tuning of large language models
Keloth V, Hu Y, Xie Q, Peng X, Wang Y, Zheng A, Selek M, Raja K, Wei C, Jin Q, Lu Z, Chen Q, Xu H. Advancing entity recognition in biomedicine via instruction tuning of large language models. Bioinformatics 2024, 40: btae163. PMID: 38514400, PMCID: PMC11001490, DOI: 10.1093/bioinformatics/btae163.Peer-Reviewed Original ResearchNamed Entity RecognitionSequence labeling taskNatural language processingBiomedical NER datasetsLanguage modelNER datasetsEntity recognitionLabeling taskText generationField of natural language processingBiomedical NERFew-shot learning capabilityReasoning tasksMulti-domain scenariosDomain-specific modelsEnd-to-endMinimal fine-tuningSOTA performanceF1 scoreHealthcare applicationsBiomedical entitiesBiomedical domainLanguage processingMulti-taskingPubMedBERT modelImproving large language models for clinical named entity recognition via prompt engineering
Hu Y, Chen Q, Du J, Peng X, Keloth V, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. Journal Of The American Medical Informatics Association 2024, 31: 1812-1820. PMID: 38281112, PMCID: PMC11339492, DOI: 10.1093/jamia/ocad259.Peer-Reviewed Original ResearchClinical NER tasksNER taskTask-specific promptsEntity recognitionLanguage modelTraining samplesState-of-the-art modelsFew-shot learningState-of-the-artMinimal training dataTask-specific knowledgeF1-socreAnnotated samplesConcept extractionModel performanceAnnotated datasetsTraining dataF1 scoreTask descriptionFormat specificationsComplex clinical dataOptimal performanceTaskEvaluation schemaGPT model
2023
AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models
Datta S, Lee K, Paek H, Manion F, Ofoegbu N, Du J, Li Y, Huang L, Wang J, Lin B, Xu H, Wang X. AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models. Journal Of The American Medical Informatics Association 2023, 31: 375-385. PMID: 37952206, PMCID: PMC10797270, DOI: 10.1093/jamia/ocad218.Peer-Reviewed Original ResearchConceptsLanguage modelInformation extraction systemOverall F1 scoreCriteria informationF1 scoreManual annotationScalable solutionContextual informationComplex scenariosContextual attributesExtraction systemReal-world settingsSystem evaluationModeling capabilitiesClinical trial protocol documentsInformationProtocol documentsTowards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach
Hu Y, Keloth V, Raja K, Chen Y, Xu H. Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach. Bioinformatics 2023, 39: btad542. PMID: 37669123, PMCID: PMC10500081, DOI: 10.1093/bioinformatics/btad542.Peer-Reviewed Original ResearchNatural language processingMicro-F1 scoreCOVID-19 datasetNLP pipelineF1 scoreEntity recognition modelAD datasetPICO elementsSentence classificationNER modelRecognition modelLanguage processingLearning approachLearning modelEnd evaluationSupplementary dataDatasetPipelineExtractionInformationRCT abstractsAnnotationSentencesBioinformaticsComplexityDevelopment of a Natural Language Processing Tool to Extract Acupuncture Point Location Terms
Li Y, Peng X, Li J, Peng S, Pei D, Tao C, Xu H, Hong N. Development of a Natural Language Processing Tool to Extract Acupuncture Point Location Terms. 2023, 00: 344-351. DOI: 10.1109/ichi57859.2023.00053.Peer-Reviewed Original ResearchAcupuncture point locationsNatural language processingRecurrent neural networkConditional random fieldWorld Health OrganizationWorld Health Organization standardsNatural language processing toolsEffect of acupuncture therapyLocation informationAcupuncture researchAcupuncture therapyAcupoint locationRecurrent neural network modelDictionary lookup methodNatural language processing modelsDeep learning techniquesAcupunctureLanguage processing toolsWestern Pacific RegionFree-text formatInternational anatomical terminologyHealth OrganizationF1 scoreInformatics applicationsNeural networkSuicide Tendency Prediction from Psychiatric Notes Using Transformer Models
Li Z, Ameer I, Hu Y, Abdelhameed A, Tao C, Selek S, Xu H. Suicide Tendency Prediction from Psychiatric Notes Using Transformer Models. 2023, 00: 481-483. DOI: 10.1109/ichi57859.2023.00074.Peer-Reviewed Original ResearchWeighted F1 scoreF1 scoreMachine learning modelsElectronic health recordsLearning modelsState-of-the-art modelsState-of-the-artBinary classification taskHealth recordsBinary classification modelStandard diagnosis codesClassification taskMulticlass classificationHealth informaticsClassification modelMental health informaticsTransformation modelPrediction algorithmPsychiatric notesInitial psychiatric evaluationSuicidal tendenciesMachineRandom forest modelSuicidal ideationPerformance
2022
ClinicalLayoutLM: A Pre-trained Multi-modal Model for Understanding Scanned Document in Electronic Health Records
Wei Q, Zuo X, Anjum O, Hu Y, Denlinger R, Bernstam E, Citardi M, Xu H. ClinicalLayoutLM: A Pre-trained Multi-modal Model for Understanding Scanned Document in Electronic Health Records. 2022, 00: 2821-2827. DOI: 10.1109/bigdata55660.2022.10020569.Peer-Reviewed Original ResearchOptical character recognitionMulti-modal modelElectronic health recordsClinical documentsNatural language processing tasksInformation extraction technologyPre-trained modelsHealth recordsLanguage processing tasksInformation extractionImage informationF1 scoreCharacter recognitionLayout analysisProcessing tasksMulti-modal approachClinical corpusBaseline modelDocumentsOpen domainTaskExtraction technologyClinical operationsDifferent categoriesTextDiscovering novel drug-supplement interactions using SuppKG generated from the biomedical literature
Schutte D, Vasilakes J, Bompelli A, Zhou Y, Fiszman M, Xu H, Kilicoglu H, Bishop J, Adam T, Zhang R. Discovering novel drug-supplement interactions using SuppKG generated from the biomedical literature. Journal Of Biomedical Informatics 2022, 131: 104120. PMID: 35709900, PMCID: PMC9335448, DOI: 10.1016/j.jbi.2022.104120.Peer-Reviewed Original ResearchConceptsUnified Medical Language SystemComprehensive knowledge graphDomain terminologyKnowledge graphSemantic relationsNatural language processing technologyLanguage processing technologyNLP toolsDownstream tasksF1 scoreSemantic relationshipsDiscovery patternsPubMed abstractsLimited coverageBiomedical literatureProcessing technologyLanguage systemSemRepDietary supplement informationManual reviewNovel methodologyGraphNodesDomainTaskCombining human and machine intelligence for clinical trial eligibility querying
Fang Y, Idnay B, Sun Y, Liu H, Chen Z, Marder K, Xu H, Schnall R, Weng C. Combining human and machine intelligence for clinical trial eligibility querying. Journal Of The American Medical Informatics Association 2022, 29: 1161-1171. PMID: 35426943, PMCID: PMC9196697, DOI: 10.1093/jamia/ocac051.Peer-Reviewed Original ResearchConceptsNegation scope detectionCohort queriesScope detectionHealth Information Technology Usability Evaluation ScaleHuman-computer collaborationValue normalizationNatural language processingMachine intelligenceDomain expertsEligibility criteria textUsability evaluationLearnability scoreF1 scoreUser interventionLanguage processingHuman intelligenceUsability scoreQueriesError correctionEngagement featuresIntelligenceDisease trialsFrequent modificationsEnhanced modulesCOVID-19 clinical trials
2020
Named Entity Recognition from Table Headers in Randomized Controlled Trial Articles
Wei Q, Zhou Y, Zhao B, Hu X, Mei Q, Tao C, Xu H. Named Entity Recognition from Table Headers in Randomized Controlled Trial Articles. 2020, 00: 1-2. DOI: 10.1109/ichi48887.2020.9374323.Peer-Reviewed Original ResearchTable headersEntity recognitionDeep learning-based approachBiomedical text miningLearning-based approachNamed Entity RecognitionInformation extractionBiomedical entitiesF1 scoreText miningUnstructured natureBiomedical articlesContextual informationComputational applicationsHeaderSemantic complexityBetter performanceCorpusRecognitionInformationMiningApplicationsImportant informationComplexityBiomedical research
2019
A study of deep learning approaches for medication and adverse drug event extraction from clinical text
Wei Q, Ji Z, Li Z, Du J, Wang J, Xu J, Xiang Y, Tiryaki F, Wu S, Zhang Y, Tao C, Xu H. A study of deep learning approaches for medication and adverse drug event extraction from clinical text. Journal Of The American Medical Informatics Association 2019, 27: 13-21. PMID: 31135882, PMCID: PMC6913210, DOI: 10.1093/jamia/ocz063.Peer-Reviewed Original ResearchConceptsDeep learning-based approachDeep learning approachLearning-based approachTraditional machineLearning approachNational NLP Clinical ChallengesAdverse drug event extractionOutperform traditional machineDifferent ensemble approachesConditional Random FieldsSequence labeling approachMIMIC-III databaseEvent extractionMedical domainEntity recognitionClassification componentF1 scoreClinical textRelation extractionClinical documentsVector machineEnd evaluationEnsemble approachClinical corpusMachine
2016
A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD)
Wu Y, Denny J, Rosenbloom S, Miller R, Giuse D, Wang L, Blanquicett C, Soysal E, Xu J, Xu H. A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). Journal Of The American Medical Informatics Association 2016, 24: e79-e86. PMID: 27539197, PMCID: PMC7651947, DOI: 10.1093/jamia/ocw109.Peer-Reviewed Original ResearchConceptsClinical NLP systemsOpen-source frameworkNLP systemsClinical corpusClinical abbreviationsClinic visit notesSense inventoryKnowledge Extraction SystemAbbreviation recognitionWord sense disambiguation methodDischarge summariesF1 scoreExternal corpusClinical narrativesSense disambiguation methodSystem capabilitiesVanderbilt University Medical CenterWrapperFrequent abbreviationsDisambiguation methodMetaMapAbbreviation identificationCardsVisit notesDisambiguation
2015
A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text.
Wu Y, Xu J, Jiang M, Zhang Y, Xu H. A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text. AMIA Annual Symposium Proceedings 2015, 2015: 1326-33. PMID: 26958273, PMCID: PMC4765694.Peer-Reviewed Original ResearchConceptsNamed Entity RecognitionClinical NER systemNeural word embeddingsClinical Named Entity RecognitionWord embeddingsNER systemWord representationsI2b2 dataEntity recognitionEmbedding featuresClinical textNatural language processing researchConditional Random FieldsLanguage processing researchWord embedding featuresLarge unlabeled corpusBrown clustersNeural wordImportant patient informationFeature representationF1 scoreIntelligent monitoringCritical taskUnlabeled corpusSemantic relations