2024
Advancing entity recognition in biomedicine via instruction tuning of large language models
Keloth V, Hu Y, Xie Q, Peng X, Wang Y, Zheng A, Selek M, Raja K, Wei C, Jin Q, Lu Z, Chen Q, Xu H. Advancing entity recognition in biomedicine via instruction tuning of large language models. Bioinformatics 2024, 40: btae163. PMID: 38514400, PMCID: PMC11001490, DOI: 10.1093/bioinformatics/btae163.Peer-Reviewed Original ResearchNamed Entity RecognitionSequence labeling taskNatural language processingBiomedical NER datasetsLanguage modelNER datasetsEntity recognitionLabeling taskText generationField of natural language processingBiomedical NERFew-shot learning capabilityReasoning tasksMulti-domain scenariosDomain-specific modelsEnd-to-endMinimal fine-tuningSOTA performanceF1 scoreHealthcare applicationsBiomedical entitiesBiomedical domainLanguage processingMulti-taskingPubMedBERT model
2022
Natural Language Processing
Xu H, Roberts K. Natural Language Processing. Cognitive Informatics In Biomedicine And Healthcare 2022, 213-234. DOI: 10.1007/978-3-031-09108-7_7.Peer-Reviewed Original ResearchNatural language processingLanguage processingElectronic health recordsBiomedical domainBiomedical natural language processingCommon NLP tasksNarrative textNLP tasksBiomedical articlesClinical documentsNLP fieldTextHealth recordsLarge amountBasic conceptsBibliographic databasesProcessingTaskArticleDocumentsDomainChapterDatabaseInformationAttentionA comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora
Li J, Wei Q, Ghiasvand O, Chen M, Lobanov V, Weng C, Xu H. A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora. BMC Medical Informatics And Decision Making 2022, 22: 235. PMID: 36068551, PMCID: PMC9450226, DOI: 10.1186/s12911-022-01967-7.Peer-Reviewed Original ResearchConceptsPre-trained language modelsNER taskUnstructured textEntity recognitionLanguage modelNatural language processing techniquesClinical trial eligibility criteriaLanguage processing techniquesData augmentation resultsData augmentation approachDomain-specific corpusBetter performanceTransformer modelCross-validation showMultiple data sourcesEligibility criteria textBiomedical domainEmbedding modelsNER performanceAugmentation approachContextual embeddingsMeaningful informationEvaluation resultsSuch documentsProcessing techniques
2019
Recognizing software names in biomedical literature using machine learning
Wei Q, Zhang Y, Amith M, Lin R, Lapeyrolerie J, Tao C, Xu H. Recognizing software names in biomedical literature using machine learning. Health Informatics Journal 2019, 26: 21-33. PMID: 31566474, PMCID: PMC7334865, DOI: 10.1177/1460458219869490.Peer-Reviewed Original ResearchConceptsSoftware namesF-measureNatural language processing methodsBiomedical literatureWord representation featuresLanguage processing methodsEntity recognition systemSoftware catalogSoftware repositoriesFeature engineeringBiomedical softwareRecognition systemSoftware toolsBiomedical domainRepresentation featuresMEDLINE abstractsWord embeddingsKnowledge featuresManual curationSoftwareMachineProcessing methodsBest systemRepositorySystem
2018
DataMed – an open source discovery index for finding biomedical datasets
Chen X, Gururaj A, Ozyurt B, Liu R, Soysal E, Cohen T, Tiryaki F, Li Y, Zong N, Jiang M, Rogith D, Salimi M, Kim H, Rocca-Serra P, Gonzalez-Beltran A, Farcas C, Johnson T, Margolis R, Alter G, Sansone S, Fore I, Ohno-Machado L, Grethe J, Xu H. DataMed – an open source discovery index for finding biomedical datasets. Journal Of The American Medical Informatics Association 2018, 25: 300-308. PMID: 29346583, PMCID: PMC7378878, DOI: 10.1093/jamia/ocx121.Peer-Reviewed Original ResearchIngestion pipelineBiomedical datasetsSearch enginesBiomedical domainAdvanced natural language processingRelevant datasetsUser-entered queryData discovery systemUnified metadata modelData ingestion pipelinesNatural language processingOpen-source packageRetrieval engineTerminology servicesMetadata modelMetadata informationDiscovery systemData reuseDataMedBenchmark datasetsBiomedical dataData indexAverage precisionLanguage processingSource package
2015
A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature
Tang B, Feng Y, Wang X, Wu Y, Zhang Y, Jiang M, Wang J, Xu H. A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature. Journal Of Cheminformatics 2015, 7: s8. PMID: 25810779, PMCID: PMC4331698, DOI: 10.1186/1758-2946-7-s1-s8.Peer-Reviewed Original ResearchMachine learning-based systemsConditional Random FieldsLearning-based systemEntity recognition systemSupport vector machineEntity recognitionRecognition systemF-measureChallenge organizersDrug Named Entity RecognitionVector machineStructured support vector machineMicro F-measureInformation extraction tasksWord representation featuresNamed Entity RecognitionTest setRandom fieldsPrimary evaluation measureBrown clusteringDocument indexingIndividual subtasksExtraction taskRandom IndexingBiomedical domain
2014
Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks
Tang B, Cao H, Wang X, Chen Q, Xu H. Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks. BioMed Research International 2014, 2014: 240403. PMID: 24729964, PMCID: PMC3963372, DOI: 10.1155/2014/240403.Peer-Reviewed Original ResearchConceptsBiomedical Named Entity RecognitionWord representationsNamed Entity Recognition (NER) taskMachine learning-based approachWord representation featuresNatural language processingLearning-based approachEntity recognition taskNamed Entity RecognitionCluster-based representationJNLPBA corpusEntity recognitionBiomedical domainF-measureLanguage processingRepresentation featuresWord embeddingsRecognition taskWR algorithmDistributional representationsTaskBetter performanceAlgorithmRepresentationDifferent types
2007
Gene symbol disambiguation using knowledge-based profiles
Xu H, Fan J, Hripcsak G, Mendonça E, Markatou M, Friedman C. Gene symbol disambiguation using knowledge-based profiles. Bioinformatics 2007, 23: 1015-1022. PMID: 17314123, DOI: 10.1093/bioinformatics/btm056.Peer-Reviewed Original ResearchConceptsKnowledge sourcesSimilarity scoresInformation retrieval methodsGene symbol disambiguationText mining systemKnowledge-based profilesTesting data setsBiomedical entitiesBiomedical domainMEDLINE abstractsHigh similarity scoresRetrieval methodAmbiguous genesEntrez GeneGene symbolsDisambiguation taskTesting set
2006
Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues
Xu H, Markatou M, Dimova R, Liu H, Friedman C. Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues. BMC Bioinformatics 2006, 7: 334. PMID: 16822321, PMCID: PMC1550263, DOI: 10.1186/1471-2105-7-334.Peer-Reviewed Original ResearchConceptsNatural language processingBiomedical domainInformation retrieval systemsML methodsWSD classifierSense disambiguationMachine learning methodsVector machine classifierError rateWord sense disambiguationRetrieval systemMachine learningML techniquesText miningBiomedical abbreviationsLanguage processingLearning methodsCross-validation methodWSD problemMachine classifierAccurate accessSense distributionClassifierBiomolecular entitiesWSD task