2016
Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning
Zhang Y, Xu J, Chen H, Wang J, Wu Y, Prakasam M, Xu H. Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning. Database 2016, 2016: baw049. PMID: 27087307, PMCID: PMC4834204, DOI: 10.1093/database/baw049.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsComputational BiologyData MiningDatabases, ChemicalPatents as TopicPattern Recognition, AutomatedPharmaceutical PreparationsUnsupervised Machine LearningConceptsMachine learning-based systemsLearning-based systemConditional Random FieldsDomain knowledgeEntity recognitionMatthews correlation coefficientDrug Named Entity RecognitionBioCreative V challengeInformation extraction systemWord representation featuresUnsupervised feature learningUnsupervised learning algorithmNamed Entity RecognitionSemantic type informationSupport vector machinePrecision-recall curveBrown clusteringFeature learningFeature engineeringUnsupervised featureIndividual subtasksMining systemNER taskLearning algorithmCPD task
2015
Recognizing Disjoint Clinical Concepts in Clinical Text Using Machine Learning-based Methods.
Tang B, Chen Q, Wang X, Wu Y, Zhang Y, Jiang M, Wang J, Xu H. Recognizing Disjoint Clinical Concepts in Clinical Text Using Machine Learning-based Methods. AMIA Annual Symposium Proceedings 2015, 2015: 1184-93. PMID: 26958258, PMCID: PMC4765674.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsHumansMachine LearningNatural Language ProcessingPattern Recognition, AutomatedSemanticsA Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text.
Wu Y, Xu J, Jiang M, Zhang Y, Xu H. A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text. AMIA Annual Symposium Proceedings 2015, 2015: 1326-33. PMID: 26958273, PMCID: PMC4765694.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsData CurationHumansNatural Language ProcessingPattern Recognition, AutomatedSemanticsTerminology as TopicConceptsNamed Entity RecognitionClinical NER systemNeural word embeddingsClinical Named Entity RecognitionWord embeddingsNER systemWord representationsI2b2 dataEntity recognitionEmbedding featuresClinical textNatural language processing researchConditional Random FieldsLanguage processing researchWord embedding featuresLarge unlabeled corpusBrown clustersNeural wordImportant patient informationFeature representationF1 scoreIntelligent monitoringCritical taskUnlabeled corpusSemantic relationsIdentifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2
Stubbs A, Kotfila C, Xu H, Uzuner Ö. Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2. Journal Of Biomedical Informatics 2015, 58: s67-s77. PMID: 26210362, PMCID: PMC4978189, DOI: 10.1016/j.jbi.2015.07.001.Peer-Reviewed Original ResearchMeSH KeywordsAgedBostonCohort StudiesComorbidityComputer SecurityConfidentialityCoronary Artery DiseaseData MiningDiabetes ComplicationsElectronic Health RecordsFemaleHumansIncidenceLongitudinal StudiesMaleMiddle AgedNarrationNatural Language ProcessingPattern Recognition, AutomatedRisk AssessmentVocabulary, ControlledConceptsCoronary artery diseaseRisk factorsLongitudinal medical recordsMedical recordsMedical risk factorsArtery diseaseDiabetic patientsSmoking statusHeart diseaseFamily historyI2b2/UTHealth natural language processingDiseaseI2b2/UTHealthProgressionUTHealthHypertensionHyperlipidemiaFactorsObesityDiabetesPatientsEase of adoption of clinical natural language processing software: An evaluation of five systems
Zheng K, Vydiswaran V, Liu Y, Wang Y, Stubbs A, Uzuner Ö, Gururaj A, Bayer S, Aberdeen J, Rumshisky A, Pakhomov S, Liu H, Xu H. Ease of adoption of clinical natural language processing software: An evaluation of five systems. Journal Of Biomedical Informatics 2015, 58: s189-s196. PMID: 26210361, PMCID: PMC4974203, DOI: 10.1016/j.jbi.2015.07.008.Peer-Reviewed Original ResearchMeSH KeywordsAttitude to ComputersData MiningElectronic Health RecordsHumansMiddle AgedNatural Language ProcessingPattern Recognition, AutomatedSoftwareUser-Computer InterfaceConceptsClinical NLP systemsNLP systemsNatural language processing softwareThird-party componentsUsability testing toolGroup of usersLanguage processing softwareEase of adoptionExpert evaluatorsSoftware distributionBiomedical softwareComputer scienceEnd usersUsability assessmentI2b2 challengeTesting toolsEvaluation showHuman evaluatorsSystem submissionsEase of useHealth informaticsProcessing softwareAdoption issuesUsersSpecial track
2014
Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks
Tang B, Cao H, Wang X, Chen Q, Xu H. Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks. BioMed Research International 2014, 2014: 240403. PMID: 24729964, PMCID: PMC3963372, DOI: 10.1155/2014/240403.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsArtificial IntelligenceBiomedical ResearchData MiningMEDLINENatural Language ProcessingPattern Recognition, AutomatedSemanticsTerminology as TopicVocabulary, ControlledConceptsBiomedical Named Entity RecognitionWord representationsNamed Entity Recognition (NER) taskMachine learning-based approachWord representation featuresNatural language processingLearning-based approachEntity recognition taskNamed Entity RecognitionCluster-based representationJNLPBA corpusEntity recognitionBiomedical domainF-measureLanguage processingRepresentation featuresWord embeddingsRecognition taskWR algorithmDistributional representationsTaskBetter performanceAlgorithmRepresentationDifferent types
2013
Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features
Tang B, Cao H, Wu Y, Jiang M, Xu H. Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features. BMC Medical Informatics And Decision Making 2013, 13: s1. PMID: 23566040, PMCID: PMC3618243, DOI: 10.1186/1472-6947-13-s1-s1.Peer-Reviewed Original ResearchConceptsStructural support vector machineWord representation featuresClinical NER tasksConditional Random FieldsSupport vector machinePerformance of MLClinical NER systemMachine learningRepresentation featuresNER systemNER taskVector machineEntity recognitionNatural language processing researchSequential labeling algorithmClinical entity recognitionLarge margin theoryClinical text processingLanguage processing researchPerformance of CRFsHighest F-measureClinical NLP researchI2b2 NLP challengeSame feature setsBetter performance
2012
A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries.
Wu Y, Denny J, Rosenbloom S, Miller R, Giuse D, Xu H. A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries. AMIA Annual Symposium Proceedings 2012, 2012: 997-1003. PMID: 23304375, PMCID: PMC3540461.Peer-Reviewed Original ResearchAbbreviations as TopicElectronic Health RecordsHumansNatural Language ProcessingPatient DischargePattern Recognition, AutomatedCombining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations.
Xu H, Stetson P, Friedman C. Combining corpus-derived sense profiles with estimated frequency information to disambiguate clinical abbreviations. AMIA Annual Symposium Proceedings 2012, 2012: 1004-13. PMID: 23304376, PMCID: PMC3540457.Peer-Reviewed Original ResearchAbbreviations as TopicElectronic Health RecordsHumansNatural Language ProcessingPattern Recognition, AutomatedRecognition of medication information from discharge summaries using ensembles of classifiers
Doan S, Collier N, Xu H, Duy P, Phuong T. Recognition of medication information from discharge summaries using ensembles of classifiers. BMC Medical Informatics And Decision Making 2012, 12: 36. PMID: 22564405, PMCID: PMC3502425, DOI: 10.1186/1472-6947-12-36.Peer-Reviewed Original ResearchMeSH KeywordsAlgorithmsArtificial IntelligenceDecision Support TechniquesFemaleHumansInformation Storage and RetrievalInstitutional Management TeamsMaleMedication SystemsNatural Language ProcessingPatient DischargePattern Recognition, AutomatedPharmaceutical PreparationsReproducibility of ResultsSemanticsSoftware DesignSupport Vector MachineConceptsConditional Random FieldsNatural language processingClinical natural language processingSupport vector machineBest F-scoreEnsemble classifierF-scoreClinical textIndividual classifiersVoting methodMajority votingLocal support vector machineSupervised machine learning methodsClinical entity recognitionClinical NLP systemsDifferent voting strategiesEntity recognition systemRule-based systemEnsemble of classifiersMachine learning methodsRule-based methodI2b2 NLP challengeEntity recognitionRecognition systemNLP systems
2011
Detecting abbreviations in discharge summaries using machine learning methods.
Wu Y, Rosenbloom S, Denny J, Miller R, Mani S, Giuse D, Xu H. Detecting abbreviations in discharge summaries using machine learning methods. AMIA Annual Symposium Proceedings 2011, 2011: 1541-9. PMID: 22195219, PMCID: PMC3243185.Peer-Reviewed Original ResearchMeSH KeywordsAbbreviations as TopicAlgorithmsArtificial IntelligenceDecision TreesElectronic Health RecordsHumansNatural Language ProcessingPatient DischargePattern Recognition, AutomatedSupport Vector MachineConceptsNatural language processingMachine learning methodsHighest F-measureF-measureClinical natural language processingLexical resourcesClinical abbreviationsTraining setPre-defined featuresRandom forest classifierDomain expertsML algorithmsML classifiersLanguage processingVoting schemeLearning methodsDischarge summariesForest classifierTest setClassifierCorpus-based methodSetResourcesAlgorithmAbbreviationsA study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries
Jiang M, Chen Y, Liu M, Rosenbloom S, Mani S, Denny J, Xu H. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. Journal Of The American Medical Informatics Association 2011, 18: 601-606. PMID: 21508414, PMCID: PMC3168315, DOI: 10.1136/amiajnl-2011-000163.Peer-Reviewed Original ResearchConceptsEntity extraction systemCenter of InformaticsConcept extractionIntegrating BiologyEntity recognition moduleEntity recognition systemConditional Random FieldsOverall F-scoreSupport vector machineRule-based moduleAssertion classificationClassification taskRecognition moduleRecognition systemML algorithmsSemantic informationTraining dataClinical textNatural languageF-measureChallenge organizersF-scoreVector machineEvaluation scriptsTraining corpus
2008
Methods for building sense inventories of abbreviations in clinical notes.
Xu H, Stetson P, Friedman C. Methods for building sense inventories of abbreviations in clinical notes. AMIA Annual Symposium Proceedings 2008, 2008: 819. PMID: 18999007, PMCID: PMC2656023.Peer-Reviewed Original Research
2006
Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues
Xu H, Markatou M, Dimova R, Liu H, Friedman C. Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues. BMC Bioinformatics 2006, 7: 334. PMID: 16822321, PMCID: PMC1550263, DOI: 10.1186/1471-2105-7-334.Peer-Reviewed Original ResearchConceptsNatural language processingBiomedical domainInformation retrieval systemsML methodsWSD classifierSense disambiguationMachine learning methodsVector machine classifierError rateWord sense disambiguationRetrieval systemMachine learningML techniquesText miningBiomedical abbreviationsLanguage processingLearning methodsCross-validation methodWSD problemMachine classifierAccurate accessSense distributionClassifierBiomolecular entitiesWSD task