2020
Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies
Rasmy L, Tiryaki F, Zhou Y, Xiang Y, Tao C, Xu H, Zhi D. Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies. Journal Of The American Medical Informatics Association 2020, 27: 1593-1599. PMID: 32930711, PMCID: PMC7647355, DOI: 10.1093/jamia/ocaa180.Peer-Reviewed Original ResearchMeSH KeywordsAgedDatabases, FactualElectronic Health RecordsFemaleHumansMaleMiddle AgedROC CurveUnified Medical Language SystemVocabulary, ControlledConceptsUnified Medical Language SystemRecurrent neural networkNeural networkPrediction performanceLogistic regressionPredictive modelingDeep learningData aggregationElectronic health record dataMachine learningRisk predictionBetter prediction performanceDengue hemorrhagic feverHealth record dataEHR dataCancer predictionLarge vocabularyDifferent tasksPredictive modelHeart failureDiabetes patientsPancreatic cancerClinical dataHemorrhagic feverICD-9
2015
Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2
Stubbs A, Kotfila C, Xu H, Uzuner Ö. Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2. Journal Of Biomedical Informatics 2015, 58: s67-s77. PMID: 26210362, PMCID: PMC4978189, DOI: 10.1016/j.jbi.2015.07.001.Peer-Reviewed Original ResearchMeSH KeywordsAgedBostonCohort StudiesComorbidityComputer SecurityConfidentialityCoronary Artery DiseaseData MiningDiabetes ComplicationsElectronic Health RecordsFemaleHumansIncidenceLongitudinal StudiesMaleMiddle AgedNarrationNatural Language ProcessingPattern Recognition, AutomatedRisk AssessmentVocabulary, ControlledConceptsCoronary artery diseaseRisk factorsLongitudinal medical recordsMedical recordsMedical risk factorsArtery diseaseDiabetic patientsSmoking statusHeart diseaseFamily historyI2b2/UTHealth natural language processingDiseaseI2b2/UTHealthProgressionUTHealthHypertensionHyperlipidemiaFactorsObesityDiabetesPatientsNamed Entity Recognition in Chinese Clinical Text Using Deep Neural Network.
Wu Y, Jiang M, Lei J, Xu H. Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network. 2015, 216: 624-8. PMID: 26262126, PMCID: PMC4624324.Peer-Reviewed Original ResearchConceptsDeep neural networksLarge unlabeled corpusNamed Entity RecognitionWord embeddingsUnlabeled corpusUnsupervised learningEntity recognitionNeural networkNatural language processing technologyNovel deep learning methodLanguage processing technologyDeep learning methodsUnsupervised feature learningFeature engineering approachImportant healthcare informationChinese clinical textTypes of entitiesFeature learningNER taskClinical textLearning methodsClinical documentsCRF modelHealthcare informationFree text
2014
Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks
Tang B, Cao H, Wang X, Chen Q, Xu H. Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks. BioMed Research International 2014, 2014: 240403. PMID: 24729964, PMCID: PMC3963372, DOI: 10.1155/2014/240403.Peer-Reviewed Original ResearchConceptsBiomedical Named Entity RecognitionWord representationsNamed Entity Recognition (NER) taskMachine learning-based approachWord representation featuresNatural language processingLearning-based approachEntity recognition taskNamed Entity RecognitionCluster-based representationJNLPBA corpusEntity recognitionBiomedical domainF-measureLanguage processingRepresentation featuresWord embeddingsRecognition taskWR algorithmDistributional representationsTaskBetter performanceAlgorithmRepresentationDifferent types
2013
Analyzing differences between chinese and english clinical text: a cross-institution comparison of discharge summaries in two languages.
Wu Y, Lei J, Wei W, Tang B, Denny J, Rosenbloom S, Miller R, Giuse D, Zheng K, Xu H. Analyzing differences between chinese and english clinical text: a cross-institution comparison of discharge summaries in two languages. 2013, 192: 662-6. PMID: 23920639, PMCID: PMC4957806.Peer-Reviewed Original ResearchConceptsNatural language processing toolsEnglish clinical textClinical textLanguage processing toolsChinese clinical textCultural differencesMajor clinical componentsTextWestern institutionsInpatient discharge summariesCross-country collaborationDocument levelProcessing toolsClinical documentsLanguageUS institutionsUsesUnprecedented amountValuable insightsInstitutionsDocumentsChinaWorldwide adoptionEMR dataCollaboration
2012
Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method.
Jiang M, Denny J, Tang B, Cao H, Xu H. Extracting semantic lexicons from discharge summaries using machine learning and the C-Value method. AMIA Annual Symposium Proceedings 2012, 2012: 409-16. PMID: 23304311, PMCID: PMC3540581.Peer-Reviewed Original Research
2011
A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries
Jiang M, Chen Y, Liu M, Rosenbloom S, Mani S, Denny J, Xu H. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. Journal Of The American Medical Informatics Association 2011, 18: 601-606. PMID: 21508414, PMCID: PMC3168315, DOI: 10.1136/amiajnl-2011-000163.Peer-Reviewed Original ResearchConceptsEntity extraction systemCenter of InformaticsConcept extractionIntegrating BiologyEntity recognition moduleEntity recognition systemConditional Random FieldsOverall F-scoreSupport vector machineRule-based moduleAssertion classificationClassification taskRecognition moduleRecognition systemML algorithmsSemantic informationTraining dataClinical textNatural languageF-measureChallenge organizersF-scoreVector machineEvaluation scriptsTraining corpus
2007
Gene symbol disambiguation using knowledge-based profiles
Xu H, Fan J, Hripcsak G, Mendonça E, Markatou M, Friedman C. Gene symbol disambiguation using knowledge-based profiles. Bioinformatics 2007, 23: 1015-1022. PMID: 17314123, DOI: 10.1093/bioinformatics/btm056.Peer-Reviewed Original ResearchConceptsKnowledge sourcesSimilarity scoresInformation retrieval methodsGene symbol disambiguationText mining systemKnowledge-based profilesTesting data setsBiomedical entitiesBiomedical domainMEDLINE abstractsHigh similarity scoresRetrieval methodAmbiguous genesEntrez GeneGene symbolsDisambiguation taskTesting set
2006
Natural language processing and visualization in the molecular imaging domain
Tulipano P, Tao Y, Millar W, Zanzonico P, Kolbert K, Xu H, Yu H, Chen L, Lussier Y, Friedman C. Natural language processing and visualization in the molecular imaging domain. Journal Of Biomedical Informatics 2006, 40: 270-281. PMID: 17084109, DOI: 10.1016/j.jbi.2006.08.002.Peer-Reviewed Original ResearchMeSH KeywordsAnimalsCell LineComputational BiologyDatabases, BibliographicDatabases, GeneticDiagnostic ImagingGenomicsHumansInformation Storage and RetrievalNatural Language ProcessingPhenotypeProgramming LanguagesSoftwareSystems IntegrationTerminology as TopicUser-Computer InterfaceVocabulary, ControlledConceptsImaging domainNatural language processing systemsNatural language processingLanguage processing systemJava viewerNLP systemsFormal evaluation studiesLanguage processingInformation resourcesProcessing systemMedical imagingIndex imagesSystem performanceBiological informationInformationImagesVisualizationBioMedLEEPerformanceNLPEvaluation studyDomainGenomics literatureSystemSimultaneous visualizationMachine learning and word sense disambiguation in the biomedical domain: design and evaluation issues
Xu H, Markatou M, Dimova R, Liu H, Friedman C. Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues. BMC Bioinformatics 2006, 7: 334. PMID: 16822321, PMCID: PMC1550263, DOI: 10.1186/1471-2105-7-334.Peer-Reviewed Original ResearchConceptsNatural language processingBiomedical domainInformation retrieval systemsML methodsWSD classifierSense disambiguationMachine learning methodsVector machine classifierError rateWord sense disambiguationRetrieval systemMachine learningML techniquesText miningBiomedical abbreviationsLanguage processingLearning methodsCross-validation methodWSD problemMachine classifierAccurate accessSense distributionClassifierBiomolecular entitiesWSD task