2018
Identifying and characterizing highly similar notes in big clinical note datasets
Gabriel R, Kuo T, McAuley J, Hsu C. Identifying and characterizing highly similar notes in big clinical note datasets. Journal Of Biomedical Informatics 2018, 82: 63-69. PMID: 29679685, DOI: 10.1016/j.jbi.2018.04.009.Peer-Reviewed Original ResearchConceptsClinical note datasetsDe-duplication algorithmMIMIC-III datasetElectronic health recordsJaccard similarityDe-duplicationLocality sensitive hashingMIMIC-IIINear-duplicatesScalable algorithmMeasure similarityAccurate statistical modelsSources of duplicationClustering methodDatasetAlgorithmApproximation algorithmHealth recordsDisjoint setsInstitutional datasetComparison of notesPairs of notesHashPairwise comparisonsPairwise
2017
Ensembles of NLP Tools for Data Element Extraction from Clinical Notes.
Kuo T, Rao P, Maehara C, Doan S, Chaparro J, Day M, Farcas C, Ohno-Machado L, Hsu C. Ensembles of NLP Tools for Data Element Extraction from Clinical Notes. AMIA Annual Symposium Proceedings 2017, 2016: 1880-1889. PMID: 28269947, PMCID: PMC5333200.Peer-Reviewed Original ResearchMeSH KeywordsData CollectionElectronic Health RecordsHumansInformation Storage and RetrievalNatural Language ProcessingConceptsNatural language processingNLP toolsElectronic health recordsData elementsConcept extractionLanguage processingEnsemble methodDiverse conceptsEvaluation resultsHealth recordsElement extractionClinical notesPlausible solutionToolPipelineExtractionPerformanceEnsembleExtraction performanceConceptNarrative textProcessingText