Identifying and characterizing highly similar notes in big clinical note datasets
Gabriel R, Kuo T, McAuley J, Hsu C. Identifying and characterizing highly similar notes in big clinical note datasets. Journal Of Biomedical Informatics 2018, 82: 63-69. PMID: 29679685, DOI: 10.1016/j.jbi.2018.04.009.Peer-Reviewed Original ResearchConceptsClinical note datasetsDe-duplication algorithmMIMIC-III datasetElectronic health recordsJaccard similarityDe-duplicationLocality sensitive hashingMIMIC-IIINear-duplicatesScalable algorithmMeasure similarityAccurate statistical modelsSources of duplicationClustering methodDatasetAlgorithmApproximation algorithmHealth recordsDisjoint setsInstitutional datasetComparison of notesPairs of notesHashPairwise comparisonsPairwise