2017
Comparative Analysis of Sequence Clustering Methods for Deduplication of Biological Databases
Chen Q, Wan Y, Zhang X, Lei Y, Zobel J, Verspoor K. Comparative Analysis of Sequence Clustering Methods for Deduplication of Biological Databases. Journal Of Data And Information Quality 2017, 9: 1-27. DOI: 10.1145/3131611.Peer-Reviewed Original ResearchVolume of dataClustering methodMassive volume of dataSequence clustering methodsClustering evaluation metricsBiological sequence databasesDetecting duplicate sequencesDatabase storageEvaluation metricsAnnotation consistencyMassive volumeDeduplicationSimilarity thresholdDetect duplicatesLevel of redundancyBiological case studyDatabase curationHeuristic approachDuplicated sequencesCD-HITBiological databasesSequence databasesBiological sequencesData qualityMultiple records
2016
Evaluation of CD-HIT for Constructing Non-Redundant Databases
Chen Q, Wan Y, Lei Y, Zobel J, Verspoor K. Evaluation of CD-HIT for Constructing Non-Redundant Databases. 2016, 703-706. DOI: 10.1109/bibm.2016.7822604.Peer-Reviewed Original Research
This site is protected by hCaptcha and its Privacy Policy and Terms of Service apply