2024
Privacy-Enhancing Technologies in Biomedical Data Science
Cho H, Froelicher D, Dokmai N, Nandi A, Sadhuka S, Hong M, Berger B. Privacy-Enhancing Technologies in Biomedical Data Science. Annual Review Of Biomedical Data Science 2024, 7: 317-343. PMID: 39178425, PMCID: PMC11346580, DOI: 10.1146/annurev-biodatasci-120423-120107.Peer-Reviewed Original ResearchConceptsPrivacy-enhancing technologiesAdoption of privacy-enhancing technologiesBiomedical data scienceData scienceAnalyze sensitive dataBiomedical data repositoriesPrivacy protectionSensitive dataPrivacy concernsData silosProtect privacyHuman subject dataBiomedical domainData repositoriesPrivacySubjective dataConventional frameworkSecure discovery of genetic relatives across large-scale and distributed genomic datasets
Hong M, Froelicher D, Magner R, Popic V, Berger B, Cho H. Secure discovery of genetic relatives across large-scale and distributed genomic datasets. Genome Research 2024, 34: gr.279057.124. PMID: 39111815, PMCID: PMC11529841, DOI: 10.1101/gr.279057.124.Peer-Reviewed Original ResearchMultiparty homomorphic encryptionIdentity-by-descentEffective hash functionsGenomic datasetsHomomorphic encryptionHash functionPrivate dataFederated algorithmBucketing strategyData holdersData silosDegree of relatednessRelation detectionGenetic relationEfficient algorithmMultiple entitiesRelatedness coefficientsPairs of individualsGenomic studiesDatasetIdentification of relationsRuntimeGenetic sequencesAccurate detectionAlgorithmSecure Discovery of Genetic Relatives Across Large-Scale and Distributed Genomic Datasets
Hong M, Froelicher D, Magner R, Popic V, Berger B, Cho H. Secure Discovery of Genetic Relatives Across Large-Scale and Distributed Genomic Datasets. Lecture Notes In Computer Science 2024, 14758: 308-313. PMID: 39027313, PMCID: PMC11257153, DOI: 10.1007/978-1-0716-3989-4_19.Peer-Reviewed Original ResearchIdentity-by-descentMultiparty homomorphic encryptionGenomic datasetsPairwise sequence comparisonsPrivacy-preserving solutionsDegree of relatednessEffective hash functionsGenetic relationPairs of individualsRelatedness coefficientsSequence comparisonCryptographic techniquesHomomorphic encryptionPrivacy guaranteesHash functionPrivate dataFederated algorithmPrivacy concernsGenetic sequencesData silosRelation detectionEfficient algorithmMultiple entitiesBurden of operatorsPrivacy
2023
Reconstruction of private genomes through reference-based genotype imputation
Mosca M, Cho H. Reconstruction of private genomes through reference-based genotype imputation. Genome Biology 2023, 24: 271. PMID: 38053191, PMCID: PMC10698978, DOI: 10.1186/s13059-023-03105-6.Peer-Reviewed Original ResearchAssessing transcriptomic reidentification risks using discriminative sequence models
Sadhuka S, Fridman D, Berger B, Cho H. Assessing transcriptomic reidentification risks using discriminative sequence models. Genome Research 2023, 33: 1101-1112. PMID: 37541758, PMCID: PMC10538488, DOI: 10.1101/gr.277699.123.Peer-Reviewed Original ResearchConceptsExpression quantitative trait lociGene expression dataExpression dataQuantitative trait lociOmics data setsGene expression profilesTrait lociGenomic regionsGenetic variationGene expressionExpression profilesMolecular insightsLinkage disequilibriumFunctional impactGenotypesTranscriptomicsLociSame individualDisequilibriumSequenceExpressionPrevious studiesFull extentData setssfkit: a web-based toolkit for secure and federated genomic analysis.
Mendelsohn S, Froelicher D, Loginov D, Bernick D, Berger B, Cho H. sfkit: a web-based toolkit for secure and federated genomic analysis. Nucleic Acids Research 2023, 51: w535-w541. PMID: 37246709, PMCID: PMC10320181, DOI: 10.1093/nar/gkad464.Peer-Reviewed Original ResearchConceptsCommand line interfaceGroup of collaboratorsCryptographic techniquesPrivacy concernsCollaborative workflowsUse casesWeb-based toolkitWeb serverComputational environmentCollaborative toolsMultiple partiesEssential taskDatasetServerPrivacyGenomic data collectionPrincipal component analysisToolkitData collectionWorkflowToolTaskComponent analysisRecent workComplexityScalable and Privacy-Preserving Federated Principal Component Analysis
Froelicher D, Cho H, Edupalli M, Sousa J, Bossuat J, Pyrgelis A, Troncoso-Pastoriza J, Berger B, Hubaux J. Scalable and Privacy-Preserving Federated Principal Component Analysis. 2016 IEEE Symposium On Security And Privacy (SP) 2023, 00: 1908-1925. PMID: 38665901, PMCID: PMC11044025, DOI: 10.1109/sp46215.2023.10179350.Peer-Reviewed Original ResearchHomomorphic encryptionData providersMultiparty homomorphic encryptionPrivacy-preserving alternativeMultiple data providersSecure multiparty computationPassive adversary modelData science domainCleartext dataData confidentialityPrivate dataMultiparty computationSecure systemsInteractive protocolDataset dimensionsEssential algorithmsCentralized solutionData distributionScience domainLocal analysis resultsDimensionality reductionIntermediate resultsEncryptionPrincipal component analysisOriginal dataSequre: a high-performance framework for secure multiparty computation enables biomedical data sharing
Smajlović H, Shajii A, Berger B, Cho H, Numanagić I. Sequre: a high-performance framework for secure multiparty computation enables biomedical data sharing. Genome Biology 2023, 24: 5. PMID: 36631897, PMCID: PMC9832703, DOI: 10.1186/s13059-022-02841-5.Peer-Reviewed Original ResearchConceptsSecure multiparty computationHigh-performance frameworkMultiparty computationMPC applicationsSensitive biomedical dataRapid application developmentPython programming languageCompile-time optimizationBiomedical data sharingCryptographic toolsApplication developmentInvolved entitiesProgramming languageBioinformatics tasksData sharingBiomedical dataPrivate informationComputationFrameworkUsabilitySharingApplicationsSyntaxPerformanceTask
2022
k-SALSA: k-Anonymous Synthetic Averaging of Retinal Images via Local Style Alignment
Jeon M, Park H, Kim H, Morley M, Cho H. k-SALSA: k-Anonymous Synthetic Averaging of Retinal Images via Local Style Alignment. Lecture Notes In Computer Science 2022, 13681: 661-678. PMID: 37525827, PMCID: PMC10388376, DOI: 10.1007/978-3-031-19803-8_39.Peer-Reviewed Original ResearchStyle alignmentMembership inference attacksRetinal imagesGenerative adversarial networkPotential of machineRetinal image analysisRetinal fundus imagesK-anonymityInference attacksPrivacy notionPrivate datasetAdversarial networkData sharingBenchmark datasetsTraining dataClassification performanceModern machineArt techniquesSource imagesImage fidelityFundus imagesPrior workVisual patternsImage analysisImagesMechanisms for Hiding Sensitive Genotypes With Information-Theoretic Privacy
Ye F, Cho H, Rouayheb S. Mechanisms for Hiding Sensitive Genotypes With Information-Theoretic Privacy. IEEE Transactions On Information Theory 2022, 68: 4090-4105. PMID: 37283781, PMCID: PMC10243750, DOI: 10.1109/tit.2022.3156276.Peer-Reviewed Original ResearchInformation-theoretic privacyGenomic data sharingOptimal greedy algorithmCritical health-related informationEfficient algorithmic implementationPrivacy leakagePrivacy mechanismsPrivacy problemsPersonal genomics servicesData sharingPrivacyGreedy algorithmStandard modeling approachesComplexity polynomialOptimal utilityAlgorithmic implementationProcessing orderHealth-related informationStraightforward solutionMarkov modelGenomic dataNearby positionsModeling approachGenomic servicesUsers
2021
Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption
Froelicher D, Troncoso-Pastoriza J, Raisaro J, Cuendet M, Sousa J, Cho H, Berger B, Fellay J, Hubaux J. Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption. Nature Communications 2021, 12: 5910. PMID: 34635645, PMCID: PMC8505638, DOI: 10.1038/s41467-021-25972-y.Peer-Reviewed Original ResearchConceptsMultiparty homomorphic encryptionHomomorphic encryptionPrivacy-preserving analysisNecessary key stepMultiple healthcare institutionsFederated analyticsFederated settingAnalysis tasksAnalytics systemIntermediate dataEncryptionCentralized studiesPatient dataBiomedical insightsScientific collaborationAccurate resultsIndispensable complementAnalyticsHealthcare institutionsDatasetTaskSystemBiomedical researchAccessCollaborationPrivacy-preserving genotype imputation in a trusted execution environment
Dokmai N, Kockan C, Zhu K, Wang X, Sahinalp S, Cho H. Privacy-preserving genotype imputation in a trusted execution environment. Cell Systems 2021, 12: 983-993.e7. PMID: 34450045, PMCID: PMC8542641, DOI: 10.1016/j.cels.2021.08.001.Peer-Reviewed Original ResearchConceptsTrusted Execution EnvironmentExecution environmentHardware-based solutionsSide-channel attacksIntel SGXEnhanced securityPrivacy concernsAnalysis servicesImputation ServerServer limitData resourcesImputation algorithmSGXServerImputation softwareGenomic data resourcesImputation accuracyGenotype imputationImputation strategiesServicesDownstream analysisScalabilityImputationEssential toolSecurityBayesian information sharing enhances detection of regulatory associations in rare cell types
Wu A, Peng J, Berger B, Cho H. Bayesian information sharing enhances detection of regulatory associations in rare cell types. Bioinformatics 2021, 37: i349-i357. PMID: 34252956, PMCID: PMC8275330, DOI: 10.1093/bioinformatics/btab269.Peer-Reviewed Original ResearchConceptsScRNA-seq datasetsRegulatory associationsCell typesRegulatory networksCell type-specific gene regulatory networksCell-type specific gene regulationSingle-cell RNA sequencing technologyCell-type specific networksBenchmark scRNA-seq datasetsDiverse cellular contextsGene regulatory network inference methodRNA sequencing technologyGene regulatory networksRare cell typesSingle-cell datasetsSpecific cell typesNetwork inference methodsDynamic biological processesTranscriptional statesGene regulationCellular contextNetwork inference algorithmsComplex rewiringBiological processesGene associationsAssessing single-cell transcriptomic variability through density-preserving data visualization
Narayan A, Berger B, Cho H. Assessing single-cell transcriptomic variability through density-preserving data visualization. Nature Biotechnology 2021, 39: 765-774. PMID: 33462509, PMCID: PMC8195812, DOI: 10.1038/s41587-020-00801-7.Peer-Reviewed Original Research
2020
Mechanisms for Hiding Sensitive Genotypes with Information-Theoretic Privacy
Ye F, Cho H, Rouayheb S. Mechanisms for Hiding Sensitive Genotypes with Information-Theoretic Privacy. 2020, 00: 902-907. DOI: 10.1109/isit44484.2020.9174492.Peer-Reviewed Original ResearchInformation-theoretic privacyGenomic data sharingCritical health-related informationEfficient algorithmic implementationHidden Markov ModelPersonal genomics servicesData sharingGenomic privacyPrivacyAlgorithmic implementationSuch servicesHealth-related informationStraightforward solutionMarkov modelGenomic dataServicesGenomic servicesInformationSharingImplementationCorrelation structureDataPrivacy-Preserving Biomedical Database Queries with Optimal Privacy-Utility Trade-Offs
Cho H, Simmons S, Kim R, Berger B. Privacy-Preserving Biomedical Database Queries with Optimal Privacy-Utility Trade-Offs. Cell Systems 2020, 10: 408-416.e9. PMID: 32359425, DOI: 10.1016/j.cels.2020.03.006.Peer-Reviewed Original ResearchConceptsDifferential privacySensitive individual-level dataFormal privacy guaranteesQuery-answering systemPrivacy-utility tradePrivacy guaranteesQuery answersCount queriesCohort discoveryDatabase queriesUtility functionUse casesProof of optimalityResearch workflowAggregate insightsBiomedical databasesAccuracy improvementPrivate informationQueriesPrivacyGeneral utility functionDatabaseMore general utility functionsNew theoretical resultsLookup
2019
Emerging technologies towards enhancing privacy in genomic data sharing
Berger B, Cho H. Emerging technologies towards enhancing privacy in genomic data sharing. Genome Biology 2019, 20: 128. PMID: 31262363, PMCID: PMC6604426, DOI: 10.1186/s13059-019-1741-0.Commentaries, Editorials and LettersGeometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape
Hie B, Cho H, DeMeo B, Bryson B, Berger B. Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape. Cell Systems 2019, 8: 483-493.e7. PMID: 31176620, PMCID: PMC6597305, DOI: 10.1016/j.cels.2019.05.003.Peer-Reviewed Original ResearchConceptsSingle-cell transcriptomic landscapeSingle-cell RNA sequencing studiesSingle-cell omicsCell typesSeq data integrationSingle-cell data analysisRare cell typesRNA sequencing studiesScRNA-seq dataTranscriptional diversityTranscriptomic landscapeBiological cell typesTranscriptomic heterogeneitySequencing studiesRare subpopulationAnalysis pipelineCellsUmbilical cord bloodEssential stepInflammatory macrophagesOmicsComprehensive visualizationDiversityGeometric sketchHundreds of thousandsLarge-Margin Classification in Hyperbolic Space.
Cho H, DeMeo B, Peng J, Berger B. Large-Margin Classification in Hyperbolic Space. Proceedings Of Machine Learning Research 2019, 89: 1832-1840. PMID: 32832915, PMCID: PMC7434093.Peer-Reviewed Original ResearchHyperbolic spaceHyperbolic geometrySupport vector machine classifierReal-world problemsVector machine classifierHyperbolic formulationGeometric interpretationEuclidean spaceEuclidean counterpartCertain classKernel SVMClassification accuracyWord embeddingsMachine classifierDecision boundariesComplex networksAccurate classificationMargin classificationSVMIndefinite kernelsTheoretical connectionsSpaceHierarchical relationshipsEnd analysisGeometry
2018
Realizing private and practical pharmacological collaboration
Hie B, Cho H, Berger B. Realizing private and practical pharmacological collaboration. Science 2018, 362: 347-350. PMID: 30337410, PMCID: PMC6519716, DOI: 10.1126/science.aat4807.Peer-Reviewed Original ResearchConceptsArt DTI prediction methodsDrug-target interactionsDTI prediction methodsIntellectual property concernsCryptographic toolsData privacyData sharingMultiple entitiesReal datasetsOpen sharingProperty concernsPrediction methodSharingDatasetPredictive modelPrivacyProtocolConfidentialityBiomedical researchCollaborationToolDataEntities