Hyunghoon Cho, PhD
he/him/his
Cards
About
Research
Publications
2025
Generating synthetic electronic health record data: a methodological scoping review with benchmarking on phenotype data and open-source software
Chen X, Wu Z, Shi X, Cho H, Mukherjee B. Generating synthetic electronic health record data: a methodological scoping review with benchmarking on phenotype data and open-source software. Journal Of The American Medical Informatics Association 2025, 32: 1227-1240. PMID: 40460023, PMCID: PMC12203555, DOI: 10.1093/jamia/ocaf082.Peer-Reviewed Original ResearchConceptsGAN-based methodsElectronic health recordsOpen-source softwareBaseline methodsMIMIC-IIIGenerative adversarial network (GAN)-based methodsAdversarial network (GAN)-based methodsSynthetic electronic health recordsDownstream use casesRule-based methodElectronic health record datasetPrivacy exposurePrivacy protectionEvaluation metricsUse casesCompetitive performanceCondition generation methodSynthetic dataDecision treeBenchmark methodsElectronic health record dataData generationGeneration methodComprehensive benchmarkMIMIC-IVTX-Phase: Secure Phasing of Private Genomes in a Trusted Execution Environment
Dokmai N, Zhu K, Sahinalp S, Cho H. TX-Phase: Secure Phasing of Private Genomes in a Trusted Execution Environment. Lecture Notes In Computer Science 2025, 15647: 325-329. DOI: 10.1007/978-3-031-90252-9_32.Peer-Reviewed Original ResearchTrusted execution environmentExecution environmentData privacy concernsSide-channel leakageState-of-the-artExtract valuable insightsOpen-source softwarePrivate genomic dataFixed-point arithmeticData confidentialityPrivacy constraintsPrivacy concernsSecurity phaseAlgorithmic techniquesServerGenomic dataEnhanced accuracyPractical performanceImputation workflowAnalysis toolsDatasetPhase algorithmAccuracyPrivacyHaplotype phasingSecure and federated genome-wide association studies for biobank-scale datasets
Cho, H., Froelicher, D., Chen, J. et al. Secure and federated genome-wide association studies for biobank-scale datasets. Nature Genetics 2025. DOI:10.1038/s41588-025-02109-1.Peer-Reviewed Original ResearchSecure and federated genome-wide association studies for biobank-scale datasets
Cho H, Froelicher D, Chen J, Edupalli M, Pyrgelis A, Troncoso-Pastoriza J, Hubaux J, Berger B. Secure and federated genome-wide association studies for biobank-scale datasets. Nature Genetics 2025, 57: 809-814. PMID: 39994472, PMCID: PMC11985345, DOI: 10.1038/s41588-025-02109-1.Peer-Reviewed Original ResearchGenome-wide association studiesAssociation studiesGenome-wide association study pipelineDiscovery of genetic variationsBiobank-scale datasetsGenomic studiesGenetic variationCryptographic toolsPrivacy guaranteesData confidentialityPrivate dataDistributed algorithmComputing promisesUK Biobank cohortMultiple entitiesSharing dataComputational frameworkBiobank cohortRuntimeCollaborative analysisDatasetPrincipal-component analysisLinear mixed modelsPrivacyDisease1,2Learning-augmented sketching offers improved performance for privacy preserving and secure GWAS
Xu J, Zhu K, Cai J, Kockan C, Dokmai N, Cho H, Woodruff D, Sahinalp S. Learning-augmented sketching offers improved performance for privacy preserving and secure GWAS. IScience 2025, 28: 112011. PMID: 40124506, PMCID: PMC11927738, DOI: 10.1016/j.isci.2025.112011.Peer-Reviewed Original ResearchTrusted execution environmentGenome-wide association studiesPublic training datasetsComputational resource constraintsOptimize memory usageIntel SGXPrivacy guaranteesPrivacy preservationExecution environmentCloud providersMemory usageMemory constraintsTraining datasetDatasetResource constraintsHigher accuracyPrivacyDedicated memoryExperimental resultsImproved performanceSignificant SNPsGWA studiesAssociation studiesGenotype dataSNPs
2024
Privacy of single-cell gene expression data
Cho H. Privacy of single-cell gene expression data. Patterns 2024, 5: 101096. PMID: 39568471, PMCID: PMC11573887, DOI: 10.1016/j.patter.2024.101096.Commentaries, Editorials and LettersPrivacy-Enhancing Technologies in Biomedical Data Science
Cho H, Froelicher D, Dokmai N, Nandi A, Sadhuka S, Hong M, Berger B. Privacy-Enhancing Technologies in Biomedical Data Science. Annual Review Of Biomedical Data Science 2024, 7: 317-343. PMID: 39178425, PMCID: PMC11346580, DOI: 10.1146/annurev-biodatasci-120423-120107.Peer-Reviewed Original ResearchConceptsPrivacy-enhancing technologiesAdoption of privacy-enhancing technologiesBiomedical data scienceData scienceAnalyze sensitive dataBiomedical data repositoriesPrivacy protectionSensitive dataPrivacy concernsData silosProtect privacyHuman subject dataBiomedical domainData repositoriesPrivacySubjective dataConventional frameworkSecure discovery of genetic relatives across large-scale and distributed genomic datasets
Hong M, Froelicher D, Magner R, Popic V, Berger B, Cho H. Secure discovery of genetic relatives across large-scale and distributed genomic datasets. Genome Research 2024, 34: gr.279057.124. PMID: 39111815, PMCID: PMC11529841, DOI: 10.1101/gr.279057.124.Peer-Reviewed Original ResearchMultiparty homomorphic encryptionIdentity-by-descentEffective hash functionsGenomic datasetsHomomorphic encryptionHash functionPrivate dataFederated algorithmBucketing strategyData holdersData silosDegree of relatednessRelation detectionGenetic relationEfficient algorithmMultiple entitiesRelatedness coefficientsPairs of individualsGenomic studiesDatasetIdentification of relationsRuntimeGenetic sequencesAccurate detectionAlgorithm
2023
Reconstruction of private genomes through reference-based genotype imputation
Mosca M, Cho H. Reconstruction of private genomes through reference-based genotype imputation. Genome Biology 2023, 24: 271. PMID: 38053191, PMCID: PMC10698978, DOI: 10.1186/s13059-023-03105-6.Peer-Reviewed Original ResearchAssessing transcriptomic reidentification risks using discriminative sequence models
Sadhuka S, Fridman D, Berger B, Cho H. Assessing transcriptomic reidentification risks using discriminative sequence models. Genome Research 2023, 33: 1101-1112. PMID: 37541758, PMCID: PMC10538488, DOI: 10.1101/gr.277699.123.Peer-Reviewed Original ResearchConceptsExpression quantitative trait lociGene expression dataExpression dataQuantitative trait lociOmics data setsGene expression profilesTrait lociGenomic regionsGenetic variationGene expressionExpression profilesMolecular insightsLinkage disequilibriumFunctional impactGenotypesTranscriptomicsLociSame individualDisequilibriumSequenceExpressionPrevious studiesFull extentData sets
Teaching & Mentoring
Mentoring
Lucy Zheng
CBB PhD student2024 - PresentAnupama Nandi
Postdoc2023 - PresentNatnatee Dokmai
Postdoc2023 - Present
News & Links
News
Get In Touch
Contacts
Email
Academic Office Number
Locations
101 College Street
Academic Office
Fl 10, Rm 1021M
New Haven, CT 06510