Haoyu Cheng, PhD
Assistant Professor of Biomedical Informatics and Data ScienceCards
About
Titles
Assistant Professor of Biomedical Informatics and Data Science
Affiliated Faculty, Yale Center for Genomic Health
Biography
Haoyu Cheng is a tenure-track Assistant Professor at the Department of Biomedical Informatics and Data Science (BIDS) at Yale University. His research is dedicated to creating highly efficient computational methodologies for genomic applications, such as genome assembly, read alignment, variant calling, and string indexing. He has developed a series of de novo genome assembly algorithms (e.g. hifiasm) that have been extensively utilized across a variety of large-scale sequencing projects, including the Human Pangenome Reference Consortium, the Vertebrate Genomes Project, and the Darwin Tree of Life project. Within these projects, he also works closely with collaborators to explore the applications of genome assemblies. Haoyu was a Postdoctoral Scholar working with Dr. Heng Li at Dana-Farber Cancer Institute and Harvard Medical School. He obtained his Ph.D. degree in Computer Science from the University of Science and Technology of China, under the supervision of Dr. Yun Xu.
Selected publications
1. Cheng H, Asri M, Lucas J, Koren S, Li H. “Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph.” Nat Methods (2024).
2. Cheng H, Jarvis ED, Fedrigo O, Koepfli KP, Urban L, Gemmell NJ, Li H. “Haplotype-resolved assembly of diploid genomes without parental data.” Nat Biotechnol (2022).
3. Cheng H, Concepcion GT, Feng X, Zhang H, Li H. “Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm.” Nat Methods (2021).
4. Cheng H, Wu M, Xu Y. “FMtree: a fast locating algorithm of FM-indexes for genomic data.” Bioinformatics (2018).
Appointments
Biomedical Informatics & Data Science
Assistant ProfessorPrimary
Other Departments & Organizations
Education & Training
- Postdoctoral Fellow
- Dana-Farber Cancer Institute and Harvard Medical School (2024)
- PhD
- University of Science and Technology of China, Computer Science (2019)
Research
Publications
Featured Publications
Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm.
Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 2021, 18: 170-175. PMID: 33526886, DOI: 10.1038/s41592-020-01056-5.Peer-Reviewed Original ResearchHaplotype-resolved assembly of diploid genomes without parental data.
Cheng H, Jarvis ED, Fedrigo O, Koepfli KP, Urban L, Gemmell NJ, Li H. Haplotype-resolved assembly of diploid genomes without parental data. Nat Biotechnol 2022, 40: 1332-1335. PMID: 35332338, DOI: 10.1038/s41587-022-01261-x.Peer-Reviewed Original ResearchScalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph.
Cheng H, Asri M, Lucas J, Koren S, Li H. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Nat Methods 2024, 21: 967-970. PMID: 38730258, DOI: 10.1038/s41592-024-02269-8.Peer-Reviewed Original ResearchFMtree: a fast locating algorithm of FM-indexes for genomic data.
Cheng H, Wu M, Xu Y. FMtree: a fast locating algorithm of FM-indexes for genomic data. Bioinformatics 2018, 34: 416-424. PMID: 28968761, DOI: 10.1093/bioinformatics/btx596.Peer-Reviewed Original Research
2024
Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy.
Larivière D, Abueg L, Brajuka N, Gallardo-Alba C, Grüning B, Ko BJ, Ostrovsky A, Palmada-Flores M, Pickett BD, Rabbani K, Antunes A, Balacco JR, Chaisson MJP, Cheng H, Collins J, Couture M, Denisova A, Fedrigo O, Gallo GR, Giani AM, Gooder GM, Horan K, Jain N, Johnson C, Kim H, Lee C, Marques-Bonet T, O'Toole B, Rhie A, Secomandi S, Sozzoni M, Tilley T, Uliano-Silva M, van den Beek M, Williams RW, Waterhouse RM, Phillippy AM, Jarvis ED, Schatz MC, Nekrutenko A, Formenti G. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nat Biotechnol 2024, 42: 367-370. PMID: 38278971, DOI: 10.1038/s41587-023-02100-3.Peer-Reviewed Original Research
2023
De novo reconstruction of satellite repeat units from sequence data.
Zhang Y, Chu J, Cheng H, Li H. De novo reconstruction of satellite repeat units from sequence data. Genome Res 2023, 33: 1994-2001. PMID: 37918962, DOI: 10.1101/gr.278005.123.Peer-Reviewed Original ResearchPan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome
Lee H, Greer S, Pavlichin D, Zhou B, Urban A, Weissman T, Consortium H, Liao W, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas J, Monlong J, Abel H, Buonaiuto S, Chang X, Cheng H, Chu J, Colonna V, Eizenga J, Feng X, Fischer C, Fulton R, Garg S, Groza C, Guarracino A, Harvey W, Heumos S, Howe K, Jain M, Lu T, Markello C, Martin F, Mitchell M, Munson K, Mwaniki M, Novak A, Olsen H, Pesout T, Porubsky D, Prins P, Sibbesen J, Tomlinson C, Villani F, Vollger M, Antonacci-Fulton L, Baid G, Baker C, Belyaeva A, Billis K, Carroll A, Chang P, Cody S, Cook D, Cornejo O, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld A, Formenti G, Frankish A, Gao Y, Giron C, Green R, Haggerty L, Hoekzema K, Hourlier T, Ji H, Kolesnikov A, Korbel J, Kordosky J, Lee H, Lewis A, Magalhães H, Marco-Sola S, Marijon P, McDaniel J, Mountcastle J, Nattestad M, Olson N, Puiu D, Regier A, Rhie A, Sacco S, Sanders A, Schneider V, Schultz B, Shafin K, Sirén J, Smith M, Sofia H, Tayoun A, Thibaud-Nissen F, Tricomi F, Wagner J, Wood J, Zimin A, Popejoy A, Bourque G, Chaisson M, Flicek P, Phillippy A, Zook J, Eichler E, Haussler D, Jarvis E, Miga K, Wang T, Garrison E, Marschall T, Hall I, Li H, Paten B, Ji H. Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome. Cell Reports Methods 2023, 3: 100543. PMID: 37671027, PMCID: PMC10475782, DOI: 10.1016/j.crmeth.2023.100543.Peer-Reviewed Original ResearchA draft human pangenome reference
Liao W, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas J, Monlong J, Abel H, Buonaiuto S, Chang X, Cheng H, Chu J, Colonna V, Eizenga J, Feng X, Fischer C, Fulton R, Garg S, Groza C, Guarracino A, Harvey W, Heumos S, Howe K, Jain M, Lu T, Markello C, Martin F, Mitchell M, Munson K, Mwaniki M, Novak A, Olsen H, Pesout T, Porubsky D, Prins P, Sibbesen J, Sirén J, Tomlinson C, Villani F, Vollger M, Antonacci-Fulton L, Baid G, Baker C, Belyaeva A, Billis K, Carroll A, Chang P, Cody S, Cook D, Cook-Deegan R, Cornejo O, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld A, Formenti G, Frankish A, Gao Y, Garrison N, Giron C, Green R, Haggerty L, Hoekzema K, Hourlier T, Ji H, Kenny E, Koenig B, Kolesnikov A, Korbel J, Kordosky J, Koren S, Lee H, Lewis A, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson N, Popejoy A, Puiu D, Rautiainen M, Regier A, Rhie A, Sacco S, Sanders A, Schneider V, Schultz B, Shafin K, Smith M, Sofia H, Abou Tayoun A, Thibaud-Nissen F, Tricomi F, Wagner J, Walenz B, Wood J, Zimin A, Bourque G, Chaisson M, Flicek P, Phillippy A, Zook J, Eichler E, Haussler D, Wang T, Jarvis E, Miga K, Garrison E, Marschall T, Hall I, Li H, Paten B. A draft human pangenome reference. Nature 2023, 617: 312-324. PMID: 37165242, PMCID: PMC10172123, DOI: 10.1038/s41586-023-05896-x.Peer-Reviewed Original ResearchGaps and complex structurally variant loci in phased genome assemblies
Porubsky D, Vollger M, Harvey W, Rozanski A, Ebert P, Hickey G, Hasenfeld P, Sanders A, Stober C, Consortium H, Korbel J, Paten B, Marschall T, Eichler E, Abel H, Antonacci-Fulton L, Asri M, Baid G, Baker C, Belyaeva A, Billis K, Bourque G, Buonaiuto S, Carroll A, Chaisson M, Chang P, Chang X, Cheng H, Chu J, Cody S, Colonna V, Cook D, Cook-Deegan R, Cornejo O, Diekhans M, Doerr D, Ebert P, Ebler J, Eichler E, Eizenga J, Fairley S, Fedrigo O, Felsenfeld A, Feng X, Fischer C, Flicek P, Formenti G, Frankish A, Fulton R, Gao Y, Garg S, Garrison E, Garrison N, Giron C, Green R, Groza C, Guarracino A, Haggerty L, Hall I, Harvey W, Haukness M, Haussler D, Heumos S, Hickey G, Hoekzema K, Hourlier T, Howe K, Jain M, Jarvis E, Ji H, Kenny E, Koenig B, Kolesnikov A, Korbel J, Kordosky J, Koren S, Lee H, Lewis A, Li H, Liao W, Lu S, Lu T, Lucas J, Magalhães H, Marco-Sola S, Marijon P, Markello C, Marschall T, Martin F, McCartney A, McDaniel J, Miga K, Mitchell M, Monlong J, Mountcastle J, Munson K, Mwaniki M, Nattestad M, Novak A, Nurk S, Olsen H, Olson N, Paten B, Pesout T, Phillippy A, Popejoy A, Porubsky D, Prins P, Puiu D, Rautiainen M, Regier A, Rhie A, Sacco S, Sanders A, Schneider V, Schultz B, Shafin K, Sibbesen J, Sirén J, Smith M, Sofia H, Tayoun A, Thibaud-Nissen F, Tomlinson C, Tricomi F, Villani F, Vollger M, Wagner J, Walenz B, Wang T, Wood J, Zimin A, Zook J. Gaps and complex structurally variant loci in phased genome assemblies. Genome Research 2023, 33: 496-510. PMID: 37164484, PMCID: PMC10234299, DOI: 10.1101/gr.277334.122.Peer-Reviewed Original ResearchConceptsProtein-coding genesGenome assemblyMbp of DNALinked-read dataLarge segmental duplicationsStrand-seqDiversity panelInversion polymorphismHaploid genomeSegmental duplicationsEuchromatic DNAMore haplotypesIdentical repeatsHaploid assembliesVariant lociDNAHaplotypesGenesFrequent expansionAssembly gapsImportant targetAssemblyHuman speciesHuman samplesMBP
2022
Semi-automated assembly of high-quality diploid human reference genomes
Jarvis E, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, Tracey A, Thibaud-Nissen F, Vollger M, Porubsky D, Cheng H, Asri M, Logsdon G, Carnevali P, Chaisson M, Chin C, Cody S, Collins J, Ebert P, Escalona M, Fedrigo O, Fulton R, Fulton L, Garg S, Gerton J, Ghurye J, Granat A, Green R, Harvey W, Hasenfeld P, Hastie A, Haukness M, Jaeger E, Jain M, Kirsche M, Kolmogorov M, Korbel J, Koren S, Korlach J, Lee J, Li D, Lindsay T, Lucas J, Luo F, Marschall T, Mitchell M, McDaniel J, Nie F, Olsen H, Olson N, Pesout T, Potapova T, Puiu D, Regier A, Ruan J, Salzberg S, Sanders A, Schatz M, Schmitt A, Schneider V, Selvaraj S, Shafin K, Shumate A, Stitziel N, Stober C, Torrance J, Wagner J, Wang J, Wenger A, Xiao C, Zimin A, Zhang G, Wang T, Li H, Garrison E, Haussler D, Hall I, Zook J, Eichler E, Phillippy A, Paten B, Howe K, Miga K. Semi-automated assembly of high-quality diploid human reference genomes. Nature 2022, 611: 519-531. PMID: 36261518, PMCID: PMC9668749, DOI: 10.1038/s41586-022-05325-5.Peer-Reviewed Original ResearchConceptsDiploid genome assemblyGenome assemblyProtein-coding genesGlobal genetic variationCurrent human reference genomeDiploid human genomeHigh-quality assemblyAccurate long readsNon-synonymous amino acid changesHuman reference genomeAmino acid changesMost chromosomesReference assemblyReference genomeHuman genomeCentromeric regionsGenetic variationHigh diversityGenome sequencingLong readsSingle nucleotideGenomeAcid changesManual curationBiological genomes