2024
Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph
Cheng H, Asri M, Lucas J, Koren S, Li H. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Nature Methods 2024, 21: 967-970. PMID: 38730258, PMCID: PMC11214949, DOI: 10.1038/s41592-024-02269-8.Peer-Reviewed Original Research
2023
De novo reconstruction of satellite repeat units from sequence data
Zhang Y, Chu J, Cheng H, Li H. De novo reconstruction of satellite repeat units from sequence data. Genome Research 2023, 33: 1994-2001. PMID: 37918962, PMCID: PMC10760446, DOI: 10.1101/gr.278005.123.Peer-Reviewed Original ResearchConceptsSatellite repeat unitSequence dataSatellite repeatsLong tandem repeated sequencesReal sequencing dataSatellite DNA evolutionTandem repeat sequencesDe novo reconstructionRepeat unitsGenomic contentGenome sequenceSatellite DNADNA evolutionModel organismsGenomeComplete assemblySequenceRepeatsCentromereAssemblyDNASpeciesAnnotationPan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome
Lee H, Greer S, Pavlichin D, Zhou B, Urban A, Weissman T, Consortium H, Liao W, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas J, Monlong J, Abel H, Buonaiuto S, Chang X, Cheng H, Chu J, Colonna V, Eizenga J, Feng X, Fischer C, Fulton R, Garg S, Groza C, Guarracino A, Harvey W, Heumos S, Howe K, Jain M, Lu T, Markello C, Martin F, Mitchell M, Munson K, Mwaniki M, Novak A, Olsen H, Pesout T, Porubsky D, Prins P, Sibbesen J, Tomlinson C, Villani F, Vollger M, Antonacci-Fulton L, Baid G, Baker C, Belyaeva A, Billis K, Carroll A, Chang P, Cody S, Cook D, Cornejo O, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld A, Formenti G, Frankish A, Gao Y, Giron C, Green R, Haggerty L, Hoekzema K, Hourlier T, Ji H, Kolesnikov A, Korbel J, Kordosky J, Lee H, Lewis A, Magalhães H, Marco-Sola S, Marijon P, McDaniel J, Mountcastle J, Nattestad M, Olson N, Puiu D, Regier A, Rhie A, Sacco S, Sanders A, Schneider V, Schultz B, Shafin K, Sirén J, Smith M, Sofia H, Tayoun A, Thibaud-Nissen F, Tricomi F, Wagner J, Wood J, Zimin A, Popejoy A, Bourque G, Chaisson M, Flicek P, Phillippy A, Zook J, Eichler E, Haussler D, Jarvis E, Miga K, Wang T, Garrison E, Marschall T, Hall I, Li H, Paten B, Ji H. Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome. Cell Reports Methods 2023, 3: 100543. PMID: 37671027, PMCID: PMC10475782, DOI: 10.1016/j.crmeth.2023.100543.Peer-Reviewed Original ResearchA draft human pangenome reference
Liao W, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas J, Monlong J, Abel H, Buonaiuto S, Chang X, Cheng H, Chu J, Colonna V, Eizenga J, Feng X, Fischer C, Fulton R, Garg S, Groza C, Guarracino A, Harvey W, Heumos S, Howe K, Jain M, Lu T, Markello C, Martin F, Mitchell M, Munson K, Mwaniki M, Novak A, Olsen H, Pesout T, Porubsky D, Prins P, Sibbesen J, Sirén J, Tomlinson C, Villani F, Vollger M, Antonacci-Fulton L, Baid G, Baker C, Belyaeva A, Billis K, Carroll A, Chang P, Cody S, Cook D, Cook-Deegan R, Cornejo O, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld A, Formenti G, Frankish A, Gao Y, Garrison N, Giron C, Green R, Haggerty L, Hoekzema K, Hourlier T, Ji H, Kenny E, Koenig B, Kolesnikov A, Korbel J, Kordosky J, Koren S, Lee H, Lewis A, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson N, Popejoy A, Puiu D, Rautiainen M, Regier A, Rhie A, Sacco S, Sanders A, Schneider V, Schultz B, Shafin K, Smith M, Sofia H, Abou Tayoun A, Thibaud-Nissen F, Tricomi F, Wagner J, Walenz B, Wood J, Zimin A, Bourque G, Chaisson M, Flicek P, Phillippy A, Zook J, Eichler E, Haussler D, Wang T, Jarvis E, Miga K, Garrison E, Marschall T, Hall I, Li H, Paten B. A draft human pangenome reference. Nature 2023, 617: 312-324. PMID: 37165242, PMCID: PMC10172123, DOI: 10.1038/s41586-023-05896-x.Peer-Reviewed Original ResearchGaps and complex structurally variant loci in phased genome assemblies
Porubsky D, Vollger M, Harvey W, Rozanski A, Ebert P, Hickey G, Hasenfeld P, Sanders A, Stober C, Consortium H, Korbel J, Paten B, Marschall T, Eichler E, Abel H, Antonacci-Fulton L, Asri M, Baid G, Baker C, Belyaeva A, Billis K, Bourque G, Buonaiuto S, Carroll A, Chaisson M, Chang P, Chang X, Cheng H, Chu J, Cody S, Colonna V, Cook D, Cook-Deegan R, Cornejo O, Diekhans M, Doerr D, Ebert P, Ebler J, Eichler E, Eizenga J, Fairley S, Fedrigo O, Felsenfeld A, Feng X, Fischer C, Flicek P, Formenti G, Frankish A, Fulton R, Gao Y, Garg S, Garrison E, Garrison N, Giron C, Green R, Groza C, Guarracino A, Haggerty L, Hall I, Harvey W, Haukness M, Haussler D, Heumos S, Hickey G, Hoekzema K, Hourlier T, Howe K, Jain M, Jarvis E, Ji H, Kenny E, Koenig B, Kolesnikov A, Korbel J, Kordosky J, Koren S, Lee H, Lewis A, Li H, Liao W, Lu S, Lu T, Lucas J, Magalhães H, Marco-Sola S, Marijon P, Markello C, Marschall T, Martin F, McCartney A, McDaniel J, Miga K, Mitchell M, Monlong J, Mountcastle J, Munson K, Mwaniki M, Nattestad M, Novak A, Nurk S, Olsen H, Olson N, Paten B, Pesout T, Phillippy A, Popejoy A, Porubsky D, Prins P, Puiu D, Rautiainen M, Regier A, Rhie A, Sacco S, Sanders A, Schneider V, Schultz B, Shafin K, Sibbesen J, Sirén J, Smith M, Sofia H, Tayoun A, Thibaud-Nissen F, Tomlinson C, Tricomi F, Villani F, Vollger M, Wagner J, Walenz B, Wang T, Wood J, Zimin A, Zook J. Gaps and complex structurally variant loci in phased genome assemblies. Genome Research 2023, 33: 496-510. PMID: 37164484, PMCID: PMC10234299, DOI: 10.1101/gr.277334.122.Peer-Reviewed Original ResearchConceptsProtein-coding genesGenome assemblyMbp of DNALinked-read dataLarge segmental duplicationsStrand-seqDiversity panelInversion polymorphismHaploid genomeSegmental duplicationsEuchromatic DNAMore haplotypesIdentical repeatsHaploid assembliesVariant lociDNAHaplotypesGenesFrequent expansionAssembly gapsImportant targetAssemblyHuman speciesHuman samplesMBP
2022
Semi-automated assembly of high-quality diploid human reference genomes
Jarvis E, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, Tracey A, Thibaud-Nissen F, Vollger M, Porubsky D, Cheng H, Asri M, Logsdon G, Carnevali P, Chaisson M, Chin C, Cody S, Collins J, Ebert P, Escalona M, Fedrigo O, Fulton R, Fulton L, Garg S, Gerton J, Ghurye J, Granat A, Green R, Harvey W, Hasenfeld P, Hastie A, Haukness M, Jaeger E, Jain M, Kirsche M, Kolmogorov M, Korbel J, Koren S, Korlach J, Lee J, Li D, Lindsay T, Lucas J, Luo F, Marschall T, Mitchell M, McDaniel J, Nie F, Olsen H, Olson N, Pesout T, Potapova T, Puiu D, Regier A, Ruan J, Salzberg S, Sanders A, Schatz M, Schmitt A, Schneider V, Selvaraj S, Shafin K, Shumate A, Stitziel N, Stober C, Torrance J, Wagner J, Wang J, Wenger A, Xiao C, Zimin A, Zhang G, Wang T, Li H, Garrison E, Haussler D, Hall I, Zook J, Eichler E, Phillippy A, Paten B, Howe K, Miga K. Semi-automated assembly of high-quality diploid human reference genomes. Nature 2022, 611: 519-531. PMID: 36261518, PMCID: PMC9668749, DOI: 10.1038/s41586-022-05325-5.Peer-Reviewed Original ResearchConceptsDiploid genome assemblyGenome assemblyProtein-coding genesGlobal genetic variationCurrent human reference genomeDiploid human genomeHigh-quality assemblyAccurate long readsNon-synonymous amino acid changesHuman reference genomeAmino acid changesMost chromosomesReference assemblyReference genomeHuman genomeCentromeric regionsGenetic variationHigh diversityGenome sequencingLong readsSingle nucleotideGenomeAcid changesManual curationBiological genomesThe complete sequence of a human genome
Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen NC, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PGS, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O’Neill R, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, Phillippy AM. The complete sequence of a human genome. Science 2022, 376: 44-53. PMID: 35357919, PMCID: PMC9186530, DOI: 10.1126/science.abj6987.Peer-Reviewed Original ResearchConceptsHuman genomeRecent segmental duplicationsHuman reference genomeProtein codingSegmental duplicationsGapless assemblyHeterochromatic regionsReference genomeGene predictionSatellite arraysComplete sequenceGenomeAcrocentric chromosomesPair sequenceBase pairsShort armFunctional studiesChromosomesSequenceComplex regionTelomeresDuplicationRegionAssemblyConsortiumHaplotype-resolved assembly of diploid genomes without parental data
Cheng H, Jarvis E, Fedrigo O, Koepfli K, Urban L, Gemmell N, Li H. Haplotype-resolved assembly of diploid genomes without parental data. Nature Biotechnology 2022, 40: 1332-1335. PMID: 35332338, PMCID: PMC9464699, DOI: 10.1038/s41587-022-01261-x.Peer-Reviewed Original ResearchCurated variation benchmarks for challenging medically relevant autosomal genes
Wagner J, Olson N, Harris L, McDaniel J, Cheng H, Fungtammasan A, Hwang Y, Gupta R, Wenger A, Rowell W, Khan Z, Farek J, Zhu Y, Pisupati A, Mahmoud M, Xiao C, Yoo B, Sahraeian S, Miller D, Jáspez D, Lorenzo-Salazar J, Muñoz-Barrera A, Rubio-Rodríguez L, Flores C, Narzisi G, Evani U, Clarke W, Lee J, Mason C, Lincoln S, Miga K, Ebbert M, Shumate A, Li H, Chin C, Zook J, Sedlazeck F. Curated variation benchmarks for challenging medically relevant autosomal genes. Nature Biotechnology 2022, 40: 672-680. PMID: 35132260, PMCID: PMC9117392, DOI: 10.1038/s41587-021-01158-1.Peer-Reviewed Original ResearchConceptsWhole-genome assemblyRelevant genesAutosomal genesLong-read technologiesSingle-nucleotide variationsVariant recallBottle ConsortiumWhole genomeSingle-nucleotidePolymorphic complexFalse duplicationsGenesGRCh38GRCh37GenomeStructural variationsRepetitive natureDuplicationAssemblyDeletionCRYAAVariantsClinical settingCBSComplex
2021
Fast alignment and preprocessing of chromatin profiles with Chromap
Zhang H, Song L, Wang X, Cheng H, Wang C, Meyer C, Liu T, Tang M, Aluru S, Yue F, Liu X, Li H. Fast alignment and preprocessing of chromatin profiles with Chromap. Nature Communications 2021, 12: 6566. PMID: 34772935, PMCID: PMC8589834, DOI: 10.1038/s41467-021-26865-w.Peer-Reviewed Original Research
2020
Chromosome-scale, haplotype-resolved assembly of human genomes
Garg S, Fungtammasan A, Carroll A, Chou M, Schmitt A, Zhou X, Mac S, Peluso P, Hatas E, Ghurye J, Maguire J, Mahmoud M, Cheng H, Heller D, Zook J, Moemke T, Marschall T, Sedlazeck F, Aach J, Chin C, Church G, Li H. Chromosome-scale, haplotype-resolved assembly of human genomes. Nature Biotechnology 2020, 39: 309-312. PMID: 33288905, PMCID: PMC7954703, DOI: 10.1038/s41587-020-0711-0.Peer-Reviewed Original ResearchConceptsHaplotype-resolved assembliesHuman genomeStructural variantsAssembly of human genomesDiscovery of structural variantsChromosome-scale phasingComplex genetic variationKiller cell immunoglobulin-like receptorsChromosome-scaleDiploid assemblyHaplotype-resolvedContig lengthGenome assemblyHeterozygous sitesTransposon insertionHaplotype variationGenetic variationPedigree informationGenomePhase assemblyPrecision medicineHuman leukocyte antigenImmunoglobulin-like receptorsAssemblyImportant regions
2017
FMtree: a fast locating algorithm of FM-indexes for genomic data
Cheng H, Wu M, Xu Y. FMtree: a fast locating algorithm of FM-indexes for genomic data. Bioinformatics 2017, 34: 416-424. PMID: 28968761, DOI: 10.1093/bioinformatics/btx596.Peer-Reviewed Original ResearchConceptsFull-text indexGenomic dataState-of-the-art algorithmsMultiway treeLocation algorithmLocation operationsState-of-the-artFM-indexTree-based algorithmsPosition of patternsMemory-efficientLong textData localitySupplementary dataOccurrence positionSuffix treeSuffix arrayShort patternsAlgorithmBioinformaticsExperimental resultsTreesTextOperationTask
2015
BitMapper: an efficient all-mapper based on bit-vector computing
Cheng H, Jiang H, Yang J, Xu Y, Shang Y. BitMapper: an efficient all-mapper based on bit-vector computing. BMC Bioinformatics 2015, 16: 192. PMID: 26063651, PMCID: PMC4462005, DOI: 10.1186/s12859-015-0626-9.Peer-Reviewed Original ResearchConceptsNext-generation sequencingMapping next-generation sequencingState-of-the-art all-mappersState-of-the-artReference genomeRaw readsBit-vector algorithmMap locationBit-vectorGPL licenseEdit distanceVerification timeGenomeRunning timeData setsComputational challengesExperimental resultsIndelsHttp://homeVerificationSequenceAlgorithmMultiple locations
This site is protected by hCaptcha and its Privacy Policy and Terms of Service apply