Featured Publications
A draft human pangenome reference
Liao W, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas J, Monlong J, Abel H, Buonaiuto S, Chang X, Cheng H, Chu J, Colonna V, Eizenga J, Feng X, Fischer C, Fulton R, Garg S, Groza C, Guarracino A, Harvey W, Heumos S, Howe K, Jain M, Lu T, Markello C, Martin F, Mitchell M, Munson K, Mwaniki M, Novak A, Olsen H, Pesout T, Porubsky D, Prins P, Sibbesen J, Sirén J, Tomlinson C, Villani F, Vollger M, Antonacci-Fulton L, Baid G, Baker C, Belyaeva A, Billis K, Carroll A, Chang P, Cody S, Cook D, Cook-Deegan R, Cornejo O, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld A, Formenti G, Frankish A, Gao Y, Garrison N, Giron C, Green R, Haggerty L, Hoekzema K, Hourlier T, Ji H, Kenny E, Koenig B, Kolesnikov A, Korbel J, Kordosky J, Koren S, Lee H, Lewis A, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson N, Popejoy A, Puiu D, Rautiainen M, Regier A, Rhie A, Sacco S, Sanders A, Schneider V, Schultz B, Shafin K, Smith M, Sofia H, Abou Tayoun A, Thibaud-Nissen F, Tricomi F, Wagner J, Walenz B, Wood J, Zimin A, Bourque G, Chaisson M, Flicek P, Phillippy A, Zook J, Eichler E, Haussler D, Wang T, Jarvis E, Miga K, Garrison E, Marschall T, Hall I, Li H, Paten B. A draft human pangenome reference. Nature 2023, 617: 312-324. PMID: 37165242, PMCID: PMC10172123, DOI: 10.1038/s41586-023-05896-x.Peer-Reviewed Original ResearchThe Human Pangenome Project: a global resource to map genomic diversity
Wang T, Antonacci-Fulton L, Howe K, Lawson HA, Lucas JK, Phillippy AM, Popejoy AB, Asri M, Carson C, Chaisson MJP, Chang X, Cook-Deegan R, Felsenfeld AL, Fulton RS, Garrison EP, Garrison N, Graves-Lindsay TA, Ji H, Kenny EE, Koenig BA, Li D, Marschall T, McMichael JF, Novak AM, Purushotham D, Schneider VA, Schultz BI, Smith MW, Sofia HJ, Weissman T, Flicek P, Li H, Miga KH, Paten B, Jarvis ED, Hall IM, Eichler EE, Haussler D. The Human Pangenome Project: a global resource to map genomic diversity. Nature 2022, 604: 437-446. PMID: 35444317, PMCID: PMC9402379, DOI: 10.1038/s41586-022-04601-8.Peer-Reviewed Original ResearchMeSH KeywordsGenome, HumanGenomicsHaplotypesHigh-Throughput Nucleotide SequencingHumansSequence Analysis, DNAConceptsHuman reference genomeReference genomeGenomic diversityGenomic variationHuman genomic variationGlobal genomic diversitySingle nucleotide variantsGene-disease associationsDiploid genomeGenetic resourcesGenomeGenomic researchFuture biomedical researchHigh-quality referenceStructural variantsHuman geneticsRoutine assemblyCommon variantsFunctional elementsPolymorphic regionDiversityBiomedical researchVariantsMajor updateGeneticsMitochondrial genome copy number measured by DNA sequencing in human blood is strongly associated with metabolic traits via cell-type composition differences
Ganel L, Chen L, Christ R, Vangipurapu J, Young E, Das I, Kanchi K, Larson D, Regier A, Abel H, Kang CJ, Scott A, Havulinna A, Chiang CWK, Service S, Freimer N, Palotie A, Ripatti S, Kuusisto J, Boehnke M, Laakso M, Locke A, Stitziel NO, Hall IM. Mitochondrial genome copy number measured by DNA sequencing in human blood is strongly associated with metabolic traits via cell-type composition differences. Human Genomics 2021, 15: 34. PMID: 34099068, PMCID: PMC8185936, DOI: 10.1186/s40246-021-00335-2.Peer-Reviewed Original ResearchMeSH KeywordsAdultAgedApoptosis Regulatory ProteinsCell LineageDNA Copy Number VariationsDNA, MitochondrialExome SequencingFemaleGenetic Predisposition to DiseaseGenome, MitochondrialGenome-Wide Association StudyGTP-Binding ProteinsHumansMaleMembrane ProteinsMendelian Randomization AnalysisMiddle AgedPhenotypePolymorphism, Single NucleotideProto-Oncogene Proteins c-mybSequence Analysis, DNAConceptsCell type compositionGenome copy numberBlood-derived DNAMitochondrial genome copy numberCombination of genomesCopy numberBulk DNA sequencingDNA sequencingPolygenic risk scoresNumber of mitochondriaExome sequencing dataRelated traitsSequencing dataMetabolic traitsTraitsCommon variantsLociRare variantsSequencingDNAFinnish individualsMendelian randomization frameworkUK BiobankMetS traitsGenomeGenomic Analysis in the Age of Human Genome Sequencing
Lappalainen T, Scott AJ, Brandt M, Hall IM. Genomic Analysis in the Age of Human Genome Sequencing. Cell 2019, 177: 70-84. PMID: 30901550, PMCID: PMC6532068, DOI: 10.1016/j.cell.2019.02.032.Peer-Reviewed Original ResearchMeSH KeywordsBiological Specimen BanksChromosome MappingGenetic Predisposition to DiseaseGenetic TestingGenetic VariationGenome, HumanGenome-Wide Association StudyGenomicsHigh-Throughput Nucleotide SequencingHuman Genome ProjectHumansPolymorphism, Single NucleotideSequence Analysis, DNAWhole Genome SequencingConceptsFunctional genomics approachAllele frequency spectrumHuman genome sequencingGene mapping studiesGenome sequencing technologiesRare human diseasesWhole-genome sequencingGenomic approachesGenetic variant discoveryGenome variationHuman genomeGenome analysisGenomic analysisSequencing technologiesGenome sequencingVariant discoveryHuman diseasesHuman geneticsGenomeFunctional interpretationMapping studiesFunctional effectsSequencingGermline variantsGeneticsThe impact of structural variation on human gene expression
Chiang C, Scott AJ, Davis JR, Tsang EK, Li X, Kim Y, Hadzic T, Damani FN, Ganel L, Montgomery S, Battle A, Conrad D, Hall I. The impact of structural variation on human gene expression. Nature Genetics 2017, 49: 692-699. PMID: 28369037, PMCID: PMC5406250, DOI: 10.1038/ng.3834.Peer-Reviewed Original ResearchThe Complete Genome Sequences, Unique Mutational Spectra, and Developmental Potency of Adult Neurons Revealed by Cloning
Hazen JL, Faust GG, Rodriguez AR, Ferguson WC, Shumilina S, Clark RA, Boland MJ, Martin G, Chubukov P, Tsunemoto RK, Torkamani A, Kupriyanov S, Hall IM, Baldwin KK. The Complete Genome Sequences, Unique Mutational Spectra, and Developmental Potency of Adult Neurons Revealed by Cloning. Neuron 2016, 89: 1223-1236. PMID: 26948891, PMCID: PMC4795965, DOI: 10.1016/j.neuron.2016.02.004.Peer-Reviewed Original ResearchMeSH KeywordsAge FactorsAnimalsAnimals, NewbornCadherin Related ProteinsCadherinsCell DivisionCloning, MolecularDNA Transposable ElementsEmbryo, MammalianFemaleHumansKi-67 AntigenMiceMice, TransgenicMicrosatellite RepeatsMutationNerve Tissue ProteinsNeuronsNuclear Transfer TechniquesOlfactory BulbOocytesSequence Analysis, DNAConceptsCell type diversificationComplete genome sequenceMobile element insertionsNuclear transfer methodWhole-genome sequencingNeuronal genomeGene-disrupting mutationsNeuronal mutationsGenome sequenceUnique mutational spectrumDevelopmental potencyComprehensive mutation detectionElement insertionsGenomic mutationsRecurrent rearrangementsNovel mechanismUnique mutationsMutationsSomatic mutationsGene biasGenomeAdult neuronsMutational spectrumFertile miceMutation detectionMosaic Copy Number Variation in Human Neurons
McConnell MJ, Lindberg MR, Brennand KJ, Piper JC, Voet T, Cowing-Zitron C, Shumilina S, Lasken RS, Vermeesch JR, Hall IM, Gage FH. Mosaic Copy Number Variation in Human Neurons. Science 2013, 342: 632-637. PMID: 24179226, PMCID: PMC3975283, DOI: 10.1126/science.1243472.Peer-Reviewed Original ResearchConceptsCopy number variationsHiPSC-derived neuronsSingle-cell genomic approachesNumber variationsDNA copy number variationsSingle-cell sequencingHuman neuronsLarge copy number variationsStem cell linesNeural progenitor cellsNovo copy-number variationsPluripotent stem cell lineAneuploid neuronsGenomic approachesDe novo copy-number variationsSubchromosomal copy number variationsAberrant genomesFrontal cortex neuronsLarge deletionsProgenitor cellsCell linesSubset of neuronsEuploid neuronsDeletionMultiple alterationsGenome Sequencing of Mouse Induced Pluripotent Stem Cells Reveals Retroelement Stability and Infrequent DNA Rearrangement during Reprogramming
Quinlan AR, Boland MJ, Leibowitz ML, Shumilina S, Pehrson SM, Baldwin KK, Hall IM. Genome Sequencing of Mouse Induced Pluripotent Stem Cells Reveals Retroelement Stability and Infrequent DNA Rearrangement during Reprogramming. Cell Stem Cell 2011, 9: 366-373. PMID: 21982236, PMCID: PMC3975295, DOI: 10.1016/j.stem.2011.07.018.Peer-Reviewed Original ResearchMeSH KeywordsAnimalsBase SequenceCell LineageCellular ReprogrammingChimeraDNA Copy Number VariationsFalse Negative ReactionsGene RearrangementGene SilencingGenomeGenomic InstabilityHumansInduced Pluripotent Stem CellsMiceMolecular Sequence DataMutagenesis, InsertionalOrgan SpecificityRetroelementsSequence Analysis, DNAConceptsPluripotent stem cellsClasses of SVsPaired-end DNA sequencingStem cellsGenomic structural variationMouse Induced Pluripotent Stem CellsStructural variationsDNA copy number variationsEmbryonic stem cellsMost iPSC linesMouse iPSC linesIPSC linesInduced pluripotent stem cellsCopy number variationsGenome stabilityGene-disrupting mutationsRecent microarray studiesDNA rearrangementsGenome sequencingSpontaneous mutationsMicroarray studiesDeleterious genetic mutationsNumber variationsDNA sequencingComplex rearrangements
2023
Gaps and complex structurally variant loci in phased genome assemblies
Porubsky D, Vollger M, Harvey W, Rozanski A, Ebert P, Hickey G, Hasenfeld P, Sanders A, Stober C, Consortium H, Korbel J, Paten B, Marschall T, Eichler E, Abel H, Antonacci-Fulton L, Asri M, Baid G, Baker C, Belyaeva A, Billis K, Bourque G, Buonaiuto S, Carroll A, Chaisson M, Chang P, Chang X, Cheng H, Chu J, Cody S, Colonna V, Cook D, Cook-Deegan R, Cornejo O, Diekhans M, Doerr D, Ebert P, Ebler J, Eichler E, Eizenga J, Fairley S, Fedrigo O, Felsenfeld A, Feng X, Fischer C, Flicek P, Formenti G, Frankish A, Fulton R, Gao Y, Garg S, Garrison E, Garrison N, Giron C, Green R, Groza C, Guarracino A, Haggerty L, Hall I, Harvey W, Haukness M, Haussler D, Heumos S, Hickey G, Hoekzema K, Hourlier T, Howe K, Jain M, Jarvis E, Ji H, Kenny E, Koenig B, Kolesnikov A, Korbel J, Kordosky J, Koren S, Lee H, Lewis A, Li H, Liao W, Lu S, Lu T, Lucas J, Magalhães H, Marco-Sola S, Marijon P, Markello C, Marschall T, Martin F, McCartney A, McDaniel J, Miga K, Mitchell M, Monlong J, Mountcastle J, Munson K, Mwaniki M, Nattestad M, Novak A, Nurk S, Olsen H, Olson N, Paten B, Pesout T, Phillippy A, Popejoy A, Porubsky D, Prins P, Puiu D, Rautiainen M, Regier A, Rhie A, Sacco S, Sanders A, Schneider V, Schultz B, Shafin K, Sibbesen J, Sirén J, Smith M, Sofia H, Tayoun A, Thibaud-Nissen F, Tomlinson C, Tricomi F, Villani F, Vollger M, Wagner J, Walenz B, Wang T, Wood J, Zimin A, Zook J. Gaps and complex structurally variant loci in phased genome assemblies. Genome Research 2023, 33: 496-510. PMID: 37164484, PMCID: PMC10234299, DOI: 10.1101/gr.277334.122.Peer-Reviewed Original ResearchMeSH KeywordsDNA, SatelliteHaplotypesHumansPolymorphism, GeneticSegmental Duplications, GenomicSequence Analysis, DNAConceptsProtein-coding genesGenome assemblyMbp of DNALinked-read dataLarge segmental duplicationsStrand-seqDiversity panelInversion polymorphismHaploid genomeSegmental duplicationsEuchromatic DNAMore haplotypesIdentical repeatsHaploid assembliesVariant lociDNAHaplotypesGenesFrequent expansionAssembly gapsImportant targetAssemblyHuman speciesHuman samplesMBP
2022
Semi-automated assembly of high-quality diploid human reference genomes
Jarvis E, Formenti G, Rhie A, Guarracino A, Yang C, Wood J, Tracey A, Thibaud-Nissen F, Vollger M, Porubsky D, Cheng H, Asri M, Logsdon G, Carnevali P, Chaisson M, Chin C, Cody S, Collins J, Ebert P, Escalona M, Fedrigo O, Fulton R, Fulton L, Garg S, Gerton J, Ghurye J, Granat A, Green R, Harvey W, Hasenfeld P, Hastie A, Haukness M, Jaeger E, Jain M, Kirsche M, Kolmogorov M, Korbel J, Koren S, Korlach J, Lee J, Li D, Lindsay T, Lucas J, Luo F, Marschall T, Mitchell M, McDaniel J, Nie F, Olsen H, Olson N, Pesout T, Potapova T, Puiu D, Regier A, Ruan J, Salzberg S, Sanders A, Schatz M, Schmitt A, Schneider V, Selvaraj S, Shafin K, Shumate A, Stitziel N, Stober C, Torrance J, Wagner J, Wang J, Wenger A, Xiao C, Zimin A, Zhang G, Wang T, Li H, Garrison E, Haussler D, Hall I, Zook J, Eichler E, Phillippy A, Paten B, Howe K, Miga K. Semi-automated assembly of high-quality diploid human reference genomes. Nature 2022, 611: 519-531. PMID: 36261518, PMCID: PMC9668749, DOI: 10.1038/s41586-022-05325-5.Peer-Reviewed Original ResearchConceptsDiploid genome assemblyGenome assemblyProtein-coding genesGlobal genetic variationCurrent human reference genomeDiploid human genomeHigh-quality assemblyAccurate long readsNon-synonymous amino acid changesHuman reference genomeAmino acid changesMost chromosomesReference assemblyReference genomeHuman genomeCentromeric regionsGenetic variationHigh diversityGenome sequencingLong readsSingle nucleotideGenomeAcid changesManual curationBiological genomes
2014
SAMBLASTER: fast duplicate marking and structural variant read extraction
Faust GG, Hall IM. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 2014, 30: 2503-2505. PMID: 24812344, PMCID: PMC4147885, DOI: 10.1093/bioinformatics/btu314.Peer-Reviewed Original Research
2012
YAHA: fast and flexible long-read alignment with optimal breakpoint detection
Faust GG, Hall IM. YAHA: fast and flexible long-read alignment with optimal breakpoint detection. Bioinformatics 2012, 28: 2417-2424. PMID: 22829624, PMCID: PMC3463118, DOI: 10.1093/bioinformatics/bts456.Peer-Reviewed Original ResearchConceptsSingle best alignmentLinux systemQueriesAcyclic graphBWA-SWAlignment toolsSimple heuristicsMultiple mappingsOptimal setAssembly algorithmStructural variant detectionComplex SVsBreakpoint detectionPossible alignmentsSample dataSV classesYahaLess timeBetter alignmentVariant detectionAlignersDownloadHeuristicsSSAHA2Algorithm