Featured Publications
To weight or not to weight? The effect of selection bias in 3 large electronic health record-linked biobanks and recommendations for practice
Salvatore M, Kundu R, Shi X, Friese C, Lee S, Fritsche L, Mondul A, Hanauer D, Pearce C, Mukherjee B. To weight or not to weight? The effect of selection bias in 3 large electronic health record-linked biobanks and recommendations for practice. Journal Of The American Medical Informatics Association 2024, 31: 1479-1492. PMID: 38742457, PMCID: PMC11187425, DOI: 10.1093/jamia/ocae098.Peer-Reviewed Original ResearchEHR-linked biobanksNational Health Interview Survey dataHealth Interview Survey dataPhenome-wide association studyMichigan Genomics InitiativeElectronic health record-linked biobankTarget populationInterview Survey dataColorectal cancerUS adult populationSelection biasUK BiobankAssociation estimatesBiobank dataRecruitment strategiesEffect of selection biasICD codesLog odds ratioUKBSelection weightsEffect sizeAssociation studiesAdult populationBiobankImpact prevalenceStatistical Inference for Association Studies Using Electronic Health Records: Handling Both Selection Bias and Outcome Misclassification
Beesley L, Mukherjee B. Statistical Inference for Association Studies Using Electronic Health Records: Handling Both Selection Bias and Outcome Misclassification. Biometrics 2020, 78: 214-226. PMID: 33179768, DOI: 10.1111/biom.13400.Peer-Reviewed Original ResearchConceptsElectronic health recordsHealth recordsElectronic health record data analysisElectronic health record settingsSelection biasMichigan Genomics InitiativeAssociation studiesEHR-linkedHealth researchInverse probability weighting methodStudy sampleEffect estimatesProbability weighting methodLack of representativenessType I errorSurvey sampling literatureStandard error estimatesGold standard labelsDisease statusError estimatesStatistical inferenceMisclassificationInference strategySampling literatureStandard labelsSet‐based tests for genetic association in longitudinal studies
He Z, Zhang M, Lee S, Smith J, Guo X, Palmas W, Kardia S, Diez Roux A, Mukherjee B. Set‐based tests for genetic association in longitudinal studies. Biometrics 2015, 71: 606-615. PMID: 25854837, PMCID: PMC4601568, DOI: 10.1111/biom.12310.Peer-Reviewed Original ResearchConceptsMulti-Ethnic Study of AtherosclerosisGenome-wide association studiesJoint effect of multiple variantsLinkage disequilibriumAssociation studiesEffects of multiple variantsMarkers of chronic diseaseGenetic variantsSet-based testGene-based testsLongitudinal outcomesMulti-Ethnic StudyGenetic association studiesStudy of AtherosclerosisChronic diseasesPhenotypic variationGenetic associationObservational studyLongitudinal analysisWithin-subject correlationMultiple variantsScore type testsJoint testJoint effectsMarker tests
2024
Improving prediction models of amyotrophic lateral sclerosis (ALS) using polygenic, pre-existing conditions, and survey-based risk scores in the UK Biobank
Jin W, Boss J, Bakulski K, Goutman S, Feldman E, Fritsche L, Mukherjee B. Improving prediction models of amyotrophic lateral sclerosis (ALS) using polygenic, pre-existing conditions, and survey-based risk scores in the UK Biobank. Journal Of Neurology 2024, 271: 6923-6934. PMID: 39249108, DOI: 10.1007/s00415-024-12644-2.Peer-Reviewed Original ResearchPolygenic risk scoresRisk scorePre-existing conditionsPhenome-wide association studyControls of European descentPhenotype risk scoreUK Biobank dataAmyotrophic lateral sclerosis riskRisk score distributionIncreased ALS riskInfluence of environmental exposuresExposure-related factorsCombined risk scoreUK BiobankAmyotrophic lateral sclerosisBaseline demographic covariatesBiobank dataPRS-CSALS riskAmyotrophic lateral sclerosis diagnosisDiagnosis 1Demographic covariatesAssociation studiesEuropean descentMethodsUtilizing dataMultiple metal exposures associate with higher amyotrophic lateral sclerosis risk and mortality independent of genetic risk and correlate to self-reported exposures: a case-control study
Jang D, Dou J, Koubek E, Teener S, Zhou L, Bakulski K, Mukherjee B, Batterman S, Feldman E, Goutman S. Multiple metal exposures associate with higher amyotrophic lateral sclerosis risk and mortality independent of genetic risk and correlate to self-reported exposures: a case-control study. Journal Of Neurology Neurosurgery & Psychiatry 2024, jnnp-2024-333978. PMID: 39107037, DOI: 10.1136/jnnp-2024-333978.Peer-Reviewed Original ResearchAmyotrophic lateral sclerosis riskEnvironmental risk scoreAssociated with ALS riskALS riskGenetic riskRisk scorePolygenic risk scoresSelf-reported exposureGenome-wide association studiesStudy investigated associationsCase-control studySingle-nucleotide polymorphismsAssociation studiesExposure mixturesControl participantsExposure sourcesRiskParticipantsAmyotrophic lateral sclerosisSurvival modelsScoresAssociationEnvironmental factorsUrine metalsUrine samples
2023
Uncovering associations between pre-existing conditions and COVID-19 Severity: A polygenic risk score approach across three large biobanks
Fritsche L, Nam K, Du J, Kundu R, Salvatore M, Shi X, Lee S, Burgess S, Mukherjee B. Uncovering associations between pre-existing conditions and COVID-19 Severity: A polygenic risk score approach across three large biobanks. PLOS Genetics 2023, 19: e1010907. PMID: 38113267, PMCID: PMC10763941, DOI: 10.1371/journal.pgen.1010907.Peer-Reviewed Original ResearchConceptsPolygenic risk scoresMichigan Genomics InitiativeUK BiobankPre-existing conditionsPhenome-wide association studyAssociation studiesCohort-specific analysesPolygenic risk score approachUK Biobank cohortMeta-analysisIncreased risk of hospitalizationGenome-wide association studiesBody mass indexRisk of hospitalizationIdentified novel associationsRisk score approachCOVID-19 outcome dataCOVID-19 hospitalizationCOVID-19Mass indexRisk scoreBiobankCardiovascular conditionsCOVID-19 severityIncreased risk
2022
A Case-Crossover Phenome-wide association study (PheWAS) for understanding Post-COVID-19 diagnosis patterns
Haupert S, Shi X, Chen C, Fritsche L, Mukherjee B. A Case-Crossover Phenome-wide association study (PheWAS) for understanding Post-COVID-19 diagnosis patterns. Journal Of Biomedical Informatics 2022, 136: 104237. PMID: 36283580, PMCID: PMC9595430, DOI: 10.1016/j.jbi.2022.104237.Peer-Reviewed Original ResearchConceptsPhenome-wide association studyPost-COVID-19 conditionCOVID-19 survivorsCohort of COVID-19 survivorsAssociation studiesMental health disordersConditional logistic regressionWithin-person confoundingSARS-CoV-2 infectionRobust study designsProportion of COVID-19 survivorsPost-COVID-19Healthcare needsMental healthSARS-CoV-2Circulatory diseasesPhenotype codesHealth disordersSARS-CoV-2 positivityStudy designSARS-CoV-2 positive patientsLogistic regressionPheWASPost-COVID-19 infectionCOVID-19The construction of cross-population polygenic risk scores using transfer learning
Zhao Z, Fritsche L, Smith J, Mukherjee B, Lee S. The construction of cross-population polygenic risk scores using transfer learning. American Journal Of Human Genetics 2022, 109: 1998-2008. PMID: 36240765, PMCID: PMC9674947, DOI: 10.1016/j.ajhg.2022.09.010.Peer-Reviewed Original ResearchConceptsGenome-wide association studiesPolygenic risk scoresAncestry groupsTransferability of PRSPRS-CSPolygenic risk score methodsEuropean ancestry cohortsIndividuals of African ancestryIndividuals of South Asian ancestryNon-European ancestry groupsNon-European ancestrySouth Asian ancestryAssociation studiesDichotomous traitsSouth Asian sampleEuropean ancestryGenetic researchPRS modelAncestryAsian ancestryAfrican ancestryAfrican samplesUK BiobankRisk scoreAsian samplesPrediction of telomere length and telomere attrition using a genetic risk score: The multi-ethnic study of atherosclerosis (MESA)
Castro-Diehl C, Smith J, Zhao W, Wang X, Mukherjee B, Seeman T, Needham B. Prediction of telomere length and telomere attrition using a genetic risk score: The multi-ethnic study of atherosclerosis (MESA). Frontiers In Aging 2022, 3: 1021051. PMID: 36304436, PMCID: PMC9592760, DOI: 10.3389/fragi.2022.1021051.Peer-Reviewed Original ResearchMulti-Ethnic Study of AtherosclerosisGenetic risk scoreMulti-Ethnic StudyGenome-wide association studiesStudy of AtherosclerosisAssociated with TLEuropean ancestry genome-wide association studiesEuropean ancestryRisk scoreTL-associated genetic variantsShorter TLEuropean ancestry populationsPredictive of telomere lengthHispanic participantsRace/ethnic groupsLinear mixed effects modelsShorter telomere lengthMixed effects modelsAfrican AmericansTelomere attritionExam 1Association studiesRelative TLTelomere lengthT/S ratioExPRSweb: An online repository with polygenic risk scores for common health-related exposures
Ma Y, Patil S, Zhou X, Mukherjee B, Fritsche L. ExPRSweb: An online repository with polygenic risk scores for common health-related exposures. American Journal Of Human Genetics 2022, 109: 1742-1760. PMID: 36152628, PMCID: PMC9606385, DOI: 10.1016/j.ajhg.2022.09.001.Peer-Reviewed Original ResearchConceptsPolygenic risk scoresChronic conditionsPhenome-wide association studyMichigan Genomics InitiativeRisk scoreAssociation studiesHealth-related exposuresGenome-wide association studiesUK BiobankGenetic risk factorsPRS methodsFollow-up studyRisk factorsComplex traitsGenome InitiativeGenetic modifiersBiobankInfluence of exposureEnvironmental variablesScoresLipid levelsExpRLifestyleSmokingOnline repositoryIncorporating family disease history and controlling case–control imbalance for population-based genetic association studies
Zhuang Y, Wolford B, Nam K, Bi W, Zhou W, Willer C, Mukherjee B, Lee S. Incorporating family disease history and controlling case–control imbalance for population-based genetic association studies. Bioinformatics 2022, 38: 4337-4343. PMID: 35876838, PMCID: PMC9477535, DOI: 10.1093/bioinformatics/btac459.Peer-Reviewed Original ResearchConceptsEmpirical saddlepoint approximationFamily disease historyCase-control imbalanceSaddlepoint approximationGenome-wide association analysisPopulation-based genetic association studiesGenetic association testsVariant-phenotype associationsDisease historyGenetic association studiesLow detection powerType I error inflationCorrelation of phenotypesWhite British sampleSupplementary dataAssociation studiesPopulation-based biobanksIncreased phenotypic correlationsKorean GenomeSimulation studyPhenotype distributionPhenotypeAssociation TestBioinformaticsPhenotypic correlationsPolygenic Liability to Depression Is Associated With Multiple Medical Conditions in the Electronic Health Record: Phenome-wide Association Study of 46,782 Individuals
Fang Y, Fritsche L, Mukherjee B, Sen S, Richmond-Rakerd L. Polygenic Liability to Depression Is Associated With Multiple Medical Conditions in the Electronic Health Record: Phenome-wide Association Study of 46,782 Individuals. Biological Psychiatry 2022, 92: 923-931. PMID: 35965108, PMCID: PMC10712651, DOI: 10.1016/j.biopsych.2022.06.004.Peer-Reviewed Original ResearchConceptsPhenome-wide association studyPolygenic risk scoresMDD PRSHealth recordsRisk scoreAssociation studiesGenome-wide polygenic risk scoreAssociated with multiple medical conditionsMeasures of genetic riskMichigan Genomics InitiativePsychiatric traitsElectronic health recordsEuropean ancestry participantsMajor depressive disorderAssociated with tobacco use disorderTests of associationMultiple medical conditionsGenitourinary conditionsTobacco use disorderDisease-associated disabilityMolecular genetic toolsMolecular genetic discoveriesPsychiatric disease categoriesHealth outcomesSubstance-related disorders
2021
On cross-ancestry cancer polygenic risk scores
Fritsche L, Ma Y, Zhang D, Salvatore M, Lee S, Zhou X, Mukherjee B. On cross-ancestry cancer polygenic risk scores. PLOS Genetics 2021, 17: e1009670. PMID: 34529658, PMCID: PMC8445431, DOI: 10.1371/journal.pgen.1009670.Peer-Reviewed Original ResearchConceptsPolygenic risk scoresGenome-wide association studiesProstate cancer polygenic risk scoresPolygenic risk score distributionRecruitment of diverse participantsAncestry groupsPolygenic risk score methodsRisk scoreNon-genetic risk factorsElectronic health recordsBreast cancer casesHealth recordsUK BiobankGWAS effortsDisease risk assessmentCancer casesAssociation studiesGenetic dataEuropean ancestryPersonalized risk stratificationSummary statisticsRisk factorsAncestryDiverse participantsField of cancerEfficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes
Bi W, Zhou W, Dey R, Mukherjee B, Sampson J, Lee S. Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes. American Journal Of Human Genetics 2021, 108: 825-839. PMID: 33836139, PMCID: PMC8206161, DOI: 10.1016/j.ajhg.2021.03.019.Peer-Reviewed Original ResearchConceptsOrdinal categorical phenotypesGenome-wide association studiesCategorical phenotypesGenome-wide significant variantsRare variantsPhenotype distributionControlled type I error ratesType I error rateMixed model approachArray genotypingAssociation studiesCommon variantsQuantitative traitsSignificant variantsLogistic mixed modelsLack of analysis toolsUK BiobankLinear mixed model approachPhenotypeAssociation TestVariantsMixed modelsSignificance levelMAFTraitsA Phenome-Wide Association Study (PheWAS) of COVID-19 Outcomes by Race Using the Electronic Health Records Data in Michigan Medicine
Salvatore M, Gu T, Mack J, Sankar S, Patil S, Valley T, Singh K, Nallamothu B, Kheterpal S, Lisabeth L, Fritsche L, Mukherjee B. A Phenome-Wide Association Study (PheWAS) of COVID-19 Outcomes by Race Using the Electronic Health Records Data in Michigan Medicine. Journal Of Clinical Medicine 2021, 10: 1351. PMID: 33805886, PMCID: PMC8037108, DOI: 10.3390/jcm10071351.Peer-Reviewed Original ResearchPhenome-wide association studyCOVID-19 outcomesIntensive care unitAssociation studiesNon-Hispanic blacksNon-Hispanic whitesAcademic medical centerAssociated with hospitalizationHealthcare deliveryAssociated with mortalityMedicine backgroundPre-existing conditionsMedical phenomeDisease preventionVulnerable populationsPulmonary heart diseaseTargeted screeningMental disordersCOVID-19Associated with intensive care unitMedical CenterRecord DataCare unitGenitourinary conditionsHeart disease
2020
Cancer PRSweb: An Online Repository with Polygenic Risk Scores for Major Cancer Traits and Their Evaluation in Two Independent Biobanks
Fritsche L, Patil S, Beesley L, VandeHaar P, Salvatore M, Ma Y, Peng R, Taliun D, Zhou X, Mukherjee B. Cancer PRSweb: An Online Repository with Polygenic Risk Scores for Major Cancer Traits and Their Evaluation in Two Independent Biobanks. American Journal Of Human Genetics 2020, 107: 815-836. PMID: 32991828, PMCID: PMC7675001, DOI: 10.1016/j.ajhg.2020.08.025.Peer-Reviewed Original ResearchConceptsPolygenic risk scoresGenome-wide association studiesMichigan Genomics InitiativeUK BiobankPopulation-based UK BiobankPolygenic risk score constructionPublished genome-wide association studiesLongitudinal biorepository effortAssociation studiesPredictive polygenic risk scoresRisk scoreNHGRI-EBI GWAS CatalogCancer traitsIndependent biobankMichigan MedicineGWAS CatalogGenome InitiativeBiobankScoresTraitsCancer researchOnline repositoryMichiganMedicineEvaluationAn efficient and computationally robust statistical method for analyzing case-control mother–offspring pair genetic association studies
Zhang H, Mukherjee B, Arthur V, Hu G, Hochner H, Chen J. An efficient and computationally robust statistical method for analyzing case-control mother–offspring pair genetic association studies. The Annals Of Applied Statistics 2020, 14: 560-584. DOI: 10.1214/19-aoas1298.Peer-Reviewed Original ResearchEnvironmental risk factorsRisk factorsMaternal environmental risk factorsOffspring genetic effectsPerinatal environmental risk factorsGenetic association studiesFinite sample performancePregnancy healthGenetic risk factorsAssessment of pre-Extensive simulation studyGestational diabetes mellitusIncreased statistical efficiencyLogistic regressionAssociation studiesMaternal genotypeSample performanceMendelian transmissionProfile likelihoodRegression modelsOffspring genotypesEarly-lifeInference proceduresLagrange multiplier methodLikelihood methodAn analytic framework for exploring sampling and observation process biases in genome and phenome‐wide association studies using electronic health records
Beesley L, Fritsche L, Mukherjee B. An analytic framework for exploring sampling and observation process biases in genome and phenome‐wide association studies using electronic health records. Statistics In Medicine 2020, 39: 1965-1979. PMID: 32198773, DOI: 10.1002/sim.8524.Peer-Reviewed Original ResearchConceptsElectronic health recordsHealth recordsAssociation studiesObservational health care databasesElectronic health record dataLongitudinal biorepository effortPhenome-wide association studyMichigan Genomics InitiativeHealth record dataHealth care databasesDisease-gene association studiesMichigan Health SystemCare databaseHealth systemPhenotype misclassificationStudy biasRecord dataNonprobability samplingAssociation analysisData sourcesGenome InitiativeMisclassificationAnalysis approachRecordsSensitivity analysis
2019
A Fast and Accurate Method for Genome-wide Scale Phenome-wide G × E Analysis and Its Application to UK Biobank
Bi W, Zhao Z, Dey R, Fritsche L, Mukherjee B, Lee S. A Fast and Accurate Method for Genome-wide Scale Phenome-wide G × E Analysis and Its Application to UK Biobank. American Journal Of Human Genetics 2019, 105: 1182-1192. PMID: 31735295, PMCID: PMC6904814, DOI: 10.1016/j.ajhg.2019.10.008.Peer-Reviewed Original ResearchConceptsCase-control ratioGenome-wide significance levelMeasures of environmental exposureGenome-wide analysisEuropean ancestry samplesGenetic association studiesSaddlepoint approximationCase-control imbalanceAnalysis of phenotypesGene-environment interactionsPopulation-based biobanksControlled type I error ratesAssociation studiesG x E effectsUK BiobankType I error rateGenetic variantsE analysisSPAGEComplex diseasesEnvironmental exposuresTest statisticsE studySimulation studyWald testExploring various polygenic risk scores for skin cancer in the phenomes of the Michigan genomics initiative and the UK Biobank with a visual catalog: PRSWeb
Fritsche L, Beesley L, VandeHaar P, Peng R, Salvatore M, Zawistowski M, Taliun S, Das S, LeFaive J, Kaleba E, Klumpner T, Moser S, Blanc V, Brummett C, Kheterpal S, Abecasis G, Gruber S, Mukherjee B. Exploring various polygenic risk scores for skin cancer in the phenomes of the Michigan genomics initiative and the UK Biobank with a visual catalog: PRSWeb. PLOS Genetics 2019, 15: e1008202. PMID: 31194742, PMCID: PMC6592565, DOI: 10.1371/journal.pgen.1008202.Peer-Reviewed Original ResearchConceptsMichigan Genomics InitiativeElectronic health recordsPolygenic risk scoresSkin cancer subtypesPheWAS resultsUK BiobankElectronic health record dataLongitudinal biorepository effortPhenome-wide association studyRisk scoreHealth record dataUK Biobank dataPrediction of disease riskPublicly-available sourcesHealth recordsGenetic architectureBiobank dataMichigan MedicineRecord dataSecondary phenotypesDisease riskVisual catalogAssociation studiesGenome InitiativePheWAS