Featured Publications
To weight or not to weight? The effect of selection bias in 3 large electronic health record-linked biobanks and recommendations for practice
Salvatore M, Kundu R, Shi X, Friese C, Lee S, Fritsche L, Mondul A, Hanauer D, Pearce C, Mukherjee B. To weight or not to weight? The effect of selection bias in 3 large electronic health record-linked biobanks and recommendations for practice. Journal Of The American Medical Informatics Association 2024, 31: 1479-1492. PMID: 38742457, PMCID: PMC11187425, DOI: 10.1093/jamia/ocae098.Peer-Reviewed Original ResearchEHR-linked biobanksNational Health Interview Survey dataHealth Interview Survey dataPhenome-wide association studyMichigan Genomics InitiativeElectronic health record-linked biobankTarget populationInterview Survey dataColorectal cancerUS adult populationSelection biasUK BiobankAssociation estimatesBiobank dataRecruitment strategiesEffect of selection biasICD codesLog odds ratioUKBSelection weightsEffect sizeAssociation studiesAdult populationBiobankImpact prevalenceExploring the Big Data Paradox for various estimands using vaccination data from the global COVID-19 Trends and Impact Survey (CTIS)
Yang Y, Dempsey W, Han P, Deshmukh Y, Richardson S, Tom B, Mukherjee B. Exploring the Big Data Paradox for various estimands using vaccination data from the global COVID-19 Trends and Impact Survey (CTIS). Science Advances 2024, 10: eadj0266. PMID: 38820165, PMCID: PMC11314312, DOI: 10.1126/sciadv.adj0266.Peer-Reviewed Original ResearchStatistical Inference for Association Studies Using Electronic Health Records: Handling Both Selection Bias and Outcome Misclassification
Beesley L, Mukherjee B. Statistical Inference for Association Studies Using Electronic Health Records: Handling Both Selection Bias and Outcome Misclassification. Biometrics 2020, 78: 214-226. PMID: 33179768, DOI: 10.1111/biom.13400.Peer-Reviewed Original ResearchConceptsElectronic health recordsHealth recordsElectronic health record data analysisElectronic health record settingsSelection biasMichigan Genomics InitiativeAssociation studiesEHR-linkedHealth researchInverse probability weighting methodStudy sampleEffect estimatesProbability weighting methodLack of representativenessType I errorSurvey sampling literatureStandard error estimatesGold standard labelsDisease statusError estimatesStatistical inferenceMisclassificationInference strategySampling literatureStandard labelsAssociation of Polygenic Risk Scores for Multiple Cancers in a Phenome-wide Study: Results from The Michigan Genomics Initiative
Fritsche L, Gruber S, Wu Z, Schmidt E, Zawistowski M, Moser S, Blanc V, Brummett C, Kheterpal S, Abecasis G, Mukherjee B. Association of Polygenic Risk Scores for Multiple Cancers in a Phenome-wide Study: Results from The Michigan Genomics Initiative. American Journal Of Human Genetics 2018, 102: 1048-1061. PMID: 29779563, PMCID: PMC5992124, DOI: 10.1016/j.ajhg.2018.04.001.Peer-Reviewed Original ResearchConceptsPolygenic risk scoresElectronic health recordsAssociations of polygenic risk scoresPhenome-wide significant associationsPolygenic risk score associationsLongitudinal biorepository effortNon-cancer diagnosesPatients' electronic health recordsPhenome-wide association studyAnalysis of temporal orderMichigan Genomics InitiativeRisk scoreAssociated with multiple phenotypesFemale breast cancerNHGRI-EBI CatalogRisk profileGenetic risk profilesMeasures of genomic variationCancer traitsCase-control studyPheWAS analysisHealth recordsHealth systemMichigan MedicineCancer diagnosisCharacteristics Associated With Racial/Ethnic Disparities in COVID-19 Outcomes in an Academic Health Care System
Gu T, Mack J, Salvatore M, Sankar S, Valley T, Singh K, Nallamothu B, Kheterpal S, Lisabeth L, Fritsche L, Mukherjee B. Characteristics Associated With Racial/Ethnic Disparities in COVID-19 Outcomes in an Academic Health Care System. JAMA Network Open 2020, 3: e2025197. PMID: 33084902, PMCID: PMC7578774, DOI: 10.1001/jamanetworkopen.2020.25197.Peer-Reviewed Original ResearchMeSH KeywordsAdultAgedBetacoronavirusBlack or African AmericanComorbidityCoronavirus InfectionsCOVID-19Diabetes Mellitus, Type 2FemaleHealth Status DisparitiesHospitalizationHumansIntensive Care UnitsKidney DiseasesMaleMichiganMiddle AgedNeoplasmsObesityOdds RatioPandemicsPneumonia, ViralPopulation DensityRetrospective StudiesRisk FactorsSARS-CoV-2White PeopleConceptsAssociated with higher riskInternational Classification of DiseasesRisk of hospitalizationPreexisting type 2 diabetesHigher risk of hospitalizationClassification of DiseasesType 2 diabetesCOVID-19 outcomesRacial/ethnic disparitiesWhite patientsBlack patientsIntensive care unitInternational ClassificationResidential-level socioeconomic characteristicsOdds ratioStatistically significant racial differencesHigh riskAssociated with higher risk of hospitalizationNon-Hispanic blacksAssociation of risk factorsNon-Hispanic whitesMichigan Department of HealthAssociated with increased risk of hospitalizationComorbidity scoreDepartment of HealthA meta-inference framework to integrate multiple external models into a current study.
Gu T, Taylor J, Mukherjee B. A meta-inference framework to integrate multiple external models into a current study. Biostatistics 2021, 24: 406-424. PMID: 34269371, PMCID: PMC10102901, DOI: 10.1093/biostatistics/kxab017.Peer-Reviewed Original ResearchConceptsAccuracy of statistical inferenceEmpirical Bayes estimatorsSummary-level informationBias-variance trade-offRelevant external informationBayes estimatorsStatistical inferenceExternal informationExternal estimatesNaive analysisNaive combinationInternational dataWeight estimationExternal modelMeta-analysis frameworkIndividual-level dataEfficiency gainsEstimationInfluence of informationTrade-offsInformationFrameworkToward Realizing the Promise of AI in Precision Health Across the Spectrum of Care
Wiens J, Spector-Bagdady K, Mukherjee B. Toward Realizing the Promise of AI in Precision Health Across the Spectrum of Care. Annual Review Of Genomics And Human Genetics 2024, 25: 141-159. PMID: 38724019, DOI: 10.1146/annurev-genom-010323-010230.Peer-Reviewed Original ResearchChronic care managementSpectrum of careArtificial intelligenceClinical care decisionsAcademic medical centerEthical challengesClinical decision-makingImprove careCare decisionsPreventive careCare managementPrecision healthTertiary careLeveraging patient dataReduce inequalitiesCareMedical CenterInconsistent useSelection biasAI solutionsPatient dataMissing dataDecision-makingDesign imperfectionsA framework for understanding selection bias in real-world healthcare data
Kundu R, Shi X, Morrison J, Barrett J, Mukherjee B. A framework for understanding selection bias in real-world healthcare data. Journal Of The Royal Statistical Society Series A (Statistics In Society) 2024, 187: 606-635. PMID: 39281782, PMCID: PMC11393555, DOI: 10.1093/jrsssa/qnae039.Peer-Reviewed Original ResearchElectronic health recordsSelection biasAssociation of cancerMultiple sources of biasHealth recordsHealthcare systemSources of biasReal-world healthcare dataBinary outcomesEstimation of associated parametersHealthcare dataReal-world dataPotential biasSample sizeStandard errorData exampleVariance formulaAnalysis of real-world dataAssociationSimulation studyWeighting approachBiological sexAssociated parametersBiasMultiple sourcesIncorporating functional annotation with bilevel continuous shrinkage for polygenic risk prediction
Zhuang Y, Kim N, Fritsche L, Mukherjee B, Lee S. Incorporating functional annotation with bilevel continuous shrinkage for polygenic risk prediction. BMC Bioinformatics 2024, 25: 65. PMID: 38336614, PMCID: PMC11323637, DOI: 10.1186/s12859-024-05664-2.Peer-Reviewed Original ResearchConceptsPredictive performance of polygenic risk scoresFunctional annotationGenetic architecturePerformance of polygenic risk scoresPRS-CSAnnotation informationPolygenic risk predictionGenetic risk predictionPolygenic risk scoresFunctional annotation informationKyoto Encyclopedia of GenesRisk predictionProportion of variantsEncyclopedia of GenesGenomes (KEGGSource of annotationTrait heritabilityAnnotation groupsPathway informationQuantitative traitsKyoto EncyclopediaFunctional categoriesBackgroundGenetic variantsHeritable contributionReal world data sourcesMethods for mediation analysis with high-dimensional DNA methylation data: Possible choices and comparisons
Clark-Boucher D, Zhou X, Du J, Liu Y, Needham B, Smith J, Mukherjee B. Methods for mediation analysis with high-dimensional DNA methylation data: Possible choices and comparisons. PLOS Genetics 2023, 19: e1011022. PMID: 37934796, PMCID: PMC10655967, DOI: 10.1371/journal.pgen.1011022.Peer-Reviewed Original ResearchConceptsBayesian Sparse Linear Mixed ModelMediation analysisHigh-dimensional mediation analysisMulti-ethnic cohortEpigenetic researchHealth outcomesHigh-dimensional DNA methylation dataLinear mixed modelsDNA methylation dataContinuous outcomesEvaluate DNA methylationDNA methylationMethylation dataDNAm dataMixed modelsDiverse simulationsSeamless implementationModern statistical methodsMediation effectR packageUnited StatesOutcomesThe importance of investing in data, models, experiments, team science, and public trust to help policymakers prepare for the next pandemic
Grieve R, Yang Y, Abbott S, Babu G, Bhattacharyya M, Dean N, Evans S, Jewell N, Langan S, Lee W, Molenberghs G, Smeeth L, Williamson E, Mukherjee B. The importance of investing in data, models, experiments, team science, and public trust to help policymakers prepare for the next pandemic. PLOS Global Public Health 2023, 3: e0002601. PMID: 38032861, PMCID: PMC10688710, DOI: 10.1371/journal.pgph.0002601.Peer-Reviewed Original ResearchLessons from SARS-CoV-2 in India: A data-driven framework for pandemic resilience
Salvatore M, Purkayastha S, Ganapathi L, Bhattacharyya R, Kundu R, Zimmermann L, Ray D, Hazra A, Kleinsasser M, Solomon S, Subbaraman R, Mukherjee B. Lessons from SARS-CoV-2 in India: A data-driven framework for pandemic resilience. Science Advances 2022, 8: eabp8621. PMID: 35714183, PMCID: PMC9205583, DOI: 10.1126/sciadv.abp8621.Peer-Reviewed Original ResearchComparative impact assessment of COVID-19 policy interventions in five South Asian countries using reported and estimated unreported death counts during 2020-2021
Kundu R, Datta J, Ray D, Mishra S, Bhattacharyya R, Zimmermann L, Mukherjee B. Comparative impact assessment of COVID-19 policy interventions in five South Asian countries using reported and estimated unreported death counts during 2020-2021. PLOS Global Public Health 2023, 3: e0002063. PMID: 38150465, PMCID: PMC10752546, DOI: 10.1371/journal.pgph.0002063.Peer-Reviewed Original ResearchCOVID-19 deathsPolicy of BangladeshUnder-reporting of deathsCOVID-19 performanceSouth Asian countriesIndia's strategyDeath dataGlobal SouthDeath registration systemPolicy interventionsMiddle income countriesDeath tollComparative impact assessmentSouth AsiaPandemic policiesDeath estimationPublic health interventionsPolicyRegistration systemDeath countsSri LankaIncome countriesCountriesAsian countriesHealth interventionsExploiting Gene-Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes-Type Shrinkage Estimator to Trade-Off Between Bias and Efficiency
Mukherjee B, Chatterjee N. Exploiting Gene-Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes-Type Shrinkage Estimator to Trade-Off Between Bias and Efficiency. Biometrics 2007, 64: 685-694. PMID: 18162111, DOI: 10.1111/j.1541-0420.2007.00953.x.Peer-Reviewed Original ResearchConceptsGene-environment independenceShrinkage estimatorsLog odds ratio parametersCase-control dataGene-environment independence assumptionOdds ratio parametersCase-control estimatorsData-adaptive fashionData exampleProspective logistic regression analysisBinary exposureGene-environment associationsIndependence assumptionLogistic regression analysisCase-onlyMaximum likelihood frameworkEstimationSample sizeBinary genesRegression analysisChatterjeeExamplesWeighted averageAssumptionsRisk of Non-Melanoma Cancers in First-Degree Relatives of CDKN2A Mutation Carriers
Mukherjee B, DeLancey J, Raskin L, Everett J, Jeter J, Begg C, Orlow I, Berwick M, Armstrong B, Kricker A, Marrett L, Millikan R, Culver H, Rosso S, Zanetti R, Kanetsky P, From L, Gruber S, Investigators F. Risk of Non-Melanoma Cancers in First-Degree Relatives of CDKN2A Mutation Carriers. Journal Of The National Cancer Institute 2012, 104: 953-956. PMID: 22534780, PMCID: PMC3379723, DOI: 10.1093/jnci/djs221.Peer-Reviewed Original ResearchConceptsFirst-degree relatives of carriersCDKN2A mutation carriersFirst-degree relativesMutation carriersNon-melanoma cancersFirst-degree relatives of melanoma patientsFirst-degree relatives of mutation carriersKin-cohort methodConfidence intervalsRisk of cancerMelanoma patientsLifetime riskProband's genotypeNon-melanomaFamily membersIncreased riskGastrointestinal cancerCDKN2A mutationsWilms tumorRiskMelanoma StudyPancreatic cancerNoncarriersGenotype distributionMelanomaSet‐based tests for genetic association in longitudinal studies
He Z, Zhang M, Lee S, Smith J, Guo X, Palmas W, Kardia S, Diez Roux A, Mukherjee B. Set‐based tests for genetic association in longitudinal studies. Biometrics 2015, 71: 606-615. PMID: 25854837, PMCID: PMC4601568, DOI: 10.1111/biom.12310.Peer-Reviewed Original ResearchConceptsMulti-Ethnic Study of AtherosclerosisGenome-wide association studiesJoint effect of multiple variantsLinkage disequilibriumAssociation studiesEffects of multiple variantsMarkers of chronic diseaseGenetic variantsSet-based testGene-based testsLongitudinal outcomesMulti-Ethnic StudyGenetic association studiesStudy of AtherosclerosisChronic diseasesPhenotypic variationGenetic associationObservational studyLongitudinal analysisWithin-subject correlationMultiple variantsScore type testsJoint testJoint effectsMarker testsSet-Based Tests for the Gene–Environment Interaction in Longitudinal Studies
He Z, Zhang M, Lee S, Smith J, Kardia S, Roux V, Mukherjee B. Set-Based Tests for the Gene–Environment Interaction in Longitudinal Studies. Journal Of The American Statistical Association 2017, 112: 966-978. PMID: 29780190, PMCID: PMC5954413, DOI: 10.1080/01621459.2016.1252266.Peer-Reviewed Original ResearchGene-environment interactionsMulti-Ethnic Study of AtherosclerosisSet-based testMeasures of neighborhood environmentMarginal genetic associationsEnvironmental exposuresMulti-Ethnic StudyStudy of AtherosclerosisNeighborhood environmentMeasurement of blood pressureGene-environmentMain-effects modelScore type testsMethod of sievesLongitudinal measures of blood pressureRobust to misspecificationGenetic associationGenetic variantsLongitudinal studyMain effectStudy periodEffects modelContinuous environmental exposurePotential biasIndependent conditions
2024
Impact of pandemic-related worries on mental health in India from 2020 to 2022
Yang Y, Sun A, Zimmermann L, Mukherjee B. Impact of pandemic-related worries on mental health in India from 2020 to 2022. Npj Mental Health Research 2024, 3: 57. PMID: 39582077, PMCID: PMC11586416, DOI: 10.1038/s44184-024-00101-x.Peer-Reviewed Original ResearchMental healthCalendar timeSelf-reported symptoms of depressionMental health outcomesPandemic-related worriesSymptoms of depressionMental health trendsSelf-reported symptomsHealth outcomesPublic health crisisHealth trendsEffect modifiersImpact SurveyResidential statusSocial media platformsFinancial stressHealth crisisWorryHealthInteraction termsMedia platformsAnxietyDepressionSurvey dataCalendarPrenatal exposure to per- and polyfluoroalkyl substances (PFAS) and their influence on inflammatory biomarkers in pregnancy: Findings from the LIFECODES cohort
Siwakoti R, Harris S, Ferguson K, Hao W, Cantonwine D, Mukherjee B, McElrath T, Meeker J. Prenatal exposure to per- and polyfluoroalkyl substances (PFAS) and their influence on inflammatory biomarkers in pregnancy: Findings from the LIFECODES cohort. Environment International 2024, 194: 109145. PMID: 39550829, DOI: 10.1016/j.envint.2024.109145.Peer-Reviewed Original ResearchC-reactive proteinNested case-control study of preterm birthCase-control study of preterm birthInflammatory biomarkersIL-10TNF-aInterquartile range increaseNested case-control studyStudy of preterm birthPre-pregnancy BMIQuantile-based g-computation approachIL-6Prenatal PFAS exposureInflammatory processPregnancy plasma samplesMeasurement of inflammatory biomarkersBirth outcomesLinear mixed modelsInverse associationRange increaseMaternal raceMaternal demographicsTNF-a levelsAdverse pregnancyAfrican AmericansImproving prediction models of amyotrophic lateral sclerosis (ALS) using polygenic, pre-existing conditions, and survey-based risk scores in the UK Biobank
Jin W, Boss J, Bakulski K, Goutman S, Feldman E, Fritsche L, Mukherjee B. Improving prediction models of amyotrophic lateral sclerosis (ALS) using polygenic, pre-existing conditions, and survey-based risk scores in the UK Biobank. Journal Of Neurology 2024, 271: 6923-6934. PMID: 39249108, DOI: 10.1007/s00415-024-12644-2.Peer-Reviewed Original ResearchPolygenic risk scoresRisk scorePre-existing conditionsPhenome-wide association studyControls of European descentPhenotype risk scoreUK Biobank dataAmyotrophic lateral sclerosis riskRisk score distributionIncreased ALS riskInfluence of environmental exposuresExposure-related factorsCombined risk scoreUK BiobankAmyotrophic lateral sclerosisBaseline demographic covariatesBiobank dataPRS-CSALS riskAmyotrophic lateral sclerosis diagnosisDiagnosis 1Demographic covariatesAssociation studiesEuropean descentMethodsUtilizing data