Featured Publications
Statistical Inference for Association Studies Using Electronic Health Records: Handling Both Selection Bias and Outcome Misclassification
Beesley L, Mukherjee B. Statistical Inference for Association Studies Using Electronic Health Records: Handling Both Selection Bias and Outcome Misclassification. Biometrics 2020, 78: 214-226. PMID: 33179768, DOI: 10.1111/biom.13400.Peer-Reviewed Original ResearchConceptsElectronic health recordsHealth recordsElectronic health record data analysisElectronic health record settingsSelection biasMichigan Genomics InitiativeAssociation studiesEHR-linkedHealth researchInverse probability weighting methodStudy sampleEffect estimatesProbability weighting methodLack of representativenessType I errorSurvey sampling literatureStandard error estimatesGold standard labelsDisease statusError estimatesStatistical inferenceMisclassificationInference strategySampling literatureStandard labelsA meta-inference framework to integrate multiple external models into a current study.
Gu T, Taylor J, Mukherjee B. A meta-inference framework to integrate multiple external models into a current study. Biostatistics 2021, 24: 406-424. PMID: 34269371, PMCID: PMC10102901, DOI: 10.1093/biostatistics/kxab017.Peer-Reviewed Original ResearchConceptsAccuracy of statistical inferenceEmpirical Bayes estimatorsSummary-level informationBias-variance trade-offRelevant external informationBayes estimatorsStatistical inferenceExternal informationExternal estimatesNaive analysisNaive combinationInternational dataWeight estimationExternal modelMeta-analysis frameworkIndividual-level dataEfficiency gainsEstimationInfluence of informationTrade-offsInformationFrameworkExploiting Gene-Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes-Type Shrinkage Estimator to Trade-Off Between Bias and Efficiency
Mukherjee B, Chatterjee N. Exploiting Gene-Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes-Type Shrinkage Estimator to Trade-Off Between Bias and Efficiency. Biometrics 2007, 64: 685-694. PMID: 18162111, DOI: 10.1111/j.1541-0420.2007.00953.x.Peer-Reviewed Original ResearchConceptsGene-environment independenceShrinkage estimatorsLog odds ratio parametersCase-control dataGene-environment independence assumptionOdds ratio parametersCase-control estimatorsData-adaptive fashionData exampleProspective logistic regression analysisBinary exposureGene-environment associationsIndependence assumptionLogistic regression analysisCase-onlyMaximum likelihood frameworkEstimationSample sizeBinary genesRegression analysisChatterjeeExamplesWeighted averageAssumptions
2022
Case studies in bias reduction and inference for electronic health record data with selection bias and phenotype misclassification
Beesley L, Mukherjee B. Case studies in bias reduction and inference for electronic health record data with selection bias and phenotype misclassification. Statistics In Medicine 2022, 41: 5501-5516. PMID: 36131394, PMCID: PMC9826451, DOI: 10.1002/sim.9579.Peer-Reviewed Original ResearchConceptsElectronic health recordsElectronic health record data analysisElectronic health record settingsLeverages external data sourcesElectronic health record dataPopulation-based data sourcesEHR-based researchLongitudinal health informationUniversity of Michigan Health SystemHealth record dataSelection biasPopulation-based researchMichigan Health SystemMultiple sources of biasFactors related to selectionPatient-level dataHealth recordsHealth systemHealth informationPhenotype misclassificationSummary estimatesPhenotyping errorsCancer diagnosisSources of biasRecord data
2021
A comparison of parametric propensity score‐based methods for causal inference with multiple treatments and a binary outcome
Yu Y, Zhang M, Shi X, Caram M, Little R, Mukherjee B. A comparison of parametric propensity score‐based methods for causal inference with multiple treatments and a binary outcome. Statistics In Medicine 2021, 40: 1653-1677. PMID: 33462862, DOI: 10.1002/sim.8862.Peer-Reviewed Original ResearchConceptsComparative effectiveness researchEstimation of causal effectsPropensity score-based methodsBinary outcomesInsurance networksCausal effectsPropensity score methodsPropensity-based methodsConfounding biasContinuous outcomesPharmacy claimsEffectiveness researchObservational studySimulation studyAdverse outcomesPropensity scoreEmergency room
2020
An analytic framework for exploring sampling and observation process biases in genome and phenome‐wide association studies using electronic health records
Beesley L, Fritsche L, Mukherjee B. An analytic framework for exploring sampling and observation process biases in genome and phenome‐wide association studies using electronic health records. Statistics In Medicine 2020, 39: 1965-1979. PMID: 32198773, DOI: 10.1002/sim.8524.Peer-Reviewed Original ResearchConceptsElectronic health recordsHealth recordsAssociation studiesObservational health care databasesElectronic health record dataLongitudinal biorepository effortPhenome-wide association studyMichigan Genomics InitiativeHealth record dataHealth care databasesDisease-gene association studiesMichigan Health SystemCare databaseHealth systemPhenotype misclassificationStudy biasRecord dataNonprobability samplingAssociation analysisData sourcesGenome InitiativeMisclassificationAnalysis approachRecordsSensitivity analysisInteraction analysis under misspecification of main effects: Some common mistakes and simple solutions
Zhang M, Yu Y, Wang S, Salvatore M, Fritsche L, He Z, Mukherjee B. Interaction analysis under misspecification of main effects: Some common mistakes and simple solutions. Statistics In Medicine 2020, 39: 1675-1694. PMID: 32101638, DOI: 10.1002/sim.8505.Peer-Reviewed Original ResearchConceptsType I error rateType I error inflationIndependence assumptionWald and score testsCorrect type I error ratesSandwich variance estimatorSandwich estimatorScore testVariance estimationSimulation studyMisspecificationMichigan Genomics InitiativeStatistical practiceBinary outcomesTested interactionsEmpirical factsFlexible modelData modelTest of interactionBiobank studyInflationAssumptionsContinuous outcomesEpidemiological literatureLinear regression models
2018
Imputation of missing values in a large job exposure matrix using hierarchical information
Roberts B, Cheng W, Mukherjee B, Neitzel R. Imputation of missing values in a large job exposure matrix using hierarchical information. Journal Of Exposure Science & Environmental Epidemiology 2018, 28: 615-648. PMID: 29789667, PMCID: PMC9929916, DOI: 10.1038/s41370-018-0037-x.Peer-Reviewed Original Research
2017
Meta‐analysis of gene‐environment interaction exploiting gene‐environment independence across multiple case‐control studies
Estes J, Rice J, Li S, Stringham H, Boehnke M, Mukherjee B. Meta‐analysis of gene‐environment interaction exploiting gene‐environment independence across multiple case‐control studies. Statistics In Medicine 2017, 36: 3895-3909. PMID: 28744888, PMCID: PMC5624850, DOI: 10.1002/sim.7398.Peer-Reviewed Original ResearchMeSH KeywordsAge FactorsAlpha-Ketoglutarate-Dependent Dioxygenase FTOBayes TheoremBiasBiometryBody Mass IndexCase-Control StudiesComputer SimulationDiabetes Mellitus, Type 2Gene-Environment InteractionHumansLogistic ModelsMeta-Analysis as TopicModels, GeneticModels, StatisticalPolymorphism, Single NucleotideRetrospective StudiesConceptsGene-environment independenceGene-environmentEmpirical Bayes estimatorsGene-environment interactionsCase-control studyMeta-analysis settingBayes estimatorsRetrospective likelihood frameworkShrinkage estimatorsMeta-analysisTesting gene-environment interactionsCombination of estimatesFactors body mass indexSimulation studyBody mass indexUnconstrained modelLikelihood frameworkInverse varianceMeta-analysis frameworkFTO geneMass indexGenetic markersEstimationStandard alternativeChatterjeeRobust Tests for Additive Gene-Environment Interaction in Case-Control Studies Using Gene-Environment Independence
Liu G, Mukherjee B, Lee S, Lee AW, Wu AH, Bandera EV, Jensen A, Rossing MA, Moysich KB, Chang-Claude J, Doherty JA, Gentry-Maharaj A, Kiemeney L, Gayther SA, Modugno F, Massuger L, Goode EL, Fridley BL, Terry KL, Cramer DW, Ramus SJ, Anton-Culver H, Ziogas A, Tyrer JP, Schildkraut JM, Kjaer SK, Webb PM, Ness RB, Menon U, Berchuck A, Pharoah PD, Risch H, Pearce CL, Consortium F. Robust Tests for Additive Gene-Environment Interaction in Case-Control Studies Using Gene-Environment Independence. American Journal Of Epidemiology 2017, 187: 366-377. PMID: 28633381, PMCID: PMC5860584, DOI: 10.1093/aje/kwx243.Peer-Reviewed Original Research
2014
The impact of exposure-biased sampling designs on detection of gene–environment interactions in case–control studies with potential exposure misclassification
Stenzel S, Ahn J, Boonstra P, Gruber S, Mukherjee B. The impact of exposure-biased sampling designs on detection of gene–environment interactions in case–control studies with potential exposure misclassification. European Journal Of Epidemiology 2014, 30: 413-423. PMID: 24894824, PMCID: PMC4256150, DOI: 10.1007/s10654-014-9908-1.Peer-Reviewed Original ResearchConceptsG-E interactionsExposure informationDetection of gene-environment interactionsPrevalence of exposureGene-environment interactionsSampling designCase-control studyRandom selection of subjectsPerformance of sampling designsCase-onlyExposure prevalenceJoint testExposure misclassificationCase-controlRare exposuresMarginal associationSelection of subjectsType I errorEmpirical simulation studyIdeal sampling schemesJoint effectsPrevalenceRandom selectionG-EMisclassificationThe Role of Environmental Heterogeneity in Meta‐Analysis of Gene–Environment Interactions With Quantitative Traits
Li S, Mukherjee B, Taylor J, Rice K, Wen X, Rice J, Stringham H, Boehnke M. The Role of Environmental Heterogeneity in Meta‐Analysis of Gene–Environment Interactions With Quantitative Traits. Genetic Epidemiology 2014, 38: 416-429. PMID: 24801060, PMCID: PMC4108593, DOI: 10.1002/gepi.21810.Peer-Reviewed Original ResearchMeSH KeywordsAlpha-Ketoglutarate-Dependent Dioxygenase FTOBiasBody Mass IndexCase-Control StudiesCholesterol, HDLCohort StudiesDiabetes Mellitus, Type 2Gene FrequencyGene-Environment InteractionGenetic Predisposition to DiseaseHumansMeta-Analysis as TopicModels, GeneticPhenotypePolymorphism, Single NucleotideProteinsQuantitative Trait, HeritableConceptsIndividual level dataMeta-analysisInverse-variance weighted meta-analysisEnvironmental heterogeneityGene-environment interaction studiesInverse-variance weighted estimatorMeta-analysis of interactionsStudy of type 2 diabetesGene-environment interactionsBody mass indexMeta-regression approachSingle nucleotide polymorphismsAdaptive weighted estimatorFTO geneType 2 diabetesMass indexMeta-regressionQuantitative traitsSummary statisticsCholesterol dataNucleotide polymorphismsLevel dataUnivariate summary statisticsData harmonizationEnvironmental covariates
2013
Environmental Confounding in Gene-Environment Interaction Studies
Vanderweele T, Ko Y, Mukherjee B. Environmental Confounding in Gene-Environment Interaction Studies. American Journal Of Epidemiology 2013, 178: 144-152. PMID: 23821317, PMCID: PMC3698991, DOI: 10.1093/aje/kws439.Peer-Reviewed Original ResearchConceptsGene-environment independenceGene-environment interaction studiesGene-environment interactionsEnvironmental confoundersGenetic factorsJoint testGene-environmentGenetic effectsEnvironmental factorsConfounding variablesConfoundingInteraction studiesSimulation studyJoint nullSample sizeBias estimatesFactorsIndependenceStudyTest
2011
Sensitivity analysis for interactions under unmeasured confounding
VanderWeele T, Mukherjee B, Chen J. Sensitivity analysis for interactions under unmeasured confounding. Statistics In Medicine 2011, 31: 2552-2564. PMID: 21976358, PMCID: PMC4226658, DOI: 10.1002/sim.4354.Peer-Reviewed Original ResearchA Latent Variable Approach to Study Gene–Environment Interactions in the Presence of Multiple Correlated Exposures
Sánchez B, Kang S, Mukherjee B. A Latent Variable Approach to Study Gene–Environment Interactions in the Presence of Multiple Correlated Exposures. Biometrics 2011, 68: 466-476. PMID: 21955029, PMCID: PMC4405908, DOI: 10.1111/j.1541-0420.2011.01677.x.Peer-Reviewed Original ResearchMeSH KeywordsAnalysis of VarianceBiasBiometryBirth WeightCase-Control StudiesComputer SimulationEnvironmental ExposureEpidemiologic FactorsFemaleGene-Environment InteractionHumansInfant, NewbornIronLead PoisoningModels, StatisticalPregnancyPrenatal Exposure Delayed EffectsPrincipal Component AnalysisConceptsGene-environment interactionsGene-environmentEnvironmental epidemiologyCohort studyGene-environment dependenceBurden of multiple testingStudy gene-environment interactionsEnvironmental exposuresExposure dataEarly life exposuresLV frameworkG x E effectsHealth StudyCorrelated exposuresG x EDisease riskLife exposureMultiple testingFunction of environmental exposureE studyGenotype categoriesStudy of lead exposureBirth weightIron metabolism genesAdaptive trade-off
2007
Accounting for error due to misclassification of exposures in case–control studies of gene–environment interaction
Zhang L, Mukherjee B, Ghosh M, Gruber S, Moreno V. Accounting for error due to misclassification of exposures in case–control studies of gene–environment interaction. Statistics In Medicine 2007, 27: 2756-2783. PMID: 17879261, DOI: 10.1002/sim.3044.Peer-Reviewed Original ResearchConceptsCase-control studyCase-control study of colorectal cancerGene-environment independence assumptionStudy of gene-environment interactionsStudy of colorectal cancerCase-control study designEnvironmental exposuresDisease-exposure associationsCase-control dataMisclassification of exposureGene-environment interactionsDegree of misclassificationStudy designConfidence intervalsGenotyping errorsValidation subsampleColorectal cancerAnalysis of dataMisclassification error rateGenetic factorsIndependence assumptionMisclassificationMisclassified dataAnalytical formEstimation strategy