Featured Publications
A meta-inference framework to integrate multiple external models into a current study.
Gu T, Taylor J, Mukherjee B. A meta-inference framework to integrate multiple external models into a current study. Biostatistics 2021, 24: 406-424. PMID: 34269371, PMCID: PMC10102901, DOI: 10.1093/biostatistics/kxab017.Peer-Reviewed Original ResearchConceptsAccuracy of statistical inferenceEmpirical Bayes estimatorsSummary-level informationBias-variance trade-offRelevant external informationBayes estimatorsStatistical inferenceExternal informationExternal estimatesNaive analysisNaive combinationInternational dataWeight estimationExternal modelMeta-analysis frameworkIndividual-level dataEfficiency gainsEstimationInfluence of informationTrade-offsInformationFrameworkSet‐based tests for genetic association in longitudinal studies
He Z, Zhang M, Lee S, Smith J, Guo X, Palmas W, Kardia S, Diez Roux A, Mukherjee B. Set‐based tests for genetic association in longitudinal studies. Biometrics 2015, 71: 606-615. PMID: 25854837, PMCID: PMC4601568, DOI: 10.1111/biom.12310.Peer-Reviewed Original ResearchConceptsMulti-Ethnic Study of AtherosclerosisGenome-wide association studiesJoint effect of multiple variantsLinkage disequilibriumAssociation studiesEffects of multiple variantsMarkers of chronic diseaseGenetic variantsSet-based testGene-based testsLongitudinal outcomesMulti-Ethnic StudyGenetic association studiesStudy of AtherosclerosisChronic diseasesPhenotypic variationGenetic associationObservational studyLongitudinal analysisWithin-subject correlationMultiple variantsScore type testsJoint testJoint effectsMarker tests
2024
Improving prediction of linear regression models by integrating external information from heterogeneous populations: James–Stein estimators
Han P, Li H, Park S, Mukherjee B, Taylor J. Improving prediction of linear regression models by integrating external information from heterogeneous populations: James–Stein estimators. Biometrics 2024, 80: ujae072. PMID: 39101548, PMCID: PMC11299067, DOI: 10.1093/biomtc/ujae072.Peer-Reviewed Original ResearchMeSH KeywordsBiometryComputer SimulationData Interpretation, StatisticalHumansLeadLinear ModelsModels, StatisticalPatellaConceptsJames-Stein estimatorLinear regression modelsIndividual-level dataComprehensive simulation studyRegression modelsNumerical performanceSimulation studyShrinkage methodCoefficient estimatesPredictive meanReduced modelStudy population heterogeneityInternal modelEstimationStudy populationBlood lead levelsInternational studiesCovariatesPatella bonePublished literatureLead levelsExternal studiesSummary informationPopulationSubsets
2023
An inverse probability weighted regression method that accounts for right‐censoring for causal inference with multiple treatments and a binary outcome
Yu Y, Zhang M, Mukherjee B. An inverse probability weighted regression method that accounts for right‐censoring for causal inference with multiple treatments and a binary outcome. Statistics In Medicine 2023, 42: 3699-3715. PMID: 37392070, DOI: 10.1002/sim.9826.Peer-Reviewed Original ResearchMeSH KeywordsComputer SimulationHumansMaleModels, StatisticalProbabilityPropensity ScoreProstatic NeoplasmsRegression AnalysisTreatment OutcomeConceptsRight censoringWeighted score functionCausal treatment effectsAverage treatment effectAsymptotic propertiesCensored componentPre-specified time windowEstimation consistencyRobustness propertiesSimulation studyBinary outcomesPresence of confoundersCensoringScoring functionInverse probabilityTreatment effectsEstimationSources of biasInferenceLetter CComparative effectiveness researchTreatment switchRegression methodLogistic regression modelsInsurance claims databaseA Synthetic Data Integration Framework to Leverage External Summary-Level Information from Heterogeneous Populations
Gu T, Taylor J, Mukherjee B. A Synthetic Data Integration Framework to Leverage External Summary-Level Information from Heterogeneous Populations. Biometrics 2023, 79: 3831-3845. PMID: 36876883, PMCID: PMC10480346, DOI: 10.1111/biom.13852.Peer-Reviewed Original ResearchConceptsCovariate effectsStatistical inferenceHeterogeneity of covariate effectsRegression coefficient estimatesSummary-level informationImprove statistical inferenceInternational studiesOutcome YCovariate informationData integration frameworkStatistical efficiencyCoefficient estimatesPartial informationExternal populationGeneral frameworkIndividual-level dataRisk prediction modelExternal modelPrediction problemInternational study populationMultiple imputation
2022
Methods for large‐scale single mediator hypothesis testing: Possible choices and comparisons
Du J, Zhou X, Clark‐Boucher D, Hao W, Liu Y, Smith J, Mukherjee B. Methods for large‐scale single mediator hypothesis testing: Possible choices and comparisons. Genetic Epidemiology 2022, 47: 167-184. PMID: 36465006, PMCID: PMC10329872, DOI: 10.1002/gepi.22510.Peer-Reviewed Original ResearchConceptsNull hypothesisTest statisticsMediation hypothesis testingComposite null hypothesisHypothesis testingClasses of methodsFalse positive rateAlternative hypothesisSimulation studyHypothesis testing methodContinuous mediatorReference distributionSobel test statisticsContinuous outcomesExposure-mediator interactionMulti-Ethnic Study of AtherosclerosisDNA methylation sitesClassCRANMethylation sites
2021
A comparison of five epidemiological models for transmission of SARS-CoV-2 in India
Purkayastha S, Bhattacharyya R, Bhaduri R, Kundu R, Gu X, Salvatore M, Ray D, Mishra S, Mukherjee B. A comparison of five epidemiological models for transmission of SARS-CoV-2 in India. BMC Infectious Diseases 2021, 21: 533. PMID: 34098885, PMCID: PMC8181542, DOI: 10.1186/s12879-021-06077-9.Peer-Reviewed Original ResearchMeSH KeywordsBayes TheoremCommunicable Disease ControlComputer SimulationCOVID-19ForecastingHumansIndiaModels, StatisticalPandemicsA comparison of parametric propensity score‐based methods for causal inference with multiple treatments and a binary outcome
Yu Y, Zhang M, Shi X, Caram M, Little R, Mukherjee B. A comparison of parametric propensity score‐based methods for causal inference with multiple treatments and a binary outcome. Statistics In Medicine 2021, 40: 1653-1677. PMID: 33462862, DOI: 10.1002/sim.8862.Peer-Reviewed Original ResearchMeSH KeywordsBiasCausalityComparative Effectiveness ResearchComputer SimulationHumansMaleModels, StatisticalPropensity ScoreConceptsComparative effectiveness researchEstimation of causal effectsPropensity score-based methodsBinary outcomesInsurance networksCausal effectsPropensity score methodsPropensity-based methodsConfounding biasContinuous outcomesPharmacy claimsEffectiveness researchObservational studySimulation studyAdverse outcomesPropensity scoreEmergency room
2020
Methods to Account for Uncertainty in Latent Class Assignments When Using Latent Classes as Predictors in Regression Models, with Application to Acculturation Strategy Measures.
Elliott M, Zhao Z, Mukherjee B, Kanaya A, Needham B. Methods to Account for Uncertainty in Latent Class Assignments When Using Latent Classes as Predictors in Regression Models, with Application to Acculturation Strategy Measures. Epidemiology 2020, 31: 194-204. PMID: 31809338, PMCID: PMC7480960, DOI: 10.1097/ede.0000000000001139.Peer-Reviewed Original ResearchConceptsMeasurement error modelJoint modelRegression parametersLatent classesLikelihood-basedLatent class modelSimulation studyClass modelTwo-stage modelClassError modelPrimary interestAcculturation behaviorsMeasurement errorSouth Asian immigrantsLatent class analysisAsian immigrantsTrue classUncertaintyClass analysisEstimationStrategy measures
2019
Bayesian Shrinkage Estimation of High Dimensional Causal Mediation Effects in Omics Studies
Song Y, Zhou X, Zhang M, Zhao W, Liu Y, Kardia S, Roux A, Needham B, Smith J, Mukherjee B. Bayesian Shrinkage Estimation of High Dimensional Causal Mediation Effects in Omics Studies. Biometrics 2019, 76: 700-710. PMID: 31733066, PMCID: PMC7228845, DOI: 10.1111/biom.13189.Peer-Reviewed Original ResearchConceptsMediation analysisEffect of socioeconomic statusPotential mediatorsMulti-Ethnic StudyCausal mediation analysisCardiometabolic outcomesDNA methylation regionsSocioeconomic statusHigh-throughput technologiesMediation effectGenomic dataEpidemiological dataMethylation regionsHigh-dimensional mediatorsBayesian inference methodsContinuous shrinkage priorsHigh-dimensional settingsBayesian shrinkage estimatorsJoint analysisOutcomesShrinkage priorsPathwayNull caseInference methodsBiomedical studiesEstimating Outcome-Exposure Associations when Exposure Biomarker Detection Limits vary Across Batches.
Boss J, Mukherjee B, Ferguson K, Aker A, Alshawabkeh A, Cordero J, Meeker J, Kim S. Estimating Outcome-Exposure Associations when Exposure Biomarker Detection Limits vary Across Batches. Epidemiology 2019, 30: 746-755. PMID: 31299670, PMCID: PMC6677587, DOI: 10.1097/ede.0000000000001052.Peer-Reviewed Original ResearchConceptsBinary outcome dataLikelihood-based methodsComplete-case analysisDistributional assumptionsAssignment of samplesSuperior estimation propertiesSimulation studyComplete-caseMultiple imputation strategyExposure dataMultiple batchesBatch assignmentEstimated propertiesLimit-variablesSingle imputationMultiple imputationCohort study
2018
Foetal ultrasound measurement imputations based on growth curves versus multiple imputation chained equation (MICE)
Ferguson K, Yu Y, Cantonwine D, McElrath T, Meeker J, Mukherjee B. Foetal ultrasound measurement imputations based on growth curves versus multiple imputation chained equation (MICE). Paediatric And Perinatal Epidemiology 2018, 32: 469-473. PMID: 30016545, PMCID: PMC6939297, DOI: 10.1111/ppe.12486.Peer-Reviewed Original ResearchConceptsLinear mixed modelsComplete-case analysisMultiple imputationEpidemiological studies of risk factorsImputed datasetsComplete-caseDemographic factorsStudy of risk factorsLIFECODES birth cohortUltrasound measurementsCalculate associationsBirth cohortCross-sectionEpidemiological studiesRisk factorsStudy visitsLongitudinal analysisParametric linear mixed modelImputationMissing dataMixed modelsLongitudinal measurementsSample sizeCovariate dataGrowth restrictionImputation of missing values in a large job exposure matrix using hierarchical information
Roberts B, Cheng W, Mukherjee B, Neitzel R. Imputation of missing values in a large job exposure matrix using hierarchical information. Journal Of Exposure Science & Environmental Epidemiology 2018, 28: 615-648. PMID: 29789667, PMCID: PMC9929916, DOI: 10.1038/s41370-018-0037-x.Peer-Reviewed Original ResearchImproving estimation and prediction in linear regression incorporating external information from an established reduced model
Cheng W, Taylor J, Vokonas P, Park S, Mukherjee B. Improving estimation and prediction in linear regression incorporating external information from an established reduced model. Statistics In Medicine 2018, 37: 1515-1530. PMID: 29365342, PMCID: PMC5889759, DOI: 10.1002/sim.7600.Peer-Reviewed Original ResearchMeSH KeywordsBayes TheoremData Interpretation, StatisticalHumansLinear ModelsModels, StatisticalRegression AnalysisConceptsOutcome variable YEfficiency of estimationApproximate Bayesian inferenceBayes solutionVariable YNonlinear constraintsInferential frameworkVariable BE(Y|XImprove inferenceBayesian inferenceEffective computational methodParameter spaceReduced modelImproved estimatesLinear regression modelsTransformation approachStandard errorDunsonInferenceEstimationRegression modelsProblemCovariatesSpace
2017
Robust distributed lag models using data adaptive shrinkage
Chen Y, Mukherjee B, Adar S, Berrocal V, Coull B. Robust distributed lag models using data adaptive shrinkage. Biostatistics 2017, 19: 461-478. PMID: 29040386, PMCID: PMC6454578, DOI: 10.1093/biostatistics/kxx041.Peer-Reviewed Original ResearchMeSH KeywordsAir PollutionBayes TheoremBiostatisticsComputer SimulationEnvironmental ExposureEpidemiologyHealth SurveysHumansModels, StatisticalConceptsDistributed lag modelsDistributed LagLag modelTime series dataEffects of air pollutionBias-variance trade-offGeneralized ridge regressionShrinkage methodAir pollution studiesHierarchical Bayes approachShrinkage approachTime seriesDl functionAir pollutionPollution studiesEffect estimatesTrade-offsExtensive simulation studyDependent variableShrinking coefficientsMean square errorLagSimulation studyBayes approachRidge regressionMeta‐analysis of gene‐environment interaction exploiting gene‐environment independence across multiple case‐control studies
Estes J, Rice J, Li S, Stringham H, Boehnke M, Mukherjee B. Meta‐analysis of gene‐environment interaction exploiting gene‐environment independence across multiple case‐control studies. Statistics In Medicine 2017, 36: 3895-3909. PMID: 28744888, PMCID: PMC5624850, DOI: 10.1002/sim.7398.Peer-Reviewed Original ResearchMeSH KeywordsAge FactorsAlpha-Ketoglutarate-Dependent Dioxygenase FTOBayes TheoremBiasBiometryBody Mass IndexCase-Control StudiesComputer SimulationDiabetes Mellitus, Type 2Gene-Environment InteractionHumansLogistic ModelsMeta-Analysis as TopicModels, GeneticModels, StatisticalPolymorphism, Single NucleotideRetrospective StudiesConceptsGene-environment independenceGene-environmentEmpirical Bayes estimatorsGene-environment interactionsCase-control studyMeta-analysis settingBayes estimatorsRetrospective likelihood frameworkShrinkage estimatorsMeta-analysisTesting gene-environment interactionsCombination of estimatesFactors body mass indexSimulation studyBody mass indexUnconstrained modelLikelihood frameworkInverse varianceMeta-analysis frameworkFTO geneMass indexGenetic markersEstimationStandard alternativeChatterjeeUpdate on the State of the Science for Analytical Methods for Gene-Environment Interactions
Gauderman W, Mukherjee B, Aschard H, Hsu L, Lewinger J, Patel C, Witte J, Amos C, Tai C, Conti D, Torgerson D, Lee S, Chatterjee N. Update on the State of the Science for Analytical Methods for Gene-Environment Interactions. American Journal Of Epidemiology 2017, 186: 762-770. PMID: 28978192, PMCID: PMC5859988, DOI: 10.1093/aje/kwx228.Peer-Reviewed Original ResearchConceptsGenome-wide association studiesG x EGene-environment interactionsAssociation studiesAnalysis of gene-environment interactionsQuantitative trait studiesComplex traitsGenetic dataGene setsTrait studiesGene-environmentCase-controlEnvironmental dataConsortium settingFormation of consortiaGenesConsortiumAnalytical challengesTraitsSetsStudyInteractionStatistical approachDataExposure enriched outcome dependent designs for longitudinal studies of gene–environment interaction
Sun Z, Mukherjee B, Estes J, Vokonas P, Park S. Exposure enriched outcome dependent designs for longitudinal studies of gene–environment interaction. Statistics In Medicine 2017, 36: 2947-2960. PMID: 28497531, PMCID: PMC5523112, DOI: 10.1002/sim.7332.Peer-Reviewed Original ResearchConceptsLongitudinal cohort studyCohort studyCase-only designLongitudinal studyG x E interactionNormative Aging StudyComplete-case analysisGene-environmentSampling designCase-controlVeterans AdministrationComplex human diseasesE interactionExposure informationAging StudyOutcome trajectoriesStratified samplingRetrospective genotypingIndividual exposureCovariate dataExposure effectsJoint effectsOutcomesTime-varying outcomeEnvironmental factors
2016
Classification and Clustering Methods for Multiple Environmental Factors in Gene–Environment Interaction
Ko Y, Mukherjee B, Smith J, Kardia S, Allison M, Roux A. Classification and Clustering Methods for Multiple Environmental Factors in Gene–Environment Interaction. Epidemiology 2016, 27: 870-878. PMID: 27479650, PMCID: PMC5039086, DOI: 10.1097/ede.0000000000000548.Peer-Reviewed Original ResearchMeSH KeywordsAgedAged, 80 and overAtherosclerosisBayes TheoremCluster AnalysisData Interpretation, StatisticalEnvironmental ExposureEpidemiologic Research DesignFemaleFollow-Up StudiesGene-Environment InteractionGenetic Predisposition to DiseaseHumansMiddle AgedModels, StatisticalRegression AnalysisRisk FactorsConceptsMultiple environmental exposuresGene-environment interactionsG x EEnvironmental exposuresMultiethnic Study of AtherosclerosisStudy of AtherosclerosisGene-environmentEffect modificationMultiethnic StudyEnvironmental factorsExposure subgroupsEnvironmental exposure profilesMain effectExposure profilesE studyEfficient analysis strategyE analysisMultiple environmental factorsSubgroupsAnalysis strategyFactorsExposureProduct termsMediation Formula for a Binary Outcome and a Time-Varying Exposure and Mediator, Accounting for Possible Exposure-Mediator Interaction
Chen Y, Mukherjee B, Ferguson K, Meeker J, VanderWeele T. Mediation Formula for a Binary Outcome and a Time-Varying Exposure and Mediator, Accounting for Possible Exposure-Mediator Interaction. American Journal Of Epidemiology 2016, 184: 157-159. PMID: 27325886, PMCID: PMC4945703, DOI: 10.1093/aje/kww045.Peer-Reviewed Original Research