Featured Publications
A meta-inference framework to integrate multiple external models into a current study.
Gu T, Taylor J, Mukherjee B. A meta-inference framework to integrate multiple external models into a current study. Biostatistics 2021, 24: 406-424. PMID: 34269371, PMCID: PMC10102901, DOI: 10.1093/biostatistics/kxab017.Peer-Reviewed Original ResearchConceptsAccuracy of statistical inferenceEmpirical Bayes estimatorsSummary-level informationBias-variance trade-offRelevant external informationBayes estimatorsStatistical inferenceExternal informationExternal estimatesNaive analysisNaive combinationInternational dataWeight estimationExternal modelMeta-analysis frameworkIndividual-level dataEfficiency gainsEstimationInfluence of informationTrade-offsInformationFrameworkSet‐based tests for genetic association in longitudinal studies
He Z, Zhang M, Lee S, Smith J, Guo X, Palmas W, Kardia S, Diez Roux A, Mukherjee B. Set‐based tests for genetic association in longitudinal studies. Biometrics 2015, 71: 606-615. PMID: 25854837, PMCID: PMC4601568, DOI: 10.1111/biom.12310.Peer-Reviewed Original ResearchConceptsMulti-Ethnic Study of AtherosclerosisGenome-wide association studiesJoint effect of multiple variantsLinkage disequilibriumAssociation studiesEffects of multiple variantsMarkers of chronic diseaseGenetic variantsSet-based testGene-based testsLongitudinal outcomesMulti-Ethnic StudyGenetic association studiesStudy of AtherosclerosisChronic diseasesPhenotypic variationGenetic associationObservational studyLongitudinal analysisWithin-subject correlationMultiple variantsScore type testsJoint testJoint effectsMarker tests
2024
Improving prediction of linear regression models by integrating external information from heterogeneous populations: James–Stein estimators
Han P, Li H, Park S, Mukherjee B, Taylor J. Improving prediction of linear regression models by integrating external information from heterogeneous populations: James–Stein estimators. Biometrics 2024, 80: ujae072. PMID: 39101548, PMCID: PMC11299067, DOI: 10.1093/biomtc/ujae072.Peer-Reviewed Original ResearchMeSH KeywordsBiometryComputer SimulationData Interpretation, StatisticalHumansLeadLinear ModelsModels, StatisticalPatellaConceptsJames-Stein estimatorLinear regression modelsIndividual-level dataComprehensive simulation studyRegression modelsNumerical performanceSimulation studyShrinkage methodCoefficient estimatesPredictive meanReduced modelStudy population heterogeneityInternal modelEstimationStudy populationBlood lead levelsInternational studiesCovariatesPatella bonePublished literatureLead levelsExternal studiesSummary informationPopulationSubsets
2020
Interaction analysis under misspecification of main effects: Some common mistakes and simple solutions
Zhang M, Yu Y, Wang S, Salvatore M, Fritsche L, He Z, Mukherjee B. Interaction analysis under misspecification of main effects: Some common mistakes and simple solutions. Statistics In Medicine 2020, 39: 1675-1694. PMID: 32101638, DOI: 10.1002/sim.8505.Peer-Reviewed Original ResearchConceptsType I error rateType I error inflationIndependence assumptionWald and score testsCorrect type I error ratesSandwich variance estimatorSandwich estimatorScore testVariance estimationSimulation studyMisspecificationMichigan Genomics InitiativeStatistical practiceBinary outcomesTested interactionsEmpirical factsFlexible modelData modelTest of interactionBiobank studyInflationAssumptionsContinuous outcomesEpidemiological literatureLinear regression models
2019
Estimating Outcome-Exposure Associations when Exposure Biomarker Detection Limits vary Across Batches.
Boss J, Mukherjee B, Ferguson K, Aker A, Alshawabkeh A, Cordero J, Meeker J, Kim S. Estimating Outcome-Exposure Associations when Exposure Biomarker Detection Limits vary Across Batches. Epidemiology 2019, 30: 746-755. PMID: 31299670, PMCID: PMC6677587, DOI: 10.1097/ede.0000000000001052.Peer-Reviewed Original ResearchMeSH KeywordsBiomarkersComputer SimulationData Interpretation, StatisticalEnvironmental ExposureEpidemiologic Research DesignHumansLimit of DetectionModels, StatisticalConceptsBinary outcome dataLikelihood-based methodsComplete-case analysisDistributional assumptionsAssignment of samplesSuperior estimation propertiesSimulation studyComplete-caseMultiple imputation strategyExposure dataMultiple batchesBatch assignmentEstimated propertiesLimit-variablesSingle imputationMultiple imputationCohort study
2018
Foetal ultrasound measurement imputations based on growth curves versus multiple imputation chained equation (MICE)
Ferguson K, Yu Y, Cantonwine D, McElrath T, Meeker J, Mukherjee B. Foetal ultrasound measurement imputations based on growth curves versus multiple imputation chained equation (MICE). Paediatric And Perinatal Epidemiology 2018, 32: 469-473. PMID: 30016545, PMCID: PMC6939297, DOI: 10.1111/ppe.12486.Peer-Reviewed Original ResearchMeSH KeywordsData Interpretation, StatisticalFemaleFetal Growth RetardationHumansLinear ModelsLongitudinal StudiesModels, StatisticalPregnancyReference ValuesUltrasonography, PrenatalConceptsLinear mixed modelsComplete-case analysisMultiple imputationEpidemiological studies of risk factorsImputed datasetsComplete-caseDemographic factorsStudy of risk factorsLIFECODES birth cohortUltrasound measurementsCalculate associationsBirth cohortCross-sectionEpidemiological studiesRisk factorsStudy visitsLongitudinal analysisParametric linear mixed modelImputationMissing dataMixed modelsLongitudinal measurementsSample sizeCovariate dataGrowth restrictionImproving estimation and prediction in linear regression incorporating external information from an established reduced model
Cheng W, Taylor J, Vokonas P, Park S, Mukherjee B. Improving estimation and prediction in linear regression incorporating external information from an established reduced model. Statistics In Medicine 2018, 37: 1515-1530. PMID: 29365342, PMCID: PMC5889759, DOI: 10.1002/sim.7600.Peer-Reviewed Original ResearchMeSH KeywordsBayes TheoremData Interpretation, StatisticalHumansLinear ModelsModels, StatisticalRegression AnalysisConceptsOutcome variable YEfficiency of estimationApproximate Bayesian inferenceBayes solutionVariable YNonlinear constraintsInferential frameworkVariable BE(Y|XImprove inferenceBayesian inferenceEffective computational methodParameter spaceReduced modelImproved estimatesLinear regression modelsTransformation approachStandard errorDunsonInferenceEstimationRegression modelsProblemCovariatesSpace
2016
Classification and Clustering Methods for Multiple Environmental Factors in Gene–Environment Interaction
Ko Y, Mukherjee B, Smith J, Kardia S, Allison M, Roux A. Classification and Clustering Methods for Multiple Environmental Factors in Gene–Environment Interaction. Epidemiology 2016, 27: 870-878. PMID: 27479650, PMCID: PMC5039086, DOI: 10.1097/ede.0000000000000548.Peer-Reviewed Original ResearchMeSH KeywordsAgedAged, 80 and overAtherosclerosisBayes TheoremCluster AnalysisData Interpretation, StatisticalEnvironmental ExposureEpidemiologic Research DesignFemaleFollow-Up StudiesGene-Environment InteractionGenetic Predisposition to DiseaseHumansMiddle AgedModels, StatisticalRegression AnalysisRisk FactorsConceptsMultiple environmental exposuresGene-environment interactionsG x EEnvironmental exposuresMultiethnic Study of AtherosclerosisStudy of AtherosclerosisGene-environmentEffect modificationMultiethnic StudyEnvironmental factorsExposure subgroupsEnvironmental exposure profilesMain effectExposure profilesE studyEfficient analysis strategyE analysisMultiple environmental factorsSubgroupsAnalysis strategyFactorsExposureProduct termsMediation Formula for a Binary Outcome and a Time-Varying Exposure and Mediator, Accounting for Possible Exposure-Mediator Interaction
Chen Y, Mukherjee B, Ferguson K, Meeker J, VanderWeele T. Mediation Formula for a Binary Outcome and a Time-Varying Exposure and Mediator, Accounting for Possible Exposure-Mediator Interaction. American Journal Of Epidemiology 2016, 184: 157-159. PMID: 27325886, PMCID: PMC4945703, DOI: 10.1093/aje/kww045.Peer-Reviewed Original ResearchMeSH KeywordsData Interpretation, StatisticalEpidemiologic Research DesignHumansLongitudinal StudiesModels, StatisticalTime Factors
2015
Applying Novel Methods for Assessing Individual- and Neighborhood-Level Social and Psychosocial Environment Interactions with Genetic Factors in the Prediction of Depressive Symptoms in the Multi-Ethnic Study of Atherosclerosis
Ware E, Smith J, Mukherjee B, Lee S, Kardia S, Diez-Roux A. Applying Novel Methods for Assessing Individual- and Neighborhood-Level Social and Psychosocial Environment Interactions with Genetic Factors in the Prediction of Depressive Symptoms in the Multi-Ethnic Study of Atherosclerosis. Behavior Genetics 2015, 46: 89-99. PMID: 26254610, PMCID: PMC4720563, DOI: 10.1007/s10519-015-9734-6.Peer-Reviewed Original ResearchConceptsDepressive symptom scoresMulti-Ethnic Study of AtherosclerosisGene regionNeighborhood levelMulti-Ethnic StudyPredictive of depressive symptomsStudy of AtherosclerosisMultiple race/ethnicitiesMultiple testing correctionAssess individual-SKAT analysisNeighborhood factorsEtiology of depressive illnessDepressive symptomsPsychosocial stressorsSymptom scoresComplex illnessTesting correctionRace/ethnicityRace/ethnicitiesEthnic groupsDepressive illnessGenetic predispositionIndividual-Genetic factorsStatistical methods for modeling repeated measures of maternal environmental exposure biomarkers during pregnancy in association with preterm birth
Chen Y, Ferguson K, Meeker J, McElrath T, Mukherjee B. Statistical methods for modeling repeated measures of maternal environmental exposure biomarkers during pregnancy in association with preterm birth. Environmental Health 2015, 14: 9. PMID: 25619201, PMCID: PMC4417225, DOI: 10.1186/1476-069x-14-9.Peer-Reviewed Original ResearchMeSH KeywordsAdultAge FactorsBiomarkersBostonCase-Control StudiesCross-Sectional StudiesData Interpretation, StatisticalEnvironmental ExposureFemaleHazardous SubstancesHumansInfant, NewbornMaternal ExposureMiddle AgedModels, StatisticalPhthalic AcidsPregnancyPremature BirthSocioeconomic FactorsYoung AdultConceptsPreterm birthEnvironmental chemical exposuresMeasures of urinary phthalate metabolitesNested case-control studyCross-sectional analysisAverage exposureMeasures of exposureCase-control studyUrinary phthalate metabolitesModel repeated measuresEpidemiological research projectsLongitudinal exposureRepeated measuresPremature birthPretermEnvironmental exposure biomarkersExposure measurementsUrinary metabolitesMaternal factorsPhthalate metabolitesPregnancyStudy of phthalatesLongitudinal predictorsChemical exposureBirth
2013
Bayesian Analysis of Time-Series Data under Case-Crossover Designs: Posterior Equivalence and Inference
Li S, Mukherjee B, Batterman S, Ghosh M. Bayesian Analysis of Time-Series Data under Case-Crossover Designs: Posterior Equivalence and Inference. Biometrics 2013, 69: 925-936. PMID: 24289144, PMCID: PMC4108592, DOI: 10.1111/biom.12102.Peer-Reviewed Original ResearchConceptsSemi-parametric Bayesian approachLikelihood-based approachRandom nuisance parametersTime series analysisFrequentist literatureNuisance parametersDirichlet processInferential issuesConditional likelihoodPosterior distributionRisk functionTime seriesBayesian workFrequentist approachCase-crossover designSimulation studyRestrictive assumptionsBayesian approachTime Series DataLikelihood formulationBayesian methodsEquivalent resultsBayesian analysisCase-crossoverBayesian framework
2012
On the equivalence of posterior inference based on retrospective and prospective likelihoods: application to a case‐control study of colorectal cancer
Ghosh M, Song J, Forster J, Mitra R, Mukherjee B. On the equivalence of posterior inference based on retrospective and prospective likelihoods: application to a case‐control study of colorectal cancer. Statistics In Medicine 2012, 31: 2196-2208. PMID: 22495822, DOI: 10.1002/sim.5358.Peer-Reviewed Original ResearchMeSH KeywordsBayes TheoremCase-Control StudiesColorectal NeoplasmsComputer SimulationData Interpretation, StatisticalHumansLikelihood FunctionsOdds RatioProspective StudiesRetrospective StudiesRisk FactorsConceptsPosterior inferenceCase-control study of colorectal cancerOdds ratio parametersCategorical response dataBayesian analysis of dataStudy of colorectal cancerCase-control studyGeneral classProspective likelihoodSimulation studyCategorical responsesBayesian analysisColorectal cancerMatched case-control studyInferenceAnalysis of dataResponse dataPriorsRetrospective designRetrospective modelEquivalenceLikelihood‐based methods for regression analysis with binary exposure status assessed by pooling
Lyles R, Tang L, Lin J, Zhang Z, Mukherjee B. Likelihood‐based methods for regression analysis with binary exposure status assessed by pooling. Statistics In Medicine 2012, 31: 2485-2497. PMID: 22415630, PMCID: PMC3528351, DOI: 10.1002/sim.4426.Peer-Reviewed Original ResearchConceptsPopulation-based case-control study of colorectal cancerCase-control study of colorectal cancerPopulation-based case-control studyStudy of colorectal cancerExposure statusBinary outcomesRegression modelsCase-control sampleLogistic regression modelsGene-disease associationsObserved binary outcomeStudy designEpidemiological studiesColorectal cancerAssess exposureMaximum likelihood analysisRegression analysisLikelihood-based methodsExposure assessmentMaximum likelihood approachLikelihood approachCross-sectionSimulation studyOutcomesLikelihood analysisPrincipal interactions analysis for repeated measures data: application to gene–gene and gene–environment interactions
Mukherjee B, Ko Y, VanderWeele T, Roy A, Park S, Chen J. Principal interactions analysis for repeated measures data: application to gene–gene and gene–environment interactions. Statistics In Medicine 2012, 31: 2531-2551. PMID: 22415818, PMCID: PMC4046647, DOI: 10.1002/sim.5315.Peer-Reviewed Original ResearchMeSH KeywordsAge FactorsBiomarkersComputer SimulationData Interpretation, StatisticalGene-Environment InteractionHearingHumansLongitudinal StudiesModels, StatisticalOxidative StressConceptsGene-environment interactionsGene-geneLongitudinal cohort studyNormative Aging StudyHealth outcomesMain effect termsMeasured outcomesAging StudyOccupational historyEpistasis modelsEnvironmental exposuresMain effectLongitudinal natureLongitudinal dataResampling-based methodsCell meansClassification arrayQuantitative traitsInteraction analysisRobust classLeading eigenvaluesSimulation studyTime-varying effectsSubject-specificOutcomesEfficient designs of gene–environment interaction studies: implications of Hardy–Weinberg equilibrium and gene–environment independence
Chen J, Kang G, VanderWeele T, Zhang C, Mukherjee B. Efficient designs of gene–environment interaction studies: implications of Hardy–Weinberg equilibrium and gene–environment independence. Statistics In Medicine 2012, 31: 2516-2530. PMID: 22362617, PMCID: PMC3448495, DOI: 10.1002/sim.4460.Peer-Reviewed Original ResearchMeSH KeywordsCase-Control StudiesComputer SimulationData Interpretation, StatisticalGene-Environment InteractionHumansPolymorphism, Single NucleotideResearch DesignConceptsPresence of G-E interactionsG-E interactionsSubsample of casesGene-environmentHardy-Weinberg equilibriumG-E independenceGene-environment interaction studiesGene-environment independenceRandom subsampleGenetic susceptibility variantsCase-control sampleEnvironmental risk factorsSusceptibility variantsExternal control dataRisk factorsGenetic effectsWald statisticInteraction studiesSubsampleVariable EControl dataEnvironmental effectsIndependenceDataWald
2011
Sensitivity analysis for interactions under unmeasured confounding
VanderWeele T, Mukherjee B, Chen J. Sensitivity analysis for interactions under unmeasured confounding. Statistics In Medicine 2011, 31: 2552-2564. PMID: 21976358, PMCID: PMC4226658, DOI: 10.1002/sim.4354.Peer-Reviewed Original ResearchLogistic regression analysis of biomarker data subject to pooling and dichotomization
Zhang Z, Liu A, Lyles R, Mukherjee B. Logistic regression analysis of biomarker data subject to pooling and dichotomization. Statistics In Medicine 2011, 31: 2473-2484. PMID: 21953741, DOI: 10.1002/sim.4367.Peer-Reviewed Original ResearchMeSH KeywordsBiomarkersColorectal NeoplasmsComputer SimulationData Interpretation, StatisticalHumansLogistic ModelsPolymorphism, Single NucleotideProspective StudiesConceptsPopulation-based case-control study of colorectal cancerCase-control study of colorectal cancerProspective logistic regression modelPopulation-based case-control studyStudy of colorectal cancerEpidemiological studiesLogistic regression modelsAnalysis of epidemiological dataLogistic regression analysisBinary exposurePooled measureColorectal cancerRegression modelsEpidemiological dataRegression analysisAnalysis of biomarker dataDisease statusExposed subjectsBiomarker dataChoice of designSubjectsEstimated parametersStatusRecommendations
2009
Shrinkage estimation for robust and efficient screening of single‐SNP association from case‐control genome‐wide association studies
Luo S, Mukherjee B, Chen J, Chatterjee N. Shrinkage estimation for robust and efficient screening of single‐SNP association from case‐control genome‐wide association studies. Genetic Epidemiology 2009, 33: 740-750. PMID: 19434716, PMCID: PMC3103068, DOI: 10.1002/gepi.20428.Peer-Reviewed Original ResearchMeSH KeywordsCase-Control StudiesComputational BiologyComputer SimulationData Interpretation, StatisticalFalse Positive ReactionsGenetic MarkersGenomeGenome, HumanGenome-Wide Association StudyGenotypeHumansLikelihood FunctionsModels, StatisticalPolymorphism, Single NucleotideReproducibility of ResultsConceptsHardy-Weinberg equilibriumAssociation TestPopulation-based case-control designGenome-wide association scanGenome-wide association studiesSingle-SNP associationsCase-control designCase-control studyAssociation scansAssociation studiesGenetic markersSusceptibility SNPsRecessive effectUnderlying populationAssociationFalse-positive resultsEfficient screeningSNPsRare diseaseShrinkage estimatorsSimulation studyStudyTestTwo-degrees-of-freedomPopulation
2008
Modeling Unobserved Sources of Heterogeneity in Animal Abundance Using a Dirichlet Process Prior
Dorazio R, Mukherjee B, Zhang L, Ghosh M, Jelks H, Jordan F. Modeling Unobserved Sources of Heterogeneity in Animal Abundance Using a Dirichlet Process Prior. Biometrics 2008, 64: 635-644. PMID: 17680831, DOI: 10.1111/j.1541-0420.2007.00873.x.Peer-Reviewed Original ResearchMeSH KeywordsAnimalsBiometryComputer SimulationData Interpretation, StatisticalDemographyModels, StatisticalPopulation DensityConceptsSampling locationsSampling protocolNatural populations of animalsPredictions of abundanceAbundance of animalsDistribution of abundanceEndangered fish speciesInduce spatial heterogeneityAnimal abundanceOkaloosa DartersPopulations of animalsUnsampled locationsFish speciesRemoval samplingSpatial heterogeneityAnalysis of countsAbundanceDirichlet processData-adaptive wayModel specificationSources of heterogeneitySpeciesParametric alternativesDartersParametric model