Featured Publications
A framework for understanding selection bias in real-world healthcare data
Kundu R, Shi X, Morrison J, Barrett J, Mukherjee B. A framework for understanding selection bias in real-world healthcare data. Journal Of The Royal Statistical Society Series A (Statistics In Society) 2024, 187: 606-635. PMID: 39281782, PMCID: PMC11393555, DOI: 10.1093/jrsssa/qnae039.Peer-Reviewed Original ResearchElectronic health recordsSelection biasAssociation of cancerMultiple sources of biasHealth recordsHealthcare systemSources of biasReal-world healthcare dataBinary outcomesEstimation of associated parametersHealthcare dataReal-world dataPotential biasSample sizeStandard errorData exampleVariance formulaAnalysis of real-world dataAssociationSimulation studyWeighting approachBiological sexAssociated parametersBiasMultiple sourcesExploiting Gene-Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes-Type Shrinkage Estimator to Trade-Off Between Bias and Efficiency
Mukherjee B, Chatterjee N. Exploiting Gene-Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes-Type Shrinkage Estimator to Trade-Off Between Bias and Efficiency. Biometrics 2007, 64: 685-694. PMID: 18162111, DOI: 10.1111/j.1541-0420.2007.00953.x.Peer-Reviewed Original ResearchConceptsGene-environment independenceShrinkage estimatorsLog odds ratio parametersCase-control dataGene-environment independence assumptionOdds ratio parametersCase-control estimatorsData-adaptive fashionData exampleProspective logistic regression analysisBinary exposureGene-environment associationsIndependence assumptionLogistic regression analysisCase-onlyMaximum likelihood frameworkEstimationSample sizeBinary genesRegression analysisChatterjeeExamplesWeighted averageAssumptions
2021
A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures
Boss J, Rix A, Chen Y, Narisetty N, Wu Z, Ferguson K, McElrath T, Meeker J, Mukherjee B. A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures. Environmetrics 2021, 32 PMID: 34899005, PMCID: PMC8664243, DOI: 10.1002/env.2698.Peer-Reviewed Original ResearchGroup least absolute shrinkageEnvironmental health studiesHealth outcomesHealth StudyLIFECODES birth cohortBirth cohortExposure interactionsPenalized regression methodsDose-response relationshipExposure mixturesComprehensive R Archive NetworkInteraction effectsInduce sparsityAdaptive weightsGroup lassoSelection operatorHeredity constraintLeast Absolute ShrinkageSelection frameworkNonlinear interaction effectsSample sizeVariable selectionJoint effectsCoefficient estimatesGroup structureRevisiting the genome-wide significance threshold for common variant GWAS
Chen Z, Boehnke M, Wen X, Mukherjee B. Revisiting the genome-wide significance threshold for common variant GWAS. G3: Genes, Genomes, Genetics 2021, 11: jkaa056. PMID: 33585870, PMCID: PMC8022962, DOI: 10.1093/g3journal/jkaa056.Peer-Reviewed Original ResearchConceptsGenome-wide significance thresholdP-value thresholdGWAS meta-analysesMeta-analysis consortiumExcessive false positive ratesSignificance thresholdGene set enrichmentBenjamini-Yekutieli procedureModest-sized studiesFDR-controlling proceduresGlobal lipidsMeta-analysesPathway analysisGWASReplication studyP-valueIncreased discoveryMultiple testing strategiesSample sizePositive discoveriesBenjamini-HochbergLipid levelsTesting strategiesDownstream workFDR
2018
Foetal ultrasound measurement imputations based on growth curves versus multiple imputation chained equation (MICE)
Ferguson K, Yu Y, Cantonwine D, McElrath T, Meeker J, Mukherjee B. Foetal ultrasound measurement imputations based on growth curves versus multiple imputation chained equation (MICE). Paediatric And Perinatal Epidemiology 2018, 32: 469-473. PMID: 30016545, PMCID: PMC6939297, DOI: 10.1111/ppe.12486.Peer-Reviewed Original ResearchConceptsLinear mixed modelsComplete-case analysisMultiple imputationEpidemiological studies of risk factorsImputed datasetsComplete-caseDemographic factorsStudy of risk factorsLIFECODES birth cohortUltrasound measurementsCalculate associationsBirth cohortCross-sectionEpidemiological studiesRisk factorsStudy visitsLongitudinal analysisParametric linear mixed modelImputationMissing dataMixed modelsLongitudinal measurementsSample sizeCovariate dataGrowth restriction
2016
Increasing efficiency for estimating treatment–biomarker interactions with historical data
Boonstra P, Taylor J, Mukherjee B. Increasing efficiency for estimating treatment–biomarker interactions with historical data. Statistical Methods In Medical Research 2016, 25: 2959-2971. PMID: 24855118, PMCID: PMC5450810, DOI: 10.1177/0962280214535370.Peer-Reviewed Original Research
2013
Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons
Sun Z, Tao Y, Li S, Ferguson K, Meeker J, Park S, Batterman S, Mukherjee B. Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons. Environmental Health 2013, 12: 85. PMID: 24093917, PMCID: PMC3857674, DOI: 10.1186/1476-069x-12-85.Peer-Reviewed Original ResearchConceptsMultipollutant modelsHealth impacts of environmental factorsEffect estimatesExposure-response associationsExposure to multiple pollutantsTime series designConsequence of environmental exposureSample sizeHealth impactsEnvironmental exposuresPresence of multicollinearityRisk predictionPotential interactive effectsInitial screeningPollutant mixturesImpact of environmental factorsSupervised principal component analysisModel dimensionsStatistical literatureData examplesTree-based methodsMultiple pollutantsVariable selectionSimulation studyReduce model dimensionEnvironmental Confounding in Gene-Environment Interaction Studies
Vanderweele T, Ko Y, Mukherjee B. Environmental Confounding in Gene-Environment Interaction Studies. American Journal Of Epidemiology 2013, 178: 144-152. PMID: 23821317, PMCID: PMC3698991, DOI: 10.1093/aje/kws439.Peer-Reviewed Original ResearchConceptsGene-environment independenceGene-environment interaction studiesGene-environment interactionsEnvironmental confoundersGenetic factorsJoint testGene-environmentGenetic effectsEnvironmental factorsConfounding variablesConfoundingInteraction studiesSimulation studyJoint nullSample sizeBias estimatesFactorsIndependenceStudyTest
2006
A Score Test for Determining Sample Size in Matched Case‐Control Studies with Categorical Exposure
Sinha S, Mukherjee B. A Score Test for Determining Sample Size in Matched Case‐Control Studies with Categorical Exposure. Biometrical Journal 2006, 48: 35-53. PMID: 16544811, DOI: 10.1002/bimj.200510200.Peer-Reviewed Original ResearchConceptsCase-control studyCategorical exposureMatched case-control studyScore testDichotomous exposureNull hypothesisExposure variablesOdds ratioNatural orderDisease-gene associationsMatched setsDisease riskColorectal cancerPower functionSample sizeAssociationOddsGeneralizationDiseaseSetsScoresEstimationExposureStudyRisk
2004
Bayesian Semiparametric Modeling for Matched Case–Control Studies with Multiple Disease States
Sinha S, Mukherjee B, Ghosh M. Bayesian Semiparametric Modeling for Matched Case–Control Studies with Multiple Disease States. Biometrics 2004, 60: 41-49. PMID: 15032772, DOI: 10.1111/j.0006-341x.2004.00169.x.Peer-Reviewed Original ResearchConceptsSemiparametric Bayesian frameworkBayesian semiparametric modelSemiparametric modelDirichlet processStratum effectsConditional likelihoodProbability of disease developmentBayesian approachNumerical integration schemeBayesian frameworkSample sizeDirichletActual estimationMLEMissingnessMarkovIntegration schemeExposure distributionBayesianEstimationRegression modelsMultiple disease statesDistributionProbabilityDisease states