Featured Publications
Statistical Inference for Association Studies Using Electronic Health Records: Handling Both Selection Bias and Outcome Misclassification
Beesley L, Mukherjee B. Statistical Inference for Association Studies Using Electronic Health Records: Handling Both Selection Bias and Outcome Misclassification. Biometrics 2020, 78: 214-226. PMID: 33179768, DOI: 10.1111/biom.13400.Peer-Reviewed Original ResearchConceptsElectronic health recordsHealth recordsElectronic health record data analysisElectronic health record settingsSelection biasMichigan Genomics InitiativeAssociation studiesEHR-linkedHealth researchInverse probability weighting methodStudy sampleEffect estimatesProbability weighting methodLack of representativenessType I errorSurvey sampling literatureStandard error estimatesGold standard labelsDisease statusError estimatesStatistical inferenceMisclassificationInference strategySampling literatureStandard labelsExploiting Gene-Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes-Type Shrinkage Estimator to Trade-Off Between Bias and Efficiency
Mukherjee B, Chatterjee N. Exploiting Gene-Environment Independence for Analysis of Case–Control Studies: An Empirical Bayes-Type Shrinkage Estimator to Trade-Off Between Bias and Efficiency. Biometrics 2007, 64: 685-694. PMID: 18162111, DOI: 10.1111/j.1541-0420.2007.00953.x.Peer-Reviewed Original ResearchConceptsGene-environment independenceShrinkage estimatorsLog odds ratio parametersCase-control dataGene-environment independence assumptionOdds ratio parametersCase-control estimatorsData-adaptive fashionData exampleProspective logistic regression analysisBinary exposureGene-environment associationsIndependence assumptionLogistic regression analysisCase-onlyMaximum likelihood frameworkEstimationSample sizeBinary genesRegression analysisChatterjeeExamplesWeighted averageAssumptions
2020
Methods to Account for Uncertainty in Latent Class Assignments When Using Latent Classes as Predictors in Regression Models, with Application to Acculturation Strategy Measures.
Elliott M, Zhao Z, Mukherjee B, Kanaya A, Needham B. Methods to Account for Uncertainty in Latent Class Assignments When Using Latent Classes as Predictors in Regression Models, with Application to Acculturation Strategy Measures. Epidemiology 2020, 31: 194-204. PMID: 31809338, PMCID: PMC7480960, DOI: 10.1097/ede.0000000000001139.Peer-Reviewed Original ResearchConceptsMeasurement error modelJoint modelRegression parametersLatent classesLikelihood-basedLatent class modelSimulation studyClass modelTwo-stage modelClassError modelPrimary interestAcculturation behaviorsMeasurement errorSouth Asian immigrantsLatent class analysisAsian immigrantsTrue classUncertaintyClass analysisEstimationStrategy measures
2017
Exposure enriched outcome dependent designs for longitudinal studies of gene–environment interaction
Sun Z, Mukherjee B, Estes J, Vokonas P, Park S. Exposure enriched outcome dependent designs for longitudinal studies of gene–environment interaction. Statistics In Medicine 2017, 36: 2947-2960. PMID: 28497531, PMCID: PMC5523112, DOI: 10.1002/sim.7332.Peer-Reviewed Original ResearchConceptsLongitudinal cohort studyCohort studyCase-only designLongitudinal studyG x E interactionNormative Aging StudyComplete-case analysisGene-environmentSampling designCase-controlVeterans AdministrationComplex human diseasesE interactionExposure informationAging StudyOutcome trajectoriesStratified samplingRetrospective genotypingIndividual exposureCovariate dataExposure effectsJoint effectsOutcomesTime-varying outcomeEnvironmental factors
2014
Testing departure from additivity in Tukey's model using shrinkage: application to a longitudinal setting
Ko Y, Mukherjee B, Smith J, Park S, Kardia S, Allison M, Vokonas P, Chen J, Diez‐Roux A. Testing departure from additivity in Tukey's model using shrinkage: application to a longitudinal setting. Statistics In Medicine 2014, 33: 5177-5191. PMID: 25112650, PMCID: PMC4227925, DOI: 10.1002/sim.6281.Peer-Reviewed Original ResearchMeSH KeywordsAgedAged, 80 and overAgingAtherosclerosisBone and BonesComputer SimulationEnvironmental ExposureEthnicityFemaleGene-Environment InteractionHumansIronLeadLeast-Squares AnalysisLikelihood FunctionsLongitudinal StudiesMaleMiddle AgedModels, GeneticUnited StatesUnited States Department of Veterans AffairsConceptsGene-environment interactionsMulti-Ethnic Study of AtherosclerosisModel of gene-environment interactionMulti-Ethnic StudyTukey's modelLongitudinal settingStudy of AtherosclerosisNormative Aging StudyCase-control studyIncreasing categoriesAging StudyTested interactionsLongitudinal studyCategorical variablesRobust to misspecificationInteraction termsTest departuresShrinkage estimatorsWald testInteraction estimatesIncreased powerOne-degree-of-freedom modelInteraction effectsSetsEnvironmental markers
2013
Bayesian Analysis of Time-Series Data under Case-Crossover Designs: Posterior Equivalence and Inference
Li S, Mukherjee B, Batterman S, Ghosh M. Bayesian Analysis of Time-Series Data under Case-Crossover Designs: Posterior Equivalence and Inference. Biometrics 2013, 69: 925-936. PMID: 24289144, PMCID: PMC4108592, DOI: 10.1111/biom.12102.Peer-Reviewed Original ResearchConceptsSemi-parametric Bayesian approachLikelihood-based approachRandom nuisance parametersTime series analysisFrequentist literatureNuisance parametersDirichlet processInferential issuesConditional likelihoodPosterior distributionRisk functionTime seriesBayesian workFrequentist approachCase-crossover designSimulation studyRestrictive assumptionsBayesian approachTime Series DataLikelihood formulationBayesian methodsEquivalent resultsBayesian analysisCase-crossoverBayesian frameworkNovel Likelihood Ratio Tests for Screening Gene‐Gene and Gene‐Environment Interactions With Unbalanced Repeated‐Measures Data
Ko Y, Saha‐Chaudhuri P, Park S, Vokonas P, Mukherjee B. Novel Likelihood Ratio Tests for Screening Gene‐Gene and Gene‐Environment Interactions With Unbalanced Repeated‐Measures Data. Genetic Epidemiology 2013, 37: 581-591. PMID: 23798480, PMCID: PMC4009698, DOI: 10.1002/gepi.21744.Peer-Reviewed Original ResearchConceptsGene-environment interactionsGene-gene interactionsTesting gene-gene interactionsModel gene-gene interactionsRepeated-measures studyLongitudinal cohort studyNormative Aging StudyCumulative lead exposureCase-control studyGene-environmentGene-geneType I error rateCohort studyScreening toolAging StudyLikelihood ratio testMain effectEpistasis patternsRatio testLead exposureHemochromatosis genePower propertiesPulse pressureRegression-based approachRestrictive assumptions
2012
Point source modeling of matched case–control data with multiple disease subtypes
Li S, Mukherjee B, Batterman S. Point source modeling of matched case–control data with multiple disease subtypes. Statistics In Medicine 2012, 31: 3617-3637. PMID: 22826092, PMCID: PMC4331356, DOI: 10.1002/sim.5388.Peer-Reviewed Original ResearchConceptsAdjacent-category logit modelMarkov chain Monte Carlo techniquesEvaluate maximum likelihoodExtensive simulation studyProfile likelihoodHierarchical Bayesian approachCase-control dataSimulation studyBayesian approachMonte Carlo techniqueBayesian methodsMaximum likelihoodMultiple disease subtypesCategorical outcomesCovariate adjustmentNonlinear modelEstimation stabilityMedicaid claims dataCase-control designPediatric asthma populationAsthma populationElevated oddsMarkovLogit modelCovariatesOn the equivalence of posterior inference based on retrospective and prospective likelihoods: application to a case‐control study of colorectal cancer
Ghosh M, Song J, Forster J, Mitra R, Mukherjee B. On the equivalence of posterior inference based on retrospective and prospective likelihoods: application to a case‐control study of colorectal cancer. Statistics In Medicine 2012, 31: 2196-2208. PMID: 22495822, DOI: 10.1002/sim.5358.Peer-Reviewed Original ResearchConceptsPosterior inferenceCase-control study of colorectal cancerOdds ratio parametersCategorical response dataBayesian analysis of dataStudy of colorectal cancerCase-control studyGeneral classProspective likelihoodSimulation studyCategorical responsesBayesian analysisColorectal cancerMatched case-control studyInferenceAnalysis of dataResponse dataPriorsRetrospective designRetrospective modelEquivalenceLikelihood‐based methods for regression analysis with binary exposure status assessed by pooling
Lyles R, Tang L, Lin J, Zhang Z, Mukherjee B. Likelihood‐based methods for regression analysis with binary exposure status assessed by pooling. Statistics In Medicine 2012, 31: 2485-2497. PMID: 22415630, PMCID: PMC3528351, DOI: 10.1002/sim.4426.Peer-Reviewed Original ResearchConceptsPopulation-based case-control study of colorectal cancerCase-control study of colorectal cancerPopulation-based case-control studyStudy of colorectal cancerExposure statusBinary outcomesRegression modelsCase-control sampleLogistic regression modelsGene-disease associationsObserved binary outcomeStudy designEpidemiological studiesColorectal cancerAssess exposureMaximum likelihood analysisRegression analysisLikelihood-based methodsExposure assessmentMaximum likelihood approachLikelihood approachCross-sectionSimulation studyOutcomesLikelihood analysisA Bayesian Semiparametric Approach for Incorporating Longitudinal Information on Exposure History for Inference in Case–Control Studies
Bhadra D, Daniels M, Kim S, Ghosh M, Mukherjee B. A Bayesian Semiparametric Approach for Incorporating Longitudinal Information on Exposure History for Inference in Case–Control Studies. Biometrics 2012, 68: 361-370. PMID: 22313248, PMCID: PMC3935236, DOI: 10.1111/j.1541-0420.2011.01686.x.Peer-Reviewed Original ResearchConceptsBayesian semiparametric approachSemiparametric approachCase-control studyReversible jump Markov chain Monte Carlo algorithmMarkov chain Monte Carlo algorithmMeasures of cumulative exposureLongitudinal biomarker informationMonte Carlo algorithmClinically meaningful estimatesSmooth functionsCase-control study of prostate cancerWeighted integralsCumulative exposureInfluence functionJoint likelihoodLikelihood formulationExposure historyStudy of prostate cancerDisease risk modelsHierarchical Bayesian frameworkDisease statusBayesian frameworkCase-controlRisk modelCohort study
2011
High Risk of Colorectal and Endometrial Cancer in Ashkenazi Families With the MSH2 A636P Founder Mutation
Mukherjee B, Rennert G, Ahn J, Dishon S, Lejbkowicz F, Rennert H, Shiovitz S, Moreno V, Gruber S. High Risk of Colorectal and Endometrial Cancer in Ashkenazi Families With the MSH2 A636P Founder Mutation. Gastroenterology 2011, 140: 1919-1926. PMID: 21419771, PMCID: PMC4835182, DOI: 10.1053/j.gastro.2011.02.071.Peer-Reviewed Original ResearchMeSH KeywordsAdultAge FactorsAgedAged, 80 and overCase-Control StudiesColorectal Neoplasms, Hereditary NonpolyposisEndometrial NeoplasmsFemaleFounder EffectGene FrequencyGenetic Predisposition to DiseaseGenetic TestingHeredityHumansIsraelJewsLikelihood FunctionsMaleMass ScreeningMiddle AgedMutationMutS Homolog 2 ProteinPedigreePenetrancePhenotypeProportional Hazards ModelsRegistriesRisk AssessmentRisk FactorsSex FactorsYoung AdultConceptsRisk of colorectal cancerHazard ratioColorectal cancerCumulative riskPopulation-basedLifetime risk of colorectal cancerCumulative risk of colorectal cancerEstimates of colorectal cancerAge-specific cumulative riskHigh risk of colorectalCases of colorectal cancerModified segregation analysisRisk of colorectalClinical genetics servicesClinic-based sampleEndometrial cancerRisk of ECCase-control studyGenetic servicesLynch syndromeCancer screeningEC riskLifetime riskAshkenazi familiesEstimated penetrance
2009
Shrinkage estimation for robust and efficient screening of single‐SNP association from case‐control genome‐wide association studies
Luo S, Mukherjee B, Chen J, Chatterjee N. Shrinkage estimation for robust and efficient screening of single‐SNP association from case‐control genome‐wide association studies. Genetic Epidemiology 2009, 33: 740-750. PMID: 19434716, PMCID: PMC3103068, DOI: 10.1002/gepi.20428.Peer-Reviewed Original ResearchMeSH KeywordsCase-Control StudiesComputational BiologyComputer SimulationData Interpretation, StatisticalFalse Positive ReactionsGenetic MarkersGenomeGenome, HumanGenome-Wide Association StudyGenotypeHumansLikelihood FunctionsModels, StatisticalPolymorphism, Single NucleotideReproducibility of ResultsConceptsHardy-Weinberg equilibriumAssociation TestPopulation-based case-control designGenome-wide association scanGenome-wide association studiesSingle-SNP associationsCase-control designCase-control studyAssociation scansAssociation studiesGenetic markersSusceptibility SNPsRecessive effectUnderlying populationAssociationFalse-positive resultsEfficient screeningSNPsRare diseaseShrinkage estimatorsSimulation studyStudyTestTwo-degrees-of-freedomPopulation
2008
Semiparametric Bayesian modeling of random genetic effects in family‐based association studies
Zhang L, Mukherjee B, Hu B, Moreno V, Cooney K. Semiparametric Bayesian modeling of random genetic effects in family‐based association studies. Statistics In Medicine 2008, 28: 113-139. PMID: 18792083, PMCID: PMC2684653, DOI: 10.1002/sim.3413.Peer-Reviewed Original ResearchConceptsRandom effects distributionRandom effects parametersBayesian approachProblem of estimating covarianceSensitive to parametric specificationSemiparametric Bayesian modelNonparametric Bayesian approachFixed covariate effectsFlexible Bayesian approachEffective distributionIntegrated likelihoodDirichlet processCovariate effectsNonparametric modelBayesian paradigmParametric specificationHierarchical Bayesian paradigmBayes methodologyInference problemSimulation studyRandom genetic effectsComputational advantagesCorrelation structureNumerical integration schemeTheoretical senseFitting stratified proportional odds models by amalgamating conditional likelihoods
Mukherjee B, Ahn J, Liu I, Rathouz P, Sánchez B. Fitting stratified proportional odds models by amalgamating conditional likelihoods. Statistics In Medicine 2008, 27: 4950-4971. PMID: 18618428, PMCID: PMC3085191, DOI: 10.1002/sim.3325.Peer-Reviewed Original ResearchConceptsNuisance parametersConditional likelihoodProportional odds modelStratum-specific nuisance parametersCumulative logit modelStratum-specific interceptsGeneral regression frameworkMultiple ordered categoriesOdds modelContinuous covariatesSandwich estimatorData examplesBinary exposureRobust sandwich estimatorLikelihood principleProportional oddsStandard softwareRegression frameworkNatural choiceOutcome modelEstimationClassical methodsStratified dataLogistic regression modelsRandom-effects modelInference of the Haplotype Effect in a Matched Case-Control Study Using Unphased Genotype Data
Sinha S, Gruber S, Mukherjee B, Rennert G. Inference of the Haplotype Effect in a Matched Case-Control Study Using Unphased Genotype Data. The International Journal Of Biostatistics 2008, 4: article 6. PMID: 20231916, PMCID: PMC2835450, DOI: 10.2202/1557-4679.1079.Peer-Reviewed Original ResearchConceptsCase-control studyUnphased genotype dataHardy-Weinberg equilibriumLocus-specific genotype dataGenotype dataBeta-Carotene Cancer Prevention StudyCancer Prevention StudyCase-control study designStudy of breast cancer patientsMatched case-control studyCase-control designPhasing of haplotypesDisease risk modelsBreast cancer patientsPrevention StudyHaplotype effectsStudy designGametic phasePolymorphic lociHaplotype frequenciesCancer patientsLociConditional likelihood approachAssociationHaplotypes
2007
Analysis of matched case–control data with multiple ordered disease states: possible choices and comparisons
Mukherjee B, Liu I, Sinha S. Analysis of matched case–control data with multiple ordered disease states: possible choices and comparisons. Statistics In Medicine 2007, 26: 3240-3257. PMID: 17206600, DOI: 10.1002/sim.2790.Peer-Reviewed Original ResearchMeSH KeywordsCase-Control StudiesData Interpretation, StatisticalDisease ProgressionHealth StatusHumansLikelihood FunctionsLogistic ModelsOdds RatioUnited StatesConceptsConditional logistic regressionStratum-specific nuisance parametersCase-control dataAdjacent-category logit modelCase-control studyOrdered categorical dataConditional-likelihood approachLikelihood-based approachNuisance parametersProportional-odds modelCumulative logitsSimulation studyAnalyse such dataMantel-Haenszel approachCumulative logit modelNatural orderPotential risk factorsStages of cancerReference categoryCategorical dataLogistic regressionOrdinal natureEffect of potential risk factorsLow birthweightRisk factors
2004
Bayesian Semiparametric Modeling for Matched Case–Control Studies with Multiple Disease States
Sinha S, Mukherjee B, Ghosh M. Bayesian Semiparametric Modeling for Matched Case–Control Studies with Multiple Disease States. Biometrics 2004, 60: 41-49. PMID: 15032772, DOI: 10.1111/j.0006-341x.2004.00169.x.Peer-Reviewed Original ResearchConceptsSemiparametric Bayesian frameworkBayesian semiparametric modelSemiparametric modelDirichlet processStratum effectsConditional likelihoodProbability of disease developmentBayesian approachNumerical integration schemeBayesian frameworkSample sizeDirichletActual estimationMLEMissingnessMarkovIntegration schemeExposure distributionBayesianEstimationRegression modelsMultiple disease statesDistributionProbabilityDisease states