Featured Publications
Statistical Inference for Association Studies Using Electronic Health Records: Handling Both Selection Bias and Outcome Misclassification
Beesley L, Mukherjee B. Statistical Inference for Association Studies Using Electronic Health Records: Handling Both Selection Bias and Outcome Misclassification. Biometrics 2020, 78: 214-226. PMID: 33179768, DOI: 10.1111/biom.13400.Peer-Reviewed Original ResearchConceptsElectronic health recordsHealth recordsElectronic health record data analysisElectronic health record settingsSelection biasMichigan Genomics InitiativeAssociation studiesEHR-linkedHealth researchInverse probability weighting methodStudy sampleEffect estimatesProbability weighting methodLack of representativenessType I errorSurvey sampling literatureStandard error estimatesGold standard labelsDisease statusError estimatesStatistical inferenceMisclassificationInference strategySampling literatureStandard labelsAssociation of Polygenic Risk Scores for Multiple Cancers in a Phenome-wide Study: Results from The Michigan Genomics Initiative
Fritsche L, Gruber S, Wu Z, Schmidt E, Zawistowski M, Moser S, Blanc V, Brummett C, Kheterpal S, Abecasis G, Mukherjee B. Association of Polygenic Risk Scores for Multiple Cancers in a Phenome-wide Study: Results from The Michigan Genomics Initiative. American Journal Of Human Genetics 2018, 102: 1048-1061. PMID: 29779563, PMCID: PMC5992124, DOI: 10.1016/j.ajhg.2018.04.001.Peer-Reviewed Original ResearchConceptsPolygenic risk scoresElectronic health recordsAssociations of polygenic risk scoresPhenome-wide significant associationsPolygenic risk score associationsLongitudinal biorepository effortNon-cancer diagnosesPatients' electronic health recordsPhenome-wide association studyAnalysis of temporal orderMichigan Genomics InitiativeRisk scoreAssociated with multiple phenotypesFemale breast cancerNHGRI-EBI CatalogRisk profileGenetic risk profilesMeasures of genomic variationCancer traitsCase-control studyPheWAS analysisHealth recordsHealth systemMichigan MedicineCancer diagnosisA framework for understanding selection bias in real-world healthcare data
Kundu R, Shi X, Morrison J, Barrett J, Mukherjee B. A framework for understanding selection bias in real-world healthcare data. Journal Of The Royal Statistical Society Series A (Statistics In Society) 2024, 187: 606-635. PMID: 39281782, PMCID: PMC11393555, DOI: 10.1093/jrsssa/qnae039.Peer-Reviewed Original ResearchElectronic health recordsSelection biasAssociation of cancerMultiple sources of biasHealth recordsHealthcare systemSources of biasReal-world healthcare dataBinary outcomesEstimation of associated parametersHealthcare dataReal-world dataPotential biasSample sizeStandard errorData exampleVariance formulaAnalysis of real-world dataAssociationSimulation studyWeighting approachBiological sexAssociated parametersBiasMultiple sources
2024
PATIENT RECRUITMENT USING ELECTRONIC HEALTH RECORDS UNDER SELECTION BIAS: A TWO-PHASE SAMPLING FRAMEWORK.
Zhang G, Beesley L, Mukherjee B, Shi X. PATIENT RECRUITMENT USING ELECTRONIC HEALTH RECORDS UNDER SELECTION BIAS: A TWO-PHASE SAMPLING FRAMEWORK. The Annals Of Applied Statistics 2024, 18: 1858-1878. PMID: 39149424, PMCID: PMC11323140, DOI: 10.1214/23-aoas1860.Peer-Reviewed Original Research
2023
Using Multi-Modal Electronic Health Record Data for the Development and Validation of Risk Prediction Models for Long COVID Using the Super Learner Algorithm
Jin W, Hao W, Shi X, Fritsche L, Salvatore M, Admon A, Friese C, Mukherjee B. Using Multi-Modal Electronic Health Record Data for the Development and Validation of Risk Prediction Models for Long COVID Using the Super Learner Algorithm. Journal Of Clinical Medicine 2023, 12: 7313. PMID: 38068365, PMCID: PMC10707399, DOI: 10.3390/jcm12237313.Peer-Reviewed Original ResearchComposite risk scoreRisk scoreElectronic health recordsAnalyses identified several factorsValidation of risk prediction modelsModerate discriminatory abilityRisk prediction modelPost-acute sequelae of COVID-19Health recordsCombined risk scorePost-acuteIdentification of individualsPrevention effortsSuper Learner algorithmMedical recordsHealthcare challengesPublic healthMedical phenotypesCOVID-19Increased riskPredictive factorsCOVID-19 infectionRecord DataPost-acute sequelaeHigh riskCohort profile: Epidemiologic Questionnaire (EPI-Q) – a scalable, app-based health survey linked to electronic health record and genotype data
Salvatore M, Clark-Boucher D, Fritsche L, Ortlieb J, Houghtby J, Driscoll A, Caldwell-Larkins B, Smith J, Brummett C, Kheterpal S, Lisabeth L, Mukherjee B. Cohort profile: Epidemiologic Questionnaire (EPI-Q) – a scalable, app-based health survey linked to electronic health record and genotype data. Epidemiology And Health 2023, 45: e2023074. PMID: 37591787, PMCID: PMC10867525, DOI: 10.4178/epih.e2023074.Peer-Reviewed Original ResearchConceptsElectronic health recordsHealth recordsSelf-reported health dataFamily health historyEpidemiological questionnaireCancer screeningHealth cohortHealth SurveyHealth historyFinancial toxicityBaseline surveyEHR dataHealth dataCohort dataEPI-QAverage ageOccupational exposureGenotype dataParticipantsGenotype informationInstitutional review board approvalResponse rateCohortLife meaningQuestionnaire
2022
Case studies in bias reduction and inference for electronic health record data with selection bias and phenotype misclassification
Beesley L, Mukherjee B. Case studies in bias reduction and inference for electronic health record data with selection bias and phenotype misclassification. Statistics In Medicine 2022, 41: 5501-5516. PMID: 36131394, PMCID: PMC9826451, DOI: 10.1002/sim.9579.Peer-Reviewed Original ResearchConceptsElectronic health recordsElectronic health record data analysisElectronic health record settingsLeverages external data sourcesElectronic health record dataPopulation-based data sourcesEHR-based researchLongitudinal health informationUniversity of Michigan Health SystemHealth record dataSelection biasPopulation-based researchMichigan Health SystemMultiple sources of biasFactors related to selectionPatient-level dataHealth recordsHealth systemHealth informationPhenotype misclassificationSummary estimatesPhenotyping errorsCancer diagnosisSources of biasRecord dataEstimating COVID-19 Vaccination and Booster Effectiveness Using Electronic Health Records From an Academic Medical Center in Michigan
Roberts E, Gu T, Wagner A, Mukherjee B, Fritsche L. Estimating COVID-19 Vaccination and Booster Effectiveness Using Electronic Health Records From an Academic Medical Center in Michigan. AJPM Focus 2022, 1: 100015. PMID: 36942016, PMCID: PMC9323299, DOI: 10.1016/j.focus.2022.100015.Peer-Reviewed Original ResearchIntensive care unit admissionElectronic health record dataHealth record dataElectronic health recordsMedical CenterUnit admissionAcademic medical centerOdds of vaccinationHealth recordsSevere COVID-19 outcomesAffluent areasHealthcare workersStudy designRecord dataCalendar quarterCOVID-19COVID-19 outcomesDisease overallUniversity of Michigan Medical CenterObservational studySevere COVID-19SARS-CoV-2 infectionVaccine effectivenessBooster statusOngoing surveillanceAssessing the added value of linking electronic health records to improve the prediction of self-reported COVID-19 testing and diagnosis
Clark-Boucher D, Boss J, Salvatore M, Smith J, Fritsche L, Mukherjee B. Assessing the added value of linking electronic health records to improve the prediction of self-reported COVID-19 testing and diagnosis. PLOS ONE 2022, 17: e0269017. PMID: 35877617, PMCID: PMC9312965, DOI: 10.1371/journal.pone.0269017.Peer-Reviewed Original ResearchConceptsElectronic health recordsHealth recordsCOVID-19-related outcomesCOVID-19 testingSurvey respondentsSelf-reported outcomesSelf-reported dataCOVID-19 outcomesElectronic recordsSurvey dataCOVID-19Prediction modelModel contextSurveyCOVID-19 diagnosisOutcomesPredictor variablesDigital surveyData sourcesCoronavirus disease 2019RespondentsPredictorsCOVID-19 casesDiagnosisRecordsPolygenic Liability to Depression Is Associated With Multiple Medical Conditions in the Electronic Health Record: Phenome-wide Association Study of 46,782 Individuals
Fang Y, Fritsche L, Mukherjee B, Sen S, Richmond-Rakerd L. Polygenic Liability to Depression Is Associated With Multiple Medical Conditions in the Electronic Health Record: Phenome-wide Association Study of 46,782 Individuals. Biological Psychiatry 2022, 92: 923-931. PMID: 35965108, PMCID: PMC10712651, DOI: 10.1016/j.biopsych.2022.06.004.Peer-Reviewed Original ResearchConceptsPhenome-wide association studyPolygenic risk scoresMDD PRSHealth recordsRisk scoreAssociation studiesGenome-wide polygenic risk scoreAssociated with multiple medical conditionsMeasures of genetic riskMichigan Genomics InitiativePsychiatric traitsElectronic health recordsEuropean ancestry participantsMajor depressive disorderAssociated with tobacco use disorderTests of associationMultiple medical conditionsGenitourinary conditionsTobacco use disorderDisease-associated disabilityMolecular genetic toolsMolecular genetic discoveriesPsychiatric disease categoriesHealth outcomesSubstance-related disorders
2021
On cross-ancestry cancer polygenic risk scores
Fritsche L, Ma Y, Zhang D, Salvatore M, Lee S, Zhou X, Mukherjee B. On cross-ancestry cancer polygenic risk scores. PLOS Genetics 2021, 17: e1009670. PMID: 34529658, PMCID: PMC8445431, DOI: 10.1371/journal.pgen.1009670.Peer-Reviewed Original ResearchConceptsPolygenic risk scoresGenome-wide association studiesProstate cancer polygenic risk scoresPolygenic risk score distributionRecruitment of diverse participantsAncestry groupsPolygenic risk score methodsRisk scoreNon-genetic risk factorsElectronic health recordsBreast cancer casesHealth recordsUK BiobankGWAS effortsDisease risk assessmentCancer casesAssociation studiesGenetic dataEuropean ancestryPersonalized risk stratificationSummary statisticsRisk factorsAncestryDiverse participantsField of cancer
2020
Phenotype risk scores (PheRS) for pancreatic cancer using time-stamped electronic health record data: Discovery and validation in two large biobanks
Salvatore M, Beesley L, Fritsche L, Hanauer D, Shi X, Mondul A, Pearce C, Mukherjee B. Phenotype risk scores (PheRS) for pancreatic cancer using time-stamped electronic health record data: Discovery and validation in two large biobanks. Journal Of Biomedical Informatics 2020, 113: 103652. PMID: 33279681, PMCID: PMC7855433, DOI: 10.1016/j.jbi.2020.103652.Peer-Reviewed Original ResearchConceptsElectronic health recordsPolygenic risk scoresElectronic health record dataMichigan Genomics InitiativePhenotype risk scoreHigh-risk individualsPancreatic cancer diagnosisBody mass indexRisk scoreCancer diagnosisMedical phenomeUK Biobank (UKBHealth record dataSource of patient informationRisk predictionHypothesis-generating associationsDisease risk predictionHealth recordsUnadjusted associationsDrinking statusSmoking statusEpidemiological covariatesUKBPatient informationMultivariate associationsA Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank
Bi W, Fritsche L, Mukherjee B, Kim S, Lee S. A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank. American Journal Of Human Genetics 2020, 107: 222-233. PMID: 32589924, PMCID: PMC7413891, DOI: 10.1016/j.ajhg.2020.06.003.Peer-Reviewed Original ResearchConceptsControlled type I error ratesTime-to-event data analysisType I error rateGenetic studies of human diseasesGenome-wide significance levelTime-to-event phenotypesSaddlepoint approximationGenome-wide analysisEuropean ancestry samplesMinor allele frequencyStudy of human diseaseElectronic health recordsCox PH regression modelRegression modelsStandard Wald testProportional hazardsBinary phenotypesData analysisAncestry samplesGenetic studiesHealth recordsUK BiobankAllele frequenciesInpatient dataCox proportional hazardsAn analytic framework for exploring sampling and observation process biases in genome and phenome‐wide association studies using electronic health records
Beesley L, Fritsche L, Mukherjee B. An analytic framework for exploring sampling and observation process biases in genome and phenome‐wide association studies using electronic health records. Statistics In Medicine 2020, 39: 1965-1979. PMID: 32198773, DOI: 10.1002/sim.8524.Peer-Reviewed Original ResearchConceptsElectronic health recordsHealth recordsAssociation studiesObservational health care databasesElectronic health record dataLongitudinal biorepository effortPhenome-wide association studyMichigan Genomics InitiativeHealth record dataHealth care databasesDisease-gene association studiesMichigan Health SystemCare databaseHealth systemPhenotype misclassificationStudy biasRecord dataNonprobability samplingAssociation analysisData sourcesGenome InitiativeMisclassificationAnalysis approachRecordsSensitivity analysis
2019
The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities
Beesley L, Salvatore M, Fritsche L, Pandit A, Rao A, Brummett C, Willer C, Lisabeth L, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Statistics In Medicine 2019, 39: 773-800. PMID: 31859414, PMCID: PMC7983809, DOI: 10.1002/sim.8445.Peer-Reviewed Original ResearchConceptsElectronic health recordsHealth recordsMichigan Genomics InitiativeBiobank-based studiesHealth-related researchUK BiobankHealth researchDisease-gene associationsStudy designAgnostic searchBiobankDisease-treatmentInformatics infrastructureHypothesis-generating studyPhenotypic identificationGenome InitiativeMissing dataResource catalogExploratory questionsCurrent bodyBiobank researchData typesMedical researchRecruitment mechanismsPractical guidanceExploring various polygenic risk scores for skin cancer in the phenomes of the Michigan genomics initiative and the UK Biobank with a visual catalog: PRSWeb
Fritsche L, Beesley L, VandeHaar P, Peng R, Salvatore M, Zawistowski M, Taliun S, Das S, LeFaive J, Kaleba E, Klumpner T, Moser S, Blanc V, Brummett C, Kheterpal S, Abecasis G, Gruber S, Mukherjee B. Exploring various polygenic risk scores for skin cancer in the phenomes of the Michigan genomics initiative and the UK Biobank with a visual catalog: PRSWeb. PLOS Genetics 2019, 15: e1008202. PMID: 31194742, PMCID: PMC6592565, DOI: 10.1371/journal.pgen.1008202.Peer-Reviewed Original ResearchConceptsMichigan Genomics InitiativeElectronic health recordsPolygenic risk scoresSkin cancer subtypesPheWAS resultsUK BiobankElectronic health record dataLongitudinal biorepository effortPhenome-wide association studyRisk scoreHealth record dataUK Biobank dataPrediction of disease riskPublicly-available sourcesHealth recordsGenetic architectureBiobank dataMichigan MedicineRecord dataSecondary phenotypesDisease riskVisual catalogAssociation studiesGenome InitiativePheWAS