2022
An unbiased kinship estimation method for genetic data analysis
Jiang W, Zhang X, Li S, Song S, Zhao H. An unbiased kinship estimation method for genetic data analysis. BMC Bioinformatics 2022, 23: 525. PMID: 36474154, PMCID: PMC9727941, DOI: 10.1186/s12859-022-05082-2.Peer-Reviewed Original ResearchConceptsRigorous mathematical proofGenetic data analysisReal data analysisUnbiased estimation methodEstimation methodIndividual-level genotype dataSample correlation coefficientMathematical proofMathematical derivationMean square errorCoefficient estimationMatrix methodEstimation accuracyEstimation biasHeritability estimationRoot mean square errorData analysisSquare errorAccurate estimatesEstimationUKINVariances of genotypesSpurious associationsKinship coefficientsEstimatesA Zero-Inflated Logistic Normal Multinomial Model for Extracting Microbial Compositions
Zeng Y, Pang D, Zhao H, Wang T. A Zero-Inflated Logistic Normal Multinomial Model for Extracting Microbial Compositions. Journal Of The American Statistical Association 2022, 118: 2356-2369. DOI: 10.1080/01621459.2022.2044827.Peer-Reviewed Original ResearchMaximum likelihood estimationEfficient iterative algorithmProbabilistic PCA modelsEmpirical Bayes approachApproximation estimatorVariational approximationExcessive zerosM-estimationAsymptotic normalityIterative algorithmLikelihood estimationBayes approachCount dataHigh dimensionalityRaw count dataMultinomial modelExtensive simulationsZerosSupplementary materialMicrobiome dataCompositional natureEstimationPCA modelComposition estimationApproximationVariance estimation and confidence intervals from genome-wide association studies through high-dimensional misspecified mixed model analysis
Dao C, Jiang J, Paul D, Zhao H. Variance estimation and confidence intervals from genome-wide association studies through high-dimensional misspecified mixed model analysis. Journal Of Statistical Planning And Inference 2022, 220: 15-23. PMID: 37089275, PMCID: PMC10121196, DOI: 10.1016/j.jspi.2022.01.003.Peer-Reviewed Original Research
2019
Sparse principal component analysis with missing observations
Park S, Zhao H. Sparse principal component analysis with missing observations. The Annals Of Applied Statistics 2019, 13: 1016-1042. DOI: 10.1214/18-aoas1220.Peer-Reviewed Original ResearchHigh-dimensional settingsPrincipal subspaceStep estimation procedureRate of convergenceSparse principal component analysisDimensional settingSimulated examplesMissing observationsStatistical methodsEstimation procedureSparse PCA methodsSingle-cell dataSubspacePCA methodSingle-cell RNA-sequencing dataNumber of featuresCompetitive performancePrincipal component analysisConvergenceSample sizeEstimationWide rangeComponent analysisPrediction Analysis for Microbiome Sequencing Data
Wang T, Yang C, Zhao H. Prediction Analysis for Microbiome Sequencing Data. Biometrics 2019, 75: 875-884. PMID: 30994187, DOI: 10.1111/biom.13061.Peer-Reviewed Original ResearchConceptsMonte Carlo expectation-maximization algorithmInverse regression modelReal data exampleTypes of covariatesNew statistical frameworkMaximum likelihood estimationExpectation-maximization algorithmDimension reduction structureInverse regressionStatistical frameworkData examplesStatistical challengesLikelihood estimationMicrobiome sequencing dataHuman microbiome studiesHuman microbiome compositionDifferent library sizesZerosPredictive analysisModelEstimationAlgorithmSimulationsRegression modelsFramework
2017
Graphical model selection with latent variables
Wu C, Zhao H, Fang H, Deng M. Graphical model selection with latent variables. Electronic Journal Of Statistics 2017, 11: 3485-3521. DOI: 10.1214/17-ejs1331.Peer-Reviewed Original ResearchGraphical model selectionModel selection consistencyEfficient ADMM algorithmSparse precision matrixGraphical modelsGaussian graphical modelsGenetical genomics dataSelection consistencyPenalized estimationStatistical inferencePrecision matrixLatent variablesParameter estimationTheoretical propertiesIdentifiability conditionsADMM algorithmModel selectionSimulation studyConditional dependenceEstimationTrace lossSuperior performanceEstimatorGraphVariables