2013
Negligible impact of rare autoimmune-locus coding-region variants on missing heritability
Hunt KA, Mistry V, Bockett NA, Ahmad T, Ban M, Barker JN, Barrett JC, Blackburn H, Brand O, Burren O, Capon F, Compston A, Gough SC, Jostins L, Kong Y, Lee JC, Lek M, MacArthur DG, Mansfield JC, Mathew CG, Mein CA, Mirza M, Nutland S, Onengut-Gumuscu S, Papouli E, Parkes M, Rich SS, Sawcer S, Satsangi J, Simmonds MJ, Trembath RC, Walker NM, Wozniak E, Todd JA, Simpson MA, Plagnol V, van Heel DA. Negligible impact of rare autoimmune-locus coding-region variants on missing heritability. Nature 2013, 498: 232-235. PMID: 23698362, PMCID: PMC3736321, DOI: 10.1038/nature12170.Peer-Reviewed Original ResearchAdjusting for Background Mutation Frequency Biases Improves the Identification of Cancer Driver Genes
Evans P, Avey S, Kong Y, Krauthammer M. Adjusting for Background Mutation Frequency Biases Improves the Identification of Cancer Driver Genes. IEEE Transactions On NanoBioscience 2013, 12: 150-157. PMID: 23694700, PMCID: PMC3989533, DOI: 10.1109/tnb.2013.2263391.Peer-Reviewed Original ResearchConceptsMore non-synonymous mutationsMutation frequencyTumor sequencing projectsGene-specific mannerCancer driver genesNon-synonymous mutationsSynonymous mutation ratioMutation biasSequencing projectsBackground mutation frequencyGene expressionDriver genesGenesTumor developmentMutation burdenMutation ratioHigher non-synonymous mutation burdenMutationsMutation countsExpressionBackground frequencyFrequency biasesIdentification
2012
Length distribution of sequencing by synthesis: fixed flow cycle model
Kong Y. Length distribution of sequencing by synthesis: fixed flow cycle model. Journal Of Mathematical Biology 2012, 67: 389-410. PMID: 22689207, DOI: 10.1007/s00285-012-0556-3.Peer-Reviewed Original Research
2009
Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants
Du J, Bjornson RD, Zhang ZD, Kong Y, Snyder M, Gerstein MB. Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants. PLOS Computational Biology 2009, 5: e1000432. PMID: 19593373, PMCID: PMC2700963, DOI: 10.1371/journal.pcbi.1000432.Peer-Reviewed Original ResearchConceptsDifferent read lengthsDifferent technologiesSemi-realistic simulationComputational complexityMaximum accuracyAssembly algorithmReconstruction efficiencySimulation toolboxPersonal genomicsAccurate detectionLow costChallenging stepTechnologyCostAlgorithmAccurate assemblyComplexitySmall enough scalesReconstructionGoalIndividual genomesCanonical problemImportant goalToolboxSimulationsCalculating complexity of large randomized libraries
Kong Y. Calculating complexity of large randomized libraries. Journal Of Theoretical Biology 2009, 259: 641-645. PMID: 19376134, DOI: 10.1016/j.jtbi.2009.04.008.Peer-Reviewed Original Research
2007
Generalized Correlation Functions and Their Applications in Selection of Optimal Multiple Spaced Seeds for Homology Search
Kong Y. Generalized Correlation Functions and Their Applications in Selection of Optimal Multiple Spaced Seeds for Homology Search. Journal Of Computational Biology 2007, 14: 238-254. PMID: 17456017, DOI: 10.1089/cmb.2006.0008.Peer-Reviewed Original ResearchConceptsGeneralized correlation functionCorrelation functionsHigher order approximationsGoulden–Jackson cluster methodHeuristic search methodsOrder approximationProbability qAverage propertiesSearch methodCluster methodLarge genomic dataProbability of occurrenceTheoretical backgroundMultiple seedsSpaced seedsPowerful methodOptimal seedApproximationEmpirical observationsNumber of wildcardsSet of patternsProbabilityProblemFunctionMatrix