# Jeffrey Townsend, PhD

## Research & Publications

## Biography

## News

## Locations

### Research Summary

**1. BIOINFORMATIC TOOLS FOR CANCER GENETICS AND EPIDEMIOLOGY**

Whole-exome sequencing has created tremendous potential for revealing the genetic basis and underlying molecular mechanisms of many forms of cancer. However, somatic mutations occur at a significant frequency within tumors of most cancer types, and identification of the mutations that are on the causative trajectory from normal tissue to cancerous tissue is challenging. We are making algorithmic advances in clustering across discrete linear sequences that facilitate maximum likelihood inference of model-averaged clustering in discrete linear sequences of somatic amino acid replacement mutations appearing within mutated genes, and applying evolutionary theory to the repeated evolution of cancer in whole-exome sequence data sets to reveal the level of clonal natural selection for cancer drivers.

**2. BIOSTATISTICAL ANALYSIS FOR NONLINEAR MATHEMATICAL MODELS OF THE EPIDEMIOLOGY OF DISEASE**

I am developing probabilistic statistical methodologies for the mathematical modeling of disease emergence and spread. For diverse reasons, data for estimation of epidemiological parameters is often sparse. Evaluating a model with the “best point estimate” of sparse data may convey a misleading certitude to policy makers basing decisions on deterministic models of disease outbreak, spread, and persistence. Conversely, policy makers who are aware that models are parameterized with limited data may be dismissive of deterministic predictions that yet have significant validity. We address these issues by probabilistic sensitivity analysis of parameters and full uncertainty analysis of outcomes of interest.

### Extensive Research Description

**1. TOOLS FOR CANCER GENETICS AND EPIDEMIOLOGY**

Whole-exome sequencing has created tremendous potential for revealing the genetic basis and underlying molecular mechanisms of many forms of cancer. However, somatic mutations occur at a significant frequency within tumors of most cancer types, and identification of the mutations that are on the causative trajectory from normal tissue to cancerous tissue is challenging. We are making algorithmic advances in clustering across discrete linear sequences to enact two powerful approaches to this identification. First, we are applying maximum likelihood approaches that we have developed for model-averaged clustering in discrete linear sequences to somatic amino acid replacement mutations appearing within mutated genes. Because amino acids of proteins that are functionally important are locally clustered in domains, mutations in multiple tumors that are functionally important to the development of cancer cluster in the linear sequence of relevant genes, allowing inference of relevance and function even in cases without three-dimensional protein structure. These clustering analyses have the power to demonstrate, for instance, cross-cancer consistency in the functional importance of the DNA binding domain of tumor suppressor p53, whether in a cancer with extensive exome data (ovarian serous adenocarcinoma) or in a cancer with much less extensive exome data (e.g. rectal adenocarcinoma).

Second, we are applying evolutionary theory to the problem of identification of the genetic architecture of underlying cancer development. The path from normal to cancerous tissue is navigated by an evolutionary process. Tools from evolutionary theory have the potential to parse those mutations that are selected within cells on the path to cancer from those mutations that arise incidentally during the somatic evolution of cancer. The theory we are applying makes use of differences in expectation for synonymous and replacement mutations. Synonymous mutations are expected to have no functional impact; thus they yield a proxy expectation for the “incidental” mutations, whereas carcinogenic replacement mutations will spread within tumors more frequently and are clustered within gene sequence. Our theory also employs human population polymorphism data, which most evolutionary biologists believe can be largely assumed to be neutral. This data facilitates calibration of the probable impact of replacement changes to sequence conservation by eliminating the confounding variable of the degree of purifying selection, which decreases the number of mutations observed in some genes and allows others to accumulate many mutations with little impact.

We are extending this approach to estimating selection intensity on mutations along the trajectory toward cancer, revealing the level of selection within tumors for replacement mutations compared to synonymous mutations. This evolutionary analysis is ideal for detecting the history of selection on sites within genes during the evolution of cancer from exome sequencing data. These sites, particularly when representing gain-of-function mutations, will help identify candidate loci for pharmacological intervention. This approach will be applied to identify targets for pharmacological intervention and design “personal genomics” drugs appropriate for the genetics of individual cancers in individual patients. As a component of that project, we are constructing an “active-experiment” cancer exome database to facilitate further bioinformatics investigation of cancer exome data.

2. BIOSTATISTICAL ANALYSIS FOR NONLINEAR MATHEMATICAL MODELS OF THE EPIDEMIOLOGY OF DISEASE

I am developing probabilistic statistical methodologies for the mathematical modeling of disease emergence and spread. Robustness of models has usually been assessed by techniques that explore the relative impact and importance of parameters upon the mathematical behavior of the function and the mathematical predictions of the model. For diverse reasons including the difficulty or cost of acquisition, restrictions due to privacy, and urgency of analysis in the case of outbreaks, data for estimation of epidemiological parameters is often sparse. Evaluating a model with the “best point estimate” of sparse data may convey a misleading certitude to policy makers basing decisions on deterministic models of disease outbreak, spread, and persistence. Conversely, policy makers who are aware that models are parameterized with limited data may be dismissive of deterministic predictions that yet have significant validity. These issues may be most straightforwardly addressed by probabilistic sensitivity analysis of parameters and full uncertainty analysis of outcomes of interest. These analyses amount to accommodating the uncertainty of parameters directly into an analysis by probabilistically resampling data or likely distributions of parameters to calculate a probabilistic distribution of outcomes.

For instance, one of the most common modeling approaches for evaluating interventions is based on differential equation models of disease such as the standard Susceptible-Infected-Recovered (SIR) model. In the SIR model and other more complex constructions, a closed-form solution can often be calculated for the basic reproductive number, *R _{0}*, the average number of secondary infections that would follow upon a primary infection in a naïve host population. In a population where there is preexisting immunity due to either vaccination or previous infection, the effective reproductive number,

*R*, is defined as the average number of secondary infections following a primary infection in a population that is not completely naïve.

_{e}is of particular interest in public health because interventions that bring its value below 1 are predicted to eradicate the disease. This deterministic threshold of is proposed as the basis for policy decisions regarding the level of interventions that should be implemented. However, the best estimates for the parameters that are needed for the closed-form solution of are inevitably inexact. To address this point, sensitivity analyses are frequently performed to evaluate models and explore the relationship between model parameters and outcomes. In such deterministic sensitivity analyses, one or more parameters are perturbed and the corresponding effects on outcomes are examined. The perturbation can be done either by evaluating the effect of arbitrarily small changes in parameter values (e.g. ± 1%) or by evaluating the effects across a range of values defined by plausible probability density functions. Because the values of other parameters are held fixed at best point estimates, these strategies do not account for interaction effects in non-linear dynamic models, and do not assess global uncertainty in outcome. Uncertainty analysis has been recommended for many fields of mathematical modeling, including medical decision making, as an optimal approach to presenting models. In the case of dynamic transmission modeling, however, authoritative best practices have not included uncertainty analyses. Modeling guidelines recommend probabilistic sensitivity analysis, in which both global parameter uncertainty and output uncertainty are addressed, as the best practice method for uncertainty analysis. Yet that ideal has not been extended to dynamic transmission models, for which its implementation has been challenging.

We are developing methods for global probabilistic sensitivity analysis that allow the contribution of each parameter to model outcomes to be investigated while also taking into account the uncertainty of other model parameters. Uncertainty in parameter values can be accounted for by sampling randomly from empirical data or from probability density functions fit to empirical data. Depending on the instance, such sampling techniques include bootstrapping, Monte Carlo sampling, and Latin hypercube sampling. The model output generated from parameter samples can then be analyzed using linear (e.g. partial correlation coefficients), monotonic (e.g. partial rank correlation coefficients) and non-monotonic statistical tests (e.g. sensitivity index) to determine the contribution of each parameter to the variation in output values. Indeed, for a global sensitivity analysis to yield probabilities associated with outcomes that are of greatest utility to policy makers, probabilistic analyses of parameter uncertainty must be carried through to the model outcomes. For example, the probability of eradication of an epidemic is sensitive to both levels of vaccination and treatment. Moreover, a policy based on the analysis of data should take into consideration not only the best estimate of necessary action, but also the uncertainty around that outcome estimate. The former policy advice, indicating an exact cline of treatment and vaccination that should put into abeyance an influenza epidemic, is very different and can be misleading compared to the probabilistic statement, which gives a policymaker a predictive probability that a particular policy of treatment and vaccination will put into abeyance an influenza epidemic. Similar approaches applied with a next-generation matrix to rabies vaccination in Tanzania were able to demonstrate that WHO goals in two districts of 70% vaccination coverage of dogs had more than enough probability to control rabies, if only the process to achieve those not impractical goals could be mustered.

A public health decision maker would find most useful the assignment of the probability of eradication to each level of treatment, so that they may precisely weigh the cost of intervention against the potential for failure. These probabilistic outcome distributions also feed forward extremely fluidly with cost-effectiveness estimation, a field which has embraced uncertainty analysis but which has until our recent work not incorporated uncertainty from nonlinear infectious disease models into calculations.

We have many projects ongoing in the lab, covering topics summarized below, including many we have already published on and many that we have not. In particular, we have a lot of projects on the somatic evolution of cancer that are not yet in publications.

### Coauthors

### Research Interests

Algorithms; Bacteria; Bacterial Infections and Mycoses; Beer; Bread; Cell Transformation, Neoplastic; Coccidioidomycosis; Computing Methodologies; Biological Evolution; Fungi; Genetic Engineering; Microbiological Phenomena; Models, Genetic; Models, Theoretical; Mycoses; Neoplasm Metastasis; Neoplasms; Phylogeny; Viruses; Wine; Models, Statistical; Likelihood Functions; Logistic Models; Polymerase Chain Reaction; Sequence Analysis, DNA; Nonlinear Dynamics; Molecular Epidemiology; Gene Transfer Techniques; Crops, Agricultural; Evolution, Molecular; Nature; Sequence Analysis, Protein; Gene Expression Profiling; Public Health Informatics; Microarray Analysis; Genetic Speciation; Host-Pathogen Interactions; Genetic Phenomena; Mathematical Concepts; Organisms; Phenomena and Processes

### Public Health Interests

Antibiotic Resistance; Bioinformatics; Cancer; Evolution; Genetics, Genomics, Epigenetics; Health Policy; Hepatitis; HIV/AIDS; Infectious Diseases; Influenza; Metabolism; Microbial Ecology; Modeling; Vaccines; Zoonotic Diseases; Pollution; Tick-borne Diseases; COVID-19

### Research Images

### Selected Publications

- Optimal COVID-19 quarantine and testing strategies.Wells CR, Townsend JP, Pandey A, Moghadas SM, Krieger G, Singer B, McDonald RH, Fitzpatrick MC, Galvani AP. Optimal COVID-19 quarantine and testing strategies. Nature Communications 2021, 12: 356. PMID: 33414470, PMCID: PMC7788536, DOI: 10.1038/s41467-020-20742-8.
- Environmental and sex-specific molecular signatures of glioma causation.Claus EB, Cannataro VL, Gaffney SG, Townsend JP. Environmental and sex-specific molecular signatures of glioma causation. Neuro-oncology 2022, 24: 29-36. PMID: 33942853, PMCID: PMC8730771, DOI: 10.1093/neuonc/noab103.
- Optimal Rates for Phylogenetic Inference and Experimental Design in the Era of Genome-Scale Data Sets.Dornburg A, Su Z, Townsend JP. Optimal Rates for Phylogenetic Inference and Experimental Design in the Era of Genome-Scale Data Sets. Systematic Biology 2019, 68: 145-156. PMID: 29939341, DOI: 10.1093/sysbio/syy047.
- APOBEC-induced mutations and their cancer effect size in head and neck squamous cell carcinoma.Cannataro VL, Gaffney SG, Sasaki T, Issaeva N, Grewal NKS, Grandis JR, Yarbrough WG, Burtness B, Anderson KS, Townsend JP. APOBEC-induced mutations and their cancer effect size in head and neck squamous cell carcinoma. Oncogene 2019, 38: 3475-3487. PMID: 30647454, PMCID: PMC6499643, DOI: 10.1038/s41388-018-0657-6.
- Wagging the long tail of drivers of prostate cancer.Cannataro VL, Townsend JP. Wagging the long tail of drivers of prostate cancer. PLoS Genetics 2019, 15: e1007820. PMID: 30653503, PMCID: PMC6336235, DOI: 10.1371/journal.pgen.1007820.
- The Cancer Tree.Townsend JP. The Cancer Tree. Scientific American 2018, 318: 34-41. PMID: 29557973, DOI: 10.1038/scientificamerican0418-34.
- Neutral Theory and the Somatic Evolution of Cancer.Cannataro VL, Townsend JP. Neutral Theory and the Somatic Evolution of Cancer. Molecular Biology And Evolution 2018, 35: 1308-1315. PMID: 29684198, PMCID: PMC5967571, DOI: 10.1093/molbev/msy079.
- Effect Sizes of Somatic Mutations in Cancer.Cannataro VL, Gaffney SG, Townsend JP. Effect Sizes of Somatic Mutations in Cancer. Journal Of The National Cancer Institute 2018, 110: 1171-1177. PMID: 30365005, PMCID: PMC6235682, DOI: 10.1093/jnci/djy168.
- Detection of Regional Variation in Selection Intensity within Protein-Coding Genes Using DNA Sequence Polymorphism and Divergence.Zhao ZM, Campbell MC, Li N, Lee DSW, Zhang Z, Townsend JP. Detection of Regional Variation in Selection Intensity within Protein-Coding Genes Using DNA Sequence Polymorphism and Divergence. Molecular Biology And Evolution 2017, 34: 3006-3022. PMID: 28962009, PMCID: PMC5850860, DOI: 10.1093/molbev/msx213.
- The ancestral levels of transcription and the evolution of sexual phenotypes in filamentous fungi.Trail F, Wang Z, Stefanko K, Cubba C, Townsend JP. The ancestral levels of transcription and the evolution of sexual phenotypes in filamentous fungi. PLoS Genetics 2017, 13: e1006867. PMID: 28704372, PMCID: PMC5509106, DOI: 10.1371/journal.pgen.1006867.
- Early and multiple origins of metastatic lineages within primary tumors.Zhao ZM, Zhao B, Bai Y, Iamarino A, Gaffney SG, Schlessinger J, Lifton RP, Rimm DL, Townsend JP. Early and multiple origins of metastatic lineages within primary tumors. Proceedings Of The National Academy Of Sciences Of The United States Of America 2016, 113: 2140-5. PMID: 26858460, PMCID: PMC4776530, DOI: 10.1073/pnas.1525677113.
- PathScore: a web tool for identifying altered pathways in cancer data.Gaffney SG, Townsend JP. PathScore: a web tool for identifying altered pathways in cancer data. Bioinformatics (Oxford, England) 2016, 32: 3688-3690. PMID: 27503224, DOI: 10.1093/bioinformatics/btw512.
- Climatic and evolutionary drivers of phase shifts in the plague epidemics of colonial India.Lewnard JA, Townsend JP. Climatic and evolutionary drivers of phase shifts in the plague epidemics of colonial India. Proceedings Of The National Academy Of Sciences Of The United States Of America 2016, 113: 14601-14608. PMID: 27791071, PMCID: PMC5187705, DOI: 10.1073/pnas.1604985113.
- Epidemiological and viral genomic sequence analysis of the 2014 ebola outbreak reveals clustered transmission.Scarpino SV, Iamarino A, Wells C, Yamin D, Ndeffo-Mbah M, Wenzel NS, Fox SJ, Nyenswah T, Altice FL, Galvani AP, Meyers LA, Townsend JP. Epidemiological and viral genomic sequence analysis of the 2014 ebola outbreak reveals clustered transmission. Clinical Infectious Diseases : An Official Publication Of The Infectious Diseases Society Of America 2015, 60: 1079-82. PMID: 25516185, PMCID: PMC4375398, DOI: 10.1093/cid/ciu1131.
- Inferring the Origin of Metastases from Cancer Phylogenies.Hong WS, Shpak M, Townsend JP. Inferring the Origin of Metastases from Cancer Phylogenies. Cancer Research 2015, 75: 4021-5. PMID: 26260528, PMCID: PMC4833389, DOI: 10.1158/0008-5472.CAN-15-1889.
- Gene Expression Evolves under a House-of-Cards Model of Stabilizing Selection.Hodgins-Davis A, Rice DP, Townsend JP. Gene Expression Evolves under a House-of-Cards Model of Stabilizing Selection. Molecular Biology And Evolution 2015, 32: 2130-40. PMID: 25901014, PMCID: PMC4592357, DOI: 10.1093/molbev/msv094.
- Phylogenetic signal and noise: predicting the power of a data set to resolve phylogeny.Townsend JP, Su Z, Tekle YI. Phylogenetic signal and noise: predicting the power of a data set to resolve phylogeny. Systematic Biology 2012, 61: 835-49. PMID: 22389443, DOI: 10.1093/sysbio/sys036.
- Maximum-likelihood model averaging to profile clustering of site types across discrete linear sequences.Zhang Z, Townsend JP. Maximum-likelihood model averaging to profile clustering of site types across discrete linear sequences. PLoS Computational Biology 2009, 5: e1000421. PMID: 19557160, PMCID: PMC2695770, DOI: 10.1371/journal.pcbi.1000421.
- Profiling phylogenetic informativeness.Townsend JP. Profiling phylogenetic informativeness. Systematic Biology 2007, 56: 222-31. PMID: 17464879, DOI: 10.1080/10635150701311362.
- Population genetic variation in genome-wide gene expression.Townsend JP, Cavalieri D, Hartl DL. Population genetic variation in genome-wide gene expression. Molecular Biology And Evolution 2003, 20: 955-63. PMID: 12716989, DOI: 10.1093/molbev/msg106.
- Bayesian analysis of gene expression levels: statistical quantification of relative mRNA level across multiple strains or treatments.Townsend JP, Hartl DL. Bayesian analysis of gene expression levels: statistical quantification of relative mRNA level across multiple strains or treatments. Genome Biology 2002, 3: RESEARCH0071. PMID: 12537560, PMCID: PMC151173, DOI: 10.1186/gb-2002-3-12-research0071.