Skip to Main Content

2023 Request for Project Proposals

Fellows accepted into the program will gain insight into research in both academia and industry. Our prestigious fellowships are fully funded for a three-year period and provide the fellows’ salaries, fringe benefits, and a generous travel allowance. Our fellowships are open to outstanding candidates seeking career advancement in biomedical data sciences.

The Yale-BI Joint Selection Committee has established the following data-driven research themes for the 2023 program. The selection committee will consider how well proposed research projects align with prioritized themes in judging submissions and post-doctoral applicants. Selected applications will be further reviewed by the JSC for approval.

Theme 1: Machine Learning and Algorithms for High Throughput Biomedical Data

Yale researchers have developed many widely used computational methods that analyze high throughput data, such as single cell sequencing data, to provide start-to-finish analytical ecosystems for large-scale biomedical datasets. These data are generated from experiments involving a wide range of systems that are considered of high interest for this program including infectious disease, cancer, autoimmunity, heterozygous respiratory disease, NASH, Obesity, and population genetics. Models using methods such as manifold learning and deep learning have been developed for supervised and unsupervised learning approaches to process and visualize data, understand disease progressions, characterize patient phenotypic diversity, and underlying causal mechanisms. This theme will leverage the strength of Yale researchers on the development of machine learning tools to address significant problems in data integration, denoising, and analysis with respect to disease progression, disease mechanism and effects of therapeutic interventions.

Theme 2: Multi-omics Analytics for Personalized Medicine

Advances in -omics technologies, such as genomics, transcriptomics, proteomics, metabolomics and immune repertoire, have begun to enable personalized medicine at an extraordinarily detailed molecular level. A key advantage of such technologies is their ability to enable a more granular understanding of cell populations and interrogate their role in molecular processes. As multi-omics data become increasingly useful, we must look to maintain speed of methodological development to realize their value. Yale is a world leader in developing and enabling -omics technologies for biomedical research and has been engaged in many national and international programs, such as Brainspan, ENCODE, modENCODE, 1000 Genomes Project, PCAWG, the exRNA Consortium, IMPACC, the Human Immunology Project Consortium (HIPC), the dGTEx project, and the Center for Mendelian Diseases. Yale researchers have developed computational approaches to process large scale multi-omics data and data integration methods. This theme will focus on cutting edge method development for the integration of multi-omics human data (e.g., single cell multi-omics – scATAC/scRNA/spatial transcriptomics) for both basic and translational research in the areas of Obesity, PF-ILD, IBD, SSc, CNSDR, Retinopathies & NASH. Additional areas of interest include the application of neural graph networks (GNNs) plus biological networks in combination with embedding methods for patient datasets with a small cohort size. These techniques can improve and ease the analysis of ‘unmatched’ multi-modal data and enhance the overall performance compared to traditional methods that don't consider the graph structure.

Theme 3: Moving from GWAS to Causal Genes and Variants

A central problem in genetics research is deciphering how genomic variation affects the function of genes and results in disease or altered response to treatment. While results from genome-wide association studies (GWAS) offer some insights into the genetic basis of common diseases, innovative methods will allow us to effectively integrate diverse data types and/or sources of information to identify functional genes and variants and understand how they shape clinically relevant phenotypes. Yale researchers have developed and applied methods to address the major challenges in post GWAS analysis, such as how to move from a genetic association signal in a chromosomal region to finding disease-associated genes and causal variants, as a step towards understanding the underlying disease process. Fine mapping, sequencing, functional studies, and other approaches have been performed to find the causal variants involved in complex diseases. We are particularly interested in exploring different computational and statistical approaches towards the integration of diverse data sources (e.g., GWAS/transcriptomics from different cohorts) to interrogate causal relationships at the single cell level between genetic variation and the perturbed pathological mechanism.

Theme 4: Genomic Health, Longitudinal Modelling, and Biomarkers

Individuals exhibit substantial heterogeneity due to genetics, environmental factors, and life histories. These differences can dictate not only the onset and trajectory of diseases, but also treatment efficacy & biomarker identification, for example, how drugs are absorbed and metabolized in the body, as well as the response to preventative measures like vaccination. Yale researchers have been developing resources and tools to identify biomarkers that account for differences between individual patients. This is exemplified by the Generations Project to recruit 100,000 patients, for example, using genomics information to find the right drug for the right patient at the right time, with the help of the Yale Center for Genome Analysis. Another example is the development of epigenetic clocks that can be used for risk stratification. Many biomarkers, such as proteins and/or metabolites, offer valuable information for patient selection, monitoring disease onset, prognosis, pharmacodynamics, treatment effect, safety, and other clinical outcomes. There is a great interest in biomarker-driven approaches to assess the pharmacologic response to a therapeutic intervention and predicting treatment efficacy more quickly than conventional clinical endpoints thus accelerating product development. Yale researchers have been active in developing methods for biomarker discovery from -omics data and other data. This theme focuses on the development of computational and statistical methods to model and mine rich data sources (e.g. -omics data) to discover, validate, and classify promising biomarkers.

Theme 5: Electronic Health Records and Digital Health

Yale researchers have extensive experience building informatics infrastructure for clinical research, and performing research focused on issues such as data integration and the management of clinical vocabularies used in clinical research databases. Research projects include broad informatics domains such as Real-World Data (RWD) and Electronic Health Records (EHRs) that can be used to understand effectiveness of therapies in patient sub-populations & synthetic data can provide a foundation for standards development while ensuring patient privacy. High interest problem statements related to these domains is the development our understanding of the impact of domain shift between different biobanks or EHR databases in the context of both supervised and unsupervised learning, with a focus on disease prediction and subtyping, and developing methods for mitigating such domain shift. The application of methods integrating machine learning with causal inference to identify causal factors relevant for the task at hand (instead of only correlating factors), which would allow for more robust domain adaptation and out-of-distribution prediction across biobank and EHR data. Lastly, developing general causal inference techniques that extract complex dependencies between diseases and patient features using large biobank and EHR databases are all of high interest.

How to Apply

What to Submit?

  • Application form
  • Current CV
  • Project proposal that includes a detailed research plan (1000 word limit)
  • Statement of how the Program aligns with your educational and career goals (800 word limit)

Key Dates

  • Submission open: March 1, 2023 12:00am ET
  • Submission deadline: April 18, 2023 12:00am ET
  • Notification to applicants: July 6, 2023
  • Program start date: September 1, 2023


  • PhD degree in the area of, computational biology, bioinformatics, data science, or relevant scientific disciplines, as well as excellent computing skills
  • Good analytical and written communication skills, including the ability to effectively describe scientific material to both specialized and lay audiences
  • A strong CV, preferably having published in high-impact journals and presented at international meetings

Fellow Responsibilities

  • Conduct high quality research through developing novel computational methods/tools to address significant biomedical problems
  • Publish research results in high-impact journals
  • Communicate research progress at regular intervals with Yale and BI mentors
  • Participate in required program activities