Skip to Main Content

David van Dijk: The Role of Machine Learning in Biomedical Discovery

May 21, 2020
by Elisabeth Reitman

David van Dijk, PhD, uses machine learning algorithms that analyze complex biomedical data. A computer scientist by training, Van Dijk holds a dual appointment in Medicine and Computer Science at Yale where he uses graph signal processing and deep learning to find patterns in large data sets.

Launched in September 2019, the Van Dijk Lab uses algorithms to accelerate discoveries. The lab develops new computational methods, based on machine learning, and applies these to large data sets to advance our understanding of a wide range of biological systems and diseases.

Van Dijk never anticipated that he would work in cardiovascular research. As a student, his interests evolved from computer science theory to application in a variety of fields. “Computer science can be a vehicle to understand the world whether it’s biology, medicine, or sociology,” he explains. After graduating from college, Van Dijk went on to earn a master’s and doctorate in computer science, at which point his focus shifted toward computational biology.

Insights from Machine Learning

At the Weizmann Institute of Science in Israel, Van Dijk worked with computational scientist Eran Segal, PhD, to investigate variability in gene expression, a process where DNA sequences enable genetic information to be read by the cell. Van Dijk co-developed a model to understand how promoter DNA sequences impart codes that determine whether a gene should be active or inactive. Segal and Van Dijk created advanced algorithms to work with large amounts of data encoded in the promoter region of a cell. By using data from yeast collected in a lab, Van Dijk designed promoter regions and mutated them to create a large data set. The researchers then used the algorithm to find patterns that predicted the activity of genes based on these DNA sequences.

At the time many scientists were measuring the expression or the activity of genes at the tissue level. For his post-doc research, Van Dijk wanted to conduct experiments to see if gene expression could make predictions at the cellular level. The opportunity came in 2015 when Van Dijk accepted a position at Columbia University. Single-cell-RNA-sequencing was becoming more widely accepted. This experimental technology could be used to learn what genes are expressed in high-throughput. RNA sequencing from single cells could answer research questions such as stem cell maturation cancer heterogeneity and variability within complex tissues.

For example, individual cells have diverse expression patterns, known as on average or in bulk. Scientists use RNA sequencing to measure cell activity in a tumor and understand the complexity of the tumor. With these insights into cell development, scientists could generate enormous data sets, but the technology had two limits: First the data lacked structure. Second the methods used to gather data was inefficient and critical information was often lost. Van Dijk realized that this was the perfect machine learning problem.

For decades, MIT’s Robert Weinberg has contributed to the characterization of human cancer genes. Van Dijk and Weinberg to develop an algorithm that they applied to a breast cancer model used to measure the spread of cancer cells to new areas of the body. They discovered that when cells transition from their baseline endothelial phenotype into a metastatic mesenchymal phenotype, the process was associated with certain stem cell signature. The cells actually become stem-like before transitioning to this more metastatic state, a subtle change that could lead to breakthroughs in cancer research.

Future of Individualized Medicine at Yale

Scientists hope that machine learning will enable physicians to spend more time with patients. The technology exists, but it has not been deployed in the healthcare field until recently.

In September, Yale launched a comprehensive DNA sequencing project called Generations. Even so, progress in the field is stalled. Outdated healthcare systems were not designed with machine learning in mind. Others find it problematic to collect large sums of data. It’s also possible that the data may contain biases, inconsistencies, and incomplete information. The algorithms that Van Dijk developed can be applied to a variety of scenarios. On a given day, Van Dijk could be working with clinicians to answer important problems about health records or designing an experiment to answer a fundamental question about molecular biology. The same algorithms often apply to multiple scenarios, whether it’s molecular biology or clinical data. The challenge is to utilize health records data collected at Yale New Haven Hospital and relate it to patient outcomes. For example, Van Dijk believes heart failure is one area that would benefit from using patient data to make better predictions.

In October, Van Dijk co-authored a paper in Nature Methods where Yale scientists used an artificial intelligence neural network called SAUCIE to analyze 11 million cells reveal cellular differences within individuals as well as broader patterns that tell how the body functions. More recently, Van Dijk, in collaboration with Craig Wilen, MD, PhD, authored a paper on single-cell analysis of SARS-CoV-2 infection dynamics.

Currently, the Van Dijk Lab is collaborating with Yale colleagues on several projects. “Everyone is excited to collaborate,” he says. “I get exposed to so many interesting systems, and I have the opportunity to impact healthcare and medicine.”

Biomedical Imaging

Van Dijk is working with nuclear cardiologists to develop the first algorithm with the ability to analyze at 3D images for new phenotypes. A positron emission tomography (PET) scan is an imaging test that helps reveal areas of decreased blood flow to the heart. Every day dozens of patients at Yale New Haven Hospital with severe chest pain receive a stress test to track the function of the heart in 3D. A PET scan helps clinicians determine how well the heart is functioning and whether a patient may need invasive treatment. Van Dijk is currently working with experts to extract outcomes information and leverage that data to make a more accurate diagnosis and determine which patients benefited the most from a surgical procedure. “The idea here is that perhaps we’re not maximizing the information we get. If we can extract additional information that perhaps you traditionally wouldn't have looked for there may be very subtle information.” What’s challenging about this is how do you let an algorithm ingest that data and look at a 3-dimensional image? How do you ingest it and how do you then extract meaningful information from that? Eventually, Van Dijk hopes to combine health records data with the nuclear imaging data.

Machine Learning and Collaboration

Van Dijk is currently developing a research project with vascular biologist Stefania Nicoli, PhD, from the Yale Cardiovascular Research Center (YCVRC). “There's a lot of excitement here,” he says. “The atmosphere here has been really great and welcoming.” The partnership hopes to use machine learning to predict complex brain vascular patterns, which will provide new insights around how the cardiovascular system is shaped by genetic activity. In addition to his research with the YCVRC, Van Dijk is also a collaborator on several immunology projects that could impact cardiovascular health.

The Van Dijk Lab is collaborating with David Hafler, the William S. and Lois Stiles Edgerly Professor in the Department of Immunobiology, to generate a data set of immune cells to locate signatures of the homeostatic immune system in the cerebral spinal fluid to predict multiple sclerosis (MS). Hafler, who also is neurologist-in-chief at Yale New Haven Hospital, is widely recognized for his contributions in identifying the underlying causes of MS.

Other collaborators include Noah Palm, PhD, an assistant professor of immunobiology, and Aaron Ring, MD, PhD. Palm’s research is focused on the complex interactions between the immune system and the gut microbiota. Together, Van Dijk and Palm are investigating gut microbiome-host interactions. Palm has developed an experimental technology that can measure how microbes interact with our immune system. Van Dijk’s goal is to develop a model to identify signatures in that model. Ring hopes to understand and manipulate the activity of immune receptors using structural and combinatorial biology approaches. If successful, Ring’s model could measure all of the antibody reactivity to predict outcome of cancer immunotherapy.

For research updates, follow Van Dijk on Twitter.

Recruitment Opportunities

The Van Dijk Lab is currently recruiting students at all levels with an engineering or quantitative background. Medical students and fellows that are interested in machine learning are also encouraged to apply. “I bring people together that otherwise would not have met each other because they live in different universes. But in my lab, I bring them together and we can do some amazing things,” says Van Dijk.

For more information, please send an email to: david.vandijk

Submitted by Elisabeth Reitman on May 21, 2020