Skip to Main Content

The Role of Machine Learning Algorithms in Biomedical Discovery

Machine learning algorithms accelerate biomedical research in viral infection, cardiovascular disease, breast cancer and more at Yale’s van Dijk Lab.

5 Minute Read

David van Dijk, PhD, uses machine learning algorithms that analyze complex biomedical data. A computer scientist by training, van Dijk holds a dual appointment in medicine and computer science at Yale, where he uses graph signal processing and deep learning to find patterns in large data sets.

Launched in September 2019, the van Dijk Lab uses algorithms to accelerate discoveries in medicine. The lab develops new computational methods, based on machine learning, and applies these to large data sets to advance our understanding of a wide range of biological systems and diseases.

Van Dijk never anticipated that he would work in cardiovascular research. As a student, his interests evolved from computer science theory to application in a variety of fields. “Computer science can be a vehicle to understand the world, whether it’s biology, medicine, or sociology,” he explains. After graduating from college, van Dijk went on to earn a master’s and doctorate in computer science, at which point his focus shifted toward computational biology.

Insights into DNA Sequences from Machine Learning Algorithms

At the Weizmann Institute of Science in Israel, van Dijk worked with computational scientist Eran Segal, PhD, to investigate variability in gene expression, a process where DNA sequences enable genetic information to be read by the cell. Van Dijk co-developed a model to understand how promoter DNA sequences impart codes that determine whether a gene should be active or inactive.

Segal and van Dijk created advanced machine learning algorithms to work with large amounts of data encoded in the promoter region of a cell. By using data from yeast collected in a lab, van Dijk designed promoter regions and mutated them to create a large data set. The researchers then used the algorithm to find patterns that predicted the activity of genes based on these DNA sequences.

At the time, many scientists were measuring the expression or the activity of genes at the tissue level. For his post-doc research, van Dijk wanted to conduct experiments to see if gene expression could make predictions at the cellular level. The opportunity came in 2015 when van Dijk accepted a position at Columbia University.

David van Dijk, PhD

Single-cell RNA sequencing was becoming more widely accepted. This experimental technology could be used to learn what genes are expressed in high-throughput. RNA sequencing from single cells could answer research questions on topics such as stem cell maturation cancer heterogeneity and variability within complex tissues.

For example, individual cells have diverse expression patterns, known as “on average” or “in bulk.” Scientists use RNA sequencing to measure cell activity in a tumor and understand the complexity of the tumor. With these insights into cell development, scientists could generate large data sets, but the technology had two limits: First the data lacked structure. Second, the methods used to gather data were inefficient, and critical information was often lost. Van Dijk realized that this was the perfect machine learning problem.

For decades, MIT’s Robert Weinberg has contributed to the characterization of human cancer genes. Van Dijk and Weinberg developed an algorithm that they applied to a breast cancer model used to measure the spread of cancer cells to new areas of the body. They discovered that when cells transition from their baseline endothelial phenotype into a metastatic mesenchymal phenotype, the process was associated with certain stem cell signatures. The cells become stem-like before transitioning to this more metastatic state, a subtle change that could lead to breakthroughs in cancer research.

From Large Data Sets to Improved Patient Care

The end goal of this research is not only to provide more precise diagnosis and treatments, but scientists also hope that machine learning algorithms will enable physicians to spend more time with patients. The technology exists, but it has not been deployed in the healthcare field until recently.

In September 2019, Yale launched a comprehensive DNA sequencing project called Generations. Progress is slow, however, for several reasons. Outdated healthcare systems were not designed with machine learning in mind, so it can be challenging to collect large sums of data. It’s also possible that the data may contain biases, inconsistencies, and incomplete information.

The machine learning algorithms that van Dijk developed can be applied to a variety of scenarios. On a given day, van Dijk could be working with clinicians to answer important problems about health records or designing an experiment to answer a fundamental question about molecular biology. The same algorithms often apply to multiple scenarios, whether it’s molecular biology or clinical data. The goal is to utilize health records data collected at Yale New Haven Hospital and relate it to patient outcomes. For example, van Dijk believes heart failure is one area that would benefit from van Dijk co-authored a paper in Nature Methods, where Yale scientists used an artificial intelligence neural network called SAUCIE to analyze 11 million cells to reveal cellular differences within individuals as well as provide more information on broader patterns that tell how the body functions.

Currently, the van Dijk Lab is collaborating with Yale colleagues on several projects. “Everyone is excited to collaborate,” he says. “I get exposed to so many interesting systems, and I have the opportunity to impact healthcare and medicine.”

Biomedical Imaging and Cardiovascular Research

Van Dijk is working with nuclear cardiologists to develop the first machine learning algorithm with the ability to analyze 3D images for new phenotypes.

A positron emission tomography (PET) scan is an imaging test that helps reveal areas of decreased blood flow to the heart. Every day, dozens of patients at Yale New Haven Hospital with severe [1] chest pain receive a stress test to track the function of the heart in 3D. A PET scan helps clinicians determine how well the heart is functioning and whether a patient may need invasive treatment. Van Dijk is now working with experts to extract outcomes information, and then leverage that data for more accurate diagnoses and to determine which patients benefited the most from a surgical procedure.

Article outro

Author

Elisabeth Reitman

Tags

Media Contact

For media inquiries, please contact us.

Explore More

Related Links