Skip to Main Content
In Depth

Bridging Biology and AI: Yale and Google's Collaborative Breakthrough in Single-Cell RNA Analysis

Model identified a potential new cancer therapy pathway
2 Minute Read

Google and Yale researchers have developed a more “advanced and capable” AI model for analyzing single-cell RNA data using large language models that is expected to “lead to new insights and potential biological discoveries.”

“This announcement marks a milestone for AI in science,” Google announced.

On social media and in comments, scientists and developers applauded the model—which Google released Oct. 15—as the much-needed bridge to make single-cell data accessible, or interpretable, by AI.

Many scientists, including cancer researchers focusing on improving the outcomes of immunotherapies, have homed in on single-cell data to understand the mechanisms that either promote or thwart disease progression. But their efforts have been slowed by the size and complexity of data.

In an April blog post, Google explained: “Single-cell data are massive, high-dimensional, and hard to interpret. Each cell can be represented by thousands of numbers—its gene expression measurements—which traditionally require specialized tools and models to analyze. This makes single-cell analysis slow, difficult to scale, and limited to expert users.”

Working with the lab of Yale’s David van Dijk, PhD, Google Research introduced an initial version of the tool, called Cell2Sentence-Scale, in the spring, describing it as “a family of powerful, open-source large language models (LLMs) trained to ‘read’ and ‘write’ biological data at the single-cell level.”

Yale and Google researchers tested how the new version of the model might be applied to biological questions prior to its release. The findings, which will be described in a forthcoming paper, revealed a potential new cancer therapy pathway.

We can finally begin to simulate how real human cells behave—in context, in silico. This is where AI stops being just an analysis tool and starts becoming a model system for biology itself.

David van Dijk, PhD, MSc, BSc
Assistant Professor of Medicine (Cardiovascular Medicine) and of Computer Science

The Van Dijk Yale lab described its work, in part, as engineering “large‑scale foundation models that learn across biological scales—from single molecules to whole organs. By casting omics data as a biological language, our models construct virtual cells that decode cellular programs driving cancer, autoimmune disease, and tissue regeneration.”

“Just as AlphaFold transformed how we think about proteins, we’re now approaching that moment for cellular biology. We can finally begin to simulate how real human cells behave—in context, in silico," says van Dijk, assistant professor of medicine (cardiovascular medicine) at Yale School of Medicine. "This is where AI stops being just an analysis tool and starts becoming a model system for biology itself.”

The newly released model—Cell2Sentence-Scale 27B—is a scaled-up version of the previous iteration. Google's blog post concluded: “The open model and its resources are available today for the research community. We invite you to explore these tools, build on our work and help us continue to translate the language of life.”

Article outro

Author

Naedine Hazell
Yale Cancer Center Senior Communications Officer

Tags

Media Contact

For media inquiries, please contact us.

Explore More

Featured in this article

Related Organizations