Google and Yale researchers have developed a more “advanced and capable” AI model for analyzing single-cell RNA data using large language models that is expected to “lead to new insights and potential biological discoveries.”
“This announcement marks a milestone for AI in science,” Google announced.
On social media and in comments, scientists and developers applauded the model—which Google released Oct. 15—as the much-needed bridge to make single-cell data accessible, or interpretable, by AI.
Many scientists, including cancer researchers focusing on improving the outcomes of immunotherapies, have homed in on single-cell data to understand the mechanisms that either promote or thwart disease progression. But their efforts have been slowed by the size and complexity of data.
In an April blog post, Google explained: “Single-cell data are massive, high-dimensional, and hard to interpret. Each cell can be represented by thousands of numbers—its gene expression measurements—which traditionally require specialized tools and models to analyze. This makes single-cell analysis slow, difficult to scale, and limited to expert users.”
Working with the lab of Yale’s David van Dijk, PhD, Google Research introduced an initial version of the tool, called Cell2Sentence-Scale, in the spring, describing it as “a family of powerful, open-source large language models (LLMs) trained to ‘read’ and ‘write’ biological data at the single-cell level.”
Yale and Google researchers tested how the new version of the model might be applied to biological questions prior to its release. The findings, which will be described in a forthcoming paper, revealed a potential new cancer therapy pathway.