ChatGPT made headlines in 2023 after it identified the cause of a young boy’s mysterious chronic pain. His mother had taken him to 17 doctors over three years, but none could figure out the cause of her son’s suffering. Out of frustration, she then turned to ChatGPT. After entering as much information as she could on his condition, she finally received the long-awaited answer—tethered spinal cord syndrome. She made an appointment with a neurosurgeon who confirmed the chatbot’s diagnosis. The boy finally received surgery to treat his chronic pain.
Indeed, chatbots show potential in assisting doctors in diagnosing complex medical cases. One 2023 study in JAMA found that OpenAI’s chatbot GPT-4 accurately identified the final diagnoses of challenging medical cases 39% of the time, and it included the correct diagnosis in its list of possible conditions 64% of the time. While promising, the chatbot’s lack of specialized training, however, still leaves much room for improvement.
Me-LLaMA is a novel family of LLMs introduced by YSM researchers. This chatbot is similar to its cousins ChatGPT and GPT-4, but these LLMs are closed-source—meaning they aren’t easily accessible to or customizable by researchers.
To address this issue, Hua Xu, PhD, Robert T. McCluskey Professor of Biomedical Informatics and Data Science and assistant dean for Biomedical Informatics, and his team are developing this new family of LLMs collectively known as Me-LLaMA, which is one of the first and largest open-source models to be trained on extensive biomedical and clinical data. “Me-LLaMA is an open-source medical foundation model that we are continuously training on large amounts of biomedical text and releasing to the community,” Xu says. His team used over 129 billion tokens—small pieces of text, like words or parts of words, that the model processes—to train Me-LLaMA. “We are doing both pre-training and fine-tuning to improve its performance on many biomedical applications.”
Xu’s team is training these models on massive amounts of data, including millions of biomedical articles from the PubMed database, clinical notes from anonymized databases, clinical guidelines, and more. The researchers are also studying how well the models perform various tasks. For example, users can ask the chatbot questions about specific publications or ask it to extract relevant information about a clinical trial.
The researchers are also comparing the performance of Me-LLaMA and other LLMs using publicly available datasets that test these models in different areas, such as answering medical questions. So far, they are finding that Me-LLaMA outperforms such other existing open medical LLMs as Meditron-70B and such commercial models as ChatGPT and GPT-4 across these kinds of tasks.
“We are showing that large language models have great potential as an AI assistant that helps with clinical diagnostic reasoning, accelerating clinical documentation, and making clinical work more efficient while improving patient care,” says Qianqian Xie, PhD, associate research scientist in Xu’s lab. Xie is currently exploring Me-LLaMA’s ability not only to come up with potential diagnoses when given a summary of a particular case but also to explain its reasoning for each one.
Updating Me-LLaMA requires significant computational resources. Fortunately, says Xu, Yale is dedicated to supporting the development of robust graphics processing units (GPUs) infrastructure. Recently, the Office of the Provost announced it will invest over $150 million in AI development.