Skip to Main Content
Everyone (Public)

NLP/LLM Interest Group

"From Compound Figures to Composite Understanding"

From Compound Figures to Composite Understanding: Developing a Multi-Modal LLM from Biomedical Literature with Medical Multiple-Image Benchmarking and Validation

In healthcare, disease diagnostics and longitudinal patient monitoring require clinicians to synthesize information across multiple images from different modalities or time points, yet this multi-image reasoning remains a significant gap for most current multi-modal LLMs. This capability gap persists due to a critical bottleneck: the lack of large-scale, high-quality annotated training data for medical multi-image understanding. This study aims to address this scarcity by leveraging compound figures from biomedical literature. We devised a novel five-stage, context-aware instruction generation pipeline to create the PMC-MI-Dataset comprising over 260,000 compound images, and subsequently developed M³LLM, a medical multi-image multi-modal LLM. For a comprehensive evaluation, we also constructed the expert-validated PMC-MI-Bench. M³LLM significantly outperforms state-of-the-art general-purpose and specialized MLLMs, achieving superior performance on diverse tasks of the PMC-MI-Bench and public benchmarks like OmniMedVQA and MMMU-Med. Furthermore, clinical validation on the MIMIC longitudinal chest X-ray dataset confirms its superior performance in real-world tasks, including disease diagnosis and progression prediction. Our study establishes a scalable paradigm for this task, and the model, dataset, and benchmark will be publicly released.

Contacts

Host Organizations

Admission

Free

Event Type

Lectures and Seminars

Tag

Next upcoming occurrences of this event

Oct 202527Monday