NLP/LLM Interest Group
"From Compound Figures to Composite Understanding"
From Compound Figures to Composite Understanding: Developing a Multi-Modal LLM from Biomedical Literature with Medical Multiple-Image Benchmarking and Validation
In healthcare, disease diagnostics and longitudinal patient monitoring require clinicians to synthesize information across multiple images from different modalities or time points, yet this multi-image reasoning remains a significant gap for most current multi-modal LLMs. This capability gap persists due to a critical bottleneck: the lack of large-scale, high-quality annotated training data for medical multi-image understanding. This study aims to address this scarcity by leveraging compound figures from biomedical literature. We devised a novel five-stage, context-aware instruction generation pipeline to create the PMC-MI-Dataset comprising over 260,000 compound images, and subsequently developed M³LLM, a medical multi-image multi-modal LLM. For a comprehensive evaluation, we also constructed the expert-validated PMC-MI-Bench. M³LLM significantly outperforms state-of-the-art general-purpose and specialized MLLMs, achieving superior performance on diverse tasks of the PMC-MI-Bench and public benchmarks like OmniMedVQA and MMMU-Med. Furthermore, clinical validation on the MIMIC longitudinal chest X-ray dataset confirms its superior performance in real-world tasks, including disease diagnosis and progression prediction. Our study establishes a scalable paradigm for this task, and the model, dataset, and benchmark will be publicly released.
Related Media
2024-11-11-Zhen-Chen-
Contacts
Host Organizations
- Biomedical Informatics & Data Science
- Clinical NLP Lab