Everyone (Public)

NLP/LLM Interest Group

Name: NLP/LLM Interest Group
Start: 2025-10-27T20:00:00.0000000Z
End: 2025-10-27T21:00:00.0000000Z
Location: Yale University

"From Compound Figures to Composite Understanding"

101 College Street

Zoom: Passcode Required

Add event to Calendar

Add event series to Calendar

From Compound Figures to Composite Understanding: Developing a Multi-Modal LLM from Biomedical Literature with Medical Multiple-Image Benchmarking and Validation

In healthcare, disease diagnostics and longitudinal patient monitoring require clinicians to synthesize information across multiple images from different modalities or time points, yet this multi-image reasoning remains a significant gap for most current multi-modal LLMs. This capability gap persists due to a critical bottleneck: the lack of large-scale, high-quality annotated training data for medical multi-image understanding. This study aims to address this scarcity by leveraging compound figures from biomedical literature. We devised a novel five-stage, context-aware instruction generation pipeline to create the PMC-MI-Dataset comprising over 260,000 compound images, and subsequently developed M³LLM, a medical multi-image multi-modal LLM. For a comprehensive evaluation, we also constructed the expert-validated PMC-MI-Bench. M³LLM significantly outperforms state-of-the-art general-purpose and specialized MLLMs, achieving superior performance on diverse tasks of the PMC-MI-Bench and public benchmarks like OmniMedVQA and MMMU-Med. Furthermore, clinical validation on the MIMIC longitudinal chest X-ray dataset confirms its superior performance in real-world tasks, including disease diagnosis and progression prediction. Our study establishes a scalable paradigm for this task, and the model, dataset, and benchmark will be publicly released.