🤖 AI Frontiers: Hosted by NLP/LLM Interest Group
Are Multimodal LLMs Ready for Clinical Dermatology? A Real-World Evaluation in Dermatology
Abstract: Multimodal large language models (MLLMs) have shown promising performance on dermatology benchmark datasets, but their ability to support real-world clinical decision-making remains unclear. In this seminar, I will present findings from a large-scale evaluation of MLLMs across both public dermatology benchmarks and a multi-site hospital dermatology consultation cohort comprising. I will discuss model performance for differential diagnosis generation and urgent dermatology triage, the impact of clinical context on diagnostic accuracy, and key limitations related to visual grounding and context integration. These results highlight the importance of realistic evaluation frameworks for assessing the clinical readiness of AI systems in dermatology.