NLP/LLM Interest Group
"Opportunities and Pitfalls of Large Language Models in Digestive Diseases"
The evidence base for LLMs in digestive diseases shows wide performance variability, underscoring safety risks and the need for rigorous evaluation.
In this talk, Mauro Giuffrè, MD will overview his main contributions in the field: a systematic review that quantified accuracy ranges and highlighted methodological gaps; a guideline-grounded study showing that retrieval-augmented and fine-tuned GPT-4 markedly improve open-ended answer quality and treatment selection in patients with Hepatitis C Virus; an “expert-of-experts” verification framework (EVAL) that aligns automated grading with human experts and boosts correctness via rejection sampling; and a randomized simulation trial (GutGPT) revealing that better usability does not automatically translate into adoption, pointing to trust and workflow integration as key levers for impact.