NLP/LLM Interest Group
This session features two talks exploring novel approaches to AI-driven biomedical discovery and drug representation.
"PredMed: Extrapolating Future Discoveries from the Literature Universe" by Chia-Hsuan Chang, PhD - Postdoctoral Associate in Biomedical Informatics and Data Science
Abstract: Most AI co-scientists are limited by retrieval-augmented generation (RAG) over static corpora and heavily rely on human guidance. We present PredMed, a novel framework that redefines hypothesis generation as a temporal extrapolation task within the high-dimensional literature universe. Using time-based regression and a specialized Embedding Language Model (ELM) acting as a decoder, we project and translate future-state embeddings back into natural language. Our results show that this temporal steering mechanism explores scientific territory that standard prompting cannot reach, outperforming baseline methods in both novelty and relational depth. We also validate PredMed’s efficacy through expert-reviewed hypotheses in CAR-T therapy domain, highlighting a new frontier for autonomous scientific discovery
"A Literature-Based Drug Embedding Resource for Biomedical Research" by Zhiyuan Cao - PhD Student in Computational Biology and Biomedical Informatics (CBB)
Abstract: We introduce DrugSpace, a reusable text-based drug embedding resource designed to support similarity search, retrieval, and downstream modeling in biomedical research. Built from large-scale PubMed abstracts and aligned with heterogeneous DrugBank drug descriptions through a two-stage training pipeline, DrugSpace is released both as a versioned embedding dataset and as an embedder for generating representations from new drug text. To support realistic reuse, the resource is evaluated under a prospective setting that separates drug-level alignment from later drug introductions and updates. Across intrinsic similarity discrimination, ATC-based therapeutic retrieval, robustness to input perturbations, and integration into a representative DDI prediction pipeline, DrugSpace consistently remains competitive with strong biomedical and general-purpose text embedding baselines, supporting its utility as a practical and extensible drug representation resource.
Related Media
Speakers
Contacts
Host Organizations
- Biomedical Informatics & Data Science
- Clinical NLP Lab
- MyYSM