Vipina K. Keloth, PhD
Associate Research Scientist in Biomedical Informatics and Data ScienceCards
Appointments
Contact Info
About
Titles
Associate Research Scientist in Biomedical Informatics and Data Science
Biography
Dr. Vipina Keloth is an Associate Research Scientist at the Department of Biomedical Informatics and Data Science at Yale School of Medicine. Previously, she was a Postdoctoral Associate at Yale BIDS and prior to that a Postdoctoral Research Fellow at the School of Biomedical Informatics at the University of Texas Health Science Center at Houston. Vipina graduated with a doctoral degree in Computer Science from New Jersey Institute of Technology (NJIT) in 2021. She has also worked as an assistant lecturer in the Department of Mathematical and Computational Sciences at the National Institute of Technology Karnataka, India. Her research interests lie broadly in the domain of biomedical ontologies/terminologies and clinical and biomedical natural language processing.
Appointments
Biomedical Informatics & Data Science
Associate Research ScientistPrimary
Other Departments & Organizations
- Biomedical Informatics & Data Science
- Clinical NLP Lab
Education & Training
- Postdoctoral Associate
- Yale University (2024)
- Postdoctoral Research Fellow
- University of Texas Health Science Center at Houston (2023)
- PhD
- New Jersey Institute of Technology, Computer Science (2021)
- MS
- National Institute of Technology Karnataka, Systems Analysis and Computer Applications (2014)
- MSc
- Mahatma Gandhi University, Computer Applications (2010)
- BS
- Kannur University, Physics (2007)
Research
Overview
Medical Research Interests
ORCID
0000-0001-6919-1122
Research at a Glance
Yale Co-Authors
Publications Timeline
Research Interests
Hua Xu, PhD
Qingyu Chen, PhD
Kalpana Raja, PhD, MRSB, CSci
Cynthia Brandt, MD, MPH
Hamita Sachar, MD
Jeffrey Zhang
Biological Ontologies
Social Determinants of Health
Natural Language Processing
Publications
2025
Social determinants of health extraction from clinical notes across institutions using large language models
Keloth V, Selek S, Chen Q, Gilman C, Fu S, Dang Y, Chen X, Hu X, Zhou Y, He H, Fan J, Wang K, Brandt C, Tao C, Liu H, Xu H. Social determinants of health extraction from clinical notes across institutions using large language models. Npj Digital Medicine 2025, 8: 287. PMID: 40379919, PMCID: PMC12084648, DOI: 10.1038/s41746-025-01645-8.Peer-Reviewed Original ResearchCitationsAltmetricBenchmarking large language models for biomedical natural language processing applications and recommendations
Chen Q, Hu Y, Peng X, Xie Q, Jin Q, Gilson A, Singer M, Ai X, Lai P, Wang Z, Keloth V, Raja K, Huang J, He H, Lin F, Du J, Zhang R, Zheng W, Adelman R, Lu Z, Xu H. Benchmarking large language models for biomedical natural language processing applications and recommendations. Nature Communications 2025, 16: 3280. PMID: 40188094, PMCID: PMC11972378, DOI: 10.1038/s41467-025-56989-2.Peer-Reviewed Original ResearchCitationsAltmetricMeSH Keywords and ConceptsConceptsLanguage modelNatural language processing applicationsBiomedical natural language processingMedical question answeringLanguage processing applicationsNatural language processingGrowth of biomedical literatureMissing informationFew-shotQuestion AnsweringZero-ShotKnowledge curationLanguage processingProcessing applicationsBioNLPBART modelPerformance gapBiomedical literatureGeneral domainTaskBenchmarksBERTInformationPerformanceLLMThe Development Landscape of Large Language Models for Biomedical Applications
Cao Z, Keloth V, Xie Q, Qian L, Liu Y, Wang Y, Shi R, Zhou W, Yang G, Zhang J, Peng X, Zhen E, Weng R, Chen Q, Xu H. The Development Landscape of Large Language Models for Biomedical Applications. Annual Review Of Biomedical Data Science 2025, 8: 251-274. PMID: 40169010, PMCID: PMC12372014, DOI: 10.1146/annurev-biodatasci-102224-074736.Peer-Reviewed Original ResearchCitationsConceptsLanguage modelTask-specific fine-tuningPrivacy concernsImprove data sharingComputational resourcesSpecialized medical applicationsBiomedical dataData sharingFine-tuningBiomedical literatureTransform healthcareModel accessDevelopment processMedical applicationsMultimodal integrationChatGPTPrivacyApplicationsModel characteristicsTrainingArchitectureLLMMedical researchMedical foundation large language models for comprehensive text analysis and beyond
Xie Q, Chen Q, Chen A, Peng C, Hu Y, Lin F, Peng X, Huang J, Zhang J, Keloth V, Zhou X, Qian L, He H, Shung D, Ohno-Machado L, Wu Y, Xu H, Bian J. Medical foundation large language models for comprehensive text analysis and beyond. Npj Digital Medicine 2025, 8: 141. PMID: 40044845, PMCID: PMC11882967, DOI: 10.1038/s41746-025-01533-1.Peer-Reviewed Original ResearchCitationsAltmetricConceptsText analysis tasksAnalysis tasksLanguage modelDomain-specific knowledgeZero-ShotHuman evaluationSupervised settingTask-specific instructionsClinical data sourcesSpecialized medical knowledgeChatGPTText analysisPretrainingTaskData sourcesMedical applicationsMedical knowledgeEnhanced performanceTextPerformance
2024
Detection of Gastrointestinal Bleeding With Large Language Models to Aid Quality Improvement and Appropriate Reimbursement
Zheng N, Keloth V, You K, Kats D, Li D, Deshpande O, Sachar H, Xu H, Laine L, Shung D. Detection of Gastrointestinal Bleeding With Large Language Models to Aid Quality Improvement and Appropriate Reimbursement. Gastroenterology 2024, 168: 111-120.e4. PMID: 39304088, DOI: 10.1053/j.gastro.2024.09.014.Peer-Reviewed Original ResearchCitationsAltmetricConceptsElectronic health recordsOvert gastrointestinal bleedingGastrointestinal bleedingRecurrent bleedingMachine learning modelsHealth recordsClinically relevant applicationsNursing notesLanguage modelAcute gastrointestinal bleedingQuality improvementLearning modelsDetection of gastrointestinal bleedingReimbursementIdentification of clinical conditionsSeparate hospitalsQuality measuresHospitalBleedingClinical conditionsPatient managementEarly identificationPatientsReimbursement codesCoding algorithmA Study of Biomedical Relation Extraction Using GPT Models.
Zhang J, Wibert M, Zhou H, Peng X, Chen Q, Keloth V, Hu Y, Zhang R, Xu H, Raja K. A Study of Biomedical Relation Extraction Using GPT Models. AMIA Joint Summits On Translational Science Proceedings 2024, 2024: 391-400. PMID: 38827097, PMCID: PMC11141827.Peer-Reviewed Original ResearchCitationsEnsemble pretrained language models to extract biomedical knowledge from literature
Li Z, Wei Q, Huang L, Li J, Hu Y, Chuang Y, He J, Das A, Keloth V, Yang Y, Diala C, Roberts K, Tao C, Jiang X, Zheng W, Xu H. Ensemble pretrained language models to extract biomedical knowledge from literature. Journal Of The American Medical Informatics Association 2024, 31: 1904-1911. PMID: 38520725, PMCID: PMC11339500, DOI: 10.1093/jamia/ocae061.Peer-Reviewed Original ResearchCitationsAltmetricConceptsNatural language processingNatural language processing systemsLanguage modelExpansion of biomedical literatureZero-shot settingManually annotated corpusKnowledge graph developmentTask-specific modelsDomain-specific modelsZero-ShotEntity recognitionBillion parametersEnsemble learningLocation informationKnowledge basesBiomedical entitiesLanguage processingFree textGraph developmentBiomedical conceptsAutomated techniqueBiomedical literatureDetection methodPredictive performanceBiomedical knowledgeAdvancing entity recognition in biomedicine via instruction tuning of large language models
Keloth V, Hu Y, Xie Q, Peng X, Wang Y, Zheng A, Selek M, Raja K, Wei C, Jin Q, Lu Z, Chen Q, Xu H. Advancing entity recognition in biomedicine via instruction tuning of large language models. Bioinformatics 2024, 40: btae163. PMID: 38514400, PMCID: PMC11001490, DOI: 10.1093/bioinformatics/btae163.Peer-Reviewed Original ResearchCitationsAltmetricConceptsNamed Entity RecognitionSequence labeling taskNatural language processingBiomedical NER datasetsLanguage modelNER datasetsEntity recognitionLabeling taskText generationField of natural language processingBiomedical NERFew-shot learning capabilityReasoning tasksMulti-domain scenariosDomain-specific modelsEnd-to-endMinimal fine-tuningSOTA performanceF1 scoreHealthcare applicationsBiomedical entitiesBiomedical domainLanguage processingMulti-taskingPubMedBERT modelFedFSA: Hybrid and federated framework for functional status ascertainment across institutions
Fu S, Jia H, Vassilaki M, Keloth V, Dang Y, Zhou Y, Garg M, Petersen R, St Sauver J, Moon S, Wang L, Wen A, Li F, Xu H, Tao C, Fan J, Liu H, Sohn S. FedFSA: Hybrid and federated framework for functional status ascertainment across institutions. Journal Of Biomedical Informatics 2024, 152: 104623. PMID: 38458578, PMCID: PMC11005095, DOI: 10.1016/j.jbi.2024.104623.Peer-Reviewed Original ResearchCitationsAltmetricConceptsNatural language processingElectronic health recordsStatus informationInformation extractionFunctional status informationRule-based information extractionFederated learning frameworkPrivate local dataNatural language processing frameworkHealthcare sitesPatient's functional statusMultiple healthcare institutionsFederated LearningPyTorch libraryConcept normalizationBERT modelLearning frameworkCollaborative development effortsCorpus annotationLanguage processingHealthcare institutionsFunctional statusPredictor of health outcomesActivities of daily livingNatural language processing performanceImproving large language models for clinical named entity recognition via prompt engineering
Hu Y, Chen Q, Du J, Peng X, Keloth V, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. Journal Of The American Medical Informatics Association 2024, 31: 1812-1820. PMID: 38281112, PMCID: PMC11339492, DOI: 10.1093/jamia/ocad259.Peer-Reviewed Original ResearchCitationsConceptsClinical NER tasksNER taskTask-specific promptsEntity recognitionLanguage modelTraining samplesState-of-the-art modelsFew-shot learningState-of-the-artMinimal training dataTask-specific knowledgeF1-socreAnnotated samplesConcept extractionModel performanceAnnotated datasetsTraining dataF1 scoreTask descriptionFormat specificationsComplex clinical dataOptimal performanceTaskEvaluation schemaGPT model
Academic Achievements & Community Involvement
Activities
activity American Medical Informatics Association (AMIA)
05/15/2018 - PresentProfessional OrganizationsMemberactivity Journal of Biomedical Informatics
06/30/2021 - PresentJournal ServiceRevieweractivity BMC Supplements
06/01/2021 - PresentJournal ServiceRevieweractivity AMIA DEI Communications Subcommittee
10/28/2021 - PresentProfessional OrganizationsMemberactivity JAMIA Open
04/03/2023 - PresentJournal ServiceReviewer
News
News
Get In Touch
Contacts
Locations
100 College Street
Academic Office
New Haven, CT 06510