Skip to Main Content

Team

  • Clinical NLP Lab

    • Robert T. McCluskey Professor of Biomedical Informatics and Data Science; Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science; Associate Dean for Biomedical Informatics, Yale School of Medicine; Director, CBB MS Program , Biomedical Informatics & Data Science; Professor, Computer Science

      Research Interests
      • Natural Language Processing
      Dr. Hua Xu is Robert T. McCluskey Professor and Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science at Yale School of Medicine (YSM). He also serves as Associate Dean for Biomedical Informatics at YSM. He received his Ph.D. in Biomedical Informatics from Columbia University. His primary research interests include biomedical natural language processing (NLP), large language models (LLMs), and AI agents, as well as their applications in clinical practice and biomedical research. His research is funded by multiple agencies (i.e., NLM, NCI, NIGMS, NIA, AHA, and CPRIT), and methods/tools developed in his lab have been widely used to support diverse biomedical applications. Dr. Xu is a fellow of both the American College of Medical Informatics (ACMI) and the International Academy of Health Sciences Informatics (IAHSI). See more information about Dr. Xu's lab here.
    • Assistant Professor

      After obtaining my Ph.D. in Information Science and completing postdoctoral training in Medical Informatics, I have actively collaborated on interdisciplinary projects in the areas of Medicines and Informatics, which has enhanced my multidisciplinary background in medical informatics and digital health. My current research focuses on clinical information standards and standard-based data applications, involving data normalization, harmonization, ontology, and metadata development. I have also gained experience in medical literature mining, clinical predictive modeling using Electronic Health Records (EHRs), and clinical decision support systems. As a co-investigator or key researcher, I have contributed to several ongoing grants. I have also co-authored over 80 peer-reviewed journal articles, conference proceedings, and books.
    • Associate Research Scientist in Biomedical Informatics and Data Science

      Research Interests
      • Natural Language Processing
      • Biological Ontologies
      • Social Determinants of Health
      Dr. Vipina Keloth is an Associate Research Scientist at the Department of Biomedical Informatics and Data Science at Yale School of Medicine. Previously, she was a Postdoctoral Associate at Yale BIDS and prior to that a Postdoctoral Research Fellow at the School of Biomedical Informatics at the University of Texas Health Science Center at Houston. Vipina graduated with a doctoral degree in Computer Science from New Jersey Institute of Technology (NJIT) in 2021. She has also worked as an assistant lecturer in the Department of Mathematical and Computational Sciences at the National Institute of Technology Karnataka, India. Her research interests lie broadly in the domain of biomedical ontologies/terminologies and clinical and biomedical natural language processing.
    • Associate Research Scientist in Biomedical Informatics and Data Science

      Brian Ondov is an Associate Research Scientist at the Yale School of Medicine Department of Biomedical Informatics and Data Science. He earned his doctorate in Computer Science at University of Maryland College Park under a training award from the National Institutes of Health. His published research spans Computational Genomics, Machine Learning, Natural Language Processing, and Human-Computer Interaction. Drawing on this background, he is currently exploring how Large Language Models (LLMs) can help researchers and healthcare consumers interact with biomedical literature and knowledge sources.
    • Instructor of Biomedical Informatics and Data Science

      Research Interests
      • Medical Informatics
      • Natural Language Processing
      • Phosphorylation
      • Comorbidity
      • Machine Learning
      Professional Summary Dr. Kalpana Raja is an interdisciplinary computer scientist, molecular informatics expert, and a Research Scientist in the Department of Biomedical Informatics & Data Science (BIDS) at the Yale School of Medicine. Since joining the department in February 2023, her research has focused on the cutting-edge intersection of natural language processing (NLP), large language models (LLM), artificial intelligence (AI), and data harmonization & standardization, with a core emphasis on biomedical text extraction, automated knowledge curation, and the advancement of open science. Current Research & Initiatives At Yale, Dr. Raja focuses on the development of Python-based NLP pipelines, deep learning paradigms, and large language model (LLM) workflows to accelerate biomedical discoveries. A strong advocate for the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles, she is currently leading an NIH SBIR grant to evaluate the reliability of immunology datasets using information extraction and sentiment analysis. She also serves as a key co-investigator and point person within the Data Coordination Center (DCC) for the $7.88 million NIH-funded IMPACT-MH program, managing informatics infrastructure to build standard data representation models. Additionally, in collaboration with Dr. Hua Xu and Dr. Lucila Ohno-Machado, Dr. Raja won Phase 1 of an NIH-organized challenge for proposing the S-index—a refined data-sharing metric designed to reward the explicit reuse of biomedical datasets—and she is currently leading the technical development of its web platform. Academic Background & Past Experience Before moving to New Haven, Dr. Raja served as an Assistant Professor at the School of Biomedical Informatics at UTHealth Houston and as a scientist at Sema4, a patient-centered healthcare data company in Stamford, CT. Dr. Raja’s unique interdisciplinary foundation begins with a bachelor’s degree in pharmacy from Tamil Nadu Dr. M.G.R. Medical University in Chennai, India, where she is also a registered pharmacist. Driven by a vision to develop software for biological applications, she earned a master’s degree in Computing (Software Technology) from The Robert Gordon University in Aberdeen, UK. During this time, she developed ProfileSKiM, an intelligent document retrieval tool that won a university reward in 2005 and the prestigious Technology Award from the British Computer Society in 2006. She subsequently completed a second master’s degree in Bioinformatics and an interdisciplinary PhD in Computing (Software Technology – Bioinformatics) from Bharathiar University in Coimbatore, India. She presented her doctoral findings at the 2012 Asia Pacific Bioinformatics Conference (APBC) in Melbourne and BioCreative Conference V in Washington, DC. Publications & Recognition Dr. Raja’s methodologies for information retrieval, literature-based discovery, and automated hypothesis generation have been applied across diverse biological domains, including protein-protein interactions, drug repurposing, and disease comorbidities. She has published over 100 articles in peer-reviewed journals, books, and conference proceedings, and frequently provides NLP support for complex genomics and transcriptomics projects. Recognized globally for her scientific contributions, she was elected as a Member of the Royal Society of Biology (MRSB) in 2019 and honored as a Chartered Scientist (CSci) in 2022 by the Royal Society of Biology, London. She is also the recipient of the 2019 Women Scientist Award from the Society for Bioinformatics and Biological Sciences, a non-profit organization in India. Areas of Expertise Natural Language Processing (NLP)Artificial Intelligence (AI) & Large Language Models (LLMs)Machine Learning & Deep LearningBiomedical Informatics & Literature MiningData Harmonization & StandardizationFAIR Data Principles & Data Quality Google scholar https://scholar.google.com/cit...
  • Staff

  • Postdocs

    • Postdoctoral Associate in Biomedical Informatics and Data Science

      Dr. Chia-Hsuan Chang is a postdoctoral associate in the Clinical NLP lab, led by Dr. Hua Xu, at the Department of Biomedical Informatics and Data Science at Yale School of Medicine. His research focuses on natural language processing, data science, and large language models, particularly their applications in enhancing healthcare. He obtained a Ph.D. in information management from National Sun Yat-sen University, under the advice of Dr. San-Yih Hwang. Previously, he was a postdoctoral researcher in the College of Computing & Informatics at Drexel University, mentored by Dr. Christopher Yang.
    • Postdoctoral Associate in Biomedical Informatics and Data Science

      Xiang Lan is a Postdoctoral Associate at the Yale School of Medicine Department of Biomedical Informatics and Data Science. He earned his Ph.D. from National University of Singapore (NUS), where he was awarded the Graduate Student Research Award. His research focuses on designing and applying models to address real-world healthcare challenges, with a long-term goal of building expert-level multimodal generalists to enhance clinical decision-making and improve patient care. His recent work concentrates on advancing Multimodal Large Language Models, leveraging their remarkable capacity for synergizing diverse modalities for reasoning and planning in clinical settings.
    • Postdoctoral Associate in Biomedical Informatics and Data Science

      Xueqing Peng, PhD, is a Postdoctoral Associate at the Section of Biomedical Informatics and Data Science, Yale School of Medicine. Previously, she was a Postdoctoral Research Fellow at the School of Biomedical Informatics, University of Texas Health Science Center at Houston. Xueqing graduated with a doctoral degree in Medical Systems Biology from Fudan University in 2022. Xueqing’s research interests include medical data science, machine learning, and natural language processing (NLP).
    • Postdoctoral Associate in Biomedical Informatics and Data Science

      Jingyi Zhang is a Postdoctoral Associate at the Yale School of Medicine Department of Biomedical Informatics and Data Science. She received her PhD from the College of Computing and Data Science at Nanyang Technological University (NTU), Singapore. Her research is centered on multimodal AI, with a long-term goal of developing intelligent multimodal systems that can perceive and reason over complex real-world information in a safe and reliable way, especially in high-stakes domains such as medicine and healthcare. Her recent research has focused on Multimodal Large Language Models, with an emphasis on enabling them to integrate diverse data modalities for clinical reasoning and accurate diagnosis.
    • Postdoctoral Associate in Biomedical Informatics and Data Science

      Weipeng Zhou is currently a postdoctoral researcher at the Yale University School of Medicine, working under the mentorship of Professor Hua Xu. His research focuses on the pre-training and evaluation of large medical language models using electronic health records and medical claims data. He received his Ph.D. in Medical Informatics from the University of Washington and holds a Bachelor’s degree in Computer Science from the University of Wisconsin. During his doctoral studies, Weipeng applied natural language processing (NLP) and large language models (LLMs) to tackle critical healthcare challenges related to Long COVID, cardiovascular diseases, and suicide prevention. He also led a pilot grant project focused on characterizing and predicting Long COVID using NLP techniques. Currently, Weipeng's work includes the pretraining of clinical LLMs from scratch and identifying emerging water contaminants by analyzing millions PubMed articles. https://scholar.google.com/citations?user=5S-jczYAAAAJ&hl=en
  • Postgrads

  • Students

    • PhD Student, Computational Biology and Biomedical Informatics

      Kevin Jin is a third-year PhD student in the Interdepartmental Program in Computational Biology and Biomedical Informatics at Yale University. He is advised by Hua Xu in the Clinical NLP Lab, a research group in the Department of Biomedical Informatics and Data Science at Yale School of Medicine. He completed his undergraduate work at Johns Hopkins University, receiving a B.S. in Molecular and Cellular Biology in 2020. He is supported by the NSF Graduate Research Fellowship.
    • Second Year PhD Student in Biomedical Informatics and Data Science Supervisor: Hua Xu, PhD, Robert T. McCluskey Professor of Biomedical Informatics and Data Science