Skip to Main Content

Team

  • Clinical NLP Lab

    • Robert T. McCluskey Professor of Biomedical Informatics and Data Science; Vice Chair for Research and Development, Department of Biomedical Informatics and Data Science; Associate Dean for Biomedical Informatics, Yale School of Medicine

      Research Interests
      • Natural Language Processing
      Dr. Hua Xu is a well-known researcher in clinical natural language processing (NLP). He has developed novel algorithms for important clinical NLP tasks such as entity recognition and relation extraction, which have been top ranked in over a dozen of international biomedical NLP challenges. His lab has developed CLAMP, a comprehensive clinical NLP toolkit that has been successfully commercialized and used by hundreds of healthcare organizations. Moreover, he has led multiple national/international initiatives (e.g., Chair of the NLP working group at Observational Health Data Sciences and Informatics - OHDSI program) to apply developed NLP technologies to diverse clinical and translational studies, thus greatly accelerating clinical evidence generation using electronic health records data. Recently, he also utilizes NLP to harmonize metadata of biomedical digital objects (e.g., indexing millions of biomedical datasets to make them findable), with the goal to promote FAIR principles in biomedicine. Currently Dr. Xu's lab is actively working on developing large language models (LLMs) for diverse biomedical applications. See more information about Dr. Xu's lab here.
    • Research Scientist in Biomedical Informatics and Data Science

      Dr. Huan He is a Research Scientist in the Section of Biomedical Informatics and Data Science at Yale University School of Medicine. His primary research areas revolve around visual analytics and their applications in healthcare-related research. Currently, his work is focused on designing and developing visual analytics systems using natural language processing (NLP) and machine learning (ML) technologies, with the goal of facilitating data exploration for health-related clinical questions. Before joining Yale University, Dr. Huan He served as a Research Fellow in the Department of Artificial Intelligence and Informatics at Mayo Clinic Rochester. During his time at Mayo Clinic, he led a living evidence synthesis project aimed at establishing a novel informatics infrastructure for providing living systematic reviews and meta-analyses through data visualization, NLP, and ML techniques. Notably, he contributed to an internal grant that utilized data visualization techniques to present national trends of COVID-19 for community surveillance. He also played a crucial role in the development of an OHNLP text annotation toolkit, which promotes privacy-preserved corpus development through a serverless architecture. In addition to his research contributions, Dr. Huan He is an active member of the medical informatics and NLP communities. He has been involved in organizing committees for the VAHC workshops and IEEE ICHI conferences. Furthermore, he serves as a regular reviewer for esteemed journals such as JBI, JMIR, TVCJ, and various international conferences, including IEEE VIS, IEEE ICHI, IEEE BIBM, AAAI, and AMIA.
    • Instructor of Biomedical Informatics and Data Science

      After obtaining my Ph.D. in Information Science and completing postdoctoral training in Medical Informatics, I have actively collaborated on interdisciplinary projects in the areas of Medicines and Informatics, which has enhanced my multidisciplinary background in medical informatics and digital health. My current research focuses on clinical information standards and standard-based data applications, involving data normalization, harmonization, ontology, and metadata development. I have also gained experience in medical literature mining, clinical predictive modeling using Electronic Health Records (EHRs), and clinical decision support systems. As a co-investigator or key researcher, I have contributed to several ongoing grants. I have also co-authored over 80 peer-reviewed journal articles, conference proceedings, and books.
    • Associate Research Scientist in Biomedical Informatics and Data Science

      Research Interests
      • Natural Language Processing
      • Biological Ontologies
      • Social Determinants of Health
      Dr. Vipina Keloth is an Associate Research Scientist at the Department of Biomedical Informatics and Data Science at Yale School of Medicine. Previously, she was a Postdoctoral Associate at Yale BIDS and prior to that a Postdoctoral Research Fellow at the School of Biomedical Informatics at the University of Texas Health Science Center at Houston. Vipina graduated with a doctoral degree in Computer Science from New Jersey Institute of Technology (NJIT) in 2021. She has also worked as an assistant lecturer in the Department of Mathematical and Computational Sciences at the National Institute of Technology Karnataka, India. Her research interests lie broadly in the domain of biomedical ontologies/terminologies and clinical and biomedical natural language processing.
    • Associate Research Scientist in Biomedical Informatics and Data Science

      Brian Ondov is an Associate Research Scientist at the Yale School of Medicine Department of Biomedical Informatics and Data Science. He earned his doctorate in Computer Science at University of Maryland College Park under a training award from the National Institutes of Health. His published research spans Computational Genomics, Machine Learning, Natural Language Processing, and Human-Computer Interaction. Drawing on this background, he is currently exploring how Large Language Models (LLMs) can help researchers and healthcare consumers interact with biomedical literature and knowledge sources.
    • Instructor of Biomedical Informatics and Data Science

      Research Interests
      • Medical Informatics
      • Natural Language Processing
      • Phosphorylation
      • Comorbidity
      • Machine Learning
      Kalpana Raja, PhD joined the Section of Biomedical Informatics & Data Science (BIDS) at Yale School of Medicine in February of 2023. Before moving to New Haven, CT, Kalpana worked as an assistant professor at the School of Biomedical Informatics, University of Texas Health Science Center (UTHealth) at Houston, TX. She also worked as a scientist at Sema4, a patient centered healthcare company located in Stamford, CT. Kalpana completed her bachelor’s degree in pharmacy from Tamil Nadu Dr. M.G.R. Medical University at Chennai, India. She is a registered pharmacist with the Indian Pharmacy Council. With a vision to develop software for biological applications, she completed her master’s degree in computing with a focus in software technology from The Robert Gordon University, Aberdeen, UK. She developed ProfileSKiM, an intelligent document retrieval tool, and submitted the findings in her MSc thesis. ProfileSKiM received a reward from the Robert Gordon University in 2005 and the Technology Award from the British Computer Society, London, UK in 2006. Kalpana completed her second master’s degree in bioinformatics and her PhD in computing: software technology – bioinformatics (inter-disciplinary) from Bharathiar University, Coimbatore, India. She presented her findings from the PhD research work at the 2012 Asia Pacific Bioinformatics Conference (APBC) held in Melbourne, Australia, and BioCreative Conference V held at Washington DC. Kalpana’s research interests include natural language processing (NLP) and machine learning. She developed methodologies and software for information retrieval, information extraction, knowledge summarization, literature-based discovery, and automated hypothesis generation. She applied her approaches on various biological domains such as protein-protein interaction, protein phosphorylation, drug-drug interactions, adverse drug events, drug repurposing, and disease comorbidity. She also provided the NLP support for various genomics and transcriptomics projects. Kalpana has published more than 70 articles in peer reviewed journals, books, and conference proceedings. She has reviewed several research articles submitted to prestigious journals such as Briefings in Bioinformatics and serves as an associate editor in the Journal of Embryology & Stem Cell Research. Kalpana was elected as a “Member of Royal Society of Biology” (MRSB) in 2019 by the Royal Society of Biology, London, UK. Recently, she was honored as the “Chartered Scientist” (CSci) by the Royal Society of Biology, London, UK. Kalpana also received the “2019 Women Scientist Award” from the Society for Bioinformatics and Biological Sciences, a non-profit professional society based in India. Areas of Expertise Natural Language Processing Artificial Intelligence (AI)Large Language Models (LLMs) Deep LearningMachine Learning Biomedical informatics Google scholar https://scholar.google.com/cit...
  • Staff

  • Postdocs

    • Postdoctoral Associate in Biomedical Informatics and Data Science

      Dr. Chia-Hsuan Chang is a postdoctoral associate in the Clinical NLP lab, led by Dr. Hua Xu, at the Department of Biomedical Informatics and Data Science at Yale School of Medicine. His research focuses on natural language processing, data science, and large language models, particularly their applications in enhancing healthcare. He obtained a Ph.D. in information management from National Sun Yat-sen University, under the advice of Dr. San-Yih Hwang. Previously, he was a postdoctoral researcher in the College of Computing & Informatics at Drexel University, mentored by Dr. Christopher Yang.
    • Postdoctoral Associate in Biomedical Informatics and Data Science

      Dr. Mauro Giuffrè is a physician–scientist and Postdoctoral Associate at Yale School of Medicine’s Department of Biomedical Informatics & Data Science (BIDS), where he is a member of the Clinical NLP Lab led by Prof. Hua Xu. His research sits at the intersection of hepatology and artificial intelligence, with two complementary lines: (i) development of non-invasive, machine-learning models for risk stratification and prediction of decompensation in advanced chronic liver disease; and (ii) design and validation of safe, explainable large language model systems for guideline-concordant clinical decision support. Prior to joining BIDS, Dr. Giuffrè worked with Prof. Dennis Shung in the Human + Artificial Intelligence in Medicine (HAIM) Lab at Yale. He received his medical degree (cum laude) from the University of Trieste, an MSc in Biostatistics for Clinical Research from the University of Padova, and completed a fellowship in Gastroenterology and Hepatology at the University of Trieste. His work has been published in peer-reviewed venues and presented at major digestive-disease conferences. In addition, Dr. Giuffrè serves on the European Association for the Study of the Liver (EASL) AI Task Force and the Italian Association for the Study of the Liver (AISF) AI Commission.
    • Postdoctoral Associate in Biomedical Informatics and Data Science

      Xiang Lan is a Postdoctoral Associate at the Yale School of Medicine Department of Biomedical Informatics and Data Science. He earned his Ph.D. from National University of Singapore (NUS), where he was awarded the Graduate Student Research Award. His research focuses on designing and applying models to address real-world healthcare challenges, with a long-term goal of building expert-level multimodal generalists to enhance clinical decision-making and improve patient care. His recent work concentrates on advancing Multimodal Large Language Models, leveraging their remarkable capacity for synergizing diverse modalities for reasoning and planning in clinical settings.
    • Postdoctoral Associate in Biomedical Informatics and Data Science

      Xueqing Peng, PhD, is a Postdoctoral Associate at the Section of Biomedical Informatics and Data Science, Yale School of Medicine. Previously, she was a Postdoctoral Research Fellow at the School of Biomedical Informatics, University of Texas Health Science Center at Houston. Xueqing graduated with a doctoral degree in Medical Systems Biology from Fudan University in 2022. Xueqing’s research interests include medical data science, machine learning, and natural language processing (NLP).
    • Postdoctoral Associate in Biomedical Informatics and Data Science

      Weipeng Zhou is currently a postdoctoral researcher at the Yale University School of Medicine, working under the mentorship of Professor Hua Xu. His research focuses on the pre-training and evaluation of large medical language models using electronic health records and medical claims data. He received his Ph.D. in Medical Informatics from the University of Washington and holds a Bachelor’s degree in Computer Science from the University of Wisconsin. During his doctoral studies, Weipeng applied natural language processing (NLP) and large language models (LLMs) to tackle critical healthcare challenges related to Long COVID, cardiovascular diseases, and suicide prevention. He also led a pilot grant project focused on characterizing and predicting Long COVID using NLP techniques. Currently, Weipeng's work includes the pretraining of clinical LLMs from scratch and identifying emerging water contaminants by analyzing millions PubMed articles. https://scholar.google.com/citations?user=5S-jczYAAAAJ&hl=en
  • Postgrads

  • Students

    • PhD Student, Computational Biology and Biomedical Informatics

      Kevin Jin is a third-year PhD student in the Interdepartmental Program in Computational Biology and Biomedical Informatics at Yale University. He is advised by Hua Xu in the Clinical NLP Lab, a research group in the Department of Biomedical Informatics and Data Science at Yale School of Medicine. He completed his undergraduate work at Johns Hopkins University, receiving a B.S. in Molecular and Cellular Biology in 2020. He is supported by the NSF Graduate Research Fellowship.