Na Hong, PhD
Instructor of Biomedical Informatics and Data ScienceCards
Contact Info
Biomedical Informatics & Data Science
100 College Street
New Haven, CT 06510
United States
About
Copy Link
Titles
Instructor of Biomedical Informatics and Data Science
Biography
After obtaining my Ph.D. in Information Science and completing postdoctoral training in Medical Informatics, I have actively collaborated on interdisciplinary projects in the areas of Medicines and Informatics, which has enhanced my multidisciplinary background in medical informatics and digital health. My current research focuses on clinical information standards and standard-based data applications, involving data normalization, harmonization, ontology, and metadata development. I have also gained experience in medical literature mining, clinical predictive modeling using Electronic Health Records (EHRs), and clinical decision support systems. As a co-investigator or key researcher, I have contributed to several ongoing grants. I have also co-authored over 80 peer-reviewed journal articles, conference proceedings, and books.
Appointments
Biomedical Informatics & Data Science
InstructorPrimary
Other Departments & Organizations
- All Institutions
- Biomedical Informatics & Data Science
- Clinical NLP Lab
Education & Training
- Research Fellow
- Mayo Clinic (2018)
- PhD
- Chinese Academy of Sciences (2010)
Research
Copy Link
Overview
My recent research focuses on medical data standards, such as OHDSI, FHIR, i2b2, etc, and standard-based data applications, including data normalization, harmonization, and large-scale research networks. I also have study experience in clinical predictive modeling using EHRs data and clinical decision support systems.
Public Health Interests
ORCID
0000-0001-6798-1761
Research at a Glance
Yale Co-Authors
Publications Timeline
Hua Xu, PhD
Huan He, PhD
Kalpana Raja, PhD, MRSB, CSci
Publications
2025
AcuKG: a comprehensive knowledge graph for medical acupuncture
Li Y, Peng X, Peng S, Li J, Pei D, Zhang Q, Lu Y, Hu Y, Li F, Zhou L, He Y, Tao C, Xu H, Hong N. AcuKG: a comprehensive knowledge graph for medical acupuncture. Journal Of The American Medical Informatics Association 2025, 33: 359-370. PMID: 41124298, PMCID: PMC12844574, DOI: 10.1093/jamia/ocaf179.Peer-Reviewed Original ResearchCitationsMeSH Keywords and ConceptsConceptsAcupuncture researchComprehensive knowledge graphKnowledge graphAcupuncture applicationAcupuncture knowledgeMedical acupunctureRelevant acupointsAcupunctureComplementary therapyEntity recognitionQuestion-answeringUse casesOntology mappingMultiple ontologiesObesityChinese medicineSemantic retrievalOnline resourcesComputational frameworkData qualityAcupointsOntologyTraditional practicesClinical trialsMultiple sourcesExtracting language information from clinical notes using large language models
Qian L, Hong N, Zhou Y, Xie Q, Weng R, Chairuengjitjaras P, Du X, Lian J, Marshall G, Blackley S, Novoa-Laurentiev J, Quiroz Y, Kim T, Adams N, Dossett M, Zhou L, Xu H. Extracting language information from clinical notes using large language models. International Journal Of Medical Informatics 2025, 205: 106116. PMID: 40992205, PMCID: PMC12490899, DOI: 10.1016/j.ijmedinf.2025.106116.Peer-Reviewed Original ResearchCitationsMeSH Keywords and ConceptsConceptsLanguage informationLanguage modelElectronic health recordsField of electronic health recordsYale-New Haven HospitalNER frameworkZero-ShotEntity recognitionInformation extractionMIMIC datasetF1 scoreClinical narrativesPatient-provider communicationClinical notesPatient-centered careMIMIC-IIIEquitable healthcare deliveryService allocationSuperior performanceCross-site validationAutomated extractionOpen-source modelHealth recordsBERTPatients' language proficiencyPheCatcher: Leveraging LLM-Generated Synthetic Data for Automated Phenotype Definition Extraction from Biomedical Literature.
Hu Y, Hong N, Li Y, Peng X, Chen Y, Xu H. PheCatcher: Leveraging LLM-Generated Synthetic Data for Automated Phenotype Definition Extraction from Biomedical Literature. Studies In Health Technology And Informatics 2025, 329: 718-722. PMID: 40775952, DOI: 10.3233/shti250934.Peer-Reviewed Original ResearchCitationsMeSH Keywords and ConceptsConceptsNamed Entity RecognitionSynthetic dataRelation extractionInformation extractionF1 scorePotential of synthetic dataBiomedical literatureModel F1 scoresEntity recognitionDefinition extractionHuman annotatorsIE systemsKnowledge basesPhenotype definitionManual inputPhenotypeModel performanceAutomated pipelinePersonalized medicinePheKBStandard codePipelineOHDSIAnnotationCodeA comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients
Li Y, Li F, Hong N, Li M, Roberts K, Cui L, Tao C, Xu H. A comparative study of recent large language models on generating hospital discharge summaries for lung cancer patients. Journal Of Biomedical Informatics 2025, 168: 104867. PMID: 40544901, DOI: 10.1016/j.jbi.2025.104867.Peer-Reviewed Original ResearchCitationsAltmetricMeSH Keywords and ConceptsConceptsDischarge summariesHealthcare settingsClinical notesContinuity of careHospital discharge summariesCancer patientsLung cancer patientsPatient carePatient informationWorkflow efficiencySemantic similarity scoresClinical practiceCareHealthcareOverall qualityClinical narrativesManual evaluationLanguage modelEvaluation metricsScoresSimilarity scoresSummaryModel fine-tuningDecision-makingPatientsCDEMapper: enhancing National Institutes of Health common data element use with large language models
Wang Y, Huang J, He H, Zhang V, Zhou Y, Hao X, Ram P, Qian L, Xie Q, Weng R, Lin F, Hu Y, Cui L, Jiang X, Xu H, Hong N. CDEMapper: enhancing National Institutes of Health common data element use with large language models. Journal Of The American Medical Informatics Association 2025, 32: 1130-1139. PMID: 40332956, PMCID: PMC12202029, DOI: 10.1093/jamia/ocaf064.Peer-Reviewed Original ResearchCitationsAltmetricMeSH Keywords and ConceptsConceptsData elementsRecommendation accuracySemantic searchLanguage modelUsability testingManual annotationData interoperabilityHuman reviewEvaluation resultsBM25Map servicesResearch reproducibilityMapping toolNational InstituteCore moduleUsabilityDataEmbeddingStreamlined pipelineCDE recommendationElasticsearchInteroperabilityRankersUsersValue sets
2024
Relation extraction using large language models: a case study on acupuncture point locations
Li Y, Peng X, Li J, Zuo X, Peng S, Pei D, Tao C, Xu H, Hong N. Relation extraction using large language models: a case study on acupuncture point locations. Journal Of The American Medical Informatics Association 2024, 31: 2622-2631. PMID: 39208311, PMCID: PMC11491641, DOI: 10.1093/jamia/ocae233.Peer-Reviewed Original ResearchCitationsAltmetricConceptsAcupuncture point locationsAcupoint locationLocation of acupointsClinical decision supportAcupuncture knowledgeAcupuncture trainingAcupuncture therapyAcupunctureAcupointsComplementary medicineEducational moduleWestern Pacific RegionInformatics applicationsDecision supportScoresGenerative Pre-trained TransformerWHO standardsF1 scoreLanguage modelPacific regionWHODomain-specific fine-tuningTrainingStudyMicro-averaged F1 scoreMedical Concept Normalization
Xu H, Demner Fushman D, Hong N, Raja K. Medical Concept Normalization. Cognitive Informatics In Biomedicine And Healthcare 2024, 137-164. DOI: 10.1007/978-3-031-55865-8_6.Peer-Reviewed Original ResearchCitationsConceptsConcept normalizationDeep learning-based techniquesMedical concept normalizationLearning-based techniquesContemporary machine learningRule-based methodologyAnnotated corpusNLP systemsMachine learningComputing applicationsBiomedical terminologiesNormalization approachStandardized terminologyOntologyTaskLearningMapping Study Variables to Common Data Elements Using GPT for Sheets: Towards Standardized Data Collection and Sharing
Ram P, Hong N, Xu H, Jiang X. Mapping Study Variables to Common Data Elements Using GPT for Sheets: Towards Standardized Data Collection and Sharing. 2024, 00: 320-325. DOI: 10.1109/ichi61247.2024.00048.Peer-Reviewed Original ResearchCitations
2023
Measuring the worldwide spread of COVID-19 using a comprehensive modeling method
Zhou X, Ma X, Gao S, Ma Y, Gao J, Jiang H, Zhu W, Hong N, Long Y, Su L. Measuring the worldwide spread of COVID-19 using a comprehensive modeling method. BMC Medical Informatics And Decision Making 2023, 21: 384. PMID: 37715170, PMCID: PMC10504693, DOI: 10.1186/s12911-023-02213-4.Peer-Reviewed Original ResearchCitationsAltmetricMeSH Keywords and ConceptsConceptsDecision-making supportComprehensive observationsTrend analysisMultiple mathematical modelsTrajectory modelCOVID-19 epidemic dataLogistic growth modelSpreading trendScalable analysis methodsComprehensive modeling methodModeling methodTrendsPublic health managementPeak timeSpread of COVID-19Outbreak levelsHealth managementPublic health interventionsGroup-based trajectory modelingGrowth modelPrediction modelEpidemic dataEpidemic trendDevelopment of a Natural Language Processing Tool to Extract Acupuncture Point Location Terms
Li Y, Peng X, Li J, Peng S, Pei D, Tao C, Xu H, Hong N. Development of a Natural Language Processing Tool to Extract Acupuncture Point Location Terms. 2023, 00: 344-351. DOI: 10.1109/ichi57859.2023.00053.Peer-Reviewed Original ResearchCitationsConceptsAcupuncture point locationsNatural language processingRecurrent neural networkConditional random fieldWorld Health OrganizationWorld Health Organization standardsNatural language processing toolsEffect of acupuncture therapyLocation informationAcupuncture researchAcupuncture therapyAcupoint locationRecurrent neural network modelDictionary lookup methodNatural language processing modelsDeep learning techniquesAcupunctureLanguage processing toolsWestern Pacific RegionFree-text formatInternational anatomical terminologyHealth OrganizationF1 scoreInformatics applicationsNeural network
Academic Achievements & Community Involvement
Copy Link
Teaching & Mentoring
Copy Link
Mentoring
Wei Pang
Graduate student2023 - 2024Yingxue Pan
PhD student2023 - 2024
News
Copy Link
News
Get In Touch
Copy Link
Contacts
Biomedical Informatics & Data Science
100 College Street
New Haven, CT 06510
United States
Locations
101 College Street
Academic Office
New Haven, CT 06510