Na Hong, PhD
Instructor of Biomedical Informatics and Data ScienceCards
Contact Info
Biomedical Informatics & Data Science
100 College Street
New Haven, CT 06510
United States
About
Titles
Instructor of Biomedical Informatics and Data Science
Biography
After obtaining my Ph.D. in Information Science and completing postdoctoral training in Medical Informatics, I have actively collaborated on interdisciplinary projects in the areas of Medicines and Informatics, which has enhanced my multidisciplinary background in medical informatics and digital health. My current research focuses on clinical information standards and standard-based data applications, involving data normalization, harmonization, ontology, and metadata development. I have also gained experience in medical literature mining, clinical predictive modeling using Electronic Health Records (EHRs), and clinical decision support systems. As a co-investigator or key researcher, I have contributed to several ongoing grants. I have also co-authored over 80 peer-reviewed journal articles, conference proceedings, and books.
Appointments
Biomedical Informatics & Data Science
InstructorPrimary
Other Departments & Organizations
- Biomedical Informatics & Data Science
- Clinical NLP Lab
Education & Training
- Research Fellow
- Mayo Clinic (2018)
- PhD
- Chinese Academy of Sciences (2010)
Research
Overview
My recent research focuses on medical data standards, such as OHDSI, FHIR, i2b2, etc, and standard-based data applications, including data normalization, harmonization, and large-scale research networks. I also have study experience in clinical predictive modeling using EHRs data and clinical decision support systems.
Public Health Interests
Research at a Glance
Publications Timeline
Publications
2025
Extracting language information from clinical notes using large language models
Qian L, Hong N, Zhou Y, Xie Q, Weng R, Chairuengjitjaras P, Du X, Lian J, Marshall G, Blackley S, Novoa-Laurentiev J, Quiroz Y, Kim T, Adams N, Dossett M, Zhou L, Xu H. Extracting language information from clinical notes using large language models. International Journal Of Medical Informatics 2025, 205: 106116. PMID: 40992205, PMCID: PMC12490899, DOI: 10.1016/j.ijmedinf.2025.106116.Peer-Reviewed Original ResearchMeSH Keywords and ConceptsConceptsLanguage informationLanguage modelElectronic health recordsField of electronic health recordsYale-New Haven HospitalNER frameworkZero-ShotEntity recognitionInformation extractionMIMIC datasetF1 scoreClinical narrativesPatient-provider communicationClinical notesPatient-centered careMIMIC-IIIEquitable healthcare deliveryService allocationSuperior performanceCross-site validationAutomated extractionOpen-source modelHealth recordsBERTPatients' language proficiencyPheCatcher: Leveraging LLM-Generated Synthetic Data for Automated Phenotype Definition Extraction from Biomedical Literature.
Hu Y, Hong N, Li Y, Peng X, Chen Y, Xu H. PheCatcher: Leveraging LLM-Generated Synthetic Data for Automated Phenotype Definition Extraction from Biomedical Literature. Studies In Health Technology And Informatics 2025, 329: 718-722. PMID: 40775952, DOI: 10.3233/shti250934.Peer-Reviewed Original ResearchCitationsMeSH Keywords and ConceptsConceptsNamed Entity RecognitionSynthetic dataRelation extractionInformation extractionF1 scorePotential of synthetic dataBiomedical literatureModel F1 scoresEntity recognitionDefinition extractionHuman annotatorsIE systemsKnowledge basesPhenotype definitionManual inputPhenotypeModel performanceAutomated pipelinePersonalized medicinePheKBStandard codePipelineOHDSIAnnotationCode
2023
Measuring the worldwide spread of COVID-19 using a comprehensive modeling method
Zhou X, Ma X, Gao S, Ma Y, Gao J, Jiang H, Zhu W, Hong N, Long Y, Su L. Measuring the worldwide spread of COVID-19 using a comprehensive modeling method. BMC Medical Informatics And Decision Making 2023, 21: 384. PMID: 37715170, PMCID: PMC10504693, DOI: 10.1186/s12911-023-02213-4.Peer-Reviewed Original ResearchCitationsAltmetricMeSH Keywords and ConceptsConceptsDecision-making supportComprehensive observationsTrend analysisMultiple mathematical modelsTrajectory modelCOVID-19 epidemic dataLogistic growth modelSpreading trendScalable analysis methodsComprehensive modeling methodModeling methodTrendsPublic health managementPeak timeSpread of COVID-19Outbreak levelsHealth managementPublic health interventionsGroup-based trajectory modelingGrowth modelPrediction modelEpidemic dataEpidemic trend
2022
Comparison of nomogram and machine‐learning methods for predicting the survival of non‐small cell lung cancer patients
Lei H, Li X, Ma W, Hong N, Liu C, Zhou W, Zhou H, Gong M, Wang Y, Wang G, Wu Y. Comparison of nomogram and machine‐learning methods for predicting the survival of non‐small cell lung cancer patients. Cancer Innovation 2022, 1: 135-145. PMID: 38090651, PMCID: PMC10686174, DOI: 10.1002/cai2.24.Peer-Reviewed Original ResearchCitationsConceptsNon-small cell lung cancerNon-small cell lung cancer patientsOverall survival of NSCLC patientsSurvival of NSCLC patientsOverall survivalNSCLC patientsPrognostic assessmentAdvanced non-small cell lung cancerPrognostic assessment of NSCLC patientsCancer patientsSurvival of non-small cell lung cancer patientsCell lung cancer patientsSurvival statusOptimum treatment planPerformance of nomogramCell lung cancerFollow-up timeLung cancer patientsConventional nomogramMachine-learning modelsRetrospective cohortClinical prognosisPoor prognosisClinical dataLung cancerMachine learning‐based prognostic and metastasis models of kidney cancer
Zhang Y, Hong N, Huang S, Wu J, Gao J, Xu Z, Zhang F, Ma S, Liu Y, Sun P, Tang Y, Liu C, Shou J, Chen M. Machine learning‐based prognostic and metastasis models of kidney cancer. Cancer Innovation 2022, 1: 124-134. PMID: 38090650, PMCID: PMC10686164, DOI: 10.1002/cai2.22.Peer-Reviewed Original ResearchCitationsAltmetricConceptsMetastasis of kidney cancerKidney cancerKidney cancer patientsMetastatic patientsLogistic regression modelsMetastasis modelCancer patientsLocalized kidney cancerKidney cancer survivalUrinary system tumorsPatient survivalSystem tumorsRenal parenchymaClinical significanceMetastasisPatientsRegression modelsCancerCancer survivalMetastasis predictionSurvival predictionLogistic regressionEpithelial systemsSurvivalKidneyApplication of informatics in cancer research and clinical practice: Opportunities and challenges
Hong N, Sun G, Zuo X, Chen M, Liu L, Wang J, Feng X, Shi W, Gong M, Ma P. Application of informatics in cancer research and clinical practice: Opportunities and challenges. Cancer Innovation 2022, 1: 80-91. PMID: 38089452, PMCID: PMC10686161, DOI: 10.1002/cai2.9.Peer-Reviewed Original ResearchCitationsConceptsCancer informaticsApplication of informaticsClinical practiceCancer researchMachine-learning algorithmsData eraData miningInformatics perspectiveIntelligent imageCancer domainPatient outcomesCancer DatabaseData standardsInformatics challengesInformaticsTreatment decisionsCancerCancer fieldInformatics methodsClinical perspectiveInformatics approachesPracticeHybrid Methods of Bibliographic Coupling and Text Similarity Measurement for Biomedical Paper Recommendation.
Guo H, Shen Z, Zeng J, Hong N. Hybrid Methods of Bibliographic Coupling and Text Similarity Measurement for Biomedical Paper Recommendation. Studies In Health Technology And Informatics 2022, 290: 287-291. PMID: 35673019, DOI: 10.3233/shti220080.Peer-Reviewed Original ResearchCitationsMeSH Keywords and ConceptsConceptsText-based methodsBibliographic couplingPaper recommendationMethod of bibliographic couplingDocument-document similarityHybrid methodLinear evaluationRecommendation taskText clusteringKnowledge discoverySimilarity measureScience mappingBM25CosineEvaluation of textsPerformanceTRECActual dataTextSimilarityMethodTaskDocumentsInformationDevelopment and Validation of a Risk Prediction Model for Venous Thromboembolism in Lung Cancer Patients Using Machine Learning
Lei H, Zhang M, Wu Z, Liu C, Li X, Zhou W, Long B, Ma J, Zhang H, Wang Y, Wang G, Gong M, Hong N, Liu H, Wu Y. Development and Validation of a Risk Prediction Model for Venous Thromboembolism in Lung Cancer Patients Using Machine Learning. Frontiers In Cardiovascular Medicine 2022, 9: 845210. PMID: 35321110, PMCID: PMC8934875, DOI: 10.3389/fcvm.2022.845210.Peer-Reviewed Original ResearchCitationsAltmetricConceptsArea under the receiver operating characteristic curveIncidence of venous thromboembolismLung cancer patientsKarnofsky performance statusVenous thromboembolismCancer patientsRisk prediction modelLung cancerDiagnosis of venous thromboembolismHistory of venous thromboembolismData of lung cancer patientsOccurrence of venous thromboembolismReal-life clinical settingRecombinant human endostatinReceiver operating characteristic curveCancer HospitalImportance scoresPatient characteristicsOperating characteristics curveClinical settingEGFR-TKIsHospitalPrimary endpointPerformance statusPlatelet countState of the Art of Machine Learning–Enabled Clinical Decision Support in Intensive Care Units: Literature Review
Hong N, Liu C, Gao J, Han L, Chang F, Gong M, Su L. State of the Art of Machine Learning–Enabled Clinical Decision Support in Intensive Care Units: Literature Review. JMIR Medical Informatics 2022, 10: e28781. PMID: 35238790, PMCID: PMC8931648, DOI: 10.2196/28781.Peer-Reviewed Original ResearchCitationsAltmetricConceptsClinical decision support systemsClinical decision supportDecision support systemReinforcement learningEvaluation metricsMachine learning-based clinical decision support systemsDecision supportSupport systemMachine learning modelsDecision support studiesData modeling studiesSupervised learningUnsupervised learningMachine learningIdentification of clinical eventsIntensive care unitLearning modelsEvent identificationActive learningCare unitResearch topicEarly identificationMulticenter data setPoint of careData sets
2021
Early Prediction of Mortality, Severity, and Length of Stay in the Intensive Care Unit of Sepsis Patients Based on Sepsis 3.0 by Machine Learning Models
Su L, Xu Z, Chang F, Ma Y, Liu S, Jiang H, Wang H, Li D, Chen H, Zhou X, Hong N, Zhu W, Long Y. Early Prediction of Mortality, Severity, and Length of Stay in the Intensive Care Unit of Sepsis Patients Based on Sepsis 3.0 by Machine Learning Models. Frontiers In Medicine 2021, 8: 664966. PMID: 34291058, PMCID: PMC8288021, DOI: 10.3389/fmed.2021.664966.Peer-Reviewed Original ResearchCitationsAltmetricConceptsIntensive care unitClinical outcomesConclusions:Length of intensive care unitMethods:Clinical outcomes of patientsPeking Union Medical College HospitalResults:Intensive care unit admissionOutcomes of patientsMedical College HospitalMortality of patientsPredictive of mortalityEarly predictionSepsis/septic shockSOFA scoreEarly prediction of mortalityCollege HospitalSepsis patientsSepsisCare unitPatientsLogistic regressionClinical decisionsMortality
Teaching & Mentoring
Mentoring
Wei Pang
Graduate student2023 - 2024Yingxue Pan
PhD student2023 - 2024
News
News
Get In Touch
Contacts
Biomedical Informatics & Data Science
100 College Street
New Haven, CT 06510
United States
Locations
101 College Street
Academic Office
New Haven, CT 06510