2025
1319-P: Leveraging Routinely Collected Health Data and Social and Behavioral Determinants of Health in Building Machine-Learning Tools for Diabetes Screening
DING Q, TU Z, WOJECK B, DOYLE-DELGADO K, ZHANG T. 1319-P: Leveraging Routinely Collected Health Data and Social and Behavioral Determinants of Health in Building Machine-Learning Tools for Diabetes Screening. Diabetes 2025, 74 DOI: 10.2337/db25-1319-p.Peer-Reviewed Original ResearchGraph neural networksMachine learningPre-diabetesGraph neural network modelRisk calculatorHealthcare access disparitiesIdentifying pre-diabetesSelf-reported healthAdvanced machine learningLUPI frameworkF1 scorePrivileged informationNeural networkML frameworkLifestyle dataUndiagnosed diabetesPerformance metricsHbA1c screeningDiabetes riskDiabetes screeningMachine-learning toolsAccess disparitiesNHANES dataBehavioral determinantsML modelsHuman vs. AI: A comparative effectiveness study of large language models for automated biomarker extraction.
Hung T, Shin E, Yu Y, Riaz N, Lee N, Kang J. Human vs. AI: A comparative effectiveness study of large language models for automated biomarker extraction. Journal Of Clinical Oncology 2025, 43: e13605-e13605. DOI: 10.1200/jco.2025.43.16_suppl.e13605.Peer-Reviewed Original ResearchHPV statusLanguage modelMemorial Sloan Kettering Cancer CenterHead & neck cancerNatural language processing capabilitiesCancer registry reportsProcessing timeLanguage processing capabilitiesIndividual patient careComparative effectiveness studiesManual extractionBiomarker dataP16 statusP16+Neck cancerPatient careEnhance scalabilityF1 scoreOncology populationHPVReal-time integrationScalable solutionRegistry reportsCancer CenterP16Improving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media
Li Y, Viswaroopan D, He W, Li J, Zuo X, Xu H, Tao C. Improving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media. Journal Of Biomedical Informatics 2025, 163: 104789. PMID: 39923968, DOI: 10.1016/j.jbi.2025.104789.Peer-Reviewed Original ResearchConceptsTraditional deep learning modelsDeep learning modelsRecurrent neural networkLearning modelsEntity recognitionLanguage modelF1 scoreEnsemble of deep learningAdvances of natural language processingEffectiveness of ensemble methodsMicro-averaged F1Bidirectional Encoder RepresentationsExtensive labeled dataNatural language processingFine-tuned modelsBiomedical text miningFeature representationEncoder RepresentationsEvent extractionEntity typesText dataDeep learningSequential dataGPT-2Neural networkUsing natural language processing to identify emergency department patients with incidental lung nodules requiring follow‐up
Moore C, Socrates V, Hesami M, Denkewicz R, Cavallo J, Venkatesh A, Taylor R. Using natural language processing to identify emergency department patients with incidental lung nodules requiring follow‐up. Academic Emergency Medicine 2025, 32: 274-283. PMID: 39821298, DOI: 10.1111/acem.15080.Peer-Reviewed Original ResearchNatural language processingIncidental lung nodulesFollow-upChest CTsCT reportsF1 scoreLung nodulesEmergency departmentLanguage processingFollow-up of incidental findingsIncidental findingNatural language processing developersAbsence of malignancyMetrics of precisionNatural language processing pipelineNatural language processing metricsChest CT reportsRecommended follow-upEmergency department patientsFollow-up rateLanguage modelLung cancerReduce errorsMalignancyDepartment patients
2024
Feature Selection and Machine Learning Approaches in Prediction of Current E-Cigarette Use Among U.S. Adults in 2022
Fang W, Liu Y, Xu C, Luo X, Wang K. Feature Selection and Machine Learning Approaches in Prediction of Current E-Cigarette Use Among U.S. Adults in 2022. International Journal Of Environmental Research And Public Health 2024, 21: 1474. PMID: 39595741, PMCID: PMC11594230, DOI: 10.3390/ijerph21111474.Peer-Reviewed Original ResearchConceptsSupport vector machineFeature selectionMachine learningRandom forestCollection of featuresMachine learning approachImbalance dataF1 scoreVector machineML techniquesLearning approachML toolsRelevant featuresPatient Health Questionnaire-4E-cigarette useML modelsML approachesRF algorithmRandom oversampling examplesMachineAlgorithmE-cigarettesSelection operatorLogistic regressionHealth Information National Trends SurveyValidating International Classification of Diseases Code 10th Revision algorithms for accurate identification of pulmonary embolism
Bikdeli B, Khairani C, Bejjani A, Lo Y, Mahajan S, Caraballo C, Jimenez J, Krishnathasan D, Zarghami M, Rashedi S, Jimenez D, Barco S, Secemsky E, Klok F, Hunsaker A, Aghayev A, Muriel A, Hussain M, Appah-Sampong A, Lu Y, Lin Z, Mojibian H, Aneja S, Khera R, Konstantinides S, Goldhaber S, Wang L, Zhou L, Monreal M, Piazza G, Krumholz H, Investigators P. Validating International Classification of Diseases Code 10th Revision algorithms for accurate identification of pulmonary embolism. Journal Of Thrombosis And Haemostasis 2024, 23: 556-564. PMID: 39505153, DOI: 10.1016/j.jtha.2024.10.013.Peer-Reviewed Original ResearchDischarge codesInternational ClassificationICD-10Yale New Haven Health SystemPositive predictive valueMass General Brigham hospitalsAccuracy of ICD-10ICD-10 codesPulmonary embolismHealth systemImage codingElectronic databasesF1 scorePre-specified protocolExcellent positive predictive valueIndependent physiciansHighest F1 scoreIdentification of pulmonary embolismAcute pulmonary embolismSecondary codePE codesScoresIdentified PERevised algorithmIntegrating Multimodal Affective Signals for Stress Detection from Audio-Visual Data
Ghose D, Gitelson O, Scassellati B. Integrating Multimodal Affective Signals for Stress Detection from Audio-Visual Data. 2024, 22-32. DOI: 10.1145/3678957.3685717.Peer-Reviewed Original ResearchAudio-visual dataLate fusionStress detectionF1 scoreMultimodal fusion methodDetect stressHuman emotional expressionsAblation studiesUnimodal classifiersEDA sensorsFusion methodIntermediate fusionFusion techniqueFacial expressionsModes of informationVideo clipsTextual sentimentEveryday environmentsReal-world settingsAudio-visualTraditional methodsVocal prosodySensorMeasurement of physiological responsesClassifierUncertainty-Aware Deep Learning Characterization of Knee Radiographs for Large-Scale Registry Creation
Mulford K, Grove A, Kaji E, Rouzrokh P, Roman R, Kremers M, Maradit Kremers H, Taunton M, Wyles C. Uncertainty-Aware Deep Learning Characterization of Knee Radiographs for Large-Scale Registry Creation. The Journal Of Arthroplasty 2024, 40: 1232-1238. PMID: 39477040, PMCID: PMC11985313, DOI: 10.1016/j.arth.2024.10.103.Peer-Reviewed Original ResearchConceptsObject detection modelDetection modelF1 scoreConformal predictionAverage precisionOut-of-domain imagesOut-of-domainPer-class F1-scoresUncertainty-awareMultilabel classifierEfficientNet modelIngestion pipelineLabel outputsClassification modelClassifierDomain detectionHeld-outMultilabelUncertainty quantificationModel performanceKnee imagesEfficientNetLarge-scaleHardwarePrecisionSEETrials: Leveraging large language models for safety and efficacy extraction in oncology clinical trials
Lee K, Paek H, Huang L, Hilton C, Datta S, Higashi J, Ofoegbu N, Wang J, Rubinstein S, Cowan A, Kwok M, Warner J, Xu H, Wang X. SEETrials: Leveraging large language models for safety and efficacy extraction in oncology clinical trials. Informatics In Medicine Unlocked 2024, 50: 101589. PMID: 39493413, PMCID: PMC11530223, DOI: 10.1016/j.imu.2024.101589.Peer-Reviewed Original ResearchAntibody-drug conjugatesOverall response rateMultiple myelomaF1 scoreCAR-TComplete responseBispecific antibodiesComparative performance analysisClinical trial studyClinical trial outcomesLanguage modelAccurate data extractionTherapy subgroupFine granularityOncology clinical trialsAdverse eventsClinical decision-makingPerformance analysisClinical trialsInnovative therapiesDiverse therapiesClinical trial abstractsCancer domainData elementsTherapyAugmenting biomedical named entity recognition with general-domain resources
Yin Y, Kim H, Xiao X, Wei C, Kang J, Lu Z, Xu H, Fang M, Chen Q. Augmenting biomedical named entity recognition with general-domain resources. Journal Of Biomedical Informatics 2024, 159: 104731. PMID: 39368529, DOI: 10.1016/j.jbi.2024.104731.Peer-Reviewed Original ResearchBioNER datasetsMulti-task learningNER datasetsEntity typesBiomedical datasetsBaseline modelGeneral domain datasetsBiomedical language modelNeural network-basedYield performance improvementsBioNER modelsEntity recognitionBiomedical corporaHuman annotatorsLabel ambiguityLanguage modelTransfer learningF1 scoreBioNERHuman effortNetwork-basedBiomedical resourcesPerformance improvementDatasetSuperior performanceRelation extraction using large language models: a case study on acupuncture point locations
Li Y, Peng X, Li J, Zuo X, Peng S, Pei D, Tao C, Xu H, Hong N. Relation extraction using large language models: a case study on acupuncture point locations. Journal Of The American Medical Informatics Association 2024, 31: 2622-2631. PMID: 39208311, PMCID: PMC11491641, DOI: 10.1093/jamia/ocae233.Peer-Reviewed Original ResearchAcupuncture point locationsAcupoint locationLocation of acupointsClinical decision supportAcupuncture knowledgeAcupuncture trainingAcupuncture therapyAcupunctureAcupointsComplementary medicineEducational moduleWestern Pacific RegionInformatics applicationsDecision supportScoresGenerative Pre-trained TransformerWHO standardsF1 scoreLanguage modelPacific regionWHODomain-specific fine-tuningTrainingStudyMicro-averaged F1 scoreDeep learning classification of pediatric spinal radiographs for use in large scale imaging registries
Mulford K, Regan C, Todderud J, Nolte C, Pinter Z, Chang-Chien C, Yan S, Wyles C, Khosravi B, Rouzrokh P, Maradit Kremers H, Larson A. Deep learning classification of pediatric spinal radiographs for use in large scale imaging registries. Spine Deformity 2024, 12: 1607-1614. PMID: 39039392, DOI: 10.1007/s43390-024-00933-9.Peer-Reviewed Original ResearchConceptsTest setConvolutional neural network classifierNeural network classifierDeep learning classifierDeep learning classificationNetwork classifierData ingestionF1 scoreLearning classifiersModel architecturePerformance metricsPicture ArchivingTraining setClassifierAutomatic systemSpine radiographsOverall accuracyLateral imagesArchitectureAP imagesImage registryPediatric scoliosis patientsLateral spine radiographsAccuracyRadiographs of patientsEnhancing post-traumatic stress disorder patient assessment: leveraging natural language processing for research of domain criteria identification using electronic medical records
Miranda O, Kiehl S, Qi X, Brannock M, Kosten T, Ryan N, Kirisci L, Wang Y, Wang L. Enhancing post-traumatic stress disorder patient assessment: leveraging natural language processing for research of domain criteria identification using electronic medical records. BMC Medical Informatics And Decision Making 2024, 24: 154. PMID: 38835009, PMCID: PMC11151516, DOI: 10.1186/s12911-024-02554-8.Peer-Reviewed Original ResearchConceptsNatural language processingPost-traumatic stress disorderPost-traumatic stress disorder patientsRDoC domainsLanguage processingAbnormal instancesLeverage natural language processingDiagnosis of post-traumatic stress disorderDomain criteriaSimilarity threshold valueF1-macro scorePositive valence systemsHeightened cue reactivityElectronic medical recordsClinical notesF1 scoreCue reactivityStress disorderRDoCValence systemsReal-timeMental health improvementDisease trajectoryExtraction researchSensorimotor disturbancesGPT-4 Performance for Neurologic Localization
Lee J, Choi E, McDougal R, Lytton W. GPT-4 Performance for Neurologic Localization. Neurology Clinical Practice 2024, 14: e200293. PMID: 38596779, PMCID: PMC11003355, DOI: 10.1212/cpj.0000000000200293.Peer-Reviewed Original ResearchGenerative Pretrained TransformerText classificationText datasetsClass labelsPretrained TransformerLanguage modelF1 scorePerformance metricsKnowledge baseMetricsInadequate knowledge baseClassificationReduce health care disparitiesClinical reasoningCapabilityPerformanceTextHealth care disparitiesDatasetSoftwareAdvancing entity recognition in biomedicine via instruction tuning of large language models
Keloth V, Hu Y, Xie Q, Peng X, Wang Y, Zheng A, Selek M, Raja K, Wei C, Jin Q, Lu Z, Chen Q, Xu H. Advancing entity recognition in biomedicine via instruction tuning of large language models. Bioinformatics 2024, 40: btae163. PMID: 38514400, PMCID: PMC11001490, DOI: 10.1093/bioinformatics/btae163.Peer-Reviewed Original ResearchNamed Entity RecognitionSequence labeling taskNatural language processingBiomedical NER datasetsLanguage modelNER datasetsEntity recognitionLabeling taskText generationField of natural language processingBiomedical NERFew-shot learning capabilityReasoning tasksMulti-domain scenariosDomain-specific modelsEnd-to-endMinimal fine-tuningSOTA performanceF1 scoreHealthcare applicationsBiomedical entitiesBiomedical domainLanguage processingMulti-taskingPubMedBERT modelOptimized Multilayer Perceptron for Sensorimotor Functional Mapping Based on a Few Minutes of Intracranial Electroencephalogram Data
Iktimal A, Spencer D, Alkawadri R. Optimized Multilayer Perceptron for Sensorimotor Functional Mapping Based on a Few Minutes of Intracranial Electroencephalogram Data. Annals Of Neurology 2024, 96: 187-193. PMID: 38506405, DOI: 10.1002/ana.26915.Peer-Reviewed Original ResearchMultilayer perceptronTanh activation functionArea under the curveMultilayer Perceptron performanceOptimal multilayer perceptronImbalanced dataF1 scoreFeature extensionNeural networkActivation functionElectroencephalogram-dataSensorimotor cortexIntractable epilepsy patientsPerceptronCentral sulcusCross-validationAnterior lipEpilepsy patientsGaussian distribution functionFunctional mappingNetworkMetricsUsing Computer Vision to Detect E-cigarette Content in TikTok Videos
Murthy D, Ouellette R, Anand T, Radhakrishnan S, Mohan N, Lee J, Kong G. Using Computer Vision to Detect E-cigarette Content in TikTok Videos. Nicotine & Tobacco Research 2024, 26: s36-s42. PMID: 38366342, PMCID: PMC10873490, DOI: 10.1093/ntr/ntad184.Peer-Reviewed Original ResearchConceptsE-cigarette-related contentSocial media platformsVideo-based social media platformsMedia platformsComputer vision modelsAverage F1 scoreComputer vision methodsComputer vision techniquesMachine learning modelsText-based approachComputer visionObject detectionAnnotated imagesVisual contentSocial mediaF1 scoreVision techniquesRecall valuesVision methodsVision modelsSocial media platformLearning modelsVideoComputerTikTok postsImproving large language models for clinical named entity recognition via prompt engineering
Hu Y, Chen Q, Du J, Peng X, Keloth V, Zuo X, Zhou Y, Li Z, Jiang X, Lu Z, Roberts K, Xu H. Improving large language models for clinical named entity recognition via prompt engineering. Journal Of The American Medical Informatics Association 2024, 31: 1812-1820. PMID: 38281112, PMCID: PMC11339492, DOI: 10.1093/jamia/ocad259.Peer-Reviewed Original ResearchClinical NER tasksNER taskTask-specific promptsEntity recognitionLanguage modelTraining samplesState-of-the-art modelsFew-shot learningState-of-the-artMinimal training dataTask-specific knowledgeF1-socreAnnotated samplesConcept extractionModel performanceAnnotated datasetsTraining dataF1 scoreTask descriptionFormat specificationsComplex clinical dataOptimal performanceTaskEvaluation schemaGPT model
2023
AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models
Datta S, Lee K, Paek H, Manion F, Ofoegbu N, Du J, Li Y, Huang L, Wang J, Lin B, Xu H, Wang X. AutoCriteria: a generalizable clinical trial eligibility criteria extraction system powered by large language models. Journal Of The American Medical Informatics Association 2023, 31: 375-385. PMID: 37952206, PMCID: PMC10797270, DOI: 10.1093/jamia/ocad218.Peer-Reviewed Original ResearchConceptsLanguage modelInformation extraction systemOverall F1 scoreCriteria informationF1 scoreManual annotationScalable solutionContextual informationComplex scenariosContextual attributesExtraction systemReal-world settingsSystem evaluationModeling capabilitiesClinical trial protocol documentsInformationProtocol documentsClinical Text Reports to Stratify Patients Affected with Myeloid Neoplasms Using Natural Language Processing
Asti G, Sauta E, Curti N, Carlini G, Dall'Olio L, Lanino L, Maggioni G, Campagna A, Ubezio M, Russo A, Todisco G, Tentori C, Morandini P, Bicchieri M, Grondelli M, Zampini M, Travaglino E, Savevski V, Derus N, Dall'Olio D, Sala C, Zhao L, Santoro A, Kordasti S, Santini V, Kubasch A, Platzbecker U, Diez-Campelo M, Fenaux P, Zeidan A, Haferlach T, Castellani G, Della Porta M, D'Amico S. Clinical Text Reports to Stratify Patients Affected with Myeloid Neoplasms Using Natural Language Processing. Blood 2023, 142: 122. DOI: 10.1182/blood-2023-188292.Peer-Reviewed Original ResearchNatural language processingLanguage modelArtificial intelligenceF1 scoreLanguage processingData layersPre-trained language modelsPre-trained modelsMultimodal patient dataBidirectional Encoder RepresentationsRelevant informationTransformer frameworkText embeddingsNLP technologyDomain adaptationTraining dataClinical textEncoder RepresentationsText representationNumerical embeddingText reportsText sentencesInformation layersKey technologiesMasked tokens
This site is protected by hCaptcha and its Privacy Policy and Terms of Service apply