Arman Cohan
Assistant ProfessorCards
About
Research
Publications
2026
Investigating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries
Liu G, Li B, Cohan A, Walden W, Yang E. Investigating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries. Lecture Notes In Computer Science 2026, 16484: 360-370. DOI: 10.1007/978-3-032-21300-6_26.Peer-Reviewed Original Research
2025
From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations.
Wang B, Xia I, Zhang Y, Wang J, Ouyang F, Han S, Cohan A, Yu H, Yao Z. From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations. 2025, 2025: 10809-10833. PMID: 41799784, PMCID: PMC12961587, DOI: 10.18653/v1/2025.emnlp-main.548.Peer-Reviewed Original ResearchReal-world medical applicationsEntity extractionCode executionLanguage modelHuman evaluationMedical benchmarksEvaluation pipelineMedical calculatorsArithmetic computationsGranular frameworkReasoning failuresError analysis frameworkAnalysis frameworkBenchmarksExpert judgmentMedical applicationsBenchmarking practicesAccuracyNumerical toleranceFrameworkPerformanceDatasetDecision-makingComputerExecutionRisks of AI scientists: prioritizing safeguarding over autonomy
Tang X, Jin Q, Zhu K, Yuan T, Zhang Y, Zhou W, Qu M, Zhao Y, Tang J, Zhang Z, Cohan A, Greenbaum D, Lu Z, Gerstein M. Risks of AI scientists: prioritizing safeguarding over autonomy. Nature Communications 2025, 16: 8317. PMID: 40968279, PMCID: PMC12446425, DOI: 10.1038/s41467-025-63913-1.Peer-Reviewed Original ResearchA Large-Scale Study of Reranker Relevance Feedback at Inference
Reddy R, Dasigi P, Sultan A, Cohan A, Sil A, Ji H, Hajishirzi H. A Large-Scale Study of Reranker Relevance Feedback at Inference. 2025, 3010-3014. DOI: 10.1145/3726302.3730160.Peer-Reviewed Original ResearchMMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Zhao Y, Zhang H, Xie L, Hu T, Gan G, Long Y, Hu Z, Chen W, Li C, Xu Z, Wang C, Shangguan Z, Liang Z, Liu Y, Zhao C, Cohan A. MMVU: Measuring Expert-Level Multi-Discipline Video Understanding. 2025, 00: 8475-8489. DOI: 10.1109/cvpr52734.2025.00793.Peer-Reviewed Original ResearchVideo understandingExpert levelRelevant domain knowledgeIn-depth error analysisDomain-specific knowledgeVideo benchmarksDomain knowledgeHuman expertsHuman expertiseVideoSpecialized domainsVisual perceptionBenchmarksIn-depth analysisHigh performanceError analysisData quality controlHigh qualityCase studyDatasetTest modelDataFoundation modelModelKnowledgeRouterRetriever: Routing over a Mixture of Expert Embedding Models
Lee H, Soldaini L, Cohan A, Seo M, Lo K. RouterRetriever: Routing over a Mixture of Expert Embedding Models. Proceedings Of The AAAI Conference On Artificial Intelligence 2025, 39: 11995-12003. DOI: 10.1609/aaai.v39i11.33306.Peer-Reviewed Original ResearchEmbedding modelRouting mechanismGeneral domain datasetsMulti-task trainingDomain-specific dataInformation retrieval methodsMulti-task modelDomain-specific expertsExpert retrievalInformation retrievalLanguage modelRouting techniquesRetrieval modelUnderperforming modelsRetrieval methodRetrievalSpecialized domainsDatasetGeneration researchExpertsQueryInformationLanguageTrainingEmbeddingmFollowIR: A Multilingual Benchmark for Instruction Following in Retrieval
Weller O, Chang B, Yang E, Yarmohammadi M, Barham S, MacAvaney S, Cohan A, Soldaini L, Van Durme B, Lawrie D. mFollowIR: A Multilingual Benchmark for Instruction Following in Retrieval. Lecture Notes In Computer Science 2025, 15573: 295-310. DOI: 10.1007/978-3-031-88711-6_19.Peer-Reviewed Original ResearchFollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Weller O, Chang B, MacAvaney S, Lo K, Cohan A, Van Durme B, Lawrie D, Soldaini L. FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions. 2025, 11926-11942. DOI: 10.18653/v1/2025.naacl-long.597.Peer-Reviewed Original ResearchReIFE: Re-evaluating Instruction-Following Evaluation
Liu Y, Shi K, Fabbri A, Zhao Y, Wang P, Wu C, Joty S, Cohan A. ReIFE: Re-evaluating Instruction-Following Evaluation. 2025, 12247-12287. DOI: 10.18653/v1/2025.naacl-long.610.Peer-Reviewed Original ResearchSciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
Wadden D, Shi K, Morrison J, Li A, Naik A, Singh S, Barzilay N, Lo K, Hope T, Soldaini L, Shen S, Downey D, Hajishirzi H, Cohan A. SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature. 2025, 6083-6120. DOI: 10.18653/v1/2025.emnlp-main.310.Peer-Reviewed Original Research
News
News
Get In Touch
Contacts
Email