Leveraging a large language model to predict protein phase transition: A physical, multiscale, and interpretable approach
Frank M, Ni P, Jensen M, Gerstein M. Leveraging a large language model to predict protein phase transition: A physical, multiscale, and interpretable approach. Proceedings Of The National Academy Of Sciences Of The United States Of America 2024, 121: e2320510121. PMID: 39110734, PMCID: PMC11331094, DOI: 10.1073/pnas.2320510121.Peer-Reviewed Original ResearchConceptsProtein phase transitionsAssociated with reduced gene expressionProtein structure predictionAlzheimer's disease-related proteinsDisease-related proteinsAlzheimer's diseaseProtein sequencesSequence variantsStructure predictionAmyloid aggregatesProtein designGene expressionAge-related diseasesNatural defense mechanismsSoluble stateProteinDefense mechanismsBiophysical featuresAlzheimerSequenceAmyloidVariantsExpressionLanguage modelComputational frameworkMolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations
Tang X, Tran A, Tan J, Gerstein M. MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations. Bioinformatics 2024, 40: i357-i368. PMID: 38940177, PMCID: PMC11256921, DOI: 10.1093/bioinformatics/btae260.Peer-Reviewed Original ResearchConceptsTransformer encoderDownstream tasksLanguage modelBiomedical textSelf-supervised pre-trainingExplicit 3D representationRepresentation improves performanceDeep learning modelsRepresentation of moleculesContrastive learningSupervisory signalExtract embeddingsRepresentation capabilityJoint representationBiomedical domainPre-trainingTextual dataLearning modelsMolecular representationsModel weightsJupyter NotebookStep-by-step guidanceEncodingProperty predictionStructural informationFAVOR-GPT: a generative natural language interface to whole genome variant functional annotations
Li T, Zhou H, Verma V, Tang X, Shao Y, Van Buren E, Weng Z, Gerstein M, Neale B, Sunyaev S, Lin X. FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations. Bioinformatics Advances 2024, 4: vbae143. PMID: 39387060, PMCID: PMC11461909, DOI: 10.1093/bioadv/vbae143.Peer-Reviewed Original ResearchVariant functional annotationFunctional annotationNatural language interfaceFunctional annotation dataDisease-associated variantsLanguage interfaceWhole genomeFunctional prioritizationGenomeUser promptsRetrieval frameworkLanguage modelRaw annotationsAnnotated dataAnnotationUsersRetrievalOnline resourcesChatbotInformation interpretationUsabilityVariantsDatabase