2024
MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations
Tang X, Tran A, Tan J, Gerstein M. MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations. Bioinformatics 2024, 40: i357-i368. PMID: 38940177, PMCID: PMC11256921, DOI: 10.1093/bioinformatics/btae260.Peer-Reviewed Original ResearchConceptsTransformer encoderDownstream tasksLanguage modelBiomedical textSelf-supervised pre-trainingExplicit 3D representationRepresentation improves performanceDeep learning modelsRepresentation of moleculesContrastive learningSupervisory signalExtract embeddingsRepresentation capabilityJoint representationBiomedical domainPre-trainingTextual dataLearning modelsMolecular representationsModel weightsJupyter NotebookStep-by-step guidanceEncodingProperty predictionStructural information
2016
The real cost of sequencing: scaling computation to keep pace with data generation
Muir P, Li S, Lou S, Wang D, Spakowicz DJ, Salichos L, Zhang J, Weinstock GM, Isaacs F, Rozowsky J, Gerstein M. The real cost of sequencing: scaling computation to keep pace with data generation. Genome Biology 2016, 17: 53. PMID: 27009100, PMCID: PMC4806511, DOI: 10.1186/s13059-016-0917-0.Peer-Reviewed Original Research
2010
Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project
Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C, Arshinoff BI, Liu T, Yip KY, Robilotto R, Rechtsteiner A, Ikegami K, Alves P, Chateigner A, Perry M, Morris M, Auerbach RK, Feng X, Leng J, Vielle A, Niu W, Rhrissorrakrai K, Agarwal A, Alexander RP, Barber G, Brdlik CM, Brennan J, Brouillet JJ, Carr A, Cheung MS, Clawson H, Contrino S, Dannenberg LO, Dernburg AF, Desai A, Dick L, Dosé AC, Du J, Egelhofer T, Ercan S, Euskirchen G, Ewing B, Feingold EA, Gassmann R, Good PJ, Green P, Gullier F, Gutwein M, Guyer MS, Habegger L, Han T, Henikoff JG, Henz SR, Hinrichs A, Holster H, Hyman T, Iniguez AL, Janette J, Jensen M, Kato M, Kent WJ, Kephart E, Khivansara V, Khurana E, Kim JK, Kolasinska-Zwierz P, Lai EC, Latorre I, Leahey A, Lewis S, Lloyd P, Lochovsky L, Lowdon RF, Lubling Y, Lyne R, MacCoss M, Mackowiak SD, Mangone M, McKay S, Mecenas D, Merrihew G, Miller DM, Muroyama A, Murray JI, Ooi SL, Pham H, Phippen T, Preston EA, Rajewsky N, Rätsch G, Rosenbaum H, Rozowsky J, Rutherford K, Ruzanov P, Sarov M, Sasidharan R, Sboner A, Scheid P, Segal E, Shin H, Shou C, Slack FJ, Slightam C, Smith R, Spencer WC, Stinson EO, Taing S, Takasaki T, Vafeados D, Voronina K, Wang G, Washington NL, Whittle CM, Wu B, Yan KK, Zeller G, Zha Z, Zhong M, Zhou X, Consortium M, Ahringer J, Strome S, Gunsalus KC, Micklem G, Liu XS, Reinke V, Kim SK, Hillier LW, Henikoff S, Piano F, Snyder M, Stein L, Lieb JD, Waterston RH. Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project. Science 2010, 330: 1775-1787. PMID: 21177976, PMCID: PMC3142569, DOI: 10.1126/science.1196914.Peer-Reviewed Original ResearchMeSH KeywordsAnimalsCaenorhabditis elegansCaenorhabditis elegans ProteinsChromatinChromosomesComputational BiologyConserved SequenceEvolution, MolecularGene Expression ProfilingGene Expression RegulationGene Regulatory NetworksGenes, HelminthGenome, HelminthGenomicsHistonesModels, GeneticMolecular Sequence AnnotationRegulatory Sequences, Nucleic AcidRNA, HelminthRNA, UntranslatedTranscription FactorsConceptsAccurate gene modelsGenome-wide identificationTranscription factor-binding sitesKey model organismTranscription factor bindingAlternative splice formsFactor-binding sitesChromatin compositionModENCODE projectChromatin organizationHistone modificationsGenome annotationModel organismsNematode CaenorhabditisChromosomal locationPutative functionsGene modelsTranscriptome profilingChromosome armsTranscription factorsNoncoding RNAsFactor bindingSplice formsX chromosomeGene expression
2006
Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights
Kim PM, Lu LJ, Xia Y, Gerstein MB. Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights. Science 2006, 314: 1938-1941. PMID: 17185604, DOI: 10.1126/science.1136174.Peer-Reviewed Original Research