Each person has about 4 million sequence differences in their genome relative to the reference human genome. These differences are known as variants. A central goal in precision medicine is understanding which of these variants contribute to disease in a particular patient. Therefore, much of the human genome annotation effort is devoted to developing resources to help interpret the relative contribution of human variants to different observable phenotypes – i.e., determining variant impact.
Recently, Yale School of Medicine led a large NIH-sponsored study where multiple institutions and international collaborators came together to address this challenge. This study generated a large, organized dataset from four individual donors using high-quality genome sequencing to identify all the variants and many different assays to determine their effect on molecular phenotypes in 25 different tissues. Known as EN-TEx, the resource is an important step toward the future of personalized care. The team published its findings in Cell on March 30.
“Our work helps provide a better annotation of the genome and a better understanding of variant impact,” says Mark Gerstein, PhD, Albert Williams Professor of Biomedical Informatics and member of the new Yale Section of Biomedical Informatics & Data Science. He also is affiliated at Yale with molecular biophysics & biochemistry, computer science, and statistics & data science. “An average person’s personal genome has variants in 4 million places. We’re trying to figure out which of these lead to meaningful differences.”
"This work represents the type of innovative large-scale data mining and teamwork that Yale is well-poised to create, coordinate, or participate in,” says Lucila Ohno-Machado, MD, MBA, PhD, Waldemar von Zedtwitz Professor of Medicine and of Biomedical Informatics & Data Science, and chair of the new section. “As our new academic unit grows, we expect to see more and more of this type of exemplary biomedical data science work originate from here.”