The lab is active in 4 research domains:

1. Personalized medicine / translational bioinformatics (high throughput sequencing of cancer genomes, building diagnostic, prognostic and predictive Omics tests)

2. Translational research infrastructure (biobanking, tissue and data sharing)

3. Semantic Web technology in biomedicine (smart reasoning engines over Omics data)

4. Data and text mining in biomedicine (biomedical search engines development)

Much of our work is at the intersection of clinical medicine, basic research and computer science - often termed translational bioinformatics. Our lab is tightly integrated with the Yale SPORE in skin cancer, a large translational cancer grant that aims at brining genomic discoveries into clinical practice. We are studying the melanoma genome, transcriptome and epigenome using High Throughput technologies, mostly Next Generation Sequencing. We work in close collaboration with physicians, biologists, and pharmacologists, distilling actionable information from large Omics data.

The lab is involved in nation-wide initiatives for storing and dissemination of cancer tissue and Omics data. We are participating in data standards initiatives, and in the design and deployment of grid-enabled databases. The goal is to accelerate discovery and translation through mutual sharing of data.

We are taking advantage of emerging Semantic Web technologies for building smart systems for ontology-driven reasoning over large Omics data sets. With the availability of complex domain ontologies (large descriptions of entities and their relationships, such as genes and processes, and the relationships between the two), we are able to interrogate Omics data for hidden associations. We can tackle difficult questions such as: “Why is a particular cancer sample resistant to epigenetic treatment?”, “What genomic aberration may explain the growth behavior of metastatic samples?’

We are also active in the exciting field of text and image mining. We tackle the fundamental problem of keeping up with an increasing number of research publications. How can we stay abreast of the latest discoveries? How can we find specific information in Millions of PubMed abstracts? A picture is worth a thousand words, and we are therefore equally interested in mining image information from biomedical publication. Below one of our recent projects in biomedical image mining.


Yale Image Finder (YIF) is a publicly accessible search engine featuring a new way of retrieving biomedical images and associated papers based on the text carried inside the images. Image queries can also be issued against the image caption, as well as words in the associated paper abstract and title. A typical search scenario using YIF is as follows: A user provides few search keywords and the most relevant images are returned and presented in the form of thumbnails. Users can click on the image of interest to retrieve the high resolution image. In addition, the search engine will provide two types of related images: Those that appear in the same paper, and those from other papers with similar image content. Retrieved images link back to their source papers, allowing users to find related papers starting with an image of interest. Currently, YIF has indexed over 140,000 images from over 34,000 public-access biomedical journal papers.

The paper is available at

The search engine is available at