Skip to Main Content

Yale team builds new search engine that retrieves images based on embedded text

Yale Medicine Magazine, 2009 - Winter


In July a team of Yale scientists published a paper describing an innovative search engine with a new way of finding biomedical images. Search engines and websites already allow scientists to search for images based on titles and captions. “We are not aware of a biomedical search engine that can retrieve images by searching the text within biomedical images,” Michael O. Krauthammer, M.D., Ph.D., assistant professor of pathology, and colleagues wrote in their paper published in Bioinformatics.

The Yale Image Finder (YIF) lets researchers locate diagrams, graphs and other experimental figures based on text contained in the images.

Krauthammer calls this new technology a major step in biomedical literature retrieval, as most important information exists in places other than image captions, which, until now, have been the primary targets of image search engines.

YIF functions by performing optical character recognition before making the images available for search. Users can restrict image queries to the text within the images, the image caption, the paper title, paper abstract, full text or any combination thereof. After submitting a query, YIF presents users with thumbnails of images. Once an image of interest is selected, YIF provides a high-resolution version of the image, along with the abstract, full text and other images from the associated paper.

“The idea is to augment text mining with image mining, with the idea that we can have a better understanding of a research article using automated means,” Krauthammer says. “I’ve felt that images are undervalued in terms of their representative quality and what type of information they can hold. In the future, we should be able to obtain even more information from the images, and get a pretty good understanding of what the paper is about.”