Skip to Main Content

New Website Search Engine Deployed

January 20, 2016
by Justin Fansler

Today the School of Medicine's Web Group launched a new and enhanced search engine tool to replace the aging Yale ITS-hosted Google Search Appliance (GSA). The new tool, based on Elastic Search and hosted in Microsoft's Azure Cloud, contains all Tridion-hosted content contained on School of Medicine and School of Public Health websites.

The website search problem to be solved

The university's Google Search Appliance license covered 1 million of approximately 11 million pages at Yale, which created significant limitations in search results, displaying only about 10% of website content available. In addition, the technology used in the GSA relied on "crawlers" that would traverse websites and index their content on a regular schedule. It could take up to a week for newly published content (or removed content) to appear (or be deleted) from the GSA search index. Any pages with secured content, either requiring the user to be on Yale's campus (or the VPN) or CAS login, were not included in the search results.

Azure search

The new search tool offers several significant improvements, including:

  1. Quick indexing: As content is published or unpublished to live websites, the search index is typically updated within one minute to reflect the content change. Search results always reflect the latest content on school sites.
  2. Secured results: Pages and documents that are available only through Yale's campus (or VPN), or that require CAS login (NetID), or that require a generic username and password are now presented in the search results. Website visitors are asked to login to view these results.
  3. Scalable and reliable infrastructure: The new search tool is hosted in the Azure cloud environment and has inherent redundancy and scalability built in. Network, power and other outages at Yale's campus will not impact the school's websites.

Secured results

If there are pages available in the search results that are protected by security (VPN, CAS, generic username/password), website visitors will be presented with a prompt to login to view the secured results. Website visitors will only be able to see results to which they already have access, based on the security files associated with the website.

In the prompt, visitors will see individual messages for the number of secured items available to them if they authenticate using any of the three available security options.

Immediate updates based on Tridion publishing and unpublishing

The search index is immediately updated when content is published to or unpublished from an SDL Tridion-hosted website. This means that any content published to a live website will be included in the search results.

What to do if pages or documents appear in search that are incorrect

The new search tool includes all content that is hosted in SDL Tridion and published to a live website. There are two scenarios for incorrect content appearing in search results:

  1. Forgotten or incorrect pages: These are pages that were published and not included in the navigation. If they should not be available in search results, unpublish the page or document in Tridion to remove it from the search index.
  2. Content that should not be publicly available: These are pages that should be secured and not available to the general public. To secure this content, please follow the instructions on our Support website at http://web.yale.edu/support/templates/external/applications/security/index.aspx.

Upcoming search improvements

While the new search tool is a fantastic improvement over the current implementation of Yale's Google Search Appliance, there are additional improvements we plan to make over the coming months, including:

Website information is added to the search index almost immediately when content is published or unpublished from SDL Tridion.

  • Content filtering: Will enable website visitors to filter search results by content types such as videos, image galleries, faculty, etc.
  • Image thumbnails: Will allow us to display thumbnails of images that appear on a page or a thumbnail of a document in the search results.
  • Tridion page keywords and descriptions: If metadata is added to a Tridion page (i.e. keyword or description), it will be added to the index and help improve search results.
  • Search suggestions and type-ahead: As the visitor types a search term into the search box, we will recommend search terms based on the content in the index.
  • Additional content types: We do not currently include NCI cancer content or podcasts in the search index. These need to be added to the indexing process.
  • Facet support: We will offer visitors the option to further filter their results by items such as published date, modification date, security, content type, etc.
  • Other infrastructure improvements: These include better synonym support, improved indexing when navigation changes, query cooking, indexing tuning and possibly including images in the search index.

Up Next: Improved people search

The new search tool forms the foundation for a unified search approach that will include people, news, organizations and SDL Tridion-hosted content. In the coming months, the team will work on significant improvements to the people search results. We are targeting a March launch date for an enhanced people search that will enable users to filter results by fields such as international activity, research interests and patent care-focused topics.