Go to the homepage
Powered bySitecore Search logo
Skip to main contentThe Sitecore Search | Sitecore Accelerate Cookbook for XM Cloud page has loaded.

Sitecore Search

Recipe for implementing search experience using Sitecore Search

Last updated: Nov 20, 2024

Problem

Majority of web application require a search tool to achieve content search - customers who purchase both XM Cloud and Search must integrate the two products to enable them to leverage Search in the sites built out using XM Cloud.

This article discusses how to integration Sitecore Search with XM Cloud. If you need to integrate a different Search provider, you will need to read these techniques and see how they can be applied to your scenario.

Solution

This is a two-step process, first, the customer will need to index their content within Sitecore Search. They will then use that search data to build out Search-based experiences in their applications and use Search events to collect user interactions.

Indexing Data

The Advanced Web Crawler is the primary way to pull data into Sitecore Search (for XM Cloud or non-XMC-based sites) and should work for the majority of use cases while allowing for quick iteration on configuring and tuning the index.

Sitecore Search has other indexing methods that offer both pull and push methods for indexing content, however, these should be used only when there are clear and concise scenarios that the Advanced Web Crawler may not be able to handle.

Crawling Methods

Here is a quick overview of the available indexing methods in Sitecore Search:

  • Advanced Web Crawler - Allows for indexing content in multiple languages and parsing web pages with JavaScript to construct indexes.
  • Web Crawler - A simple web crawler that has limited configuration options.
  • API Crawler - Used to consume responses from an HTTP API endpoint that returns JSON. Allows for indexing content in multiple languages and parsing the response with JavaScript.
  • Ingestion API - Used to push content into a Sitecore Search Index.

You can read an overview of how to work with Index Sources in Sitecore Search on our documentation site: how to index contents.

Implementing Search in the Web Application

Implementing Search in the web application requires you to call Sitecore Search’s REST API endpoints. The REST API endpoints are the recommended way to implement Sitecore Search. Additionally, Sitecore does make available a React SDK (Documentation) for the development. It is also available a Starter Kit for Sitecore Search that is intended for exploratory or prototyping purposes at this time.

Discussion

While the web crawler can be an effective way to index all or part of the site. It is necessary to remember that the execution of the crawler updates the index completely; for this reason the crawlers cannot be used to partially update the source index. For example, if the page ‘/events/last’ changes you cannot use the web crawler to update the index just for '/events/last', you should launch the crawler to re-index the source completely (updates all index documents with the latest data). To partially update a source index, you need to implement the Incremental Updates that uses, besides the crawler, also the Ingestion API.

A common solution that teams may arrive at is to use a combination of webhooks and the Ingestion API to integrate Sitecore Search and their XM Cloud implementation tightly. However, this solution, while superficially a good fit, often is not able to effectively meet the overall requirements for the search implementation.

While you can configure a webhook either in the XMC CM or Edge to trigger an update on publish, all you can do is pull the data directly from Edge to update the Search Source. Web pages generated by the front-end application often use multiple, external datasources in addition to the content inside of XM Cloud, in which case you would need to trigger a re-crawl of the page that has been updated. As mentioned above, Sitecore Search does not have a way to trigger individual page crawls, a crawl must happen in its entirety across the configured crawl source and configuration. The ability to crawl individual pages is a planned feature for a future release of Sitecore Search.

Pushing Data into Sitecore Search via a Webhook

To achieve this you will need to provision a service to accept the request from the webhook, process the data then leverage the Search API to update the indexes. The process for this is.

  • Register an “OnUpdate” webhook on the Experience Edge tenant.
  • Use the RootItem ID to make a GraphQL Request to Edge to get the rendered layout data for that item.
  • Use this data to call the Ingestion API and update the data in the Source Index. Sitecore documentation provides examples of how to create and update a document.
  • You can find out more about different elements of this approach in the following Documentation articles

Event tracking

As visitors interact with your page by clicking on widgets, logging in, upvoting articles, and so on, you must track those interactions and send them to Sitecore Search using the Events API. Sitecore Search uses events for generating analytic dashboards and driving personalization.

It is worth noting that Search events are different from Sitecore CDP and Personalize events: they have different structure and purpose. If your project includes CDP and/or Personalize, this is the time to define your event tracking and personalization strategy.

In Search, events have a specific goal:

  • Measure the conversions: this is done via Funnel events. These are visitor actions that could drive to conversion. For example, page view, add to cart, remove from cart, logging in, and so on. Conversion analytics are generated based on these events.
  • Personalize the search experience: This happens through Widget events. It is about when a visitor views or clicks on a Sitecore Search widget.

It is important to consider events from the beginning of your project, the more events come into the system, the better the value you return to customers. The starter kit provides basic tracking and can be useful to familiarize yourself with the mechanism and how events are built.

Search provides tools to:

Furthermore, in compliance with GDPR, the product provides the tool to ensure the right to erasure.

Throttle limits

Based on the contract (entitlement), API calls are limited per second and per day. The implementers should always handle rate limiting and handle cases in which they receive the HTTP Status Code 429 (Too Many Requests); in this case, if a retry is needed, it is best to allow for a back-off to avoid going into an infinite retry loop.

The throttle limits can be checked in the CEC>Developer Resources>API Access. The dashboard shows a table like the following:

Throttle limits

Developer experience

Sitecore Search provides three integration methods that you can choose from:

  • JavaScript SDK for React - integration involves adding SDK packages containing components and features to your project. Built specially for React applications, the SDK is the fastest way to integrate with Search and requires the least time and development effort.
  • JavaScript Data - can be used with vanilla JavaScript or in any JS framework. This is quick way to make search queries and receive responses from Search .
  • REST APIs - integration involves accessing endpoints that expose Search services. Configure the requests you want to send to Search. You'll also need to handle the responses and translate them into Search experiences.

In addition, for the pure purpose of prototyping and training, there are the following resources:

© Copyright 2024, Sitecore. All Rights Reserved