Programmatically Updating Sitecore Indexes

IndexEver since Sitecore 7 introduced the ContentSearch namespace, more than half of the projects I’ve worked on have required me to update Lucene indexes from code. This topic has proven to be even more complicated than publishing Sitecore items programmatically! This post will examine some of the ways you can update the Sitecore indexes as-needed, from your code, and my opinions on when you should each each method.

Types of updates

First of all, we need to distinguish from the different kinds of index update actions. Here is a brief table that should help to make it pretty clear.

Action Events Triggered
Rebuild/FullRebuild indexing:start
indexing:end
Given an index, deletes all index documents before crawling and re-indexing all documents again
Refresh none
Given an index and a starting item, re-indexes (creates or updates index) for that item and all of its descendants.
RefreshTree none
Given a starting item, calls Refresh for all indexes.
Update/UpdateItem indexing:updatingitem
indexing:updateditem
indexing:updatedependents
Given a document identifier and an index, updates the index for specified item only. If specified item cannot be found by the crawler, then it is deleted from the index. This also calls Update for any items returned from the indexing.getDependencies pipeline.
Incremental indexing:start
indexing:end
Given a list of document identifiers and an index, calls Update for each one.
ForcedIncremental indexing:start
indexing:end
Same as Incremental except that it will start the index operation even if indexing is paused or stopped.
Delete/DeleteItem/DeleteVersion indexing:deletegroup
indexing:deleteitem
Given a document identifier and an index, it removes that document from the index.

Methods

There are basically two ways to initiate the index operations.

  1. Calling the methods directly on the Index object
  2. Using IndexCustodian

Like the PublishManager, the IndexCustodian does some extra work that is useful, like queuing and running asynchronous indexing jobs when you call DeleteItem, or refreshing an item in all indexes when calling RefreshTree. Because of this, you should usually prefer to use the IndexCustodian. On the other hand, if you want to deliberately perform a synchronous action on an index, you would not use the IndexCustodian.

Multiple Servers

There is an additional consideration, when talking about updating search indexes. Since each index lives as a physical file set on each server, updating the index on one server does not synchronize across to the others, so when a new item gets added to the web index on the Content Management server, it is not necessarily added to the web index on one of the Content Delivery servers.

I won’t go into it here except to point you to an article by John West that explains it well. What you are looking for in that article is the “RemoteRebuildStrategy”.

Leave a Reply