Ever since Sitecore 7 introduced the ContentSearch
namespace, more than half of the projects I’ve worked on have required me to update Lucene indexes from code. This topic has proven to be even more complicated than publishing Sitecore items programmatically! This post will examine some of the ways you can update the Sitecore indexes as-needed, from your code, and my opinions on when you should each each method.
Types of updates
First of all, we need to distinguish from the different kinds of index update actions. Here is a brief table that should help to make it pretty clear.
Action | Events Triggered |
---|---|
Rebuild/FullRebuild | indexing:start indexing:end |
Given an index, deletes all index documents before crawling and re-indexing all documents again | |
Refresh | none |
Given an index and a starting item, re-indexes (creates or updates index) for that item and all of its descendants. | |
RefreshTree | none |
Given a starting item, calls Refresh for all indexes. |
|
Update/UpdateItem | indexing:updatingitem indexing:updateditem indexing:updatedependents |
Given a document identifier and an index, updates the index for specified item only. If specified item cannot be found by the crawler, then it is deleted from the index. This also calls Update for any items returned from the indexing.getDependencies pipeline. |
|
Incremental | indexing:start indexing:end |
Given a list of document identifiers and an index, calls Update for each one. |
|
ForcedIncremental | indexing:start indexing:end |
Same as Incremental except that it will start the index operation even if indexing is paused or stopped. | |
Delete/DeleteItem/DeleteVersion | indexing:deletegroup indexing:deleteitem |
Given a document identifier and an index, it removes that document from the index. |
Methods
There are basically two ways to initiate the index operations.
- Calling the methods directly on the Index object
- Using IndexCustodian
Like the PublishManager, the IndexCustodian does some extra work that is useful, like queuing and running asynchronous indexing jobs when you call DeleteItem, or refreshing an item in all indexes when calling RefreshTree. Because of this, you should usually prefer to use the IndexCustodian. On the other hand, if you want to deliberately perform a synchronous action on an index, you would not use the IndexCustodian.
Multiple Servers
There is an additional consideration, when talking about updating search indexes. Since each index lives as a physical file set on each server, updating the index on one server does not synchronize across to the others, so when a new item gets added to the web index on the Content Management server, it is not necessarily added to the web index on one of the Content Delivery servers.
I won’t go into it here except to point you to an article by John West that explains it well. What you are looking for in that article is the “RemoteRebuildStrategy”.