Monday, November 17, 2014

Sitecore 7.2 with Solr: Search operator, boosting the most recent results, highlighting and proximity search.

Recently I found myself in need to add a few extensions to the Sitecore Solr Provider that I would like to share. For original implementation I added a "GroupBy" method that you can read about in my other blog post "Sitecore 7 and GroupBy method for Solr.". This time I had to add 4 more things:
  • passing of a search operator "AND";
  • boosting the most recent results to the top;
  • implementing highlighting;
  • and proximity search.
Because I already had a class where GroupBy extension was implemented, I used the same class to add query parameters for new functionality.

Passing of a search operator "AND"

Out-of-the-box Solr uses "OR" operator for all queries. To retrieve "AND" results for something like private equity phrase, I had to pass "AND" search operator. The Solr query should look like this:

?q={q.op=AND}((_template:(d84da6272e284cdb87869691dea4e692) OR _template:(e6ec2506eb4e4998aaa922b41372a8b7) OR _template:(d07c1b814d2546529c4738a0cc7a48dc)) AND people_sm:(296ee3f7692b48eba1f3f41387eee6c3))&rows=4500&fl=*,score&fq=((iscontent_b:(True) AND show_in_search_results_b:(True)) AND _latestversion:(True))&fq=_indexname:(sitecore_master_index)&sort=start_date_tdt desc,publication_date_tdt desc

or like this:

?q=((_template:(d84da6272e284cdb87869691dea4e692) OR _template:(e6ec2506eb4e4998aaa922b41372a8b7) OR _template:(d07c1b814d2546529c4738a0cc7a48dc)) AND people_sm:(296ee3f7692b48eba1f3f41387eee6c3))&rows=4500&fl=*,score&fq=((iscontent_b:(True) AND show_in_search_results_b:(True)) AND _latestversion:(True))&fq=_indexname:(sitecore_master_index)&q.op=AND&sort=start_date_tdt desc,publication_date_tdt desc

If you wish you can pass both.

To add this parameter to the Solr query I had to add an Operator parameter to my GroupResults method that returns grouped search results to indicate which operator should be used.

public static ExtendedSearchResults<TSource> GroupResults<TSource, TKey>(this IQueryable<TSource> source, IProviderSearchContext context, Expression<Func<TSource, TKey>> keySelector, int groupLimit, bool includSpellChecking, Operator op, string text, IDictionary<BoostField,string> boostFields = null)
        {
            if (source == null)
                throw new ArgumentNullException("source");

            var linqToSolr = new CustomLinqToSolrIndex((SolrSearchContext)context, (IExecutionContext)null);
            
            var param = "_template";
            MemberExpression member = keySelector.Body as MemberExpression;
            if (member != null)
            {
                PropertyInfo propInfo = member.Member as PropertyInfo;
                if (propInfo != null)
                {
                    var expressionParser = new ExpressionParser(typeof(TKey), typeof(TSource), linqToSolr.PublicFieldNameTranslator);
                    var methodInfo = expressionParser.GetType().GetMethod("Visit", BindingFlags.Instance | BindingFlags.NonPublic, Type.DefaultBinder, new Type[] { typeof(Expression) }, null);
                    if (methodInfo != null)
                    {
                        object classInstance = Activator.CreateInstance(typeof(TSource), null);
                        QueryNode queryNode = methodInfo.Invoke(expressionParser, new object[] { keySelector.Body }) as QueryNode;
                        FieldNode fieldNode = queryNode as FieldNode;
                        if (fieldNode != null)
                        {
                            param = fieldNode.FieldKey;
                        }
                    }                                    
                }
            }
            var extendedQuery = ExtendNativeQuery((IHasNativeQuery)source, groupLimit, param, includSpellChecking, op, text, boostFields);
            return linqToSolr.Execute<ExtendedSearchResults<TSource>>(extendedQuery);            
        }

This method calls ExtendNativeQuery method, passing Operator parameter. In the ExtendNativeQuery method the assembling of query parameters happens:

private static ExtendedCompositeQuery ExtendNativeQuery(IHasNativeQuery hasNativeQuery, int groupLimit, string expression, bool includSpellChecking, Operator op, string text, IDictionary<BoostField,string> boostFields = null)
        {            
            var query = (SolrCompositeQuery)hasNativeQuery.Query;
            query.Methods.Add((QueryMethod)new GetResultsMethod(GetResultsOptions.Default));

            var options = new QueryOptions()
                {
                    Grouping = new GroupingParameters()
                    {
                        Fields = new[] { expression },
                        Format = GroupingFormat.Grouped,
                        Limit = groupLimit,
                    },
                    Highlight = GetHighlightParameter()
                };

            var localParams = new LocalParams();
            
            if (op == Operator.AND)
            {
                localParams.Add("q.op", "AND");
            }
            return new ExtendedCompositeQuery(query.Query, query.Filter, query.Methods, query.VirtualFieldProcessors, query.FacetQueries, options, localParams);
        }

Here I am adding creating a LocalParams object, add q.op parameter to it and call constructor for ExtendedCompositeQuery passing newly created localParams object. The ExtendedCompositeQuery class inherits SolrCompositeQuery and is very simple.

public class ExtendedCompositeQuery : SolrCompositeQuery
    {
        public QueryOptions QueryOptions { get; set; }
        public LocalParams LocalParams { get; set; }
        public ExtendedCompositeQuery(AbstractSolrQuery query, AbstractSolrQuery filterQuery, IEnumerable<Sitecore.ContentSearch.Linq.Methods.QueryMethod> methods, IEnumerable<IFieldQueryTranslator> virtualFieldProcessors, IEnumerable<FacetQuery> facetQueries, QueryOptions options, LocalParams localParams = null)
            : base(query, filterQuery, methods, virtualFieldProcessors, facetQueries)
        {
            QueryOptions = options;
            LocalParams = localParams;
        } 
}

The only difference between SolrCompositeQuery and ExtendedCompositeQuery is the LocalParams property.

After the query object is assembled, an Execute method is being called on CustomLinqToSolrIndex object, which is a class extending LinqToSolrIndex one from Sitecore original code. In the extension class I added the following code block to the internal Execute method:

if (compositeQuery.LocalParams != null)
{
   SearchLog.Log.Info("Serialized Query - ?q=" + compositeQuery.LocalParams.ToString() + q + "&" + string.Join("&", Enumerable.ToArray(Enumerable.Select<KeyValuePair<string, string>, string>(loggingSerializer.GetAllParameters(options), (Func<KeyValuePair<string, string>, string>)(p => string.Format("{0}={1}", (object)p.Key, (object)p.Value))))), (Exception)null);
   return solrOperations.Query(compositeQuery.LocalParams + q, options);
}

It adds the local parameters to the query if LocalParams property of the composite query is defined. This is the point where a string representation of the query is being generated and passed to Solr.

Boosting the most recent results

The second requirement that I needed to accommodate was boosting of the most recent results to the top of the result set. For Solr to return most recent results first the query should look something like this:

?q={!boost b=recip(ms(NOW,publication_date_tdt),3.16e-11,1,1) q.op=AND}((_template:(d84da6272e284cdb87869691dea4e692) OR _template:(e6ec2506eb4e4998aaa922b41372a8b7) OR _template:(d07c1b814d2546529c4738a0cc7a48dc)) AND people_sm:(296ee3f7692b48eba1f3f41387eee6c3))&rows=4500&fl=*,score&fq=((iscontent_b:(True) AND show_in_search_results_b:(True)) AND _latestversion:(True))&fq=_indexname:(sitecore_master_index)&q.op=AND&sort=start_date_tdt desc,publication_date_tdt desc

To add this parameter to the query I added one more parameter to the ExtendNativeQuery method responsible for putting together the composite query object. In the body of the method I added a check for the parameter being null, and if it is not, I am adding "boost b" parameter to localParams object.

if (boostFields != null)
{
    foreach (var field in boostFields.Keys)
    {
        switch (field)
        {
           case BoostField.DateLatestFirst:
                localParams.Add("boost b", string.Format("recip(ms(NOW,{0}),3.16e-11,1,1)", boostFields[field]));
                break;
           default:
                break;
        }
     }
}

The rest gets taken care of by the CustomLinqToSolrIndex class that you saw in the search operator implementation.

Proximity Search

You probably have seen in Sitecore Search documentation an example of proximity search Linq syntax. For one reason or another it didn't really work for me. First it didn't wrap the word I was passing to the Like method in quotes, which is needed for the Solr query to return an proximity search results, so I had to force the quotes. In addition to that when I tried to pass an integer of 1000, it kept converting the value to ~0.5, which was wrong. Only when I converted it into a float, it appended ~1000, which I was trying to do. I think should be the opposite, and reflected code seems to check for int, but for whatever reason it worked in reverse.

At the end the code looked like this:

float slop = 1000F;
query = query.Where(p =>
    p.FirstName.MatchWildcard(correctSpelling) ||
    p.LastName.MatchWildcard(correctSpelling) ||
    p.OfficeName.MatchWildcard(correctSpelling) ||
    p.Title.Like("\"" + correctSpelling + "\"", slop) ||
    p.ReferenceItem.Like("\"" + correctSpelling + "\"", slop) ||
    p.SiteContent.Like("\"" + correctSpelling + "\"", slop) ||
    p.PageContent.Like("\"" + correctSpelling + "\"", slop));

Highlighting of the keywords in the search results.

Implementing of highlighting was the most evolved. Not only I had to pass the query with all highlighting parameters, but I also had to extract the highlights section from the Solr response.

Once again in the same ExtendNativeQuery method where I added other parameters to the composite query, a Hightlight property of QueryOptions was added.



var options = new QueryOptions()
{
   Grouping = new GroupingParameters()
   {
       Fields = new[] { expression },
       Format = GroupingFormat.Grouped,
       Limit = groupLimit,
   },
   Highlight = GetHighlightParameter()
};

Method responsible for generating Highlight parameter:


private static HighlightingParameters GetHighlightParameter()
{
    var snippetCount = WeilConfig.SiteProperties.PropertyValue<int>("SolrHighlightsNumberOfSnippets");
    var fields = WeilConfig.SiteProperties.PropertyValues<string>("SolrHighlightsFields");
    return new HighlightingParameters
        {
            Fields = fields != null ? fields.ToArray() : new[] { "pagecontent_t" },
            Fragmenter = SolrHighlightFragmenter.Regex,
            RegexPattern = @"\w[^|;.!?]{50,400}[|;.!?]",
            Fragsize = 300,
            RegexSlop = 0.2
        };
}
The GroupResults method returns an extension of Sitecore SearchResults object called ExtendedSearchResults, which was done for the GroupBy implementation, I can just add processing of the Highlights section to the same class.

The actual processing of Highlights section of the Solr response is done in CustomLinqToSolrIndex class in GetExtendedResults method:

internal TResult GetExtendedResults<TResult, TDocument>(ExtendedCompositeQuery compositeQuery, SolrSearchResults<TDocument> processedResults, SolrQueryResults<Dictionary<string, object>> results)
        {
            object obj = default(TResult);
            IEnumerable<Linq.GroupedResults<TDocument>> groups = processedResults.GetGroupedResults();
            FacetResults facetResults = this.FormatFacetResults(processedResults.GetFacets(), compositeQuery.FacetQueries);
            IEnumerable<SearchHit<TDocument>> searchResults = processedResults.GetSearchHits();
            var spellcheckedResponse = processedResults.GetSpellCheckedResults();
            obj = Activator.CreateInstance(typeof(TResult), (object)searchResults, (object)groups, (object)processedResults.NumberFound, spellcheckedResponse, processedResults.Highlights, (object)facetResults);
            return (TResult)Convert.ChangeType(obj, typeof(TResult));
        }

The SolrQueryResults class is responsible for Solr response processing and holds processed values in it's properties. I added a new highlights properties to this class to hold the Highlights response portion.


public struct SolrSearchResults<TElement>
    {
        private readonly SolrSearchContext context;
        private readonly SolrQueryResults<Dictionary<string, object>> searchResults;
        private readonly IDictionary<string, SolrNet.GroupedResults<Dictionary<string, object>>> groupedSearchResults;
        private readonly SolrIndexConfiguration solrIndexConfiguration;
        private readonly IIndexDocumentPropertyMapper<Dictionary<string, object>> mapper;
        private readonly SelectMethod selectMethod;
        private readonly IEnumerable<IFieldQueryTranslator> virtualFieldProcessors;
        private readonly int numberFound;
        private readonly string spellCheckerResults;
        private readonly IDictionary<string, HighlightedSnippets> highlights;
        private readonly IEnumerable<IExecutionContext> executionContexts;
    
        public int NumberFound
        {
            get
            {
                return this.numberFound;
            }
        }
        
        public SolrSearchResults(SolrSearchContext context, SolrQueryResults<Dictionary<string, object>> searchResults, SelectMethod selectMethod, IEnumerable<IExecutionContext> executionContexts, IEnumerable<IFieldQueryTranslator> virtualFieldProcessors)
        {
            this.context = context;
            this.solrIndexConfiguration = (SolrIndexConfiguration)this.context.Index.Configuration;
            this.executionContexts = executionContexts;

            OverrideExecutionContext<IIndexDocumentPropertyMapper<Dictionary<string, object>>> executionContext = this.executionContexts != null ? Enumerable.FirstOrDefault<IExecutionContext>(this.executionContexts, (Func<IExecutionContext, bool>)(c => c is OverrideExecutionContext<IIndexDocumentPropertyMapper<Dictionary<string, object>>>)) as OverrideExecutionContext<IIndexDocumentPropertyMapper<Dictionary<string, object>>> : (OverrideExecutionContext<IIndexDocumentPropertyMapper<Dictionary<string, object>>>)null;
            this.mapper = (executionContext != null ? executionContext.OverrideObject : (IIndexDocumentPropertyMapper<Dictionary<string, object>>)null) ?? this.solrIndexConfiguration.IndexDocumentPropertyMapper;
            
            this.selectMethod = selectMethod;
            this.virtualFieldProcessors = virtualFieldProcessors;
            this.numberFound = searchResults.NumFound;
            this.searchResults = SolrSearchResults<TElement>.ApplySecurity(searchResults, context.SecurityOptions, context.Index.Locator.GetInstance<ICorePipeline>(), context.Index.Locator.GetInstance<IAccessRight>(), ref this.numberFound);
            this.groupedSearchResults = SolrSearchResults<TElement>.ApplyGroupSecurity(this.searchResults.Grouping, context.SecurityOptions, context.Index.Locator.GetInstance<ICorePipeline>(), context.Index.Locator.GetInstance<IAccessRight>(), ref this.numberFound);
            this.spellCheckerResults = SolrSearchResults<TElement>.GetSpellCheckedString(searchResults.SpellChecking);
            this.highlights = searchResults.Highlights;
        }
...


The next step is generating of ExtendedSearchResults object to be returned by GroupResults method. If highlights parameter is passed to the constructor, created object will have the Highlights property populated.


    public class ExtendedSearchResults<TSource>
    {
        public string CorrectedSpelling { get; set; }
        public IDictionary<string, IList<TSource>> SimilarResults { get; set; }
        public int TotalSearchResults { get; private set; }
        public IEnumerable<SearchHit<TSource>> Hits { get; private set; }
        public IEnumerable<Linq.GroupedResults<TSource>> Groups { get; private set; }
        public FacetResults Facets { get; private set; }
        public IDictionary<string, HighlightedSnippets> Highlights { get; private set; }

        public ExtendedSearchResults(IEnumerable<SearchHit<TSource>> results, int totalSearchResults)
        {
            if (results == null)
                throw new ArgumentNullException("results");
            this.Hits = results;
            this.TotalSearchResults = totalSearchResults;
        }

        public ExtendedSearchResults(IEnumerable<SearchHit<TSource>> results, int totalSearchResults, FacetResults facets = null)
            : this(results, totalSearchResults)
        {
            this.Facets = facets;
        }

        public ExtendedSearchResults(IEnumerable<Linq.GroupedResults<TSource>> results, int totalSearchResults, FacetResults facets = null)
            : this(results, totalSearchResults)
        {
            this.Facets = facets;
        }

        public ExtendedSearchResults(IEnumerable<SearchHit<TSource>> results, IEnumerable<Linq.GroupedResults<TSource>> groups, int totalSearchResults, string spellcheckedString, FacetResults facets = null)
            : this(results, totalSearchResults)
        {
            this.Facets = facets;
            this.Groups = groups;
            this.CorrectedSpelling = spellcheckedString;
        }

        public ExtendedSearchResults(IEnumerable<SearchHit<TSource>> results, IEnumerable<Linq.GroupedResults<TSource>> groups, int totalSearchResults, string spellcheckedString, IDictionary<string, HighlightedSnippets> highlights, FacetResults facets = null)
            : this(results, totalSearchResults)
        {
            this.Facets = facets;
            this.Groups = groups;
            this.CorrectedSpelling = spellcheckedString;
            this.Highlights = highlights;
        }

        public ExtendedSearchResults(IEnumerable<Linq.GroupedResults<TSource>> results, int totalSearchResults)
        {
            if (results == null)
                throw new ArgumentNullException("results");
            this.Groups = results;
            this.TotalSearchResults = totalSearchResults;
        }

    }

At the end yo get the ExtendedSearchResults object that includes Highlights property that you can use to highlight keywords in the search results on the front end.


You have an option of choosing to display multiple highlighted sections and use Regex. The way you construct the highlighting query parameters is absolutely up to you.