Solr Query Re-ranking: Navigating Search Complexity

Overview

Majority of organizations’ relevance requirements are very complex and creates performance issues during execution. Most search engines rely on parameters such as product popularity, product rating, recency, click through rate and other factors to influence the result set for an input user search query. Solr provides a nice feature called “Query Re-ranking”, which allows to run query with a less costly relevance algorithm which contains simple main query and then re-rank the top N documents by a more costly algorithm with second complex query. In this manner it will have less impact on search performance. One should choose the first query very carefully because the documents which score very low using the first simple query may not be considered during the re-ranking phase, even if they would score very highly using complex query.

Therefore, we can consider re-ranking as a two stage scoring mechanism and the “expensive” scoring calculations are generally applied in the second step, where only limited documents are involved.

Specifying a Ranking Query

A Ranking query can be specified using the rq request parameter. The rq parameter must specify a query string that when parsed, produces a RankQuery. Solr provides three QParserPlugin for rank queries currently and you can also write custom QParserPlugin also if needed and configure the same to use.

ReRankQParserPlugin – parser (rerank)
ExportQParserPlugin – parser (xport)
LTRQParserPlugin – parser (ltr)

ReRank Query Parser

It’s very easy to use the “rerank” parser provided with Solr. It wraps a query specified by a local parameter, along with additional parameters indicating how many documents should be re-ranked (reRankDocs), and how the final scores should be computed (reRankWeight).

reRankQuery – The query string for your complex ranking query, can be a reference variable also.

reRankDocs – The minimum number of top N documents from the original query that should be re-ranked.

reRankWeight – A multiplicative factor that will be applied to the score from the reRankQuery for each of the top matching documents.

reRankScale – Scales the rerank scores between min and max values. Accepts only positive integers. Example reRankScale=0-1

reRankMainScale – Scales the main query scores between min and max values. Accepts only positive integers. Example reRankScale=0-1

reRankOperator – By default the score from the reRankQuery multiplied by the reRankWeight is added to the original score.

Example

q=Samsung+5G&rq={!rerank reRankQuery=$rqq reRankDocs=1000 reRankWeight=3}&rqq=mobile

The above example finds all documents that match on the query “Samsung 5G”, and changes the ranking of the top 1000 documents that also contain “mobile” by additional weight of 3.

ExportQParserPlugin

There is default configured /export request handler available and allows a fully sorted result set to be streamed out of Solr using a special rank query parser and xef:response-writers.adoc[response writer]. These have been specifically designed to work together to handle scenarios that involve sorting and exporting millions of records. This uses rq={!xsort} and wt=xsort for reranking result before streaming them out. This feature uses a stream sorting technique that begins to send records within milliseconds and continues to stream results until the entire result set has been sorted and exported.

All the fields being sorted and exported must have docValues set to true. Other supported response writers are json and javabin. The sort property defines how documents will be sorted in the exported result set. Results can be sorted by any field that has a field type of int, long, float, double and string. The sort fields must be single valued fields. The fl property defines the fields that will be exported with the result set. Any of the field types that can be sorted, can be used in the field list.

Learning To Rank

LTR (Learning To Rank) module allows the option to configure and run machine learned ranking models in Solr. Learning To Rank is used to re-rank the top N retrieved documents using trained machine learning models. The concept is that such sophisticated models can make more nuanced ranking decisions than standard ranking functions like TF-IDF or BM25.

Main components in LTR

Ranking Model

A ranking model computes the scores used to rerank documents. Irrespective of any particular algorithm or implementation, a ranking model’s computation can use three types of inputs:

parameters that represent the scoring algorithm
- features that represent the document being scored
- features that represent the query for which the document is being scored
Feature

A feature is a value, a number, that represents some quantity or quality of the document being scored or of the query for which documents are being scored. For example documents often have a ‘recency’ quality and ‘number of past purchases’ might be a quantity that is passed to Solr as part of the search query.

Example

q=Samsung+5G&rq={!ltr model=myModel reRankDocs=100}&fl=id,score

The above example finds all documents that match on the query “Samsung 5G”, and changes the ranking of the top 100 documents based on applied model in the query syntax.

Conclusion

Re-Ranking is a great way of scoring documents dynamically if there are performance issues in ranking documents. Re-Ranking is not the solution to adjust the scoring, but this feature will be very helpful in many situations.

About Nextbrick

AI

Search

Content Management

Data Engineering

Emerging Technologies

Software Development

ERP

Our Product

About Nextbrick

AI

Search

Content Management

Data Engineering

Emerging Technologies

Software Development

ERP

Our Product

Solr Query Re-ranking: Navigating Search Complexity

Overview

Specifying a Ranking Query

ReRank Query Parser

Example

ExportQParserPlugin

Learning To Rank

Main components in LTR

Example

Conclusion

Leave a Reply Cancel reply

Looking for an expert provider of software, services, and technology solutions?

Helpful Links

Official Info

Newsletter

For AI, Search, Content Management & Data Engineering Services