[[boosting-by-popularity]] === Boosting by Popularity

Imagine that we have a website that hosts blog posts and enables users to vote for the blog posts that they like.((("relevance", "controlling", "boosting by popularity")))((("popularity", "boosting by")))((("boosting", "by popularity"))) We would like more-popular posts to appear higher in the results list, but still have the full-text score as the main relevance driver. We can do this easily by storing the number of votes with each blog post:

[role="pagebreak-before"]

[source,json]

PUT /blogposts/post/1 { "title": "About popularity", "content": "In this post we will talk about...", "votes": 6

}

At search time, we can use the function_score query ((("function_score query", "field_value_factor function")))((("field_value_factor function")))with the field_value_factor function to combine the number of votes with the full-text relevance score:

[source,json]

GET /blogposts/post/_search { "query": { "function_score": { <1> "query": { <2> "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { <3> "field": "votes" <4> } } }

}

<1> The function_score query wraps the main query and the function we would like to apply.

<2> The main query is executed first.

<3> The field_value_factor function is applied to every document matching the main query.

<4> Every document must have a number in the votes field for the function_score to work.

In the preceding example, the final _score for each document has been altered as follows:

new_score = old_score * number_of_votes

This will not give us great results. The full-text _score range usually falls somewhere between 0 and 10. As can be seen in <>, a blog post with 10 votes will completely swamp the effect of the full-text score, and a blog post with 0 votes will reset the score to zero.

[[img-popularity-linear]] .Linear popularity based on an original _score of 2.0 image::images/elas_1701.png[Linear popularity based on an original _score of 2.0]

==== modifier

A better way to incorporate popularity is to smooth out the votes value with some modifier. ((("modifier parameter")))((("field_value_factor function", "modifier parameter")))In other words, we want the first few votes to count a lot, but for each subsequent vote to count less. The difference between 0 votes and 1 vote should be much bigger than the difference between 10 votes and 11 votes.

A typical modifier for this use case is log1p, which changes the formula to the following:

new_score = old_score * log(1 + number_of_votes)

The log function smooths out the effect of the votes field to provide a curve like the one in <>.

[[img-popularity-log]] .Logarithmic popularity based on an original _score of 2.0 image::images/elas_1702.png[Logarithmic popularity based on an original _score of 2.0]

The request with the modifier parameter looks like the following:

[source,json]

GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p" <1> } } }

}

<1> Set the modifier to log1p.

[role="pagebreak-before"] The available modifiers are none (the default), log, log1p, log2p, ln, ln1p, ln2p, square, sqrt, and reciprocal. You can read more about them in the http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_field_value_factor[`field_value_factor` documentation].

==== factor

The strength of the popularity effect can be increased or decreased by multiplying the value((("factor (function_score)")))((("field_value_factor function", "factor parameter"))) in the votes field by some number, called the factor:

[source,json]

GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 2 <1> } } }

}

<1> Doubles the popularity effect

Adding in a factor changes the formula to this:

new_score = old_score * log(1 + factor * number_of_votes)

A factor greater than 1 increases the effect, and a factor less than 1 decreases the effect, as shown in <>.

[[img-popularity-factor]] .Logarithmic popularity with different factors image::images/elas_1703.png[Logarithmic popularity with different factors]

==== boost_mode

Perhaps multiplying the full-text score by the result of the field_value_factor function ((("function_score query", "boost_mode parameter")))((("boost_mode parameter")))still has too large an effect. We can control how the result of a function is combined with the _score from the query by using the boost_mode parameter, which accepts the following values:

multiply:: Multiply the _score with the function result (default)

sum:: Add the function result to the _score

min:: The lower of the _score and the function result

max:: The higher of the _score and the function result

replace:: Replace the _score with the function result

If, instead of multiplying, we add the function result to the _score, we can achieve a much smaller effect, especially if we use a low factor:

[source,json]

GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 0.1 }, "boost_mode": "sum" <1> } }

}

<1> Add the function result to the _score.

The formula for the preceding request now looks like this (see <>):

new_score = old_score + log(1 + 0.1 * number_of_votes)

[[img-popularity-sum]] .Combining popularity with sum image::images/elas_1704.png["Combining popularity with sum"]

==== max_boost

Finally, we can cap the maximum effect((("function_score query", "max_boost parameter")))((("max_boost parameter"))) that the function can have by using the max_boost parameter:

[source,json]

GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 0.1 }, "boost_mode": "sum", "max_boost": 1.5 <1> } }

}

<1> Whatever the result of the field_value_factor function, it will never be greater than 1.5.

NOTE: The max_boost applies a limit to the result of the function only, not to the final _score.

powered by Gitbook该页面构建时间: 2017-08-11 12:51:16

results matching ""

    No results matching ""