[[boosting-by-popularity]] === Boosting by Popularity
Imagine that we have a website that hosts blog posts and enables users to vote for the blog posts that they like.((("relevance", "controlling", "boosting by popularity")))((("popularity", "boosting by")))((("boosting", "by popularity"))) We would like more-popular posts to appear higher in the results list, but still have the full-text score as the main relevance driver. We can do this easily by storing the number of votes with each blog post:
[role="pagebreak-before"]
[source,json]
PUT /blogposts/post/1 { "title": "About popularity", "content": "In this post we will talk about...", "votes": 6
}
At search time, we can use the function_score
query ((("function_score query", "field_value_factor function")))((("field_value_factor function")))with the
field_value_factor
function to combine the number of votes with the full-text relevance score:
[source,json]
GET /blogposts/post/_search { "query": { "function_score": { <1> "query": { <2> "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { <3> "field": "votes" <4> } } }4>3>2>1>
}
<1> The function_score
query wraps the main query and the function we would
like to apply.1>
<2> The main query is executed first.2>
<3> The field_value_factor
function is applied to every document matching
the main query
.3>
<4> Every document must have a number in the votes
field for
the function_score
to work.4>
In the preceding example, the final _score
for each document has been altered as
follows:
new_score = old_score * number_of_votes
This will not give us great results. The full-text _score
range
usually falls somewhere between 0 and 10. As can be seen in <
[[img-popularity-linear]]
.Linear popularity based on an original _score
of 2.0
image::images/elas_1701.png[Linear popularity based on an original _score
of 2.0
]
==== modifier
A better way to incorporate popularity is to smooth out the votes
value
with some modifier
. ((("modifier parameter")))((("field_value_factor function", "modifier parameter")))In other words, we want the first few votes to count a
lot, but for each subsequent vote to count less. The difference between 0
votes and 1 vote should be much bigger than the difference between 10 votes
and 11 votes.
A typical modifier
for this use case is log1p
, which changes the formula
to the following:
new_score = old_score * log(1 + number_of_votes)
The log
function smooths out the effect of the votes
field to provide a
curve like the one in <
[[img-popularity-log]]
.Logarithmic popularity based on an original _score
of 2.0
image::images/elas_1702.png[Logarithmic popularity based on an original _score
of 2.0
]
The request with the modifier
parameter looks like the following:
[source,json]
GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p" <1> } } }1>
}
<1> Set the modifier
to log1p
.1>
[role="pagebreak-before"]
The available modifiers are none
(the default), log
, log1p
, log2p
,
ln
, ln1p
, ln2p
, square
, sqrt
, and reciprocal
. You can read more
about them in the
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_field_value_factor[`field_value_factor` documentation].
==== factor
The strength of the popularity effect can be increased or decreased by
multiplying the value((("factor (function_score)")))((("field_value_factor function", "factor parameter"))) in the votes
field by some number, called the
factor
:
[source,json]
GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 2 <1> } } }1>
}
<1> Doubles the popularity effect1>
Adding in a factor
changes the formula to this:
new_score = old_score * log(1 + factor * number_of_votes)
A factor
greater than 1
increases the effect, and a factor
less than 1
decreases the effect, as shown in <
[[img-popularity-factor]] .Logarithmic popularity with different factors image::images/elas_1703.png[Logarithmic popularity with different factors]
==== boost_mode
Perhaps multiplying the full-text score by the result of the
field_value_factor
function ((("function_score query", "boost_mode parameter")))((("boost_mode parameter")))still has too large an effect. We can control
how the result of a function is combined with the _score
from the query by
using the boost_mode
parameter, which accepts the following values:
multiply
::
Multiply the _score
with the function result (default)
sum
::
Add the function result to the _score
min
::
The lower of the _score
and the function result
max
::
The higher of the _score
and the function result
replace
::
Replace the _score
with the function result
If, instead of multiplying, we add the function result to the _score
, we can
achieve a much smaller effect, especially if we use a low factor
:
[source,json]
GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 0.1 }, "boost_mode": "sum" <1> } }1>
}
<1> Add the function result to the _score
.1>
The formula for the preceding request now looks like this (see <
new_score = old_score + log(1 + 0.1 * number_of_votes)
[[img-popularity-sum]]
.Combining popularity with sum
image::images/elas_1704.png["Combining popularity with sum
"]
==== max_boost
Finally, we can cap the maximum effect((("function_score query", "max_boost parameter")))((("max_boost parameter"))) that the function can have by using the
max_boost
parameter:
[source,json]
GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 0.1 }, "boost_mode": "sum", "max_boost": 1.5 <1> } }1>
}
<1> Whatever the result of the field_value_factor
function, it will never be
greater than 1.5
.1>
NOTE: The max_boost
applies a limit to the result of the function only, not
to the final _score
.