Recently working on a recommendation engine, I stumbled across an interesting problem. I wanted to put some automated checks in place so that if the newly generated list of recommendations significantly differ from the previous one then the system raises some kind of flags (such as an email alert). The challenge was however how to quantify similarity or differences between the new and the old recommendation list. The output of the recommendation engine is a sorted list of items. Thus, the problem was to quantify similarity between two ranked lists.

As it turns out, the problem of comparing two ranked lists is already known and quite popular. A very interesting paper on the topic is “A Similarity Measure for Indefinite Rankings“. Not only the literature review is very informative, but even the suggested approach seems to be comprehensive. Below is a brief summary of the paper. Also, I have put together an implementation of rank biased overlap (RBO) that is mentioned in the above linked paper.

There are broadly two different approaches to compare ranked lists: (1) Rank Correlation and (2) Set Based Measure.

**Ranked Correlation:**

Rank correlation based approaches such as Kendall Tau essentially measure the probability of two items being in the same order in the two ranked lists. For instance if item A appears before the item B in list 1, then what’s the probability that the item A also appears before the item B in list 2. Mathematically, Kendall Tau is computed as

where:

- N = Total number of pairs = , where n is the number of items in a list.
- C = Number of concordant pairs i.e number of pairs for which relative ordering (as explained above) is preserved in the two lists. Thus, probability of concordant pairs .
- D = Number of discordant pairs i.e. number of pairs for which relative ordering is reversed. Thus, probability of discordant pairs .

There are however certain challenges with all the rank correlation based measures. Often referred as **dis-jointness problem**, one of the challenges is what happens if an item is present only in one of the list. In this case there is no way to calculate number of concordant or discordant pairs unless you make certain assumptions about the missing item. There are number of extensions of Kendall Tau that make certain assumptions to deal with the above dis-jointness problem. For instance, a common approach is to assume that the missing item is at the bottom of the list.

Another problem, known as **top-weighteness**, with the Kendall Tau is that the rank position of an item has no effect on the final similarity score. For instance, consider three lists:

A: [a, b, c, d, e]

B: [b, a, c, d, e]

C: [a, b, c, e, d]

As compared to list A, both list B and C switch one item pair (items a and b are switched in B and items e and d are switched in list C), but at different rank positions. However, both have get the same Kendall Tau score of 0.8. However, in some cases (such as search engine), the top ranking items are more important as compared to lower ranking items. Hence, it is expected to give lower score to a list that switches item position at a higher rank position (List B) as compared to a list that switches items at a lower ranked position (List C).

**Set Based Measure:**

The general idea of set intersection is well known. By itself, set intersection has no concept of rank. It just considers two bag of items and returns the list of items that are common in the two bags. However, researchers and mathematicians have been able to able to use the idea of set intersection in some innovative ways to quantify similarity between two ranked list. In general, the idea is to determine the fraction of content overlapping at different depths. For instance, we compare List A and B in the above example, the length of set intersection at various depth is shown below

Depth | Items in List A @ k | Items in List B @ k | Set intersection | Fraction |

1 | a | b | {} | 0/2 = 0 |

2 | a,b | b,a | {a,b} | 2/2 = 1 |

3 | a,b,c | b,a,c | {a,b,c} | 3/3 = 1 |

4 | a,b,c,d | b,a,c,d | {a,b,c,d} | 4/4 = 1 |

5 | a,b,c,d,e | b,a,c,e | {a,b,c,d,e} | 5/5 = 1 |

Once we have fraction of content overlapping at various depth, one can either plot the distribution to study how similar two lists are or, as in the case of Average overlap score, return the average of the last column. Since observing a common item at higher rank contributes to all the lower ranked intersections, this kind of approach is naturally top-weighted i.e. it gives more importance to items at the higher rank. For instance, if we compare List A and B in the above example, the average overlap score at depth 5 is 4/5=0.8. However, if comparing List A and C, the average overlap score is .

Rank Biased Overlap (RBO) extends the above idea of average overlap in two ways. First, the above approach is not bounded. That is the average overlap value can range anywhere from zero to infinity and therefore is not useful. In order to address this issue, RBO uses geometric series. One nice property of geometric series that is used in RBO is that it is a type of convergent series. That is, it can be easily shown that the indefinite sum of geometric series is finite and is given as . Thus, using geometric series, RBO is able to bound the unbounded average overlap score for indefinite rank position. However, another advantage of using geometric series in RBO. Since the values in geometric series decreases with the increasing depth, it allows RBO to explicitly model user’s behavior i.e. the likelihood of going from a given rank position i to i+1. This is the crux of RBO but if you read the above paper then you will many more things such as how to extend the idea to lists that are of different lengths, etc.

Quick Links

1. RBO Implementation

How can the average overlap score ever be above 1?

Also you have 0.92 in your overlap score but your github code gives 0.95. I figured out its because you count the overlap of [a,b,c,d] and [a,b,c,e] as 3/4 vs. 3/5 in the article.

Thanks for capturing that. You are right. Average overlap should be 0.95. I have fixed it.

Looks like you might have a typo; depth 5 of your table for items in list B should read [b,a,c,d,e] not [b,a,c,e].

Thanks for this!

I believe he meant to write that the

weightsare unbounded and that RBO’sweightsare bounded because they follow a convergent, geometric series.Dear Ritesch,

thank you for this post. I am very interested in ranking problems and the RBO approach looks very good. I was in all sorts of bother before I read your post and had not got further than Kendall’s Tau (that great man). I have a copy of his ‘Multivariate Analysis’ and he writes simply, extremely clearly and best of all compactly!

If you have any other references on this I would really appreciate anything you can pass on. I am mathematically horribly naive so I have difficulty finding relevant literature because I don’t know the terminology.

In particular, I am interested in working on ranking subjective estimates (which I guess encompasses recommendations) because it seems to me that many intuitive estimations can be very good and I can not see any other approach than ranking to quantify them.

Another problem seems to be the time evolution of rankings which I am sure must be worth a good look at.

Anyway, thank you again and if I get anything working I will pop it up on git hub along with any other documents and sent you the address.

regards,

Jeremy

Very useful article. Thanks a lot.

Nice post. Here’s a paper that I came across which connects information theory and data compression to the problem. Pretty easy to read: http://arxiv.org/abs/1310.0110

Nice article, but I do not find any explanation on how to use the implemented RBO of the link you provided here

You mean how to use the RBO function or in theory how to use RBO ? For python example, below is a very simple example:

from measures.rankedlist import RBO

x = ['a', 'b', 'c', 'd']

y = ['b', 'a', 'c', 'd']

rbo1 = RBO.score(x, y)

Excellent write up! What recommendations would you have for applying RBO to lists of different length? Current implementation will treat [‘a’,’b’,’c’,’d’] vs. [‘a’] as a RBO of 1.0, but realistically it should be less than 1.0.

I don’t think so. RBO can handle lists of different length.

Thank you for the great article!! Can you give me the reference to the “Average overlap score”. It is easy to understand and apply, but I need to cite the scientific article. (I could not find anything called Average overlap score)