Modelling human preferences for ranking and collaborative filtering: a probabilistic ordered partition approach

Tran, Truyen; Phung, Dinh; Venkatesh, Svetha

doi:10.1007/s10115-015-0840-9

Modelling human preferences for ranking and collaborative filtering: a probabilistic ordered partition approach

Regular Paper
Published: 13 May 2015

Volume 47, pages 157–188, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Truyen Tran^1,2,
Dinh Phung¹ &
Svetha Venkatesh¹

434 Accesses
5 Citations
Explore all metrics

Abstract

Learning preference models from human generated data is an important task in modern information processing systems. Its popular setting consists of simple input ratings, assigned with numerical values to indicate their relevancy with respect to a specific query. Since ratings are often specified within a small range, several objects may have the same ratings, thus creating ties among objects for a given query. Dealing with this phenomena presents a general problem of modelling preferences in the presence of ties and being query-specific. To this end, we present in this paper a novel approach by constructing probabilistic models directly on the collection of objects exploiting the combinatorial structure induced by the ties among them. The proposed probabilistic setting allows exploration of a super-exponential combinatorial state-space with unknown numbers of partitions and unknown order among them. Learning and inference in such a large state-space are challenging, and yet we present in this paper efficient algorithms to perform these tasks. Our approach exploits discrete choice theory, imposing generative process such that the finite set of objects is partitioned into subsets in a stagewise procedure, and thus reducing the state-space at each stage significantly. Efficient Markov chain Monte Carlo algorithms are then presented for the proposed models. We demonstrate that the model can potentially be trained in a large-scale setting of hundreds of thousands objects using an ordinary computer. In fact, in some special cases with appropriate model specification, our models can be learned in linear time. We evaluate the models on two application areas: (i) document ranking with the data from the Yahoo! challenge and (ii) collaborative filtering with movie data. We demonstrate that the models are competitive against state-of-the-arts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Collaborative Filtering, Matrix Factorization and Population Based Search: The Nexus Unveiled

Bayesian analysis of ranking data with the Extended Plackett–Luce model

Article 12 March 2020

Markov Chain Monte Carlo for Effective Personalized Recommendations

Notes

This rating-to-rank conversion is not reversible since we cannot generally infer ratings from a ranking. First, the top rating for each query is always converted into rank 1 even if this is not the maximum score in the rating scale. Second, there are no gaps in ranking, while it is possible that we may rate the best object by 5 stars, but the second best by 3 stars.
http://www.netflixprize.com/.
We are aware that clickthrough data can help to obtain a complete ordering, but the data may be noisy.
We caution the confusion between ‘rating’ and ‘ranking’ here. Ranking is the process of sorting a set of objects in an increasing or decreasing order, whereas in ‘rating’, each object is given with a value indicating its preference.
Strictly speaking, a partition can be an empty set but we deliberately left out this case, because empty sets do not contribute to the probability mass of the model, and it does not match the real-world intuition of object’s worth.
More precisely, when the number of partitions K is given, the cardinality ranges from 1 to $N-K+1$ since partitions are non-empty.
This process resembles the generative process of Plackett–Luce discrete choice model [35, 41], except we apply on partitions rather than single element. It clear from here that Plackett–Luce model is a special case of ours wherein each partition $ X _{k}$ reduces to a singleton.
The usual understanding would also contain the empty set, but we exclude it in this paper.
i.e. the function value does not depend on the order of elements within the partition.
To illustrate this intuition, suppose the remainder set is $R_{k}=\left\{ a,b\right\} $, hence its power set, excluding $\emptyset $, contains 3 subsets $\left\{ a\right\} ,\left\{ b\right\} ,\left\{ a,b\right\} $. Under the arithmetic mean assumption, the denominator in Eq. (7) becomes $\phi \left( r_{a}\right) +\phi \left( r_{b}\right) +\frac{1}{2}\left\{ \phi \left( r_{a}\right) +\phi \left( r_{b}\right) \right\} =(1+\frac{1}{2})\sum _{x\in \left\{ a,b\right\} }\phi \left( r_{x}\right) $. The constant term is $C=\frac{3}{2}$ in this case.
To be more precise, for $k=1$ we define $ X _{1:0}$ to be $\emptyset $.
This is 2-D because we also need to index the parameters as well as the subsets.
We especially thank the reviewer who pointed out that the computation could be efficient for this case.
Please note that these states are defined for the Markov random field under study only.
We note a confusion that may arise here is that, although during training each training query q is supplied with a list of related objects and their ratings, during the ranking phase the system still needs to return a ranking over the list of related objects for an unseen query.
In document querying, for example, the list may consist of all documents which contain one or more query words.
Note that generally $K\le M+1$ because there may be gaps in rating scales for a specific query.
This is much larger than the commonly used LETOR 3.0 and 4.0 data sets. In the preparation of this manuscript, we learnt that Microsoft had released two large sets of comparable size with that of Yahoo! but due to time constraint, we do not report the results here.
Strictly speaking, RankNet makes use of neural networks as the scoring function, but the overall loss is still logistic, and for simplicity, we use simple perceptron.
We are aware that the results should be associated with the error bars, however, since the data are huge, running the experiments repeatedly is extremely time-consuming.
Our result using second-order features was submitted to the Yahoo! challenge and obtained a position in the top 4 % over 1055 teams, given that our main purpose was to propose a new theoretical and useful model.
http://grouplens.org/node/73.
The code is available at: http://www.cofirank.org/downloads. We implement a simple wrapper to compute the ERR and NDCG scores (at various positions), which are not available in the code.
Note that, this is different from saying the states of variables are independent.

References

Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17(6):734–749
Article Google Scholar
Becchetti L, Colesanti UM, Marchetti-Spaccamela A, Vitaletti A (2011) Recommending items in pervasive scenarios: models and experimental analysis. Knowl Inf Syst 28(3):555–578
Article Google Scholar
Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs. Biometrika 39:324–345
MathSciNet MATH Google Scholar
Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Article Google Scholar
Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of ICML, 96
Cao Z, Qin T, Liu TY, Tsai MF, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th international conference on machine learning, 136 pp. ACM
Carreira-Perpiñán MA, Hinton GE (2005) On contrastive divergence learning. In: Cowell RG, Ghahramani Z (eds) Proceedings of the 10th international workshop on artificial intelligence and statistics (AISTATS). Society for Artificial Intelligence and Statistics, Barbados, pp 33–40, Jan 6–8
Chapelle O, Chang Y (2011) Yahoo! learning to rank challenge overview. JMLR workshop and conference proceedings, vol 14, pp 1–24
Chapelle O, Metlzer D, Zhang Y, Grinspan P (2009) Expected reciprocal rank for graded relevance. In: CIKM. ACM, pp 621–630
Chu W, Ghahramani Z (2006) Gaussian processes for ordinal regression. J Mach Learn Res 6(1):1019
MathSciNet MATH Google Scholar
Chu W, Keerthi SS (2007) Support vector ordinal regression. Neural Comput 19(3):792–815
Article MathSciNet MATH Google Scholar
Cossock D, Zhang T (2008) Statistical analysis of Bayes optimal subset ranking. IEEE Trans Inf Theory 54(11):5140–5154
Article MathSciNet MATH Google Scholar
Davidson RR (1970) On extending the Bradley-Terry model to accommodate ties in paired comparison experiments. J Am Stat Assoc 65(329):317–328
Article Google Scholar
Diaconis P (1988) Group representations in probability and statistics. Institute of Mathematical Statistics Hayward, CA
MATH Google Scholar
Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1–2):31–71
Article MATH Google Scholar
Fligner MA, Verducci JS (1988) Multistage ranking models. J Am Stat Assoc 83(403):892–901
Article MathSciNet MATH Google Scholar
Freund Y, Iyer R, Schapire RE, Singer Y (2004) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4(6):933–969
MathSciNet MATH Google Scholar
Fürnkranz J, Hüllermeier E (2010) Preference learning. Springer, New York
MATH Google Scholar
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell PAMI 6(6):721–742
Article MATH Google Scholar
Glenn WA, David HA (1960) Ties in paired-comparison experiments using a modified Thurstone-Mosteller model. Biometrics 16(1):86–109
Article MATH Google Scholar
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109
Article MathSciNet MATH Google Scholar
Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14:1771–1800
Article MATH Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet MATH Google Scholar
Huang J, Guestrin C, Guibas L (2009) Fourier theoretic probabilistic inference over permutations. J Mach Learn Res 10:997–1070
MathSciNet MATH Google Scholar
Huang TK, Weng RC, Lin CJ (2006) Generalized Bradley-Terry models and multi-class probability estimates. J Mach Learn Res 7:115
MathSciNet MATH Google Scholar
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst TOIS 20(4):446
Google Scholar
Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of SIGKDD. ACM, New York, NY, USA, pp 133–142
Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93
Article MathSciNet MATH Google Scholar
Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: KDD
Lauritzen SL (1996) Graphical models. Oxford Science Publications, Oxford
MATH Google Scholar
Lebanon G, Mao Y (2008) Non-parametric modeling of partially ranked data. J Mach Learn Res 9:2401–2429
MathSciNet MATH Google Scholar
Leung CW, Chan SC, Chung F (2006) A collaborative filtering framework based on fuzzy association rules and multiple-level similarity. Knowl Inf Syst 10(3):357–381
Article Google Scholar
Liu NN, Zhao M, Yang Q (2009) Probabilistic latent preference analysis for collaborative filtering. In: CIKM. ACM, pp 759–766
Liu TY (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331
Article Google Scholar
Luce RD (1959) Individual choice behavior. Wiley, New York
MATH Google Scholar
Mallows CL (1957) Non-null ranking models. I. Biometrika 44(1):114–130
Article MathSciNet MATH Google Scholar
Marden JI (1995) Analyzing and modeling rank data. Chapman & Hall/CRC, London
MATH Google Scholar
Marlin B, Swersky K, Chen B, de Freitas N (May 2010) Inductive principles for restricted boltzmann machine learning. In: Proceedings of the 13rd international conference on artificial intelligence and statistics, Chia Laguna Resort, Sardinia, Italy
Mureşan M (2008) A concrete approach to classical analysis. Springer, Berlin
MATH Google Scholar
Neal RM (2001) Annealed importance sampling. Stat Comput 11(2):125–139
Article MathSciNet Google Scholar
Plackett RL (1975) The analysis of permutations. Appl Stat 24(2):193–202
Article MathSciNet Google Scholar
Rao PV, Kupper LL (1967) Ties in paired-comparison experiments: a generalization of the Bradley-Terry model. J Am Stat Assoc 62(317):194–204
Article MathSciNet Google Scholar
Resnick P, Iacovou N, Suchak M, Bergstorm P, Riedl J (1994) GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of ACM conference on computer supported cooperative work. Chapel Hill, North Carolina. ACM, pp 175–186
Sarwar B, Karypis G, Konstan J, Reidl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on World Wide Web. ACM Press, New York, NY, USA, pp 285–295
Shi Y, Larson M, Hanjalic A (2010) List-wise learning to rank with matrix factorization for collaborative filtering. In: ACM RecSys. ACM, pp 269–272
Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101
Article Google Scholar
Tieleman T, Hinton G (2009) Using fast weights to improve persistent contrastive divergence. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, NY, USA
Truyen T, Phung DQ, Venkatesh S (2011) Probabilistic models over ordered partitions with applications in document ranking and collaborative filtering. In: Proceedings of SIAM conference on data mining (SDM), Mesa, Arizona, USA. SIAM
van Lint JH, Wilson RM (1992) A course in combinatorics. Cambridge University Press, Cambridge
MATH Google Scholar
Vembu S, Gärtner T (2010) Label ranking algorithms: a survey. In Preference learning, p 45
Volkovs MN, Zemel RS (2009) BoltzRank: learning to maximize expected ranking gain. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, NY, USA
Weimer M, Karatzoglou A, Le Q, Smola A (2008) CoFi$^{RANK}$-maximum margin matrix factorization for collaborative ranking. Adv Neural Inf Process Syst 20:1593–1600
Google Scholar
Xia F, Liu TY, Wang J, Zhang W, Li H (2008) Listwise approach to learning to rank: theory and algorithm. In: Proceedings of ICML, pp 1192–1199
Younes L (1989) Parametric inference for imperfectly observed Gibbsian fields. Probab Theory Relat Fields 82(4):625–645
Article MathSciNet MATH Google Scholar
Zhou K, Xue GR, Zha H, Yu Y (2008) Learning to rank with ties. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 275–282

Download references

Acknowledgments

We thank anonymous reviewers for constructive comments, in particular, the suggestion of minimum/maximum aggregation functions.

Author information

Authors and Affiliations

Center for Pattern Recognition and Data Analytics, School of IT, Deakin University, 75 Pigdons Rd, Waurn Ponds, VIC, 3216, Australia
Truyen Tran, Dinh Phung & Svetha Venkatesh
Department of Computing, Curtin University, Kent St, Bentley, WA, 6102, Australia
Truyen Tran

Authors

Truyen Tran
View author publications
You can also search for this author in PubMed Google Scholar
Dinh Phung
View author publications
You can also search for this author in PubMed Google Scholar
Svetha Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Truyen Tran.

Appendix

1.1 Computing $C_{k}$

We here calculate the constant $C_{k}$ in Eq. (11). Let us rewrite the equation for ease of comprehension

$$\begin{aligned} \sum _{S\in 2^{R_{k}}}\frac{1}{\left| S\right| }\sum _{x\in S}\phi _{k}\left( x\right) =C_{k}\times \sum _{x\in R_{k}}\phi _{k}(x) \end{aligned}$$

where $2^{R_{k}}$ is the power set with respect to the set $R_{k}$, or the set of all non-empty subsets of $R_{k}$. Equivalently

$$\begin{aligned} C_{k}=\sum _{S\in 2^{R_{k}}}\frac{1}{\left| S\right| }\sum _{x\in S}\frac{\phi _{k}\left( x\right) }{\sum _{x\in R_{k}}\phi _{k}(x)} \end{aligned}$$

If all objects are the same, then this can be simplified to

$$\begin{aligned} C_{k}= & {} \sum _{S\in 2^{R_{k}}}\frac{1}{\left| S\right| }\sum _{x\in S}\frac{1}{N_{k}}=\frac{1}{N_{k}}\sum _{S\in 2^{R_{k}}}1=\frac{2^{N_{k}}-1}{N_{k}} \end{aligned}$$

where $N_{k}=|R_{k}|$. In the last equation, we have made use of the fact that $\sum _{S\in 2^{R_{k}}}1$ is the number of all possible non-empty subsets, or equivalently, the size of the power set, which is known to be $2^{N_{k}}-1$. One way to derive this result is the imagine a collection of $N_{k}$ variables, each has two states: $\mathtt {selected}$ and $\mathtt {notselected}$, where $\mathtt {selected}$ means the object belongs to a subset. Since there are $2^{N_{k}}$ such configurations over all states, the number of non-empty subsets must be $2^{N_{k}}-1$.

For arbitrary objects, let us examine the probability that the object x belongs to a subset of size m, which is $\frac{m}{N_{k}}$. Recall from standard combinatorics that the number of m-element subsets is the binomial coefficient $\left( {{\begin{array}{ll} N_{k}\\ m\\ \end{array}}}\right) $, where $1\le m\le N_{k}$. Thus the number of times an object appears in any m-subset is $\left( {{\begin{array}{ll} N_{k}\\ m\\ \end{array}}}\right) \frac{m}{N_{k}}$. Taking into account that this number is weighted down by m (i.e. |S| in Eq. (11)), the contribution towards $C_{k}$ is then $\left( {{\begin{array}{ll} N_{k}\\ m\\ \end{array}}}\right) \frac{1}{N_{k}}$. Finally, we can compute the constant $C_{k}$, which is the weighted number of times an object belongs to any subset of any size, as follows

$$\begin{aligned} C_{k}= & {} \sum _{m=1}^{N_{k}} \left( {{\begin{array}{ll} N_{k}\\ m\\ \end{array}}}\right) \frac{1}{N_{k}}=\frac{1}{N_{k}}\sum _{m=1}^{N_{k}}\left( {{\begin{array}{ll} N_{k}\\ m\\ \end{array}}}\right) =\frac{2^{N_{k}}-1}{N_{k}} \end{aligned}$$

We have made use of the known identity $\sum _{m=1}^{N_{k}} \left( {{\begin{array}{ll} N_{k}\\ m\\ \end{array}}}\right) =2^{N_{k}}-1$.

1.2 Computing $M_{k}(x)$

We now calculate the constant $M_{k}(x)$ in Eq. (17), which is reproduced here for clarity:

$$\begin{aligned} \sum _{S \in 2^{R_{k}}}\max _{x\in S}\phi (x)=\sum _{x\in R_{k}}M_{k}(x)\phi (x) \end{aligned}$$

(27)

First, we arrange the objects in the decreasing order of worth $\phi (x)$. For notation convenience we assume that the order is $1,2,3,\ldots ,N_{k}$. The largest object will appear in a subset consisting of only itself, and $2^{N_{k}-1}-1$ other subsets. Thus $M_{k}(1)=2^{N_{k}-1}$ since for all subsets to which the largest object belong, the maximum aggregation is the worth of the object, as per definition. Now removing the largest object, consider the second largest one. With the same argument as before, $M_{k}(2)=2^{N_{k}-2}$. Continuing the same line of reasoning, we end up $M_{k}(n)=2^{N_{k}-n}$.

1.3 Pairwise losses

Let $f(x_{i},w)$ be the scoring function parameterised by w that takes the input vector $x_{i}$ and outputs a real value indicating the relevancy of the object i. Let $\delta _{ij}(w)=f(x_{i},w)-f(x_{j},w)$. Pairwise models are quite similar in their general setting. The only difference is the specific loss function:

$$\begin{aligned} \ell (x_{i}\succ x_{j};w)={\left\{ \begin{array}{ll} \log (1+\exp \left\{ -\delta _{ij}(w)\right\} ) &{} \hbox { (RankNet) }\\ \max \left\{ 0,1-\delta _{ij}(w)\right\} &{} \hbox { (Ranking SVM)}\\ (1-\delta _{ij}(w))^{2} &{} \hbox { (Rank Regress)}\\ \exp \left\{ -\delta _{ij}(w)\right\} &{} \hbox { (Rank Boost)} \end{array}\right. } \end{aligned}$$

However, these losses behave quite different from each other. For the RankNet and Rank Boost, minimising the loss would widen the margin between the score for $x_{i}$ and $x_{j}$ as much as possible. The difference is that the RankNet is less sensitive to noise due to the log-scale. The Ranking SVM, however, aims just about to achieve the margin of 1, and the Rank Regress, attempts to bound the margin by 1.

At the first sight, the cost for gradient evaluation in pairwise losses would be $\mathcal {O}(0.5N(N-1)F)$ where F is the number of parameters. However, we can achieve $\max \{\mathcal {O}(0.5N(N-1)),\mathcal {O}(NF)\}$ as follows. The overall loss for a particular query is

$$\begin{aligned} \mathfrak {L}= & {} \sum _{i,j|x_{i}\succ x_{j}}\ell (x_{i}\succ x_{j};w) \end{aligned}$$

Taking derivative with respect to w yields

$$\begin{aligned} \frac{\partial \mathfrak {L}}{\partial w}= & {} \sum _{i,j|x_{i}\succ x_{j}} \frac{\partial \ell (x_{i} \succ x_{j};w)}{\partial \delta _{ij}}\left( -\frac{\partial f_{i}}{\partial w}+\frac{\partial f_{j}}{\partial w}\right) \\= & {} -\sum _{i}\frac{\partial f_{i}}{\partial w}\sum _{j|x_{i}\succ x_{j}}\frac{\partial \ell (x_{i}\succ x_{j};w)}{\partial \delta _{ij}}+\sum _{j}\frac{\partial f_{j}}{\partial w}\sum _{i|x_{i}\succ x_{j}}\frac{\partial \ell (x_{i}\succ x_{j};w)}{\partial \delta _{ij}} \end{aligned}$$

As $\left\{ \frac{\partial \ell (x_{i}\succ x_{j};w)}{\partial \delta _{ij}}\right\} _{i,j|x_{i}\succ x_{j}}$ can be computed in $\mathcal {O}(0.5N(N-1))$ time, and $\left\{ \frac{\partial f_{i}}{\partial w}\right\} _{i}$ in $\mathcal {O}(NF)$ time, the overall cost would be $\max \{\mathcal {O}(0.5N(N-1)),\mathcal {O}(NF)\}$.

1.4 Learning the pairwise ties models

This subsection describes the details of learning the paired ties models discussed in Sect. 6.

1.4.1 Davidson method

Recall from Sect. 2 that in the Davidson method, the probability masses are defined as

$$\begin{aligned} P(x_{i}\succ x_{j};w)= & {} \frac{1}{Z_{ij}}\phi (x_{i});\quad P(x_{i}\sim x_{j};w)=\frac{1}{Z_{ij}}\nu \sqrt{\phi (x_{i})\phi (x_{j})} \end{aligned}$$

where $Z_{ij}=\phi (x_{i})+\phi (x_{j})+\nu \sqrt{\phi (x_{i})\phi (x_{j})}$ and $\nu \ge 0$. For simplicity of unconstrained optimisation, let $\nu =e^{\beta }$ for $\beta \in \mathbb {R}$. Let $P_{i}=P(x_{i}\succ x_{j};w)$, $P_{j}=P(x_{j}\succ x_{i};w)$ and $P_{ij}=P(x_{i}\sim x_{j};w)$.

Taking derivatives of the log-likelihood gives

$$\begin{aligned} \frac{\partial \log P(x_{i}\succ x_{j};w)}{\partial w}= & {} (1-P_{i}-0.5P_{ij})\frac{\partial \log \phi (x_{i},w)}{\partial w}-(P_{i}+0.5P_{ij})\frac{\partial \log \phi (x_{j},w)}{\partial w}\\ \frac{\partial \log P(x_{i}\succ x_{j};w)}{\partial \beta }= & {} -P_{ij}\\ \frac{\partial \log P(x_{i}\sim x_{j};w)}{\partial w}= & {} (0.5-P_{i}-0.5P_{ij})\frac{\partial \log \phi (x_{i},w)}{\partial w}\\&+(0.5-P_{j}-0.5P_{ij})\frac{\partial \log \phi (x_{j},w)}{\partial w}\\ \frac{\partial \log P(x_{i}\sim x_{j};w)}{\partial \beta }= & {} 1-P_{ij}. \end{aligned}$$

1.4.2 Rao-Kupper method

Recall from Sect. 2 that the Rao-Kupper model defines the following probability masses

$$\begin{aligned} P(x_{i}\succ x_{j};w)= & {} \frac{\phi (x_{i})}{\phi (x_{i})+\theta \phi (x_{j})}\\ P(x_{i}\sim x_{j};w)= & {} \frac{(\theta ^{2}-1)\phi (x_{i})\phi (x_{j})}{\left[ \phi (x_{i})+\theta \phi (x_{j})\right] \left[ \theta \phi (x_{i})+\phi (x_{j})\right] } \end{aligned}$$

where $\theta \ge 1$ is the ties factor and w is the model parameter. Note that $\phi (.)$ is also a function of w, which we omit here for clarity. For ease of unconstrained optimisation, let $\theta =1+e^{\alpha }$ for $\alpha \in \mathbb {R}$. In learning, we want to estimate both $\alpha $ and w. Let

$$\begin{aligned} P_{i}= & {} \frac{\phi (x_{i})}{\phi (x_{i})+(1+e^{\alpha })\phi (x_{j})};\quad P_{j}^{*}=\frac{\phi (x_{j})}{\phi (x_{i})+(1+e^{\alpha })\phi (x_{j})};\\ P_{i}^{*}= & {} \frac{\phi (x_{i})}{(1+e^{\alpha })\phi (x_{i})+\phi (x_{j})};\quad P_{j}=\frac{\phi (x_{j})}{(1+e^{\alpha })\phi (x_{i})+\phi (x_{j})}. \end{aligned}$$

Taking partial derivatives of the log-likelihood gives

$$\begin{aligned} \frac{\partial \log P(x_{i}\succ x_{j};w)}{\partial w}= & {} (1-P_{i})\frac{\partial \log \phi (x_{i},w)}{\partial w}-(1+e^{\alpha })P_{j}\frac{\partial \log \phi (x_{j},w)}{\partial w} \\ \frac{\partial \log P(x_{i}\succ x_{j};w)}{\partial \alpha }= & {} -P_{j}e^{\alpha } \\ \frac{\partial \log P(x_{i}\sim x_{j};w)}{\partial w}= & {} (1-P_{i}-(1+e^{\alpha })P_{i}^{*})\frac{\partial \log \phi (x_{i},w)}{\partial w} \\&+\, (1-P_{j}-(1+e^{\alpha })P_{j}^{*})\frac{\partial \log \phi (x_{j},w)}{\partial w} \\ \frac{\partial \log P(x_{i}\sim x_{j};w)}{\partial \alpha }= & {} \left( \frac{2(1+e^{\alpha })}{(1+e^{\alpha })^{2}-1}-P_{i}^{*}-P_{j}^{*}\right) e^{\alpha }. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tran, T., Phung, D. & Venkatesh, S. Modelling human preferences for ranking and collaborative filtering: a probabilistic ordered partition approach. Knowl Inf Syst 47, 157–188 (2016). https://doi.org/10.1007/s10115-015-0840-9

Download citation

Received: 17 September 2013
Revised: 05 October 2014
Accepted: 30 April 2015
Published: 13 May 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s10115-015-0840-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modelling human preferences for ranking and collaborative filtering: a probabilistic ordered partition approach

Abstract

Access this article

Similar content being viewed by others

Collaborative Filtering, Matrix Factorization and Population Based Search: The Nexus Unveiled

Bayesian analysis of ranking data with the Extended Plackett–Luce model

Markov Chain Monte Carlo for Effective Personalized Recommendations

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Computing \(C_{k}\)

1.2 Computing \(M_{k}(x)\)

1.3 Pairwise losses

1.4 Learning the pairwise ties models

1.4.1 Davidson method

1.4.2 Rao-Kupper method

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modelling human preferences for ranking and collaborative filtering: a probabilistic ordered partition approach

Abstract

Access this article

Similar content being viewed by others

Collaborative Filtering, Matrix Factorization and Population Based Search: The Nexus Unveiled

Bayesian analysis of ranking data with the Extended Plackett–Luce model

Markov Chain Monte Carlo for Effective Personalized Recommendations

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Computing \(C_{k}\)

1.2 Computing \(M_{k}(x)\)

1.3 Pairwise losses

1.4 Learning the pairwise ties models

1.4.1 Davidson method

1.4.2 Rao-Kupper method

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation