[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] Neural Collaborative Filtering (2017)

๐ŸŠ ๋…ผ๋ฌธ ๋งํฌ: https://arxiv.org/pdf/1708.05031.pdf

He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T. S. (2017, April). Neural collaborative filtering.

In Proceedings of the 26th international conference on world wide web (pp. 173-182).


1. INTRODUCTION

  ๊ธฐ์กด ์ถ”์ฒœ์‹œ์Šคํ…œ ์—ฐ๊ตฌ๋Š” Collaborative Filtering์— ๊ธฐ๋ฐ˜ํ•œ Matrix Factorization ๊ด€๋ จ ์—ฐ๊ตฌ๊ฐ€ ์ฃผ๋ฅ˜์˜€๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‹จ์ˆœ inner-product๋Š” explicit ๋ฐ์ดํ„ฐ์˜ linearํ•œ ๊ด€๊ณ„๋งŒ์„ ํ‘œํ˜„ํ•œ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค. ๋”ฅ๋Ÿฌ๋‹์„ ์ถ”์ฒœ์‹œ์Šคํ…œ์— ์ ์šฉํ•œ ์—ฐ๊ตฌ๋“ค์ด ์žˆ์—ˆ์ง€๋งŒ implicitํ•œ user interaction data์˜ ๋ณต์žกํ•œ latent feature๋ฅผ ๋ฐ˜์˜ํ•˜๊ธฐ์—๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ์—ˆ๋‹ค.

 

๋ณธ ๋…ผ๋ฌธ์€ user-item ๊ฐ„์˜ implicit ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์—ฌ user preference๋ฅผ ๋” ์ž˜ ํ‘œํ˜„ํ•˜๊ณ ์ž ํ•œ๋‹ค.

์ด๋ฅผ ์œ„ํ•ด Matrix Factorization ๊ธฐ๋ฐ˜ GMF ๋ชจ๋ธ๊ณผ Neural Network ๊ธฐ๋ฐ˜ MLP ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•œ NCF ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

 

2. PRELIMINARIES

2.1 Learning from implicit data

 ์ถ”์ฒœ ์‹œ์Šคํ…œ์—์„œ user์™€ Item์˜ ๊ด€๊ณ„๋Š” Explicit / Implitcitํ•œ ๋ฐ์ดํ„ฐ๋กœ ํ‘œํ˜„๋œ๋‹ค. Explicit data๋Š” ํ‰์ , ์ข‹์•„์š”์™€ ๊ฐ™์ด ๊ณ ๊ฐ์ด ์•„์ดํ…œ  ๋Œ€ํ•ด ์ง์ ‘ ์„ ํ˜ธ๋„๋ฅผ ํ‰๊ฐ€ํ•œ ๋ช…์‹œ์  ๋ฐ์ดํ„ฐ์ด๋‹ค. ๋ฐ˜๋ฉด Implicit data๋Š” ์‹œ์ฒญ ์‹œ๊ฐ„, ๋กœ๊ทธ ๊ธฐ๋ก๊ณผ ๊ฐ™์ด ๋ฌต์‹œ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚˜๋Š” ๋ฐ์ดํ„ฐ์ด๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” user์™€ item๊ฐ„์˜ implicit data๋ฅผ ํ†ตํ•ด latent feature๋ฅผ ํ•™์Šตํ•˜๊ณ ์ž ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ํ‰์  ์—ฌ๋ถ€๋ฅผ 0 or 1๋กœ ํ‘œํ˜„ํ•œ๋‹ค.

 

2.2 Matrix Factorization

 ์„ ํ–‰ ์—ฐ๊ตฌ๋กœ MF์— ๊ด€ํ•œ ์ง€์‹์ด ํ•„์š”ํ•˜๋‹ค. ์•„๋ž˜๋Š” 2009๋…„์— ๋ฐœํ‘œ๋œ Netflix Prize MF๋…ผ๋ฌธ ๋งํฌ์ด๋‹ค. 

- Matrix Factorization Techniques for Recommender Systems (2009)

- ๋…ผ๋ฌธ ๋งํฌ: https://datajobs.com/data-science-repo/Recommender-Systems-%5BNetflix%5D.pdf

 

3. NEURAL COLLABORATIVE FILTERING

  ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•˜๋Š” new neural matrix factorization model์€ MF์™€ MLP์˜ ์•™์ƒ๋ธ”์ด๋‹ค.

๋”ฐ๋ผ์„œ MF ์˜ linearity์™€ MLP์˜ non-linearity์˜ ์žฅ์ ์„ ๋™์‹œ์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

- 3.1 General Framework

  ์ œ์•ˆํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์ธํ’‹์œผ๋กœ User์™€ item์˜ binary Identity ๋งŒ์„ one-hot encodingํ•œ sparse vector๋กœ ํ™œ์šฉํ•œ๋‹ค. ๊ฐ input layer๋Š” embedding layer๋ฅผ ํ†ต๊ณผํ•˜์—ฌ dense vector๋กœ ์ž„๋ฒ ๋”ฉ๋œ๋‹ค. ์ดํ›„ ๋ณธ ๋…ผ๋ฌธ์—์„œ ๋ช…๋ช…ํ•œ neural collaborative filtering layers๋ฅผ ์ง€๋‚˜ ์ตœ์ข… prediction scores๋ฅผ ๋„์ถœํ•œ๋‹ค.

NCF์—์„œ multi-layer-representation์„ ์ ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋ชจ๋ธ ๊ทธ๋ฆผ์€ ์œ„์™€ ๊ฐ™๋‹ค. ๋ชจ๋ธ์€ user-item interaction์ธ $y_{ui}$๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ layer์˜ ์•„์›ƒํ’‹์ด ๋‹ค์Œ layer์˜ ์ธํ’‹์œผ๋กœ ํ™œ์šฉ๋œ๋‹ค. User์™€ item์€ ์›ํ•ซ ์ธ์ฝ”๋”ฉ ํ•˜์—ฌ ์ธํ’‹ ๋ฐ์ดํ„ฐ๋กœ ํ™œ์šฉํ•œ๋‹ค.

์ธํ’‹ ๋ฐ์ดํ„ฐ๋Š” ์ž„๋ฒ ๋”ฉ ๋ ˆ์ด์–ด์™€ fully connected ๋ ˆ์ด์–ด๋ฅผ ๊ฑฐ์น˜๊ฒŒ ๋œ๋‹ค. ๊ฐ neural CF layer๋Š” ํŠน์ •ํ•œ latent structures๋ฅผ ๋ฐœ๊ฒฌํ•˜๊ธฐ ์œ„ํ•ด ์ปค์Šคํ„ฐ๋งˆ์ด์ง• ํ•  ์ˆ˜ ์žˆ๋‹ค. NCF์˜ prediction model์˜ fomulation์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

$$ \hat{y}_{ui} = f(P^T v^U_{u}, Q^T v^I_{i} |P, Q, \Theta_{f}) $$

$P \in \mathbb{R}^{M \times k}, Q \in \mathbb{R}^{N \times k}$ ๋Š” ๊ฐ๊ฐ user์™€ item์˜ latent fator matrix๋ฅผ ์˜๋ฏธํ•œ๋‹ค. $\Theta_{f} $๋Š” ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ์ด๋‹ค.

 

$$ f(P^T v^U_{u}, Q^T v^I_{i}) = \phi_{out}(\phi_{X}(...{\phi_{2}(\phi_{1}(P^T v^U_{u}, Q^T v^I_{i}))...})) $$

$\phi_{out}, \phi_{x}$๋Š” ๊ฐ๊ฐ output layer, x-th CF layer์˜ mapping function์„ ์˜๋ฏธํ•œ๋‹ค.

 

- 3.1.1 Learning NCF

  ๋ชจ๋ธ์ด ํ•™์Šตํ•˜๋Š” objective function์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค. NCF๋Š” ์•„๋ž˜ ์‹์„ ์ตœ์†Œํ™” ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•˜๋ฉฐ SGD ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•œ๋‹ค. ๋ชฉ์  ํ•จ์ˆ˜๋Š” binary cross-entropy log loss ์‹๊ณผ ๊ฐ™์œผ๋ฉฐ ํ•™์Šต์—๋Š” ๋„ค๊ฑฐํ‹ฐ๋ธŒ ์ƒ˜ํ”Œ๋ง($y^-$)์„ ์ ์šฉํ•œ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

$$ L = - \sum_{(u,i)\in y}{\log{\hat{y_{ui}}}} - \sum_{(u,j)\in y^-}{\log({1 - \hat{y_{ui}}}}) = - \sum_{(u,i)\in y\cup y^-}{y_{ui}\log{\hat{y_{ui}}}} + (1-y_{ui})\log(1-\hat{y_{ui}}) $$

 

- 3.2 Generalized Matrix Factorization (GMF)

  ๋ณธ ๋…ผ๋ฌธ์—์„œ ํฅ๋ฏธ๋กญ๊ฒŒ ์ฝ์€ ๋ถ€๋ถ„์ด๋‹ค. ์ €์ž๋“ค์€ MF๋ฅผ ์œ„ NCF ๋ชจ๋ธ์˜ special case๋ผ๊ณ  ๋งํ•œ๋‹ค. User latent vector $p_{u}$๋ฅผ $P^T v^U_{u}$๋กœ ํ‘œํ˜„ํ•˜๊ณ  Item latent vector ์—ญ์‹œ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํ‘œํ˜„ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ first neural CF layer๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ๋‘˜ ๊ฐ„์˜ element-wise product๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค.

$$ \phi_{1}({p_{u}, q_{i}}) = p_{u} \odot q_{i} $$

 

Output layer ์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

$$ \hat{y_{ui}} = a_{out}(h^T(p_{u} \odot q_{i}) $$

$ a_{out}$์€ activation function, $h$๋Š” weight๋ฅผ ์˜๋ฏธํ•œ๋‹ค. ์ด๋•Œ $a_{out}$์œผ๋กœ identity function, $h$๋ฅผ uniform vector of 1์ด๋ผ๋ฉด ์œ„ ์‹์€ ์ •ํ™•์ด MF๋ชจ๋ธ๋กœ ํ‘œํ˜„๋œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” activation์œผ๋กœ sigmoid๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ non_linearํ•œ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๋‹ค. ์ด ๋ถ€๋ถ„์„ GMF๋ผ๊ณ  ๋ช…๋ช…ํ•˜๊ณ  ์žˆ๋‹ค.

 

- 3.3 Multi-Layer Perceptron (MLP)

  MLP๋Š” user์™€ item feature๋ฅผ two path๋กœ concatenationํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋‹จ์ˆœ concat์€ latent feature์˜ interaction์„ ์ถฉ๋ถ„ํžˆ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์—ฌ๋Ÿฌ hidden layer๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์— ๋†’์€ ์œ ์—ฐ์„ฑ๊ณผ ๋น„์„ ํ˜•์„ฑ์„ ๋ถ€์—ฌํ•œ๋‹ค. ์•„๋ž˜ ์‹๊ณผ ๊ฐ™์ด MLP๋Š” layer ์ˆ˜๋งŒํผ activation ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผ์‹œํ‚จ๋‹ค. ์‹์—์„œ $W_{x}$๋Š” ๊ฐ€์ค‘์น˜ ๋งคํŠธ๋ฆญ์Šค, $b_{x}$๋Š” bias, $a_{x}$๋Š” ํ™œ์„ฑ ํ•จ์ˆ˜๋ฅผ ๋œปํ•˜๊ณ  ReLU๋ฅผ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์‹์€ ์ผ๋ฐ˜์ ์ธ ๋”ฅ๋Ÿฌ๋‹์˜ MLP๊ตฌ์กฐ์™€ ๊ฐ™๋‹ค.

$$ z_{1} = \phi(p_{u}, q_{i}) = \begin{bmatrix}p_{u} \\ q_{i} \end{bmatrix} $$

$$ \phi_{L}(z_{L-1}) = a_{L}(W^T_{L}z_{L-1} + b_{2}) $$

$$ \hat{y_{ui}} = \sigma (h^T\phi_{L}(z_{L-1})) $$

 

- 3.4 Fusion of GMF and MLP

  ์ตœ์ข…์ ์œผ๋กœ ์ œ์•ˆํ•˜๋Š” NeuMF ๋ชจ๋ธ์ด๋‹ค. ์•ž์„œ ์–ธ๊ธ‰ํ•œ ๊ฒƒ๊ณผ ๊ฐ™์ด GMF์™€ MLP๋ฅผ ํ•ฉ์นœ ๋ชจ๋ธ์ด๋‹ค. ์ด๋Ÿฌํ•œ ์•™์ƒ๋ธ” ๋ชจ๋ธ์„ ํ†ตํ•ด linearity์™€ non-linearity์˜ ์žฅ์ ์„ ๋™์‹œ์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” input ๋ฐ์ดํ„ฐ๋กœ user์™€ item ์› ํ•ซ ์ธ์ฝ”๋”ฉ ๋ฒกํ„ฐ๋ฅผ ์ž„๋ฒ ๋”ฉ layer์— ํ†ต๊ณผ์‹œํ‚จ ํ›„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด ๋•Œ ๋‘ ๋ชจ๋ธ์ด ๊ฐ™์€ embedding layer๋ฅผ shareํ•  ์ˆ˜๋„ ์žˆ์ง€๋งŒ, ๋‘ ์žฅ์ ์„ ๋™์‹œ์— ๊ณ ๋ คํ•˜๊ธฐ ์œ„ํ•ด ์ž„๋ฒ ๋”ฉ ์—ญ์‹œ ๊ฐ๊ฐ ์ง„ํ–‰ํ•œ๋‹ค. ์•„๋ž˜๋Š” ์ตœ์ข… ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์™€ ์‹์ด๋‹ค.

User์™€ Item ์›ํ•ซ ์ธ์ฝ”๋”ฉ ๋ฒกํ„ฐ๋ฅผ GMF, MLP ๊ฐ๊ฐ ์ž„๋ฒ ๋”ฉ์„ ์‹œํ‚จ ํ›„ ๊ฐ๊ฐ Layer๋“ค์„ ํ†ต๊ณผํ•˜๋ฉฐ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค. ์ดํ›„ ์ตœ์ข…์ ์œผ๋กœ ๋‘ ๋ฒกํ„ฐ๋ฅผ concatํ•œ ํ›„์— NeuMF Layer๋ฅผ ํ†ต๊ณผํ•˜์—ฌ ์ตœ์ข… $\hat{y_{ui}}$๋ฅผ ์˜ˆ์ธกํ•˜๊ณ  Log loss function์„ ํ†ตํ•ด Training์„ ์ง„ํ–‰ํ•œ๋‹ค. ์ด๋•Œ ํšจ์œจ์ ์ธ ํ•™์Šต์„ ์œ„ํ•ด (Local minimum์— ๋น ์ง€์ง€ ์•Š๊ธฐ ์œ„ํ•ด) GMF, MLP ๊ฐ๊ฐ pre-trained์œผ๋กœ ํ•™์Šต๋œ ๋‘ ๋ฒกํ„ฐ๋ฅผ concatํ•˜์—ฌ NeuMF Layer๋ฅผ ํ•™์Šตํ•œ๋‹ค. pretraining์€ Adam, NeuMF๋Š” SGD ์˜ตํ‹ฐ๋งˆ์ด์ €๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

 

$$\phi^{GMF} = p^G_{u} \odot q^G_{i},$$

$$\phi^{MLP} = a_{L}(W^T_{L}(a_{L-1}(...a_2(W^T_2 \begin{bmatrix}p^M_u \\q^M_i \end{bmatrix} + b_2)...))+b_L),$$

$$\hat{y_{ui}} = \sigma(h^T\begin{bmatrix}\phi^{GMF} \\\phi^{MLP} \end{bmatrix}),$$

  • $p^G_{u}$: User embedding for GMF
  • $p^M_{u}$: User embedding for MLP
  • $q^G_{i}$: Item embedding for GMF
  • $q^M_{i}$: Item embedding for MLP

4. EXPERIMENTS

  ๋…ผ๋ฌธ์—์„œ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•˜๋Š” research question์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

    • RQ1. Do our proposed NCF methods outperform the state-of-the-art implicit collaborative โ€€ltering methods?
    • RQ2. How does our proposed optimization framework (log loss with negative sampling) work for the recommen dation task?
    • RQ3. Are deeper layers of hidden units helpful for learning from user{item interaction data?

- ๋ฐ์ดํ„ฐ ์…‹์€ ๋ฌด๋น„๋ Œ์ฆˆ ๋ฐ์ดํ„ฐ์™€ ํ•€ํ„ฐ๋ ˆ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

 

์‹คํ—˜์„ ํ†ตํ•ด 3๊ฐ€์ง€ RQ๋“ค์„ ํ•ด๊ฒฐํ•˜์˜€๊ณ , ์„ฑ๋Šฅ ์—ญ์‹œ SOTA๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ์™€ ๊ฒฐ๋ก ์€ ์ƒ๋žตํ•˜๊ฒ ๋‹ค.


Review

 ๊ธฐ์กด MF๊ธฐ๋ฐ˜์˜ ์ถ”์ฒœ์‹œ์Šคํ…œ์„ ๊ฐœ์„ ํ•˜์—ฌ ๋”ฅ๋Ÿฌ๋‹์„ ์ ์šฉํ•œ ํ”„๋ ˆ์ž„ ์›Œํฌ๋ฅผ ์ œ์‹œํ•œ ๋…ผ๋ฌธ์ด๋‹ค. non-linearity์™€ linearity๋ฅผ ์•™์ƒ๋ธ”์„ ํ™œ์šฉํ•˜์—ฌ ๋™์‹œ์— ํ•™์Šตํ•˜๋Š” ์ ์ด ํฅ๋ฏธ๋กœ์› ๋‹ค. ๋”ฅ๋Ÿฌ๋‹ ์ถ”์ฒœ์‹œ์Šคํ…œ  ๋ถ„์•ผ์—์„œ ๋ผˆ๋Œ€๊ฐ€ ๋˜๋Š” ๋…ผ๋ฌธ์ด๊ณ , ์ถ”ํ›„์— NCF ๊ตฌ์กฐ๋ฅผ ๊ฐœ์„ ํ•œ ์†Œ๋…ผ๋ฌธ๋„ ์จ๋ณด๋ฉด ์ข‹๊ฒ ๋‹ค๋Š” ์ƒ๊ฐ์„ ํ–ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” ์œ ์ €, ์•„์ดํ…œ์˜ ์•„์ด๋ดํ‹ฐํ‹ฐ ๋ฒกํ„ฐ๋งŒ์„ ์ธํ’‹์œผ๋กœ ์‚ฌ์šฉํ–ˆ์ง€๋งŒ, ์ด๋ฅผ ๊ฐœ์„ ํ•˜์—ฌ ๋” ๋งŽ์€ ์ •๋ณด๋ฅผ ์ž„๋ฒ ๋”ฉ ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™๋‹ค. ์ดํ›„์—๋Š” NCF ์ฝ”๋“œ๋ฅผ ํฌ์ŠคํŒ…ํ•  ์˜ˆ์ •์ด๋‹ค.

๋ฐ˜์‘ํ˜•