Neural Collaborative Filtering论文笔记

zoukankan html css js c++ java

Neural Collaborative Filtering论文笔记
NCF论文笔记

MF

Although some recent work has employed deep learning for recommendation, they primarily used it to model auxiliary information, such as textual descriptions of items and acoustic features of musics. When it comes to model the key factor in collaborative ﬁltering — the interaction between
user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items. By replacing the inner product with a neural architecture that can learn an arbitrary function from data, we present a general framework named NCF, short for Neural networkbased Collaborative Filtering. NCF is generic and can express and generalize matrix factorization under its framework. To supercharge NCF modelling with non-linearities,we propose to leverage a multi-layer perceptron to learn the user–item interaction function.

以前神经网络做推荐系统时，只是应用到了辅助信息，user和item的交互仍然是矩阵分解。NCF这篇paper的idea主要是是用多层神经网络给用户和数据之间的交互建模，
- 隐式反馈，例如浏览记录、购买记录等等。paper探讨了如何利用DNN来模拟噪声隐式反馈信号的中心问题。
一般来说得到的显示反馈的信息其实是很少的，毕竟看的会比买的多

逐点损失（point-wise loss）：最小化(hat{oldsymbol{y}}_{u i}) 与(y_{u i})之间的差距。
成对损失（pair-wise loss）：最大化观察到信息的entryy^uiy^ui与未观察到信息的entryy^uiy^ui之间的差距。

Let (M) and (N) denote the number of users and items, respectively. We define the user-item interaction matrix (mathbf{Y} in mathbb{R}^{M imes N}) from users' implicit feedback as, (y_{u i}=left{egin{array}{ll}1, & ext { if interaction }( ext { user } u, ext { item } i) ext { is observed } \ 0, & ext { otherwise }end{array} ight.)
Here a value of 1 for (y_{u i}) indicates that there is an interaction between user (u) and item (i ;) however, it does not mean (u) actually likes (i .) Similarly, a value of 0 does not necessarily mean (u) does not like (i,) it can be that the user is not aware of the item. This poses challenges in learning from implicit data, since it provides only noisy signals about users' preference. While observed entries at least reflect users' interest on items, the unobserved entries can be just missing data and there is a natural scarcity of negative fe

M和N表示分别表示user和item矩阵，y_{ui} 值为 1 仅表示用户 u 与项目i之间有交互信息，并不意味 u 喜欢 i；同样的(y_ui) 值为 0 也并不表示u 讨厌 i。这其中缺少负反馈信息。这也是隐性反馈的一个挑战，因为它给用户偏好信息带来了干扰。

MF associates each user and item with a real-valued vector of latent features. Let (mathbf{p}_{u}) and (mathbf{q}_{i}) denote the latent vector for user (u) and item (i,) respectively; MF estimates an interaction (y_{u i}) as the inner product of (mathbf{p}_{u}) and (mathbf{q}_{i})
[
hat{y}{u i}=fleft(u, i | mathbf{p}{u}, mathbf{q}{i} ight)=mathbf{p}{u}^{T} mathbf{q}{i}=sum{k=1}^{K} p_{u k} q_{i k}
]
where (K) denotes the dimension of the latent space. As we can see, MF models the two-way interaction of user and item latent factors, assuming each dimension of the latent space is independent of each other and linearly combining them with the same weight. As such, MF can be deemed as a linear model of latent factors.

MF用(p_u)和 (q_i) 的内积来评估它们之间的交互(y^ui),MF模型是用户和项目的潜在因素的双向互动，它假设潜在空间的每一维都是相互独立的并且用相同的权重将它们线性结合。MF可视为潜在因素（latent factor）的线性模型

下图中，左边为原始的 user-item 矩阵，观察这个矩阵可以计算出，u1,u2,u3 之间的相似度。如果将矩阵进行分解，将 item 的向量降维至 2 维，pi 为 ui 的隐向量。右图中各向量的夹角可以正确地表达 u1,u2,u3 之间的相似度，u2 和 u3 最相似，u1 和 u2 的相似度大于和 u3 的相似度。

观察虚线框中的 u4，它与 u1 最接近，其次是 u3，最后才是 u2。但在隐空间中，这种关系没法表示出来。p4 要想和 p1 的夹角最小，那么它必然和 p2 的夹角要小于和 p3 的夹角。

缺点
上面这个问题，直观地想，会在使用隐向量计算相似度的时候存在问题，因为相似度是用夹角衡量的。但怎么能说明 MF 使用内积来估计评分是有问题的呢？夹角大小关系在降维后出现了错乱，而 cosine 的分子上其实就是两个向量的内积。可能这能间接地说明，使用内积不足以可靠地预测评分。

NCF

paper里面的框架

To permit a full neural treatment of collaborative filtering, we adopt a multi-layer representation to model a user-item interaction (y_{u i}) as shown in Figure (2,) where the output of one layer serves as the input of the next one. The bottom input layer consists of two feature vectors (mathbf{v}_{u}^{U}) and (mathbf{v}_{i}^{I}) that describe user (u) and item (i,) respectively; **they can be customized to support a wide range of modelling of users and items, such as context-aware ([28,1],) content-based ([3],) and neighbor based ([26] .) since this work focuses on the pure collaborative filtering setting, **we use only the identity of a user and an item as the input feature, transforming it to a binarized sparse vector with one-hot encoding. Note that with such a generic feature representation for inputs, our method can be easily adjusted to address the cold-start problem by using content features to represent users and items.

用多层感知机MLP来表示user和item之间的交互。输入层是两个特征向量，分别表示user和item。
这句话比较重要：*they can be customized to support a wide range of modelling of users and items, such as context-aware ([28,1],) content-based ([3],) and neighbor based $[26] ，这里是可以自定义user，item或者其他的特征向量的，因此可以融合多个相关的embedding
NCF中仅用了user和item的embedding。

Above the input layer is the embedding layer; it is a fully connected layer that projects the sparse representation to a dense vector. The obtained user (item) embedding can be seen as the latent vector for user (item) in the context of latent factor model. The user embedding and item embedding are then fed into a multi-layer neural architecture, which we term as neural collaborative filtering layers, to map the latent vectors to prediction scores. Each layer of the neural CF layers can be customized to discover certain latent structures of user-item interactions. The dimension of the last hidden layer (X) determines the model's capability. The final output layer is the predicted score (hat{y}_{u i},) and training is performed by minimizing the pointwise loss between (hat{y}_{u i}) and its target value (y_{u i} .) We note that another way to train the model is by performing pairwise learning, such as using the Bayesian Personalized Ranking [27] and margin-based loss ([33] .) As the focus of the paper is on the neural network modelling part, we leave the extension to pairwise learning of (mathrm{NCF}) as a future work.

input layer就是embedding layer它是一个全连接层，用来将输入层的稀疏表示映射为一个密集向量（dense vector）。这些嵌入后的向量其实就可以看做是用户（项目）的潜在向量。然后我们将这些嵌入向量送入多层网络结构，最后得到预测的分数。NCF层的每一层可以被定制，用以发现用户-项目交互的某些潜在结构。最后一个隐层 X 的维度尺寸决定了模型的能力。最终输出层是预测分数 (y^ui)y

github传送门：https://github.com/hexiangnan/neural_collaborative_filtering

预测模型
(hat{y}_{u i}=fleft(mathbf{P}^{T} mathbf{v}_{u}^{U}, mathbf{Q}^{T} mathbf{v}_{i}^{I} | mathbf{P}, mathbf{Q}, Theta_{f} ight))
（point-wise loss）
等价于
(fleft(mathbf{P}^{T} mathbf{v}_{u}^{U}, mathbf{Q}^{T} mathbf{v}_{i}^{I} ight)=phi_{o u t}left(phi_{X}left(ldots phi_{2}left(phi_{1}left(mathbf{P}^{T} mathbf{v}_{u}^{U}, mathbf{Q}^{T} mathbf{v}_{i}^{I} ight) ight) ldots ight) ight))

此时NCF框架就将该问题转化为一个二分类预测的问题了，user和item有交互则为1，没有交互则为0，损失函数是log_loss

GMF，它应用了一个线性内核来模拟潜在的特征交互；MLP，使用非线性内核从数据中学习交互函数。

MF 存在问题，那自然要改进了，改进方法就是引入多层感知机。把 user 和 item 的 Embedding 拼起来，然后输入给多层感知机，就可以了。这里的 Embedding 在 MF 的语境下，就是隐向量。

MLP 能够引入非线性的变换，有能力捕获到更加复杂的特征组合。有望利用 user 和 item 的隐向量，学得一个更好的模型，用以估计 user 和 item 是否存在交互。

结合 GMF 和 MLP
MF 对 user 和 item 的隐向量做内积，是线性模型。而 MLP 是非线性的。组合线性和非线性也许有效果，那就组合一下吧(wide and deep/lr+gbdt)

[
hat{y}{u i}=sigmaleft(mathbf{h}^{T} aleft(mathbf{p}{u} odot mathbf{q}{i}+mathbf{W}left[egin{array}{l}
mathbf{p}{u}
mathbf{q}_{i}
end{array} ight]+mathbf{b} ight) ight)
]
However, sharing embeddings of GMF and MLP might limit the performance of the fused model. For example, it implies that (mathrm{GMF}) and (mathrm{MLP}) must use the same size of embeddings; for datasets where the optimal embedding size of the two models varies a lot, this solution may fail to obtain the optimal ensemble.

To provide more flexibility to the fused model, we allow GMF and MLP to learn separate embeddings, and combine the two models by concatenating their last hidden layer. Figure 3 illustrates our proposal, the formulation of which is given as follows
[
egin{aligned}
phi^{G M F} &=mathbf{p}{u}^{G} odot mathbf{q}{i}^{G}
phi^{M L P} &=a_{L}left(mathbf{W}{L}^{T}left(a{L-1}left(ldots a_{2}left(mathbf{W}{2}^{T}left[egin{array}{c}
mathbf{p}{u}^{M}
M
end{array} ight]+mathbf{b}{2} ight) ldots ight) ight)+mathbf{b}{L} ight)
end{aligned}
]

(hat{y}_{u i}=sigmaleft(mathbf{h}^{T}left[egin{array}{c}phi^{G M F} \ phi^{M L P}end{array} ight] ight))

就是 MLP 和 GMF 的最后一层的向量拼接起来，然后交给 logistics regression。下图中好像 GMF 和 MLP 共用了一个 Embedding。paper
中共用 Embedding 需要 GMF 和 MLP 用相同的维度。共享GMF和MLP的嵌入层可能会限制融合模型的性能。例如，它意味着，GMF和MLP必须使用的大小相同的嵌入，对于数据集，两个模型的最佳嵌入尺寸差异很大，使得这种解决方案可能无法获得最佳的组合。学习单独的 Embedding 可能得到更好的集成效果。

补充冷启动问题

冷启动问题：如何在没有大量用户数据的情况下设计个性化推荐系统并让用户对推荐结果满意从而愿意使用推荐系统，就是冷启动问题。
分类：
（1）用户冷启动：如何给新用户做个性化推荐
（2）物品冷启动：如何将新物品推荐给可能对其感兴趣的用户。在新闻网站等时效性很强的网站中非常重要。
（2）系统冷启动：如何在一个新开发的网站上设计个性化推荐，从而在网站刚发布时就让用户体验到个性化推荐服务。没有用户，只有一些物品信息。
查看全文

相关阅读:
具体讲解有关“DB2“数据库的一些小材干1
适用手段 Ubuntu Linux 8.04设置与优化2
如何管理DB2数据库代码页不兼容的成效
 具体解说有关“DB2“数据库的一些小本领3
深化分析DB2数据库运用体系的性能优化3
实例讲解如安在DB2 UDB中正确的监控弃世锁2
阅历总结:运用IBM DB2数据库的详细事变
 实例讲授如何在DB2 UDB中正确的监控死锁3
DB2数据库在AIX上若何卸载并重新安顿
 轻松处置DB2创设存储历程时碰着的错误

原文地址：https://www.cnblogs.com/gaowenxingxing/p/12864512.html

Neural Collaborative Filtering论文笔记

NCF论文笔记

MF

NCF