AFM论文精读

zoukankan html css js c++ java

AFM论文精读
深度学习在推荐系统的应用(二)中AFM的简单回顾

AFM模型(Attentional Factorization Machine)
- 模型原始论文
  Attentional Factorization Machines:Learning the Weight of Feature Interactions via Attention Networks
- 模型架构
- 模型原理
[ŷ_{AFM}(x)=ω_0+∑_{i=1}^{n}ω_{i}x_{i}+p^T∑^{n}_{i=1}∑^{n}_{j=i+1}a_{ij}(v_i⊙v_j)x_ix_j ]
- 模型特点
  相对FM,AFM引入attention-based pooling,其学习出来的参数值用于判断不同特征之间交互的重要性。
- 模型案例
  https://github.com/hexiangnan/attentional_factorization_machine
  推荐系统遇上深度学习(八)--AFM模型理论和实践
算法推导

[ ext{(非零)特征集:}chi ]
[ ext{(非零)特征的embeding输出:}varepsilon = left { v_ix_i ight }_{iin chi } ]
FM模型数学公式：

[widehat{y}_{FM}(X)=W_0+sum_{i=1}^n w_ix_i+sum_{i=1}^n sum_{j=i+1}^n widehat{w}_{ij}x_ix_j ext{(1)} ]
pair-wise interaction layer(It expands m vectors to m(m − 1)/2 interacted vectors):

[f_{PI}(varepsilon )=left { v_i odot v_jx_ix_j ight }_{i,j in R_x } ext{(2)} ]
[ ext{这里}R_x=left { (i,j) ight }_{i in chi ,j in chi,j>i } ]
the attention network is defined as ：

[acute{a_{ij}}=h^TReLU(W(v_i odot v_j)x_ix_j+b),a_{ij}= frac{exp(acute{a_{ij}})}{displaystyle sum_{(i,j) in R_x}exp(acute{a_{ij}})}(5) ]
[ ext{这里}w in R^{t*k},b in R^t,h in R^t, ext{t代表注意力网络隐藏层大小,k是注意力网络输出向量维度大小} ]
综上得AFM模型公式：

[ŷ_{AFM}(x)=ω_0+∑_{i=1}^{n}ω_{i}x_{i}+p^T∑^{n}_{i=1}∑^{n}_{j=i+1}a_{ij}(v_i⊙v_j)x_ix_j ]
模型用到得参数集合：

[Theta =left { w_0, left { w_i ight }_{i=1}^n,left { v_i ight }_{i=1}^n ,P,W,b,h ight } ]
论文要点
- We point out that in these methods(e.g WDL,DCN), feature interactions are implicitly captured by a deep neural network, rather than FM that explicitly models each interaction as the inner product of two features. As such, these deep methods are not interpretable, as the contribution of each feature interaction is unknown.By directly extending FM with the attention mechanism that learns the importance of each feature interaction, our AMF is more interpretable and empirically demonstrates superior performance over Wide&Deep and DeepCross.
- RQ1 How do the key hyper-parameters of AFM (i.e., dropout on feature interactions and regularization on the attention network) impact its performance?
  分别在开源数据机调参Dropout率和L2正则系数
- RQ2 Can the attention network effectively learn the importance of feature interactions?
  对比只训练embeding和只训练attention network
- RQ3 How does AFM perform as compared to the state-of-theart methods for sparse data prediction?
  对比开源数据集上的参数个数与损失;参数更少,损失更低
查看全文

相关阅读:
德国闪电战和苏联大纵深，谁更厉害？（一个是为了避免战略上的持久战，一个是为了战役的突破）
“MEAN”技术栈开发web应用
 MVC 01
适配器模式
 w3wp占用CPU过高
 安装tensorflow
MemCache分布式内存对象缓存系统
 MVC 使用IBatis.net
分布式计算
 Remote Desktop Connection Manager

原文地址：https://www.cnblogs.com/arachis/p/AFM_detail.html

深度学习在推荐系统的应用(二)中AFM的简单回顾

AFM模型(Attentional Factorization Machine)

算法推导

论文要点