zoukankan      html  css  js  c++  java
  • 基于baseline和stochastic gradient descent的个性化推荐系统


    文章主要介绍的是koren 08年发的论文[1],  2.1 部分内容(其余部分会陆续补充上来)。

     koren论文中用到netflix 数据集, 过于大, 在普通的pc机上运行时间很长很长。考虑到写文章目地主要是已介绍总结方法为主,所以采用Movielens 数据集。

    要用到的变量介绍:

    Baseline estimates

         

    object function:

    梯度变化(利用stochastic gradient descent算法使上述的目标函数值,在设定的迭代次数内,降到最小)

    系统评判标准:


    参数设置:

    迭代次数maxStep = 100, 学习速率(梯度变化速率)取0.99  还有的其他参数设置参考引用论文[2]

     具体的代码实现

    ''''' 
    Created on Dec 11, 2012 
     
    @Author: Dennis Wu 
    @E-mail: hansel.zh@gmail.com 
    @Homepage: http://blog.csdn.net/wuzh670 
     
    Data set download from : http://www.grouplens.org/system/files/ml-100k.zip 
     
    '''  
    from operator import itemgetter, attrgetter  
    from math import sqrt  
    import random  
      
    def load_data():  
          
        train = {}  
        test = {}  
          
        filename_train = 'data/ua.base'  
        filename_test = 'data/ua.test'  
          
        for line in open(filename_train):  
            (userId, itemId, rating, timestamp) = line.strip().split('	')  
            train.setdefault(userId,{})  
            train[userId][itemId] = float(rating)  
        
        for line in open(filename_test):  
            (userId, itemId, rating, timestamp) = line.strip().split('	')  
            test.setdefault(userId,{})  
            test[userId][itemId] = float(rating)  
          
        return train, test  
      
    def calMean(train):  
        sta = 0  
        num = 0  
        for u in train.keys():  
            for i in train[u].keys():  
                sta += train[u][i]  
                num += 1  
        mean = sta*1.0/num  
        return mean  
      
    def initialBias(train, userNum, movieNum):  
      
        mean = calMean(train)  
        bu = {}  
        bi = {}  
        biNum = {}  
        buNum = {}  
          
        u = 1  
        while u < (userNum+1):  
            su = str(u)  
            for i in train[su].keys():  
                bi.setdefault(i,0)  
                biNum.setdefault(i,0)  
                bi[i] += (train[su][i] - mean)  
                biNum[i] += 1  
            u += 1  
              
        i = 1  
        while i < (movieNum+1):  
            si = str(i)  
            biNum.setdefault(si,0)  
            if biNum[si] >= 1:  
                bi[si] = bi[si]*1.0/(biNum[si]+25)  
            else:  
                bi[si] = 0.0  
            i += 1  
      
        u = 1  
        while u < (userNum+1):  
            su = str(u)  
            for i in train[su].keys():  
                bu.setdefault(su,0)  
                buNum.setdefault(su,0)  
                bu[su] += (train[su][i] - mean - bi[i])  
                buNum[su] += 1  
            u += 1  
              
        u = 1  
        while u < (userNum+1):  
            su = str(u)  
            buNum.setdefault(su,0)  
            if buNum[su] >= 1:  
                bu[su] = bu[su]*1.0/(buNum[su]+10)  
            else:  
                bu[su] = 0.0  
            u += 1  
      
        return bu,bi,mean  
      
    def sgd(train, test, userNum, movieNum):  
      
        bu, bi, mean = initialBias(train, userNum, movieNum)  
      
        alpha1 = 0.002  
        beta1 = 0.1  
        slowRate = 0.99  
        step = 0  
        preRmse = 1000000000.0  
        nowRmse = 0.0  
        while step < 100:  
            rmse = 0.0  
            n = 0  
            for u in train.keys():  
                for i in train[u].keys():  
                    pui = 1.0 * (mean + bu[u] + bi[i])  
                    eui = train[u][i] - pui  
                    rmse += pow(eui,2)  
                    n += 1  
                    bu[u] += alpha1 * (eui - beta1 * bu[u])  
                    bi[i] += alpha1 * (eui - beta1 * bi[i])  
      
            nowRmse = sqrt(rmse*1.0/n)  
            print 'step: %d      Rmse: %s' % ((step+1), nowRmse)  
            if (nowRmse < preRmse):  
                preRmse = nowRmse  
            alpha1 *= slowRate  
            step += 1  
        return bu, bi, mean  
      
    def calRmse(test, bu, bi, mean):  
          
        rmse = 0.0  
        n = 0  
        for u in test.keys():  
            for i in test[u].keys():  
                pui = 1.0 * (mean + bu[u] + bi[i])  
                eui = pui - test[u][i]  
                rmse += pow(eui,2)  
                n += 1  
        rmse = sqrt(rmse*1.0 / n)  
        return rmse;  
         
    if __name__ == "__main__":  
      
      
        # load data  
        train, test = load_data()  
          
        # baseline + stochastic gradient descent  
        bu, bi, mean = sgd(train, test, 943, 1682)  
          
        # compute the rmse of test set  
        print 'the Rmse of test test is: %s' % calRmse(test, bu, bi, mean)  

     实验结果

     

    REFERENCES

     

    1.Y. Koren. Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. Proc. 14th ACM SIGKDD Int. Conf. On Knowledge Discovery and Data Mining  (KDD08), pp. 426434, 2008.

    2. Y.Koren.  The BellKor Solution to the Netflix Grand Prize  2009

     

     

     

     

  • 相关阅读:
    uboot和内核分区的修改
    2440移植内核到uboot上,打印乱码
    启动新内核出现:No filesystem could mount root, tried: ext3 ext2 cramfs vfa
    启动新内核出现:Kernel panic
    移植最新版本3.4.2内核
    2017团体程序设计天梯赛大区赛 L3-3 球队“食物链”
    leetcode543 Diameter of Binary Tree
    CF599B Spongebob and Joke
    poj1930 Dead Fraction
    poj3040 Allowance
  • 原文地址:https://www.cnblogs.com/gt123/p/3451802.html
Copyright © 2011-2022 走看看