zoukankan      html  css  js  c++  java
  • 机器学习实战-学习笔记-第十四章

    1.将代码拷贝到F:studioMachineLearningInActionch14下

    2.启动ipython

    3.在ipython中改变工作目录到F:studioMachineLearningInActionch14

    In [17]: cd F:\studio\MachineLearningInAction\ch14
    F:studioMachineLearningInActionch14

    4.在工作目录下新建一个svdRec.py文件并加入如下代码:

    from numpy import *
    from numpy import linalg as la
    
    def loadExData():
        return[[0, 0, 0, 2, 2],
               [0, 0, 0, 3, 3],
               [0, 0, 0, 1, 1],
               [1, 1, 1, 0, 0],
               [2, 2, 2, 0, 0],
               [5, 5, 5, 0, 0],
               [1, 1, 1, 0, 0]]

    5.进行SVD分解并验证分解结果:

    In [18]: import svdRec
    
    In [19]: Data=svdRec.loadExData()
    
    In [20]: U,Sigma,VT=linalg.svd(Data)
    
    In [21]: Sigma
    Out[21]:
    array([  9.64365076e+00,   5.29150262e+00,   9.11145502e-16,
             1.40456183e-16,   3.09084552e-17])
    
    In [22]: Sig2=mat([[Sigma[0],0],[0,Sigma[2]]])
    
    In [23]: Sig2
    Out[23]:
    matrix([[  9.64365076e+00,   0.00000000e+00],
            [  0.00000000e+00,   9.11145502e-16]])
    
    In [24]: Sig2=mat([[Sigma[0],0],[0,Sigma[1]]])
    
    In [25]: Sig2
    Out[25]:
    matrix([[ 9.64365076,  0.        ],
            [ 0.        ,  5.29150262]])
    
    In [26]: U[:,:2]*Sig2*VT[:2,:]
    Out[26]:
    matrix([[ -1.36157966e-16,  -8.59140046e-16,  -8.59140046e-16,
               2.00000000e+00,   2.00000000e+00],
            [  7.22982080e-16,  -3.61491040e-16,  -3.61491040e-16,
               3.00000000e+00,   3.00000000e+00],
            [  2.40994027e-16,  -1.20497013e-16,  -1.20497013e-16,
               1.00000000e+00,   1.00000000e+00],
            [  1.00000000e+00,   1.00000000e+00,   1.00000000e+00,
              -8.60707644e-18,  -8.60707644e-18],
            [  2.00000000e+00,   2.00000000e+00,   2.00000000e+00,
              -1.72141529e-17,  -1.72141529e-17],
            [  5.00000000e+00,   5.00000000e+00,   5.00000000e+00,
              -1.39716789e-16,  -1.39716789e-16],
            [  1.00000000e+00,   1.00000000e+00,   1.00000000e+00,
              -8.60707644e-18,  -8.60707644e-18]])

    可以看出,U[:,:2]*Sig2*VT[:2,:]是对原来的Data矩阵的一个非常好的近似。

    6.在svdRec.py中加入如下代码:

    def ecludSim(inA,inB):
        return 1.0/(1.0 + la.norm(inA - inB))
    
    def pearsSim(inA,inB):
        if len(inA) < 3 : return 1.0
        return 0.5+0.5*corrcoef(inA, inB, rowvar = 0)[0][1]
    
    def cosSim(inA,inB):
        num = float(inA.T*inB)
        denom = la.norm(inA)*la.norm(inB)
        return 0.5+0.5*(num/denom)

    上述代码定义了三种不同的相似度量

    7.利用朴素的基于相似度的推荐方法建议推荐结果

    In [44]: reload(svdRec)
    Out[44]: <module 'svdRec' from 'svdRec.py'>
    
    In [45]: myMat=mat(svdRec.loadExData())
    
    In [46]: myMat
    Out[46]:
    matrix([[0, 0, 0, 2, 2],
            [0, 0, 0, 3, 3],
            [0, 0, 0, 1, 1],
            [1, 1, 1, 0, 0],
            [2, 2, 2, 0, 0],
            [5, 5, 5, 0, 0],
            [1, 1, 1, 0, 0]])
    
    In [47]: myMat[0,1]=myMat[0,0]=myMat[1,0]=myMat[2,0]=4
    
    In [48]: myMat[3,3]=2
    
    In [49]: myMat
    Out[49]:
    matrix([[4, 4, 0, 2, 2],
            [4, 0, 0, 3, 3],
            [4, 0, 0, 1, 1],
            [1, 1, 1, 2, 0],
            [2, 2, 2, 0, 0],
            [5, 5, 5, 0, 0],
            [1, 1, 1, 0, 0]])
    
    In [50]: svdRec.recommend(myMat,2)
    the 1 and 0 similarity is: 1.000000
    the 1 and 3 similarity is: 0.928746
    the 1 and 4 similarity is: 1.000000
    the 2 and 0 similarity is: 1.000000
    the 2 and 3 similarity is: 1.000000
    the 2 and 4 similarity is: 0.000000
    Out[50]: [(2, 2.5), (1, 2.0243290220056256)]
    
    
    In [53]: svdRec.recommend(myMat,2,simMeas=svdRec.ecludSim)
    the 1 and 0 similarity is: 1.000000
    the 1 and 3 similarity is: 0.309017
    the 1 and 4 similarity is: 0.333333
    the 2 and 0 similarity is: 1.000000
    the 2 and 3 similarity is: 0.500000
    the 2 and 4 similarity is: 0.000000
    Out[53]: [(2, 3.0), (1, 2.8266504712098603)]
    
    In [54]: svdRec.recommend(myMat,2,simMeas=svdRec.pearsSim)
    the 1 and 0 similarity is: 1.000000
    the 1 and 3 similarity is: 1.000000
    the 1 and 4 similarity is: 1.000000
    the 2 and 0 similarity is: 1.000000
    the 2 and 3 similarity is: 1.000000
    the 2 and 4 similarity is: 0.000000
    Out[54]: [(2, 2.5), (1, 2.0)]

    8.利用SVD提高推荐的效果

    在svdRec代码中个加入如下代码:

    def loadExData2():
        return[[0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 5],
               [0, 0, 0, 3, 0, 4, 0, 0, 0, 0, 3],
               [0, 0, 0, 0, 4, 0, 0, 1, 0, 4, 0],
               [3, 3, 4, 0, 0, 0, 0, 2, 2, 0, 0],
               [5, 4, 5, 0, 0, 0, 0, 5, 5, 0, 0],
               [0, 0, 0, 0, 5, 0, 1, 0, 0, 5, 0],
               [4, 3, 4, 0, 0, 0, 0, 5, 5, 0, 1],
               [0, 0, 0, 4, 0, 4, 0, 0, 0, 0, 4],
               [0, 0, 0, 2, 0, 2, 5, 0, 0, 1, 2],
               [0, 0, 0, 0, 5, 0, 0, 0, 0, 4, 0],
               [1, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0]]

    上面的矩阵比较稀疏。现在计算该矩阵进行SVD分解需要多少维特征

    In [57]: reload(svdRec)
    Out[57]: <module 'svdRec' from 'svdRec.py'>
    
    In [58]: U,Sigma,VT=la.svd(mat(svdRec.loadExData2()))
    
    In [59]: Sigma
    Out[59]:
    array([ 15.77075346,  11.40670395,  11.03044558,   4.84639758,
             3.09292055,   2.58097379,   1.00413543,   0.72817072,
             0.43800353,   0.22082113,   0.07367823])
    
    In [60]: mat(svdRec.loadExData2())
    Out[60]:
    matrix([[0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 5],
            [0, 0, 0, 3, 0, 4, 0, 0, 0, 0, 3],
            [0, 0, 0, 0, 4, 0, 0, 1, 0, 4, 0],
            [3, 3, 4, 0, 0, 0, 0, 2, 2, 0, 0],
            [5, 4, 5, 0, 0, 0, 0, 5, 5, 0, 0],
            [0, 0, 0, 0, 5, 0, 1, 0, 0, 5, 0],
            [4, 3, 4, 0, 0, 0, 0, 5, 5, 0, 1],
            [0, 0, 0, 4, 0, 4, 0, 0, 0, 0, 4],
            [0, 0, 0, 2, 0, 2, 5, 0, 0, 1, 2],
            [0, 0, 0, 0, 5, 0, 0, 0, 0, 4, 0],
            [1, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0]])
    
    In [61]: Sig2=Sigma**2
    
    In [62]: Sig2
    Out[62]:
    array([  2.48716665e+02,   1.30112895e+02,   1.21670730e+02,
             2.34875695e+01,   9.56615756e+00,   6.66142570e+00,
             1.00828796e+00,   5.30232598e-01,   1.91847092e-01,
             4.87619735e-02,   5.42848136e-03])
    
    In [63]: sum(Sig2)
    Out[63]: 541.99999999999955
    
    In [64]: sum(Sig2)*0.9
    Out[64]: 487.79999999999961
    
    In [65]: sum(Sig2[:2])
    Out[65]: 378.82955951135784
    
    In [66]: sum(Sig2[:3])
    Out[66]: 500.50028912757921

    9.基于SVD进行评分:

    在svdRec中加入如下代码:

    def svdEst(dataMat, user, simMeas, item):
        n = shape(dataMat)[1]
        simTotal = 0.0; ratSimTotal = 0.0
        U,Sigma,VT = la.svd(dataMat)
        Sig4 = mat(eye(4)*Sigma[:4]) #arrange Sig4 into a diagonal matrix
        xformedItems = dataMat.T * U[:,:4] * Sig4.I  #create transformed items
        for j in range(n):
            userRating = dataMat[user,j]
            if userRating == 0 or j==item: continue
            similarity = simMeas(xformedItems[item,:].T,
                                 xformedItems[j,:].T)
            print 'the %d and %d similarity is: %f' % (item, j, similarity)
            simTotal += similarity
            ratSimTotal += similarity * userRating
        if simTotal == 0: return 0
        else: return ratSimTotal/simTotal

    它定义了基于SVD的相似度评分

    10.测试效果

    In [69]: myMat=mat(svdRec.loadExData2())
    
    In [70]: svdRec.recommend(myMat,1,estMethod=svdRec.svdEst)
    the 0 and 3 similarity is: 0.490950
    the 0 and 5 similarity is: 0.484274
    the 0 and 10 similarity is: 0.512755
    the 1 and 3 similarity is: 0.491294
    the 1 and 5 similarity is: 0.481516
    the 1 and 10 similarity is: 0.509709
    the 2 and 3 similarity is: 0.491573
    the 2 and 5 similarity is: 0.482346
    the 2 and 10 similarity is: 0.510584
    the 4 and 3 similarity is: 0.450495
    the 4 and 5 similarity is: 0.506795
    the 4 and 10 similarity is: 0.512896
    the 6 and 3 similarity is: 0.743699
    the 6 and 5 similarity is: 0.468366
    the 6 and 10 similarity is: 0.439465
    the 7 and 3 similarity is: 0.482175
    the 7 and 5 similarity is: 0.494716
    the 7 and 10 similarity is: 0.524970
    the 8 and 3 similarity is: 0.491307
    the 8 and 5 similarity is: 0.491228
    the 8 and 10 similarity is: 0.520290
    the 9 and 3 similarity is: 0.522379
    the 9 and 5 similarity is: 0.496130
    the 9 and 10 similarity is: 0.493617
    Out[70]: [(4, 3.3447149384692283), (7, 3.3294020724526971), (9, 3.3281008763900695)]
    
    In [71]: svdRec.recommend(myMat,1,estMethod=svdRec.svdEst,simMeas=svdRec.pearsSim)
    the 0 and 3 similarity is: 0.341942
    the 0 and 5 similarity is: 0.124132
    the 0 and 10 similarity is: 0.116698
    the 1 and 3 similarity is: 0.345560
    the 1 and 5 similarity is: 0.126456
    the 1 and 10 similarity is: 0.118892
    the 2 and 3 similarity is: 0.345149
    the 2 and 5 similarity is: 0.126190
    the 2 and 10 similarity is: 0.118640
    the 4 and 3 similarity is: 0.450126
    the 4 and 5 similarity is: 0.528504
    the 4 and 10 similarity is: 0.544647
    the 6 and 3 similarity is: 0.923822
    the 6 and 5 similarity is: 0.724840
    the 6 and 10 similarity is: 0.710896
    the 7 and 3 similarity is: 0.319482
    the 7 and 5 similarity is: 0.118324
    the 7 and 10 similarity is: 0.113370
    the 8 and 3 similarity is: 0.334910
    the 8 and 5 similarity is: 0.119673
    the 8 and 10 similarity is: 0.112497
    the 9 and 3 similarity is: 0.566918
    the 9 and 5 similarity is: 0.590049
    the 9 and 10 similarity is: 0.602380
    Out[71]: [(4, 3.3469521867021736), (9, 3.3353796573274703), (6, 3.307193027813037)]
    
    In [72]:
  • 相关阅读:
    太白老师 day06 编码 encode decode
    太白老师day6 1.代码块 2.is==id 3.小数据池
    MySQL 基本语法(1.表字段操作,2表记录管理 3.运算符管理4.SQL查询 5.约束6.索引
    List 接口常用子类及其特点
    Java 集合框架
    Java 常用工具类之基本对象包装类
    Java 常用工具类之 String 类
    Java 多线程间通信
    Java 多线程通信之多生产者/多消费者
    Java 之多线程通信(等待/唤醒)
  • 原文地址:https://www.cnblogs.com/littlesuccess/p/5096559.html
Copyright © 2011-2022 走看看