zoukankan      html  css  js  c++  java
  • 协同过滤CF算法之入门

    数据规整

    首先将评分数据从 ratings.dat 中读出到一个 DataFrame 里:

    >>> import pandas as pd

    In [2]: import pandas as pd

    In [3]: df = pd.read_csv('2014-12-18.csv')

    In [4]: df.head()
    Out[4]:
    user_id item_id behavior_type user_geohash item_category hour
    0 100268421 284019855 1 95ridd7 1863 19
    1 109802727 56489946 1 NaN 8291 10
    2 109802727 56489946 1 NaN 8291 10
    3 109802727 266907147 1 99ctk96 9117

     

    >>> data = ratings.pivot(index='user_id',columns='movie_id',values='rating')

    >>> data[:5]
    movie_id  1   2   3   4   5   6 
    user_id                                                                       
    1          5 NaN NaN NaN NaN NaN ...
    2        NaN NaN NaN NaN NaN NaN ...
    3        NaN NaN NaN NaN NaN NaN ...
    4        NaN NaN NaN NaN NaN NaN ...
    5        NaN NaN NaN NaN NaN   2 ...
     

    >>> check_size = 1000

    >>> check = {}
    >>> check_data = data.copy()#复制一份 data 用于检验,以免篡改原数据
    >>> check_data = check_data.ix[check_data.count(axis=1)>200]#滤除评价数小于200的用户
    >>> for user in np.random.permutation(check_data.index):
            movie = np.random.permutation(check_data.ix[user].dropna().index)[0]
            check[(user,movie)] = check_data.ix[user,movie]
            check_data.ix[user,movie] = np.nan
            check_size -= 1
            if not check_size:
                break
     
    >>> corr = check_data.T.corr(min_periods=200)
    >>> corr_clean = corr.dropna(how='all')
    >>> corr_clean = corr_clean.dropna(axis=1,how='all')#删除全空的行和列
    >>> check_ser = Series(check)#这里是被提取出来的 1000 个真实评分
    >>> check_ser[:5]
    (15593)     4
    (23555)     3
    (333363)    4
    (362355)    5
    (533605)    4
    dtype: float64
     

    参考:

    Python 基于协同过滤的推荐

    利用python的theano库刷kaggle mnist排行榜

    每天一小步,人生一大步!Good luck~
  • 相关阅读:
    C# 将数据导出到Excel汇总
    jquery 常用技巧
    JavaScript:世界上误解最深的语言
    对于jQuery中$.ajax方法的新认识
    JQuery上传插件Uploadify使用详解
    HTTP中Get与Post的区别
    Javascript中最常用的55个经典技巧
    C# params参数的应用
    10种JavaScript特效实例让你的网站更吸引人
    jQuery设计思想
  • 原文地址:https://www.cnblogs.com/jkmiao/p/4443968.html
Copyright © 2011-2022 走看看