zoukankan      html  css  js  c++  java
  • 机器学习 — 推荐系统

    机器学习 — 推荐系统

    作者:大树 深圳
    更新时间:2018.02.08

    email:59888745@qq.com

    说明:因内容较多,会不断更新 xxx学习总结;

    回主目录:2017 年学习记录和总结

    技术架构

    1.对内容数据,用户数据,行为数据,进行数据处理,格式化,清洗,归并等;
    2.根据业务规则建立推荐系统,内容画像,用户画像,行为画像;
    3.根据建立的各种画像,进行相关推荐,个性化推荐,相关推荐,热门推荐等;
    4.推荐形式有,相似度推荐,相关内容推荐,好友推荐,排名推荐.

    核心算法是计算相似度,欧几里得距离公式,排名等。

     

    机器学习 — 推荐系统

    dennychen in shenzhen

    1提供推荐

    1。协作过里

    2。搜集偏好

    3。寻找相近的用户

    4。推荐物品,根据用户相似度推荐,根据物品排名推荐

    5。匹配商品

    6。构建推荐系统

    7。基于物品的过里

    8。使用数据集

    9。基于用户进行过里还是基于物品进行过里

    2。计算用户相似度, 欧几里得距离 pearson相关度

    3。计算两个人的相似度,本来是推荐平均评分较高的作品,考虑到两个人的爱好相似程度,对评分根据相似度进行加权平均.

    In [ ]:
     
    from math import sqrt
    
    critics={'dennychen': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
     'tomastang': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
     'The Night Listener': 3.0},
    'alexye': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
     'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
     'You, Me and Dupree': 3.5},
    'Michaelzhou': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
     'Superman Returns': 3.5, 'The Night Listener': 4.0},
    'josephtcheng': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
     'The Night Listener': 4.5, 'Superman Returns': 4.0,
     'You, Me and Dupree': 2.5},
    'antyonywang': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
     'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
     'You, Me and Dupree': 2.0},
    'jackfan': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
     'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
    'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
    
    print(critics['dennychen']['Lady in the Water'])
    print(critics['alexye']['Lady in the Water'])
    # a ['Lady in the Water', 'Snakes on a Plane', 'Superman Returns', 'You, Me and Dupree', 'The Night Listener']
    # sum_of_squares 3.5
    
    In [37]:
    import pandas as pd
    from math import sqrt
    
    critics={'dennychen': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
     'tomastang': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
     'The Night Listener': 3.0},
    'alexye': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
     'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
     'You, Me and Dupree': 3.5},
    'Michaelzhou': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
     'Superman Returns': 3.5, 'The Night Listener': 4.0},
    'josephtcheng': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
     'The Night Listener': 4.5, 'Superman Returns': 4.0,
     'You, Me and Dupree': 2.5},
    'antyonywang': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
     'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
     'You, Me and Dupree': 2.0},
    'jackfan': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
     'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
    'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
    
     
    # 欧几里得距离评价,评价2这之间的相似度,值越接近1,相似度越高
    def sim_distance(prefs, person1, person2):
        si = {}
        for item in prefs[person1]:
            if item in prefs[person2]:
                si[item] = 1
                
        if len(si) == 0:
            return 0
        a =[item  for item in prefs[person1] if item in prefs[person2]]
        print('a',a)
        sum_of_squares = sum([pow(prefs[person1][item] - prefs[person2][item], 2) for item in prefs[person1] if item in prefs[person2]])
        print('sum_of_squares',sum_of_squares)
        return 1 / (1 + sqrt(sum_of_squares))
    
    print(sim_distance(critics, 'dennychen', 'Michaelzhou'))
    print(sim_distance(critics, 'dennychen', 'alexye'))
    
     
    a ['Lady in the Water', 'Snakes on a Plane', 'Superman Returns', 'The Night Listener']
    sum_of_squares 1.25
    0.4721359549995794
    a ['Lady in the Water', 'Snakes on a Plane', 'Superman Returns', 'You, Me and Dupree', 'The Night Listener']
    sum_of_squares 3.5
    0.3483314773547883
    
    In [38]:
    sim_pearson(critics, 'dennychen', 'alexye')
    
    Out[38]:
    0.6085806194501843
    In [ ]:
     
    In [32]:
    import pandas as pd
    from math import sqrt
    
    critics={'dennychen': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
     'tomastang': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
     'The Night Listener': 3.0},
    'alexye': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
     'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
     'You, Me and Dupree': 3.5},
    'Michaelzhou': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
     'Superman Returns': 3.5, 'The Night Listener': 4.0},
    'josephtcheng': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
     'The Night Listener': 4.5, 'Superman Returns': 4.0,
     'You, Me and Dupree': 2.5},
    'antyonywang': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
     'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
     'You, Me and Dupree': 2.0},
    'jackfan': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
     'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
    'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
    
     
    # 欧几里得距离评价,评价2这之间的相似度,值越接近1,相似度越高
    def sim_distance(prefs, person1, person2):
        si = {}
        for item in prefs[person1]:
            if item in prefs[person2]:
                si[item] = 1
                
        if len(si) == 0:
            return 0
        a =[item  for item in prefs[person1] if item in prefs[person2]]
        print('a',a)
        sum_of_squares = sum([pow(prefs[person1][item] - prefs[person2][item], 2) for item in prefs[person1] if item in prefs[person2]])
        print('sum_of_squares',sum_of_squares)
        return 1 / (1 + sqrt(sum_of_squares))
    
    # 皮尔逊相关度评价
    def sim_pearson(prefs, person1, person2):
        # 得到两者评价过的相同商品
        si = {}
        for item in prefs[person1]:
            if item in  prefs[person2]:
                si[item] = 1
       
        n = len(si)
        # 如果两个用户之间没有相似之处则返回1
        if n == 0:
            return 1
        
        # 对各自的所有偏好求和
        sum1 = sum([prefs[person1][item] for item in si])
        sum2 = sum([prefs[person2][item] for item in si])
        
        # 求各自的平方和
        sum1_square = sum([pow(prefs[person1][item], 2) for item in si])
        sum2_square = sum([pow(prefs[person2][item], 2) for item in si])
        
        # 求各自的乘积的平方
        sum_square = sum([prefs[person1][item] * prefs[person2][item] for item in si])
        
        # 计算pearson相关系数
        den = sqrt((sum1_square - pow(sum1, 2) / n) * (sum2_square - pow(sum2, 2) / n))
        if den == 0:
            return 0
    
        return (sum_square - (sum1 * sum2/n)) / den
    
    
    
    def topMatches(prefs, person, n = 5, simlarity = sim_pearson):
        scores = [(simlarity(prefs, person, other), other) for other in prefs if other != person]
        
        # 对列表进行排序,评价高者排在前面
        scores.sort()
        print('scores:',scores)
        scores.reverse()
        # 取指定个数的(不需要判断n的大小,因为python中的元组可以接受正、负不在范围内的index)
        return scores[0:n]
    
    
    
    # 利用其他所有人的加权平均给用户推荐
    def get_recommendations(prefs, person, similarity=sim_pearson):
        # 其他用户对某个电影的评分加权之后的总和
        totals = {}
        # 其他用户的相似度之和
        sim_sums = {}
        for other in prefs:
            # 不和自己比较
            if other == person:
                continue
            
            # 求出相似度
            sim = similarity(prefs, person, other)
            # 忽略相似度小于等于情况0的
            if sim <= 0:
                continue
            
            # 获取other所有的评价过的电影评分的加权值
            for item in prefs[other]:
                # 只推荐用户没看过的电影
                if item not in prefs[person] or prefs[person][item] == 0:
                    #print item
                    # 设置默认值
                    totals.setdefault(item, 0)
                    # 求出该电影的加权之后的分数之和
                    totals[item] += prefs[other][item] * sim
                    # 求出各个用户的相似度之和
                    sim_sums.setdefault(item, 0)
                    sim_sums[item] += sim
            
    
        # 对于加权之后的分数之和取平均值
        rankings = [(total / sim_sums[item], item) for item, total in totals.items()]
    
        # 返回经过排序之后的列表
        rankings.sort()
        rankings.reverse()
        return rankings
    
    sim_distance(critics, 'dennychen', 'Michaelzhou')
    # sim_pearson(critics, 'Lisa Rose', 'Gene Seymour')
    topMatches(critics, 'dennychen', n = 3)
    
    # get_recommendations(critics, 'Toby')
    # get_recommendations(critics, 'Toby', similarity=sim_distance)
    
     
    a ['Lady in the Water', 'Snakes on a Plane', 'Superman Returns', 'The Night Listener']
    sum_of_squares 1.25
    scores: [(0.40451991747794525, 'Michaelzhou'), (0.5606119105813882, 'josephtcheng'), (0.6085806194501843, 'alexye'), (0.7071067811865475, 'antyonywang'), (0.7470178808339965, 'jackfan'), (0.9912407071619299, 'Toby')]
    
    Out[32]:
    [(0.9912407071619299, 'Toby'),
     (0.7470178808339965, 'jackfan'),
     (0.7071067811865475, 'antyonywang')]
    In [ ]:
     
  • 相关阅读:
    每日一水 POJ8道水题
    编译和使用 MySQL C++ Connector
    j2ee model1模型完成分页逻辑的实现 详解!
    DB查询分析器访问EXCEL时,要在表名前后加上中括弧或双引号
    指向结构体变量的指针
    EOSS V3.0 企业运营支撑系统(基于RBAC原理的权限管理)
    MybatisGen1.0 Mybatis JavaBean Mapper生成工具
    The table name must be enclosed in double quotation marks or sqare bracket while accessing EXCEL by
    资源-Android:Android
    软件-开发软件:Android Studio
  • 原文地址:https://www.cnblogs.com/csj007523/p/8435762.html
Copyright © 2011-2022 走看看