zoukankan      html  css  js  c++  java
  • 数据挖掘 ---支持度和置信度的用法

    如果客户买了 xx 物品,那么他可能买YY物品

     

    规则常用的方法,支持度和置信度

    支持度是指规则的应验次数

    置信度就是应验次数所占的比例

    直接上代码

    # 面包,牛奶,奶酪,苹果,香蕉
    from collections import OrderedDict
    import numpy as np 
    from pyexcel_xls import get_data
    from pyexcel_xls import save_data
    xls_data = get_data(r"777.xls")
    features = ["bread", "milk", "cheese", "apples", "bananas"]
    
    # print (xls_data['Sheet1'])
    lis =xls_data['Sheet1']
    X= np.array(lis)
    n_samples,n_features=X.shape  # 获取行数
    print(n_samples)
    print(n_features)
    # print(X)
    # 统计买苹果的人数
    num_apple_purchaes =0
    for  sample  in X:
        if sample[3]==1:
            num_apple_purchaes +=1
    print("{0} people bought Apples".format(num_apple_purchaes))
    from collections import defaultdict
    
    valid_rules =defaultdict(int)         # 接受应验次数
    invalid_rules =defaultdict(int)       # 接受不应验次数
    num_occurences =defaultdict(int)       # 接受出现次数
    
    
    
    for sample in X:                                 #对每一行进行循环
        for premise in range(n_features):            #对每列进行循环
            if sample[premise] == 0: continue        #判断该行的某一列列元素是否位0,即是否购买,若为0,跳出本轮循环,测试下一列
            
            num_occurences[premise] += 1             #记录有购买的一列 sample[premise]
            for conclusion in range(n_features):     #当读取到某一列有购买后,再次循环每一列的值
                if premise == conclusion:            #排除相同的一列,若循环到同一列,则跳出循环,比较下一列
                    continue
                if sample[conclusion] == 1:          #当sample[conclusion] 的值为1时,满足了当顾客购买前一件商品时也买了这种商品
                    
                    valid_rules[(premise, conclusion)] += 1  #记录下该规则出现的次数
                else:
                    
                    invalid_rules[(premise, conclusion)] += 1  #当不满足时即 sample[conclusion]=0 时,记录下不满足该规则的次数
    support = valid_rules                               #支持度=规则出现的次数
    confidence = defaultdict(float)                     #强制将置信度转为浮点型
    for premise, conclusion in valid_rules.keys():
        confidence[(premise, conclusion)] = valid_rules[(premise, conclusion)] / num_occurences[premise] #计算某一规则的置信度,并将其存在字典confidence中
    
        
        
    for premise, conclusion in confidence:     #根据字典的两个参数来取值
        premise_name = features[premise]       #我们之前定义了features列表,它的每一列都对应数组的每一列,即商品名称
        conclusion_name = features[conclusion] #商品名称
     
        print("Rule: 如果顾客购买 {0} 那么他可能同时购买 {1}".format(premise_name, conclusion_name))
        print(" - Confidence: {0:.3f}".format(confidence[(premise, conclusion)]))
        print(" - Support: {0}".format(support[(premise, conclusion)]))
        print("")

    结果:  通过 置信度和支持度即可 知道  当买了什么时候,客户更喜欢在买什么

    
    
    25
    5
    18 people bought Apples
    Rule: 如果顾客购买 bread 那么他可能同时购买 milk
     - Confidence: 0.533
     - Support: 8
    
    Rule: 如果顾客购买 milk 那么他可能同时购买 cheese
     - Confidence: 0.222
     - Support: 2
    
    Rule: 如果顾客购买 apples 那么他可能同时购买 cheese
     - Confidence: 0.333
     - Support: 6
    
    Rule: 如果顾客购买 milk 那么他可能同时购买 apples
     - Confidence: 0.444
     - Support: 4
    
    Rule: 如果顾客购买 bread 那么他可能同时购买 apples
     - Confidence: 0.667
     - Support: 10
    
    Rule: 如果顾客购买 apples 那么他可能同时购买 bread
     - Confidence: 0.556
     - Support: 10
    
    Rule: 如果顾客购买 apples 那么他可能同时购买 bananas
     - Confidence: 0.611
     - Support: 11
    
    Rule: 如果顾客购买 apples 那么他可能同时购买 milk
     - Confidence: 0.222
     - Support: 4
    
    Rule: 如果顾客购买 milk 那么他可能同时购买 bananas
     - Confidence: 0.556
     - Support: 5
    
    Rule: 如果顾客购买 cheese 那么他可能同时购买 bananas
     - Confidence: 0.556
     - Support: 5
    
    Rule: 如果顾客购买 cheese 那么他可能同时购买 bread
     - Confidence: 0.556
     - Support: 5
    
    Rule: 如果顾客购买 cheese 那么他可能同时购买 apples
     - Confidence: 0.667
     - Support: 6
    
    Rule: 如果顾客购买 cheese 那么他可能同时购买 milk
     - Confidence: 0.222
     - Support: 2
    
    Rule: 如果顾客购买 bananas 那么他可能同时购买 apples
     - Confidence: 0.647
     - Support: 11
    
    Rule: 如果顾客购买 bread 那么他可能同时购买 bananas
     - Confidence: 0.467
     - Support: 7
    
    Rule: 如果顾客购买 bananas 那么他可能同时购买 cheese
     - Confidence: 0.294
     - Support: 5
    
    Rule: 如果顾客购买 milk 那么他可能同时购买 bread
     - Confidence: 0.889
     - Support: 8
    
    Rule: 如果顾客购买 bananas 那么他可能同时购买 milk
     - Confidence: 0.294
     - Support: 5
    
    Rule: 如果顾客购买 bread 那么他可能同时购买 cheese
     - Confidence: 0.333
     - Support: 5
    
    Rule: 如果顾客购买 bananas 那么他可能同时购买 bread
     - Confidence: 0.412
     - Support: 7
     

    最后按照置信度排序

  • 相关阅读:
    redis状态与性能监控
    redis-stat 安装
    Redis-stat is not found
    查看Redis信息和状态
    查看、分析memcached使用状态
    Memcache内存分配策略
    memcached server LRU 深入分析
    Memcached常用命令及使用说明
    Web-超大文件上传-如何上传文件-大文件上传
    PHP-超大文件上传-如何上传文件-大文件上传
  • 原文地址:https://www.cnblogs.com/baili-luoyun/p/11217075.html
Copyright © 2011-2022 走看看