zoukankan      html  css  js  c++  java
  • Weka关联规则分析

    购物篮分析:

    Apriori算法:

    参数设置:

    1.car 如果设为真,则会挖掘类关联规则而不是全局关联规则。

    2. classindex 类属性索引。如果设置为-1,最后的属性被当做类属性。

    3. delta 以此数值为迭代递减单位。不断减小支持度直至达到最小支持度或产生了满足数量要求的规则。

    4. lowerBoundMinSupport 最小支持度下界。

    5. metricType 度量类型。设置对规则进行排序的度量依据。可以是:置信度(类关联规则只能用置信度挖掘),提升度(lift),杠杆率(leverage),确信度(conviction)。

    在 Weka中设置了几个类似置信度(confidence)的度量来衡量规则的关联程度,它们分别是:

    a) Lift : P(A,B)/(P(A)P(B)) Lift=1时表示A和B独立。这个数越大(>1),越表明A和B存在于一个购物篮中不是偶然现象,有较强的关联度.

    b) Leverage :P(A,B)-P(A)P(B)

    Leverage=0时A和B独立,Leverage越大A和B的关系越密切

    c) Conviction:P(A)P(!B)/P(A,!B) (!B表示B没有发生) Conviction也是用来衡量A和B的独立性。从它和lift的关系(对B取反,代入Lift公式后求倒数)可以看出,这个值越大, A、B越关联。

    6. minMtric 度量的最小值。

    7. numRules 要发现的规则数。

    8. outputItemSets 如果设置为真,会在结果中输出项集。

    9. removeAllMissingCols 移除全部为缺省值的列。

    10. significanceLevel 重要程度。重要性测试(仅用于置信度)。

    11. upperBoundMinSupport 最小支持度上界。 从这个值开始迭代减小最小支持度。

    12. verbose 如果设置为真,则算法会以冗余模式运行。

    === Run information ===
    
    Scheme:       weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1
    Relation:     basket
    Instances:    940  // 有940条数据
    Attributes:   11    // 有11 个字段
                  fruitveg
                  freshmeat
                  dairy
                  cannedveg
                  cannedmeat
                  frozenmeal
                  beer
                  wine
                  softdrink
                  fish
                  confectionery
    === Associator model (full training set) ===
    
    
    Apriori
    =======
    
    Minimum support: 0.1 (94 instances) // 最小支持率是0.1,最小需要94个实例
    Minimum metric <confidence>: 0.9  //最小置信度为0.9
    Number of cycles performed: 18   // 进行了18轮搜索
    
    Generated sets of large itemsets:   //生成的频繁项集
    
    Size of set of large itemsets L(1): 22 //频繁1项集:22
    
    Size of set of large itemsets L(2): 171  // 频繁2项集 171 个
    
    Size of set of large itemsets L(3): 633
    
    Size of set of large itemsets L(4): 992
    
    Size of set of large itemsets L(5): 1130
    
    Size of set of large itemsets L(6): 538
    
    Size of set of large itemsets L(7): 143
    
    Best rules found:  // 最好的10条规律
    
     1. cannedveg=F beer=F fish=T confectionery=F 118 ==> wine=F 109    conf:(0.92)
     2. freshmeat=F cannedveg=F beer=F fish=T confectionery=F 102 ==> wine=F 94    conf:(0.92)
     3. fruitveg=F freshmeat=F cannedveg=T softdrink=F 147 ==> dairy=F 135    conf:(0.92)
     4. freshmeat=F wine=T confectionery=F 117 ==> dairy=F 107    conf:(0.91)
     5. fruitveg=F freshmeat=F cannedveg=T wine=F softdrink=F 105 ==> dairy=F 96    conf:(0.91)
     6. fruitveg=F freshmeat=F cannedveg=T softdrink=F confectionery=F 113 ==> dairy=F 103    conf:(0.91)
     7. fruitveg=F freshmeat=F cannedveg=T cannedmeat=F softdrink=F 112 ==> dairy=F 102    conf:(0.91)
     8. fruitveg=F cannedveg=T softdrink=F confectionery=F 128 ==> dairy=F 116    conf:(0.91)
     9. fruitveg=F freshmeat=F cannedveg=T softdrink=F fish=F 117 ==> dairy=F 106    conf:(0.91)
    10. fruitveg=F dairy=F cannedveg=T wine=F softdrink=F 106 ==> freshmeat=F 96    conf:(0.91)

    结果含义:

    cannedveg的值为F、 beer的值F、 fish的值为T 、 confectionery的值为F 118 的时候推出   wine的值为F 109,该关联关系的置信度为0.92
    FilteredAssociator
    === Run information ===
    
    Scheme:       weka.associations.FilteredAssociator -F "weka.filters.MultiFilter -F "weka.filters.unsupervised.attribute.ReplaceMissingValues "" -c -1 -W weka.associations.Apriori -- -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1
    Relation:     basket
    Instances:    940
    Attributes:   11
                  fruitveg
                  freshmeat
                  dairy
                  cannedveg
                  cannedmeat
                  frozenmeal
                  beer
                  wine
                  softdrink
                  fish
                  confectionery
    === Associator model (full training set) ===
    
    FilteredAssociator using weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1 on data filtered through weka.filters.MultiFilter -F "weka.filters.unsupervised.attribute.ReplaceMissingValues "
    
    Filtered Header
    @relation basket-weka.filters.unsupervised.attribute.ReplaceMissingValues-weka.filters.MultiFilter-Fweka.filters.unsupervised.attribute.ReplaceMissingValues
    
    @attribute fruitveg {F,T}
    @attribute freshmeat {F,T}
    @attribute dairy {F,T}
    @attribute cannedveg {F,T}
    @attribute cannedmeat {F,T}
    @attribute frozenmeal {F,T}
    @attribute beer {F,T}
    @attribute wine {F,T}
    @attribute softdrink {F,T}
    @attribute fish {F,T}
    @attribute confectionery {F,T}
    
    @data
    
    
    Associator Model
    
    Apriori
    =======
    
    Minimum support: 0.1 (94 instances)
    Minimum metric <confidence>: 0.9
    Number of cycles performed: 18
    
    Generated sets of large itemsets:
    
    Size of set of large itemsets L(1): 22
    
    Size of set of large itemsets L(2): 171
    
    Size of set of large itemsets L(3): 633
    
    Size of set of large itemsets L(4): 992
    
    Size of set of large itemsets L(5): 1130
    
    Size of set of large itemsets L(6): 538
    
    Size of set of large itemsets L(7): 143
    
    Best rules found:
    
     1. cannedveg=F beer=F fish=T confectionery=F 118 ==> wine=F 109    conf:(0.92)
     2. freshmeat=F cannedveg=F beer=F fish=T confectionery=F 102 ==> wine=F 94    conf:(0.92)
     3. fruitveg=F freshmeat=F cannedveg=T softdrink=F 147 ==> dairy=F 135    conf:(0.92)
     4. freshmeat=F wine=T confectionery=F 117 ==> dairy=F 107    conf:(0.91)
     5. fruitveg=F freshmeat=F cannedveg=T wine=F softdrink=F 105 ==> dairy=F 96    conf:(0.91)
     6. fruitveg=F freshmeat=F cannedveg=T softdrink=F confectionery=F 113 ==> dairy=F 103    conf:(0.91)
     7. fruitveg=F freshmeat=F cannedveg=T cannedmeat=F softdrink=F 112 ==> dairy=F 102    conf:(0.91)
     8. fruitveg=F cannedveg=T softdrink=F confectionery=F 128 ==> dairy=F 116    conf:(0.91)
     9. fruitveg=F freshmeat=F cannedveg=T softdrink=F fish=F 117 ==> dairy=F 106    conf:(0.91)
    10. fruitveg=F dairy=F cannedveg=T wine=F softdrink=F 106 ==> freshmeat=F 96    conf:(0.91)

     这个结论和上面的一样就不写了

    Tertius

     

    === Run information ===

    Scheme: weka.associations.Tertius -K 10 -F 0.0 -N 1.0 -L 4 -G 0 -c 0 -I 0 -P 0
    Relation: basket
    Instances: 940
    Attributes: 11
    fruitveg
    freshmeat
    dairy
    cannedveg
    cannedmeat
    frozenmeal
    beer
    wine
    softdrink
    fish
    confectionery
    === Associator model (full training set) ===


    Tertius
    =======

    1. /* 0.433417 0.022340 */ frozenmeal = F ==> cannedveg = F or beer = F
    2. /* 0.427294 0.028723 */ beer = F ==> cannedveg = F or frozenmeal = F
    3. /* 0.426433 0.025532 */ cannedveg = F ==> frozenmeal = F or beer = F
    4. /* 0.394573 0.015957 */ dairy = F and frozenmeal = T and beer = T ==> cannedveg = T
    5. /* 0.388260 0.019149 */ dairy = F and cannedveg = T and beer = T ==> frozenmeal = T
    6. /* 0.382993 0.019149 */ beer = F ==> cannedveg = F or frozenmeal = F or softdrink = T
    7. /* 0.382471 0.017021 */ frozenmeal = F ==> cannedveg = F or beer = F or softdrink = T
    8. /* 0.380465 0.025532 */ dairy = F and cannedveg = T and frozenmeal = T ==> beer = T
    9. /* 0.376718 0.017021 */ cannedveg = F ==> frozenmeal = F or beer = F or confectionery = T
    10. /* 0.374939 0.018085 */ frozenmeal = F ==> cannedveg = F or beer = F or confectionery = T

    Number of hypotheses considered: 43952
    Number of hypotheses explored: 22282

     结论:

    英语版的(英语好的请自己翻译后回复一下):

    1. /* 0.433417 0.022340 */ frozenmeal = F ==> cannedveg = F or beer = F

    The first number given with the rules is the confirmation value, and the second number is the frequency of counter-instances.
    The “number of hypotheses considered” is the number of rules generated with the refinement operator.
    The “number of hypotheses explored” is the number of rules that were “potentially interesting” and were considered for adding to the results or refining.

  • 相关阅读:
    bzoj 2152: 聪聪可可
    bzoj 2143: 飞飞侠
    bzoj 2132: 圈地计划
    bzoj 2127: happiness
    bzoj 2124: 等差子序列
    bzoj 2120: 数颜色
    对MySQL数据类型的认识
    MySQL详解--锁,事务(转)
    mysql 5.7快速部署
    elasticsearch报错[WARN ][bootstrap ] Unable to lock JVM Memory: error=12,reason=Cannot allocate memory,解决
  • 原文地址:https://www.cnblogs.com/tomcattd/p/3478678.html
Copyright © 2011-2022 走看看