zoukankan      html  css  js  c++  java
  • Apriori算法Python实现

    Apriori如果数据挖掘算法的头发模式挖掘鼻祖,从60年代开始流行,该算法非常简单朴素的思维。首先挖掘长度1频繁模式,然后k=2

    这些频繁模式的长度合并k频繁模式。计算它们的频繁的数目,并确保其充分k-1集长度为频繁,值是,为了避免反复。合并的时候。仅仅合并那些前k-2个字符都同样,而k-1的字符一边是少于还有一边的。

    下面是算法的Python实现:

    __author__ = 'linfuyuan'
    min_frequency = int(raw_input('please input min_frequency:'))
    file_name = raw_input('please input the transaction file:')
    transactions = []
    
    
    def has_infrequent_subset(candidate, Lk):
        for i in range(len(candidate)):
            subset = candidate[:-1]
            subset.sort()
            if not ''.join(subset) in Lk:
                return False
            lastitem = candidate.pop()
            candidate.insert(0, lastitem)
        return True
    
    
    def countFrequency(candidate, transactions):
        count = 0
        for transaction in transactions:
            if transaction.issuperset(candidate):
                count += 1
        return count
    
    
    with open(file_name) as f:
        for line in f.readlines():
            line = line.strip()
            tokens = line.split(',')
            if len(tokens) > 0:
                transaction = set(tokens)
                transactions.append(transaction)
    currentFrequencySet = {}
    for transaction in transactions:
        for item in transaction:
            time = currentFrequencySet.get(item, 0)
            currentFrequencySet[item] = time + 1
    Lk = set()
    for (itemset, count) in currentFrequencySet.items():
        if count >= min_frequency:
            Lk.add(itemset)
    print ', '.join(Lk)
    
    while len(Lk) > 0:
        newLk = set()
        for itemset1 in Lk:
            for itemset2 in Lk:
                cancombine = True
                for i in range(len(itemset1)):
                    if i < len(itemset1) - 1:
                        cancombine = itemset1[i] == itemset2[i]
                        if not cancombine:
                            break
                    else:
                        cancombine = itemset1[i] < itemset2[i]
                        if not cancombine:
                            break
                if cancombine:
                    newitemset = []
                    for char in itemset1:
                        newitemset.append(char)
                    newitemset.append(itemset2[-1])
                    if has_infrequent_subset(newitemset, Lk) and countFrequency(newitemset, transactions) >= min_frequency:
                        newLk.add(''.join(newitemset))
        print ', '.join(newLk)
        Lk = newLk
    


    版权声明:本文博客原创文章。博客,未经同意,不得转载。

  • 相关阅读:
    将vue文件script代码抽取到单独的js文件
    git pull 提示错误:Your local changes to the following files would be overwritten by merge
    vue和uniapp 配置项目基础路径
    XAMPP Access forbidden! Access to the requested directory is only available from the local network.
    postman与newman集成
    postman生成代码段
    Curl命令
    POST方法的Content-type类型
    Selenium Grid 并行的Web测试
    pytorch转ONNX以及TnesorRT的坑
  • 原文地址:https://www.cnblogs.com/mfrbuaa/p/4620279.html
Copyright © 2011-2022 走看看