zoukankan      html  css  js  c++  java
  • spark mllib prefixspan demo

    ./bin/spark-submit ~/src_test/prefix_span_test.py
    

    source code:

    import os
    import sys 
    from  pyspark.mllib.fpm import PrefixSpan
    from pyspark import SparkContext
    from pyspark import SparkConf
    
    sc = SparkContext("local","testing")
    print(sc)
    data = [ 
       [['a'],["a", "b", "c"], ["a","c"],["d"],["c", "f"]],
       [["a","d"], ["c"],["b", "c"], ["a", "e"]],
       [["e", "f"], ["a", "b"], ["d","f"],["c"],["b"]],
       [["e"], ["g"],["a", "f"],["c"],["b"],["c"]]
       ]   
    rdd = sc.parallelize(data, 2)
    model = PrefixSpan.train(rdd, 0.5,4)
    result = sorted(model.freqSequences().collect())
    print("*"*88)
    print(result)
    print("*"*88)
    

     output:

    ****************************************************************************************
    [FreqSequence(sequence=[['a']], freq=4), FreqSequence(sequence=[['a'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b']], freq=4), FreqSequence(sequence=[['a'], ['b'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c']], freq=4), FreqSequence(sequence=[['a'], ['c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c'], ['b']], freq=3), FreqSequence(sequence=[['a'], ['c'], ['c']], freq=3), FreqSequence(sequence=[['a'], ['d']], freq=2), FreqSequence(sequence=[['a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['f']], freq=2), FreqSequence(sequence=[['b']], freq=4), FreqSequence(sequence=[['b'], ['a']], freq=2), FreqSequence(sequence=[['b'], ['c']], freq=3), FreqSequence(sequence=[['b'], ['d']], freq=2), FreqSequence(sequence=[['b'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b'], ['f']], freq=2), FreqSequence(sequence=[['b', 'a']], freq=2), FreqSequence(sequence=[['b', 'a'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['f']], freq=2), FreqSequence(sequence=[['b', 'c']], freq=2), FreqSequence(sequence=[['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['c']], freq=4), FreqSequence(sequence=[['c'], ['a']], freq=2), FreqSequence(sequence=[['c'], ['b']], freq=3), FreqSequence(sequence=[['c'], ['c']], freq=3), FreqSequence(sequence=[['d']], freq=3), FreqSequence(sequence=[['d'], ['b']], freq=2), FreqSequence(sequence=[['d'], ['c']], freq=3), FreqSequence(sequence=[['d'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e']], freq=3), FreqSequence(sequence=[['e'], ['a']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['f']], freq=3), FreqSequence(sequence=[['f'], ['b']], freq=2), FreqSequence(sequence=[['f'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c'], ['b']], freq=2)]
    ****************************************************************************************

  • 相关阅读:
    江の島西浦写真館2-1
    江の島西浦写真館1-2
    Oracle 查询表空间使用情况
    Oracle 的开窗函数 rank,dense_rank,row_number
    oracle11G 用户密码180天修改概要文件过程
    CentOS6 安装 MySQL5.7
    linux下SS 网络命令详解
    CentOS6 网络设置
    redhat 6 红帽6 Linux 网络配置
    Oracle分析函数——函数列表
  • 原文地址:https://www.cnblogs.com/bonelee/p/10755622.html
Copyright © 2011-2022 走看看