zoukankan      html  css  js  c++  java
  • spark mllib prefixspan demo

    ./bin/spark-submit ~/src_test/prefix_span_test.py
    

    source code:

    import os
    import sys 
    from  pyspark.mllib.fpm import PrefixSpan
    from pyspark import SparkContext
    from pyspark import SparkConf
    
    sc = SparkContext("local","testing")
    print(sc)
    data = [ 
       [['a'],["a", "b", "c"], ["a","c"],["d"],["c", "f"]],
       [["a","d"], ["c"],["b", "c"], ["a", "e"]],
       [["e", "f"], ["a", "b"], ["d","f"],["c"],["b"]],
       [["e"], ["g"],["a", "f"],["c"],["b"],["c"]]
       ]   
    rdd = sc.parallelize(data, 2)
    model = PrefixSpan.train(rdd, 0.5,4)
    result = sorted(model.freqSequences().collect())
    print("*"*88)
    print(result)
    print("*"*88)
    

     output:

    ****************************************************************************************
    [FreqSequence(sequence=[['a']], freq=4), FreqSequence(sequence=[['a'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b']], freq=4), FreqSequence(sequence=[['a'], ['b'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c']], freq=4), FreqSequence(sequence=[['a'], ['c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c'], ['b']], freq=3), FreqSequence(sequence=[['a'], ['c'], ['c']], freq=3), FreqSequence(sequence=[['a'], ['d']], freq=2), FreqSequence(sequence=[['a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['f']], freq=2), FreqSequence(sequence=[['b']], freq=4), FreqSequence(sequence=[['b'], ['a']], freq=2), FreqSequence(sequence=[['b'], ['c']], freq=3), FreqSequence(sequence=[['b'], ['d']], freq=2), FreqSequence(sequence=[['b'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b'], ['f']], freq=2), FreqSequence(sequence=[['b', 'a']], freq=2), FreqSequence(sequence=[['b', 'a'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['f']], freq=2), FreqSequence(sequence=[['b', 'c']], freq=2), FreqSequence(sequence=[['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['c']], freq=4), FreqSequence(sequence=[['c'], ['a']], freq=2), FreqSequence(sequence=[['c'], ['b']], freq=3), FreqSequence(sequence=[['c'], ['c']], freq=3), FreqSequence(sequence=[['d']], freq=3), FreqSequence(sequence=[['d'], ['b']], freq=2), FreqSequence(sequence=[['d'], ['c']], freq=3), FreqSequence(sequence=[['d'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e']], freq=3), FreqSequence(sequence=[['e'], ['a']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['f']], freq=3), FreqSequence(sequence=[['f'], ['b']], freq=2), FreqSequence(sequence=[['f'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c'], ['b']], freq=2)]
    ****************************************************************************************

  • 相关阅读:
    Windows下IIS+PHP 5.2的安装与配置
    windows下安装、卸载mysql服务
    电脑变绿色
    libmysql.dll是否真的要拷贝到c:\windows目录下呢?
    用PHPnow搭建PHP+MYSQL网站开发环境
    VPS初始化及Nginx+MySQL+PHP/PHPMyAdmin安装优化cnblogs
    二级域名三级域名设置方法
    一句简单命令重启nginx
    上海世博会门票
    无法载入 mysql 扩展
  • 原文地址:https://www.cnblogs.com/bonelee/p/10755622.html
Copyright © 2011-2022 走看看