zoukankan      html  css  js  c++  java
  • spark mllib prefixspan demo

    ./bin/spark-submit ~/src_test/prefix_span_test.py
    

    source code:

    import os
    import sys 
    from  pyspark.mllib.fpm import PrefixSpan
    from pyspark import SparkContext
    from pyspark import SparkConf
    
    sc = SparkContext("local","testing")
    print(sc)
    data = [ 
       [['a'],["a", "b", "c"], ["a","c"],["d"],["c", "f"]],
       [["a","d"], ["c"],["b", "c"], ["a", "e"]],
       [["e", "f"], ["a", "b"], ["d","f"],["c"],["b"]],
       [["e"], ["g"],["a", "f"],["c"],["b"],["c"]]
       ]   
    rdd = sc.parallelize(data, 2)
    model = PrefixSpan.train(rdd, 0.5,4)
    result = sorted(model.freqSequences().collect())
    print("*"*88)
    print(result)
    print("*"*88)
    

     output:

    ****************************************************************************************
    [FreqSequence(sequence=[['a']], freq=4), FreqSequence(sequence=[['a'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b']], freq=4), FreqSequence(sequence=[['a'], ['b'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c']], freq=2), FreqSequence(sequence=[['a'], ['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c']], freq=4), FreqSequence(sequence=[['a'], ['c'], ['a']], freq=2), FreqSequence(sequence=[['a'], ['c'], ['b']], freq=3), FreqSequence(sequence=[['a'], ['c'], ['c']], freq=3), FreqSequence(sequence=[['a'], ['d']], freq=2), FreqSequence(sequence=[['a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['a'], ['f']], freq=2), FreqSequence(sequence=[['b']], freq=4), FreqSequence(sequence=[['b'], ['a']], freq=2), FreqSequence(sequence=[['b'], ['c']], freq=3), FreqSequence(sequence=[['b'], ['d']], freq=2), FreqSequence(sequence=[['b'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b'], ['f']], freq=2), FreqSequence(sequence=[['b', 'a']], freq=2), FreqSequence(sequence=[['b', 'a'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d']], freq=2), FreqSequence(sequence=[['b', 'a'], ['d'], ['c']], freq=2), FreqSequence(sequence=[['b', 'a'], ['f']], freq=2), FreqSequence(sequence=[['b', 'c']], freq=2), FreqSequence(sequence=[['b', 'c'], ['a']], freq=2), FreqSequence(sequence=[['c']], freq=4), FreqSequence(sequence=[['c'], ['a']], freq=2), FreqSequence(sequence=[['c'], ['b']], freq=3), FreqSequence(sequence=[['c'], ['c']], freq=3), FreqSequence(sequence=[['d']], freq=3), FreqSequence(sequence=[['d'], ['b']], freq=2), FreqSequence(sequence=[['d'], ['c']], freq=3), FreqSequence(sequence=[['d'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e']], freq=3), FreqSequence(sequence=[['e'], ['a']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['a'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['b']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c']], freq=2), FreqSequence(sequence=[['e'], ['f'], ['c'], ['b']], freq=2), FreqSequence(sequence=[['f']], freq=3), FreqSequence(sequence=[['f'], ['b']], freq=2), FreqSequence(sequence=[['f'], ['b'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c']], freq=2), FreqSequence(sequence=[['f'], ['c'], ['b']], freq=2)]
    ****************************************************************************************

  • 相关阅读:
    eclipse pom文件报错 org.apache.maven.archiver.MavenArchiver.getManifest(org.apache.maven.project.Mav (Click for 1 more)
    严重: Compilation error org.eclipse.jdt.internal.compiler.classfmt.ClassFormatException
    powercfg -duplicatescheme 设置电源方案
    测试3
    测试2
    markdonwn 测试1
    Java线程池-线程工厂ThreadFactory
    Java线程池-拒绝策略
    一文读懂Base64编码
    ThreadLocal
  • 原文地址:https://www.cnblogs.com/bonelee/p/10755622.html
Copyright © 2011-2022 走看看