zoukankan      html  css  js  c++  java
  • spark 2.4 java8 hello world

    download JDK 8, extract and add to .bashrc:

    export JAVA_HOME=/home/bonelee/jdk1.8.0_211
    export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
    export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
    export JRE_HOME=$JAVA_HOME/jre

    download spark, unzip. and run:

    ./bin/spark-submit ~/src_test/spark_hello.py

    spark_hello.py :

    from pyspark.context import SparkContext
    from pyspark.conf import SparkConf
    
    sc = SparkContext(conf=SparkConf().setAppName("mnist_parallelize"))
    text_file = sc.textFile("file:///tmp/test.txt")
    counts = text_file.flatMap(lambda line: line.split(" ")) 
                 .map(lambda word: (word, 1)) 
                 .reduceByKey(lambda a, b: a + b)
    print(counts.collect())
    

    /tmp/test.txt

    text_file = sc.textFile("hdfs://...")
    counts = text_file.flatMap(lambda line: line.split(" ")) 
                 .map(lambda word: (word, 1)) 
                              .reduceByKey(lambda a, b: a + b)
                              counts.saveAsTextFile("hdfs://...")
    

    output:

    [('100', 1), ('text_file', 1), ('=', 2), ('counts', 1), ('text_file.flatMap(lambda', 1), ('line.split("', 1), ('"))', 1), ('', 65), ('word:', 1), ('(word,', 1), ('1))', 1), ('b:', 1), ('sc.textFile("hdfs://...")', 1), ('line:', 1), ('\', 2), ('.map(lambda', 1), ('.reduceByKey(lambda', 1), ('a,', 1), ('a', 1), ('+', 1), ('b)', 1), ('counts.saveAsTextFile("hdfs://...")', 1)]
    
  • 相关阅读:
    使用 RMAN 同步数据库
    关于MongoDB分布式高可用集群实现
    Oracle排错总结
    Oracle中Restore和Recovery的区别
    linux下防火墙iptables原理及使用
    RHEL6.5上升级OpenSSH7.4p1
    awk中next以及getline用法示例
    Linux下Redis的安装和部署
    Mongodb集群搭建的三种方式
    天地图服务地址
  • 原文地址:https://www.cnblogs.com/bonelee/p/10755575.html
Copyright © 2011-2022 走看看