zoukankan      html  css  js  c++  java
  • spark 2.4 java8 hello world

    download JDK 8, extract and add to .bashrc:

    export JAVA_HOME=/home/bonelee/jdk1.8.0_211
    export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
    export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
    export JRE_HOME=$JAVA_HOME/jre

    download spark, unzip. and run:

    ./bin/spark-submit ~/src_test/spark_hello.py

    spark_hello.py :

    from pyspark.context import SparkContext
    from pyspark.conf import SparkConf
    
    sc = SparkContext(conf=SparkConf().setAppName("mnist_parallelize"))
    text_file = sc.textFile("file:///tmp/test.txt")
    counts = text_file.flatMap(lambda line: line.split(" ")) 
                 .map(lambda word: (word, 1)) 
                 .reduceByKey(lambda a, b: a + b)
    print(counts.collect())
    

    /tmp/test.txt

    text_file = sc.textFile("hdfs://...")
    counts = text_file.flatMap(lambda line: line.split(" ")) 
                 .map(lambda word: (word, 1)) 
                              .reduceByKey(lambda a, b: a + b)
                              counts.saveAsTextFile("hdfs://...")
    

    output:

    [('100', 1), ('text_file', 1), ('=', 2), ('counts', 1), ('text_file.flatMap(lambda', 1), ('line.split("', 1), ('"))', 1), ('', 65), ('word:', 1), ('(word,', 1), ('1))', 1), ('b:', 1), ('sc.textFile("hdfs://...")', 1), ('line:', 1), ('\', 2), ('.map(lambda', 1), ('.reduceByKey(lambda', 1), ('a,', 1), ('a', 1), ('+', 1), ('b)', 1), ('counts.saveAsTextFile("hdfs://...")', 1)]
    
  • 相关阅读:
    vue路由跳转的方式(一)
    ElementUi树形目录
    Element UI问题总结
    angular入门
    IntelliJ IDEA 指定Java编译版本
    Python 笔记 v1
    Typora极简教程
    Gitbook在Windows上安装
    IntelliJ IDEA中查看UML类图
    服务器最大连接数问题
  • 原文地址:https://www.cnblogs.com/bonelee/p/10755575.html
Copyright © 2011-2022 走看看