zoukankan      html  css  js  c++  java
  • spark 2.4 java8 hello world

    download JDK 8, extract and add to .bashrc:

    export JAVA_HOME=/home/bonelee/jdk1.8.0_211
    export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
    export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
    export JRE_HOME=$JAVA_HOME/jre

    download spark, unzip. and run:

    ./bin/spark-submit ~/src_test/spark_hello.py

    spark_hello.py :

    from pyspark.context import SparkContext
    from pyspark.conf import SparkConf
    
    sc = SparkContext(conf=SparkConf().setAppName("mnist_parallelize"))
    text_file = sc.textFile("file:///tmp/test.txt")
    counts = text_file.flatMap(lambda line: line.split(" ")) 
                 .map(lambda word: (word, 1)) 
                 .reduceByKey(lambda a, b: a + b)
    print(counts.collect())
    

    /tmp/test.txt

    text_file = sc.textFile("hdfs://...")
    counts = text_file.flatMap(lambda line: line.split(" ")) 
                 .map(lambda word: (word, 1)) 
                              .reduceByKey(lambda a, b: a + b)
                              counts.saveAsTextFile("hdfs://...")
    

    output:

    [('100', 1), ('text_file', 1), ('=', 2), ('counts', 1), ('text_file.flatMap(lambda', 1), ('line.split("', 1), ('"))', 1), ('', 65), ('word:', 1), ('(word,', 1), ('1))', 1), ('b:', 1), ('sc.textFile("hdfs://...")', 1), ('line:', 1), ('\', 2), ('.map(lambda', 1), ('.reduceByKey(lambda', 1), ('a,', 1), ('a', 1), ('+', 1), ('b)', 1), ('counts.saveAsTextFile("hdfs://...")', 1)]
    
  • 相关阅读:
    activeMQ
    读写xml
    PLSQL
    oracle语法
    cxf远程调用服务
    FastDFS在linux下的安装和整合nginx实现上传图片和url访问
    dubbo和zookeeper的应用
    solr和Lucene的配置方式和应用
    win10 下安装 MongoDB 数据库支持模块(python)
    nodeJs 对 Mysql 数据库的 curd
  • 原文地址:https://www.cnblogs.com/bonelee/p/10755575.html
Copyright © 2011-2022 走看看