zoukankan html css js c++ java

spark 2.4 java8 hello world

download JDK 8, extract and add to .bashrc:

export JAVA_HOME=/home/bonelee/jdk1.8.0_211
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
export JRE_HOME=$JAVA_HOME/jre

download spark, unzip. and run:

./bin/spark-submit ~/src_test/spark_hello.py

spark_hello.py :

from pyspark.context import SparkContext
from pyspark.conf import SparkConf

sc = SparkContext(conf=SparkConf().setAppName("mnist_parallelize"))
text_file = sc.textFile("file:///tmp/test.txt")
counts = text_file.flatMap(lambda line: line.split(" ")) 
             .map(lambda word: (word, 1)) 
             .reduceByKey(lambda a, b: a + b)
print(counts.collect())

/tmp/test.txt

text_file = sc.textFile("hdfs://...")
counts = text_file.flatMap(lambda line: line.split(" ")) 
             .map(lambda word: (word, 1)) 
                          .reduceByKey(lambda a, b: a + b)
                          counts.saveAsTextFile("hdfs://...")

output:

[('100', 1), ('text_file', 1), ('=', 2), ('counts', 1), ('text_file.flatMap(lambda', 1), ('line.split("', 1), ('"))', 1), ('', 65), ('word:', 1), ('(word,', 1), ('1))', 1), ('b:', 1), ('sc.textFile("hdfs://...")', 1), ('line:', 1), ('\', 2), ('.map(lambda', 1), ('.reduceByKey(lambda', 1), ('a,', 1), ('a', 1), ('+', 1), ('b)', 1), ('counts.saveAsTextFile("hdfs://...")', 1)]

查看全文

相关阅读:
（转）使用 PyInstaller 把python程序 .py转为 .exe 可执行程序
 （转）使用 python Matplotlib 库绘图
 使用Matplotlib画图系列（一）
Numpy常用金融计算（一）
Windows中安装Linux子系统的详细步骤
 Centos7+Postfix+Dovecot实现内网邮件收发风行天下
 centos7系统防火墙端口转发风行天下
 Centos7+Postfix+Dovecot实现内网邮件收发
 centos7的路径含有空格Linux命令使用时路径存在空格、特殊符号（如、@等等）风行天下
 zabbix无法使用Detect operating system [zabbix] 风行天下

原文地址：https://www.cnblogs.com/bonelee/p/10755575.html