spark - 走看看

zoukankan html css js c++ java

spark

# Spark is a fast and general engine for large-scale data processing.

# Spark libraries

YARN

./bin/run-example SparkPi 10

./bin/spark-shell --master spark://IP:POR
./bin/spark-shell
http://192.168.1.112:8080/
http://192.168.1.112:4040/

RDD (Resilient Distributed Dataset)

# create RDD using hdfs
var textFile = sc.textFile("hdfs://localhost:9000/user/root/BUILDING.txt");
textFile.count()
textFile.first()
textFile.filter(line => line.contains("hadoop")).count()
val count = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_)
count.collect()

Some concepts
--------------------------------
RDD (resillient distributed dataset)
Task: Task is comprised of ShuffleMapTask and ResultTask. ShuffleMapTask and ResultTask are similar to Map and Reduce in Hadoop.
Job:
Stage:
Partition:
NarrowDependency:
ShuffleDependency:
DAG (Directed Acycle graph)

Core functions
--------------------------------
SparkContext

hadoop-2.7.2/etc/hadoop/core-site.xml

查看全文

相关阅读:
【YbtOJ#20064】预算缩减
 【GMOJ6805】模拟speike
【洛谷P5675】取石子游戏
 【YbtOJ#20061】波动序列
 【洛谷P4302】字符串折叠
 flash 上传文件
 HTTP 客户端发送的头格式
 FLEX 在本地使用只访问本地文件
 as3 重写
 iis7 上传限制问题

原文地址：https://www.cnblogs.com/weiweifeng/p/7489436.html

Copyright © 2011-2022 走看看