http://blog.jobbole.com/86232/
1. 安装lib
材料:
spark : http://spark.apache.org/downloads.html
hadoop : http://hadoop.apache.org/releases.html
jdk: http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html
hadoop-commin : https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip (for windows7)
需要下载对应的版本
步骤:
a. 安装jdk,默认步骤即可
b. 解压spark (D:spark-2.0.0-bin-hadoop2.7)
c. 解压hadoop (D:hadoop2.7)
d. 解压hadoop-common-bin(for w7)
e. copy hadoop-common-bin/bin to hadoop/bin (for w7)
2. 环境变量设置
SPARK_HOME = D:spark-2.0.0-bin-hadoop2.7
HADOOP_HOME = D:hadoop2.7
PATH append = D:spark-2.0.0-bin-hadoop2.7in;D:hadoop2.7in
3. Python lib设置
a. copy D:spark-2.0.0-bin-hadoop2.7pythonpyspark to [Your-Python-Home]Libsite-packages
b. pip install py4j
c. pip install psutil
(for windows: http://www.lfd.uci.edu/~gohlke/pythonlibs/#psutil)
4. Testing
cmd -> pyspark 不报错并且有相应的cmd