Hadoop 安装好之后,开始安装 Spark。
环境:ubuntu16.04,hadoop 2.7.2
选择spark1.6.1,基于hadoop2.6的预编译版本。官网:http://spark.apache.org/downloads.html
检查:
md5sum spark-1.6.1-bin-hadoop2.6.tgz
下载后,执行如下命令进行安装:
sudo tar -zxf ~/下载/spark-1.6.0-bin-without-hadoop.tgz -C /usr/local/
cd /usr/local
sudo mv ./spark-1.6.0-bin-without-hadoop/ ./spark
sudo chown -R hadoop:hadoop ./spark # 此处的 hadoop 为你的用户名
安装后,需要在 ./conf/spark-env.sh 中修改 Spark 的 Classpath,执行如下命令拷贝一个配置文件:
cd /usr/local/spark
cp ./conf/spark-env.sh.template ./conf/spark-env.sh
编辑 ./conf/spark-env.sh(vim ./conf/spark-env.sh
),在最后面加上如下一行:
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
./bin/run-example SparkPi
cd spark/
- ./bin/run-example SparkPi 2>&1 |grep "Pi"
./bin/spark-submit examples/src/main/python/pi.py
spark shell 交互,支持python和scala,
./bin/spark-shell
参考/转载:http://www.powerxing.com/spark-quick-start-guide/