hadoop集群搭建
Table of Contents
1 集群规划
三台虚拟机
操作系统:CentOS7 Minimal
通过桥接方式联网(NAT和Host-only应该也可以)
IP地址分别是:
192.168.1.101
192.168.1.102
192.168.1.103
2 集群基本配置
修改三台机器的/etc/hosts文件,增加如下内容:
192.168.1.101 master 192.168.1.102 slave1 192.168.1.103 slave2
分别修改/etc/hostname,内容为master/slave1/slave2
3 配置ssh免密码访问
在slave1中
su vim /etc/ssh/sshd_config StrictModes no RSAAuthentication yes PubkeyAuthentication yes /bin/systemctl restart sshd.service mkdir .ssh
在master中
ssh-keygen -t dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys cat ~/.ssh/id_dsa.pub | ssh galaxy@slave1 'cat - >> ~/.ssh/authorized_keys' ssh slave1
用同样的方式处理slave2和master(master自己也要能够ssh免密码访问)
参考资料:
http://my.oschina.net/u/1169607/blog/175899
http://segmentfault.com/a/1190000002911599
4 安装hadoop
省略
5 配置hadoop
hadoop-env.sh
#export JAVA_HOME=$JAVA_HOME //错误,不能这么改 export JAVA_HOME=/usr/java/jdk1.8.0_45
core-site.xml
<configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/tmp</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapred.job.tracker</name> <value>master:9001</value> </property> </configuration>
masters
master
slaves
slave1 slave2
将配置复制到另外两台机器上
scp etc/hadoop/* galaxy@slave1:/home/galaxy/hadoop-2.5.1/etc/hadoop/ scp etc/hadoop/* galaxy@slave2:/home/galaxy/hadoop-2.5.1/etc/hadoop/
6 启动hadoop集群
格式化namenode
./bin/hadoop namenode -format 出现:15/11/09 19:25:59 INFO common.Storage: Storage directory /tmp/dfs/name has been successfully formatted.
启动hadoop
./sbin/start-dfs.sh
通过jps验证是否都正常运行
[galaxy@master hadoop-2.5.1]$ jps 5924 ResourceManager 6918 SecondaryNameNode 7718 Jps 6743 NameNode [galaxy@slave1 ~]$ jps 6402 Jps 6345 DataNode [galaxy@slave2 ~]$ jps 25552 Jps 25495 DataNode
7 查看集群状态
命令行方式
./bin/hdfs dfsadmin -report Configured Capacity: 0 (0 B) Present Capacity: 0 (0 B) DFS Remaining: 0 (0 B) DFS Used: 0 (0 B) DFS Used%: NaN% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0
网页方式
http://192.168.1.101:50070
注意:需要关闭centos7的防火墙:systemctl stop firewalld
8 运行测试程序
创建本地测试文件
mkdir input vim input/f1 vim input/f2
创建hadoop目录
./bin/hadoop fs -mkdir /tmp ./bin/hadoop fs -mkdir /tmp/input ./bin/hadoop fs -ls /
上传测试文件
./bin/hadoop fs -put input/ /tmp 注意:需要关闭所有节点centos7的防火墙:systemctl stop firewalld,否则上传文件会报错 ./bin/hadoop fs -ls /tmp/input -rw-r--r-- 2 galaxy supergroup 16 2015-11-11 04:30 /tmp/input/f1 -rw-r--r-- 2 galaxy supergroup 24 2015-11-11 04:30 /tmp/input/f2
运行wordcount
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar wordcount /tmp/input /output
查看输出结果
[galaxy@master hadoop-2.5.1]$ ./bin/hadoop fs -ls /output Found 2 items -rw-r--r-- 2 galaxy supergroup 0 2015-11-11 04:44 /output/_SUCCESS -rw-r--r-- 2 galaxy supergroup 31 2015-11-11 04:44 /output/part-r-00000 [galaxy@master hadoop-2.5.1]$ ./bin/hadoop fs -cat /output/* bye 2 hadoop 2 hello 2 world 1