一、介绍
Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。
HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。
Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。
二、部署环境规划
1、服务器地址规划
序号 |
IP地址 |
机器名 |
类型 |
用户名 |
1 |
10.0.0.67 |
Master.Hadoop |
Namenode |
Hadoop/root |
2 |
10.0.0.68 |
Slave1.Hadoop |
Datanode |
Hadoop/root |
3 |
10.0.0.69 |
Slave2.Hadoop |
Datanode |
Hadoop/root |
2、部署环境
[root@Master ~]# cat /etc/redhat-release CentOS release 6.9 (Final) [root@Master ~]# uname -r 2.6.32-696.el6.x86_64 [root@Master ~]# /etc/init.d/iptables status iptables: Firewall is not running. [root@Master ~]# getenforce Disabled
3、统一/etc/hosts解析
10.0.0.67 Master.Hadoop 10.0.0.68 Slave1.Hadoop 10.0.0.69 Slave2.Hadoop
三、SSH无密码验证配置
1、Master操作
[root@Master ~]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa Generating public/private dsa key pair. Created directory '/root/.ssh'. Your identification has been saved in /root/.ssh/id_dsa. Your public key has been saved in /root/.ssh/id_dsa.pub. The key fingerprint is: d9:50:b7:b1:f9:aa:83:6e:34:b9:0a:10:61:b9:83:e8 root@Master.Hadoop The key's randomart image is: +--[ DSA 1024]----+ | o. . o | | ... . . = | |.... . + | |o o. + . | |. .. S.. . | | E . + . | | . . + . | | . + .. | | .+. .. | +-----------------+ [root@Master ~]# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2、分发公钥到两个Slave上面
①Slave1
[root@Slave1 ~]# scp root@Master.Hadoop:~/.ssh/id_dsa.pub ~/.ssh/master_dsa.pub The authenticity of host 'master.hadoop (10.0.0.67)' can't be established. RSA key fingerprint is b4:24:ea:5f:aa:06:3b:7c:76:93:b9:11:4c:65:70:95. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'master.hadoop,10.0.0.67' (RSA) to the list of known hosts. root@master.hadoop's password: id_dsa.pub 100% 608 0.6KB/s 00:00 [root@Slave1 ~]# cat ~/.ssh/master_dsa.pub >> ~/.ssh/authorized_keys
②Slave2
[root@Slave2 ~]# scp root@Master.Hadoop:~/.ssh/id_dsa.pub ~/.ssh/master_dsa.pub The authenticity of host 'master.hadoop (10.0.0.67)' can't be established. RSA key fingerprint is b4:24:ea:5f:aa:06:3b:7c:76:93:b9:11:4c:65:70:95. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'master.hadoop,10.0.0.67' (RSA) to the list of known hosts. root@master.hadoop's password: id_dsa.pub 100% 608 0.6KB/s 00:00 [root@Slave2 ~]# cat ~/.ssh/master_dsa.pub >> ~/.ssh/authorized_keys
③Master测试连接slave
[root@Master ~]# ssh Slave1.Hadoop Last login: Tue Aug 7 10:30:53 2018 from 10.0.0.67 [root@Slave1 ~]# exit logout Connection to Slave1.Hadoop closed. [root@Master ~]# ssh Slave2.Hadoop Last login: Tue Aug 7 10:31:04 2018 from 10.0.0.67
四、Hadoop安装及环境配置
1、Master操作
①安装JAVA环境
tar xf jdk-8u121-linux-x64.tar.gz -C /usr/local/ ln -s /usr/local/jdk1.8.0_121/ /usr/local/jdk
配置环境变量
[root@Master ~]# tail -4 /etc/profile export JAVA_HOME=/usr/local/jdk1.8.0_181 export JRE_HOME=/usr/local/jdk1.8.0_181/jre export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH export PATH=$JAVA_HOME/bin:$PATH [root@Master ~]# source /etc/profile [root@Master ~]# java -version java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
2、Hadoop安装及其环境配置
①安装
tar -xf hadoop-2.8.0.tar.gz -C /usr/ mv /usr/hadoop-2.8.0/ /usr/hadoop ###配置Hadoop环境变量### export HADOOP_HOME=/usr/hadoop export PATH=$PATH:$HADOOP_HOME/bin
②配置hadoop-env.sh生效
vim /usr/hadoop/etc/hadoop/hadoop-env.sh export JAVA_HOME=/usr/local/jdk1.8.0_181 source /usr/hadoop/etc/hadoop/hadoop-env.sh [root@Master usr]# hadoop version #查看Hadoop版本 Hadoop 2.8.0 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 91f2b7a13d1e97be65db92ddabc627cc29ac0009 Compiled by jdu on 2017-03-17T04:12Z Compiled with protoc 2.5.0 From source with checksum 60125541c2b3e266cbf3becc5bda666 This command was run using /usr/hadoop/share/hadoop/common/hadoop-common-2.8.0.jar
③创建Hadoop所需的子目录
mkdir /usr/hadoop/{tmp,hdfs}
mkdir /usr/hadoop/hdfs/{name,tmp,data} -p
④修改Hadoop核心配置文件core-site.xml,配置是HDFS master(即namenode)的地址和端口号
vim /usr/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop/tmp</value>
<final>true</final>
<!--(备注:请先在 /usr/hadoop 目录下建立 tmp 文件夹) -->
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://10.0.0.67:9000</value>
<!-- hdfs://Master.Hadoop:22-->
<final>true</final>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
⑤配置hdfs-site.xml文件
vim /usr/hadoop/etc/hadoop/hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.name.dir</name> <value>/usr/hadoop/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/hadoop/hdfs/data</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master.hadoop:9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
⑥配置mapred-site.xml文件
vim /usr/hadoop/etc/hadoop/mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
⑦配置yarn-site.xml文件
vim /usr/hadoop/etc/hadoop/yarn-site.xml <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.address</name> <value>Master.Hadoop:18040</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>Master.Hadoop:18030</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>Master.Hadoop:18088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>Master.Hadoop:18025</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>Master.Hadoop:18141</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
⑧配置masters、slaves文件
echo "10.0.0.67" >/usr/hadoop/etc/hadoop/masters echo -e "10.0.0.68 10.0.0.69" >/usr/hadoop/etc/hadoop/slaves
查看
[root@Master hadoop]# cat /usr/hadoop/etc/hadoop/masters 10.0.0.67 [root@Master hadoop]# cat /usr/hadoop/etc/hadoop/slaves 10.0.0.68 10.0.0.69
3、Slave服务器安装及配置
①拷贝jdk到Slave
scp -rp /usr/local/jdk1.8.0_181 root@Slave1.Hadoop:/usr/local/ scp -rp /usr/local/jdk1.8.0_181 root@Slave2.Hadoop:/usr/local/
②拷贝环境变量/etc/profile
scp -rp /etc/profile root@Slave1.Hadoop:/etc/
scp -rp /etc/profile root@Slave2.Hadoop:/etc/
③拷贝/usr/hadoop
scp -rp /usr/hadoop root@Slave1.Hadoop:/usr/
scp -rp /usr/hadoop root@Slave2.Hadoop:/usr/
到此环境搭建完毕
五、启动及验证Hadoop集群
1、启动
①格式化HDFS文件系统
/usr/hadoop/sbin/hadoop namenode –format
②启动Hadoop集群所有节点
sh /usr/hadoop/sbin/start-all.sh
查看hadoop进程
[root@Master sbin]# ps -ef|grep hadoop root 1523 1 3 16:37 ? 00:00:07 /usr/local/jdk1.8.0_181/bin/java -Dproc_secondarynamenode -Xmx1000m -Djava.library.path=/usr/hadoop/lib -Dhadoop.log.dir=/usr/hadoop/logs -Dhadoop.log.file=hadoop-root-secondarynamenode-Master.Hadoop.log -Dhadoop.home.dir=/usr/hadoop -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode root 1670 1 11 16:37 pts/0 00:00:19 /usr/local/jdk1.8.0_181/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.home.dir= -Dyarn.id.str=root -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.log.file=yarn-root-resourcemanager-Master.Hadoop.log -Dyarn.home.dir=/usr/hadoop -Dhadoop.home.dir=/usr/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -classpath /usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/share/hadoop/common/lib/*:/usr/hadoop/share/hadoop/common/*:/usr/hadoop/share/hadoop/hdfs:/usr/hadoop/share/hadoop/hdfs/lib/*:/usr/hadoop/share/hadoop/hdfs/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/mapreduce/lib/*:/usr/hadoop/share/hadoop/mapreduce/*:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/contrib/capacity-scheduler/*.jar:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager root 1941 1235 0 16:40 pts/0 00:00:00 grep --color=auto hadoop
③关闭Hadoop集群所有节点
sh /usr/hadoop/sbin/stop-all.sh
Slave1.Hadoop Slave2.Hadoop查看hadoop进程
[root@Slave1 ~]# ps -ef|grep hadoop root 1271 1 2 16:37 ? 00:00:12 /usr/local/jdk1.8.0_181/bin/java -Dproc_datanode -Xmx1000m -Djava.library.path=/usr/hadoop/lib -Dhadoop.log.dir=/usr/hadoop/logs -Dhadoop.log.file=hadoop-root-datanode-Slave1.Hadoop.log -Dhadoop.home.dir=/usr/hadoop -Dhadoop.id.str=root -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode root 1363 1 4 16:37 ? 00:00:19 /usr/local/jdk1.8.0_181/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.home.dir= -Dyarn.id.str=root -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/usr/hadoop/logs -Dyarn.log.dir=/usr/hadoop/logs -Dhadoop.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.log.file=yarn-root-nodemanager-Slave1.Hadoop.log -Dyarn.home.dir=/usr/hadoop -Dhadoop.home.dir=/usr/hadoop -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop/lib/native -classpath /usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/etc/hadoop:/usr/hadoop/share/hadoop/common/lib/*:/usrhadoop/share/hadoop/common/*:/usr/hadoop/share/hadoop/hdfs:/usr/hadoop/share/hadoop/hdfs/lib/*:/usr/hadoop/share/hadoop/hdfs/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/mapreduce/lib/*:/usr/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/usr/hadoop/share/hadoop/yarn/*:/usr/hadoop/share/hadoop/yarn/lib/*:/usr/hadoop/etc/hadoop/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager root 1499 1238 0 16:45 pts/0 00:00:00 grep --color=auto hadoop
2、使用jps命令测试
①Master
[root@Master ~]# jps 11329 NameNode 11521 SecondaryNameNode 12269 Jps 11677 ResourceManager
②Slave
[root@Slave1 ~]# jps 4320 Jps 4122 NodeManager 4012 DataNode
3、Master上面查看集群状态
[root@Master ~]# hadoop dfsadmin -report DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 18/08/08 11:16:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Configured Capacity: 38020816896 (35.41 GB) Present Capacity: 27476373504 (25.59 GB) DFS Remaining: 27476316160 (25.59 GB) DFS Used: 57344 (56 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Pending deletion blocks: 0 ------------------------------------------------- Live datanodes (2): Name: 10.0.0.68:50010 (Slave1.Hadoop) Hostname: Slave1.Hadoop Decommission Status : Normal Configured Capacity: 19010408448 (17.70 GB) DFS Used: 28672 (28 KB) Non DFS Used: 4270084096 (3.98 GB) DFS Remaining: 13767794688 (12.82 GB) DFS Used%: 0.00% DFS Remaining%: 72.42% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Wed Aug 08 11:16:09 CST 2018 Name: 10.0.0.69:50010 (Slave2.Hadoop) Hostname: Slave2.Hadoop Decommission Status : Normal Configured Capacity: 19010408448 (17.70 GB) DFS Used: 28672 (28 KB) Non DFS Used: 4329357312 (4.03 GB) DFS Remaining: 13708521472 (12.77 GB) DFS Used%: 0.00% DFS Remaining%: 72.11% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Wed Aug 08 11:16:08 CST 2018
4、通过web页面查看集群状态
http://10.0.0.67:50070