0. 准备工作
- 3台主机
- java环境
- host配置
- 免密登陆
1. 安装包下载
我们先在Master节点下载Hadoop包,然后修改配置,随后复制到其他Slave节点稍作修改就可以了。
- 下载安装包,创建Hadoop目录
#下载
wget http://http://apache.claz.org/hadoop/common/hadoop-3.2.1//hadoop-3.2.1.tar.gz
#解压到 /usr/local 目录
sudo tar -xzvf hadoop-3.2.1.tar.gz -C /usr/local
#修改hadoop的文件权限
sudo chown -R ubuntu:ubuntu hadoop-3.2.1.tar.gz
#重命名文件夹
sudo mv hadoop-3.2.1 hadoop
#建议使用软连接
ln -s hadoop-3.2.1 hadoop
2. 配置Master节点的Hadoop环境变量
2.1 全局环境变量
[root@hadoop01 hadoop]# vi /etc/profile
export HADOOP_HOME=/apprun/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
[root@hadoop01 apprun]# source /etc/profile
[root@hadoop01 ~]# echo $HADOOP_HOME
/apprun/hadoop
2.2 hadoop 环境变量
hadoop-env.sh中的JAVA_HOME参数,请指向自己系统的jdk路径。该参数是hadoop运行时读取环境变量的路径。(整个脚本就这一行,其他都是注释掉的)
[root@hadoop01 hadoop]# vi etc/hadoop/hadoop-env.sh
export JAVA_HOME=/apprun/jdk
export HADOOP_HOME=/apprun/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
3. 配置Master节点
Hadoop 的各个组件均用XML文件进行配置, 配置文件都放在 /usr/local/hadoop/etc/hadoop
目录中:
- core-site.xml:配置通用属性,例如HDFS和MapReduce常用的I/O设置等
- hdfs-site.xml:Hadoop守护进程配置,包括namenode、辅助namenode和datanode等
- mapred-site.xml:MapReduce守护进程配置
- yarn-site.xml:资源调度相关配置
3.1 编辑core-site.xml
文件,修改内容如下:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/apprun/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:9000</value>
</property>
</configuration>
参数说明:
- fs.defaultFS:默认文件系统,HDFS的客户端访问HDFS需要此参数
- hadoop.tmp.dir:指定Hadoop数据存储的临时目录,其它目录会基于此路径, 建议设置到一个足够空间的地方,而不是默认的/tmp下
如没有配置
hadoop.tmp.dir
参数,系统使用默认的临时目录:/tmp/hadoo-hadoop。而这个目录在每次重启后都会被删除,必须重新执行format才行,否则会出错。
3.2 编辑hdfs-site.xml
,修改内容如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/apprun/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/apprun/hadoop/hdfs/data</value>
</property>
<!--namenode和second-->
<property>
<name>dfs.http.address</name>
<value>hadoop01:50070</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>hadoop02:50090</value>
</property>
</configuration>
参数说明:
- dfs.replication:数据块副本数
- dfs.name.dir:指定namenode节点的文件存储目录
- dfs.data.dir:指定datanode节点的文件存储目录
3.3 编辑mapred-site.xml
,修改内容如下:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
3.4 编辑yarn-site.xml
,修改内容如下:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME</value>
</property>
</configuration>
3.5 编辑workers
, 修改内容如下:
hadoop01
hadoop02
hadoop03
4. 配置worker 的 Slave节点
将Master节点配置好的Hadoop打包,发送到其他两个节点:
# 打包hadoop包
tar -czf hadoop.tar.gz /apprun/hadoop
# 拷贝到其他两个节点
scp hadoop.tar.gz root@hadoop02:~
scp hadoop.tar.gz root@hadoop03:~
在其他节点加压Hadoop包到/usr/local
目录
sudo tar -xzvf hadoop.tar.gz -C /apprun/
配置Slave1和Slaver2两个节点的Hadoop环境变量:
[root@hadoop01 hadoop]# vi /etc/profile
export HADOOP_HOME=/apprun/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
[root@hadoop01 apprun]# source /etc/profile
[root@hadoop01 ~]# echo $HADOOP_HOME
/apprun/hadoop
启动集群
4.1 格式化HDFS文件系统
进入Master节点的Hadoop目录,执行一下操作:
bin/hadoop namenode -format
格式化namenode,第一次启动服务前执行的操作,以后不需要执行。
如果在后面的日志信息中能看到这一行,则说明 namenode 格式化成功。
common.Storage: Storage directory /apprun/elk/hadoop_repo/dfs/name has been successfully formatted.
4.2 启动Hadoop集群
服务启动
[root@hadoop01 hadoop]# sbin/start-all.sh
Starting namenodes on [hadoop01]
Starting datanodes
Starting secondary namenodes [hadoop02]
2020-08-21 14:08:09,698 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Starting nodemanagers
[root@hadoop01 hadoop]#
[root@hadoop01 hadoop]# jps
31188 NodeManager
30439 NameNode
30615 DataNode
31531 Jps
服务状态
[root@hadoop01 hadoop]# hadoop dfsadmin -report
WARNING: Use of this script to execute dfsadmin is deprecated.
WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.
2020-08-21 14:13:48,047 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 303759667200 (282.90 GB)
Present Capacity: 261995925504 (244.00 GB)
DFS Remaining: 261995827200 (244.00 GB)
DFS Used: 98304 (96 KB)
DFS Used%: 0.00%
Replicated Blocks:
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
Erasure Coded Block Groups:
Low redundancy block groups: 0
Block groups with corrupt internal blocks: 0
Missing block groups: 0
Low redundancy blocks with highest priority to recover: 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (3):
Name: 10.20.1.188:9866 (hadoop01)
Hostname: hadoop01
Decommission Status : Normal
Configured Capacity: 101253222400 (94.30 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 6546034688 (6.10 GB)
DFS Remaining: 89563734016 (83.41 GB)
DFS Used%: 0.00%
DFS Remaining%: 88.46%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Aug 21 14:13:47 CST 2020
Last Block Report: Fri Aug 21 14:08:05 CST 2020
Num of Blocks: 0
Name: 10.20.1.189:9866 (hadoop02)
Hostname: hadoop02
Decommission Status : Normal
Configured Capacity: 101253222400 (94.30 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 10085851136 (9.39 GB)
DFS Remaining: 86023917568 (80.12 GB)
DFS Used%: 0.00%
DFS Remaining%: 84.96%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Aug 21 14:13:47 CST 2020
Last Block Report: Fri Aug 21 14:08:05 CST 2020
Num of Blocks: 0
Name: 10.20.1.190:9866 (hadoop03)
Hostname: hadoop03
Decommission Status : Normal
Configured Capacity: 101253222400 (94.30 GB)
DFS Used: 32768 (32 KB)
Non DFS Used: 9701593088 (9.04 GB)
DFS Remaining: 86408175616 (80.47 GB)
DFS Used%: 0.00%
DFS Remaining%: 85.34%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Aug 21 14:13:47 CST 2020
Last Block Report: Fri Aug 21 14:08:05 CST 2020
Num of Blocks: 0
[root@hadoop01 hadoop]#
服务关闭
[root@hadoop01 hadoop]# sbin/stop-all.sh
Stopping namenodes on [hadoop01]
Stopping datanodes
Stopping secondary namenodes [hadoop02]
2020-08-21 14:06:07,290 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping nodemanagers
hadoop01: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
hadoop02: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
hadoop03: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
Stopping resourcemanager
[root@hadoop01 hadoop]#