在上一篇文章,介绍了服务器的基本配置,本文将介绍Hadoop的设置。(JDK也已经下载并解压到对应目录,环境变量也已经设置过了,本文不再赘述。)
1创建用于存储数据的目录
在三台机器上分别执行如下命令:
mkdir /data/hadoop mkdir /data/hadoop/hdfs mkdir /data/hadoop/hdfs/nn mkdir /data/hadoop/hdfs/dn mkdir /data/hadoop/hdfs/snn mkdir /data/hadoop/hdfs/tmp mkdir /data/hadoop/yarn mkdir /data/hadoop.yarn/nm
如果没有创建的权限,就使用 sudo mkdir xxx 来执行。
但需要修改这些文件夹的所属用户ubuntu,可以使用如下命令修改文件所属用户,并设置可读可写的权限:
cd /data sudo chown -R ubuntu:root * sudo chmod 766 *
2 master机器,修改hdfs相关配置文件
以下7个文件,均在 /usr/local/hadoop-2.9.2/etc/hadoop/ 目录中。
2.1 masters 文件
ubuntu@master
2.2 slaves 文件
ubuntu@slave1
ubuntu@slave2
2.3 hadoop-env.sh 文件
在此文件中加入如下语句:
export JAVA_HOME=/usr/local/jdk1.8.0_261
2.4 core-site.xml 文件
<configuration> <property> <name>hadoop.tmp.dir</name> <value>file:///data/hadoop/hdfs/tmp</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>fs.checkpoint.period</name> <value>3600</value> <description>The number of seconds between two periodic checkpoints</description> </property> <property> <name>hadoop.proxyuser.spark.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.spark.groups</name> <value>*</value> </property> <property> <name>fs.checkpoint.txns</name> <value>1000000</value> </property> </configuration>
2.5 hdfs-site.xml 文件
<configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.nameservices</name> <value>hadoop-cluster</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///data/hadoop/hdfs/nn</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:///data/hadoop/hdfs/snn</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///data/hadoop/hdfs/dn</value> </property> <property> <name>dfs.http.address</name> <value>master:50070</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>slave1:50090</value> </property> <property> <name>dfs.datanode.address</name> <value>0.0.0.0:50011</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
2.6 mapred-site.xml 文件
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>
2.7 yarn-site.xml 文件
<configuration> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.webapp.address</name> <value>master:8042</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>file:///data/hadoop/yarn/nm</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> </configuration>
3 slave1机器修改hadoop设置:
slave1中的6个文件与master中的配置相同,第7个文件(yarn-site.xml 文件),中有两个节点标记为红色,这两个节点在slave1上与master不同,详情如下:
<configuration> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.hostname</name> <value>slave1</value> </property> <property> <name>yarn.nodemanager.webapp.address</name> <value>slave1:8042</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>file:///data/hadoop/yarn/nm</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> </configuration>
4 slave2机器修改hadoop设置:
slave2中的6个文件与master中的配置相同,第7个文件(yarn-site.xml 文件),中有两个节点标记为红色,这两个节点在slave2上与master不同,详情如下:
<configuration> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.hostname</name> <value>slave2</value> </property> <property> <name>yarn.nodemanager.webapp.address</name> <value>slave2:8042</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>file:///data/hadoop/yarn/nm</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> </configuration>
5 初始化namenode ,并启动集群
在master机器上,输入如下命令初始化namenode:
cd /usr/local/hadoop-2.9.2/bin
hdfs namenode -format
启动集群命令有:
cd /usr/local/hadoop-2.9.2/sbin
./start-dfs.sh
./start-yarn.sh
./mr-jobhistory-daemon.sh start historyserver
./hadoop-daemons.sh start secondarynamenode
关闭集群命令有:
cd /usr/local/hadoop-2.9.2/sbin
./stop-dfs.sh
./stop-yarn.sh
./mr-jobhistory-daemon.sh stop historyserver
./hadoop-daemons.sh stop secondarynamenode
Hadoop部署完毕,可通过web页面查看集群状态信息:
#HDFS web页面地址
http://master公网IP:50070/
#Yarn web页面地址
http://master公网IP:8088/
Hadoop部署完毕。
警告:开发Yarn 的web地址8088端口是存在风险的,本示例服务器就被挂马并且成为挖矿的肉机,因此不要将Yarn地址开放到公网!