前置条件
- 各软件版本:hadoop-2.7.7、hbase-2.1.5 、jdk1.8.0_211、zookeeper-3.4.10、apache-phoenix-5.0.0-HBase-2.0-bin.tar.gz
- 至少 3 台 Centos 服务器,主机名分别为:hadoop0001、hadoop0002、hadoop0003
- 这里所有的软件将安装在 hadoop 用户的 /home/hadoop/app 目录下
- 在每台服务器设置 hosts
[hadoop@hadoop0001 ~]$ vim /etc/hosts
host 内容如下:
# 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
# ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.2.1.102 hadoop0001
10.2.1.103 hadoop0002
10.2.1.104 hadoop0003
- ssh 免密登录(此步骤可以忽略,但 Hadoop 每次启动都需要输入密码)
在 hadoop0001 终端执行以下命令:
[hadoop@hadoop0001 ~]$ ssh-keygen -t rsa -P "" //一直回车即可
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0002:~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0003:~/.ssh/authorized_keys
在 hadoop0002 终端执行以下命令:
[hadoop@hadoop0001 ~]$ ssh-keygen -t rsa -P "" //一直回车即可
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0001:~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0003:~/.ssh/authorized_keys
在 hadoop0003 终端执行以下命令:
[hadoop@hadoop0001 ~]$ ssh-keygen -t rsa -P "" //一直回车即可
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0001:~/.ssh/authorized_keys
[hadoop@hadoop0001 ~]$ cat ~/.ssh/id_rsa.pub >> hadoop@hadoop0002:~/.ssh/authorized_keys
验证免密登录
[hadoop@hadoop0001 ~]$ ssh localhost
Last login: Fri Jan 4 13:45:54 2019 //出现这个结果表示免密登录成功
- JDK 安装
JDK 版本:
Linux:jdk-8u192-linux-x64.tar.gz
JDK 环境变量配置:
# 用户家目录下
[hadoop@hadoop0001 ~]$ vim .bashrc
添加以下内容:
JAVA_HOME=/home/hadoop/app/jdk1.8.0_192
CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
PATH=$JAVA_HOME/bin:$HOME/bin:$HOME/.local/bin:$PATH
最后使环境变量生效:
# 用户家目录下
[hadoop@hadoop0001 ~]$ . .bashrc
JDK 验证:
java -version
java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode) java -version
将 hadoop0001 的 JDK 复制到其他服务器上
[hadoop@hadoop0001 app]$ scp -r jdk1.8.0_192/ hadoop@hadoop0002:~/app/jdk1.8.0_192/
[hadoop@hadoop0001 app]$ scp -r jdk1.8.0_192/ hadoop@hadoop0003:~/app/jdk1.8.0_192/
[hadoop@hadoop0001 ~]$ scp /etc/profile hadoop@hadoop0002:/etc/profile
[hadoop@hadoop0001 ~]$ scp /etc/profile hadoop@hadoop0003:/etc/profile
- NTP 服务搭建
每台服务器上安装 ntp
[hadoop@hadoop0001 ~]$ yum install -y ntp
hadoop0001 配置 ntp
[hadoop@hadoop0001 ~]$ vim /etc/ntp.conf
添加以下配置:
restrict 10.2.1.0 mask 255.255.255.0 nomodify notrap
logfile /var/log/ntpd.log
server ntp1.aliyun.com
server ntp2.aliyun.com
server ntp3.aliyun.com
server 127.0.0.1
fudge 127.0.0.1 stratum 10
完整配置文件(ntp.conf):
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
logfile /var/log/ntpd.log
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
restrict 10.2.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
server ntp1.aliyun.com
server ntp2.aliyun.com
server ntp3.aliyun.com
server 127.0.0.1
fudge 127.0.0.1 stratum 10
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
# Disable the monitoring facility to prevent amplification attacks using ntpdc
# monlist command when default restrict does not include the noquery flag. See
# CVE-2013-5211 for more details.
# Note: Monitoring will not be disabled with the limited restriction flag.
disable monitor
时间服务器可参考:https://www.pool.ntp.org/zone/asia
时间同步:
[hadoop@hadoop0001 ~]$ sudo ntpdate -u ntp1.aliyun.com
16 Jul 16:46:39 ntpdate[12700]: adjust time server 120.25.115.20 offset -0.002546 sec
启动时间服务:
[hadoop@hadoop0001 ~]$ sudo systemctl start ntpd
时间服务开机自启:
[hadoop@hadoop0001 ~]$ sudo systemctl enable ntpd
在 hadoop0002 和 hadoop0003 配置 ntp 客户端
在 /etc/ntp.conf 配置如下代码
server hadoop0001
查看 ntp 是否同步
如下表示未同步
[root@hadoop0002 ~]# ntpstat
unsynchronised
time server re-starting
polling server every 8 s
如下表示已同步
[root@hadoop0001 ~]# ntpstat
synchronised to NTP server (120.25.115.20) at stratum 3
time correct to within 976 ms
polling server every 64 s
注意:同步需要 10 分钟左右
Hadoop 安装
下载 Hadoop
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
解压 Hadoop
tar -zxvf hadoop-2.7.7.tar.gz
配置 hadoop-env.sh
# 根据实际业务需要配置
export HADOOP_HEAPSIZE=1024
配置 mapred-env.sh
export JAVA_HOME=${JAVA_HOME}
配置 yarn-env.sh
# 根据实际业务需要配置
JAVA_HEAP_MAX=-Xmx512m
YARN_HEAPSIZE=1024
配置 core-site.xml
<!-- hdfs 端口 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop0001:8020</value>
</property>
<!-- hadoop 临时数据目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/application/hadoop-2.7.7/data</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>14400</value>
</property>
配置 yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop0001</value>
<discription>指定 YARN 的 ResourceManager 的地址</discription>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<discription>日志聚集功能</discription>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<discription>Reducer 获取数据方式</discription>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
<discription>日志保留时间设置 7 天</discription>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>15000</value>
<discription>每个节点可用内存,单位MB</discription>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb