1.修改主机名:vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=node1
2.修改域名映射:vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 //有没有都可以
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 //有没有都可以
192.168.10.11 node1
192.168.10.12 node2
192.168.10.13 node3
192.168.10.14 node4
3.设置date同步:
1)yum install ntp //如果服务器中没有则安装
1.1)chkconfig ntpd on //设置开机自启
2) ntpdate ntp.api.bz //时间服务器
3)service ntpd start/stop/restart/reload
4)设置定时同步:crontab -e
*/10 * * * * ntpdate time.nist.gov //每隔10分钟同步一次
4.1)我们可以通过 chkconfig --list | grep cron 命令来查看cron服务的启动情况
crond 0:关闭 1:关闭 2:启用 3:启用 4:启用 5:启用 6:关闭
系统启动级别如果是1-4,cron服务都会开机自动启动的
4.2)设置crond开机自启:chkconfig crond on
4.3) crontab使用参数
-e [UserName]: 执行文字编辑器来设定时程表,内定的文字编辑器是 vi
-r [UserName]: 删除目前的时程表
-l [UserName]: 列出目前的时程表
-v [UserName]:列出用户cron作业的状态
4.关闭防火墙:chkconfig iptables off
5.关闭安全机制:vi /etc/selinux/config
SELINUX=disabled
SELINUXTYPE=targeted
6.ssh免密登录
1)yum list | grep ssh
2)yum install -y openssh-server openssh-clients
3)service sshd start
4)chkconfig sshd on
5)ssh-keygen // 生成秘钥
6) ssh-copy-id node1 // 免密登录 当前服务器可以免密登录node1
设置namenode和resourcemanager服务器免密登录所有服务器(namenode + datanode)
7.Hadoop完全分布式集群搭建:
1)配置文件
1.1 vi + /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_171
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.6.5
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
1.2 hadoop-env.sh mapred-env.sh yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_171
1.3 hdfs-core.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/data/hadoop</value>
</property>
1.4 hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node2:50090</value>
</property>
1.5 slaves
node2
node3
node4
1.6 格式化文件系统:./bin/hdfs namenode -format
查看帮助:./bin/hdfs namenode -h
1.7 启动集群:./sbin/start-dfs.sh
1.8 查看web UI: IP:50070:
node1:50070
1.9 帮助:
hdfs
hdfs dfs
创建目录:hdfs dfs -mkdir -p /user/root
查看目录: hdfs dfs -ls /
上传文件: hdfs dfs -put hadoop-2.6.5.tar.gz /user/root
1.10 停止集群:./sbin/stop-dfs.sh
8.Hadoop-HA搭建
1)配置文件
1.1 vi + /etc/profile
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_171
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.6.5
#ZOOKEEPER_HOME
export ZOOKEEPER_HOME=/opt/module/zookeeper-3.4.6
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin
1.2 hadoop-env.sh mapred-env.sh yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_171
1.3 hdfs-core.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/data/hadoop</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node2:2181,node3:2181,node4:2181</value>
</property>
1.4 hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>node1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>node2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<!-- 如果文件是id_dsa这后边需要改成id_dsa -->
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/data/hadoop/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
1.5 slaves
node2
node3
node4
1.6 zookeeper集群搭建
zoo.cfg
tickTime=2000
dataDir=/opt/data/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=node2:2888:3888
server.2=node3:2888:3888
server.3=node4:2888:3888
/opt/data/zookeeper/myid 内容分别是[1,2,3]
1.7 每个zk节点上都执行:zkServer.sh start
看是否启动成功:zkServer.sh status
1.8 每个journalnode节点都执行:hadoop-daemon.sh start journalnode //必须在启动Hadoop集群之前先启动journalnode
1.9 同步编辑日志
如果已有集群并且是单namenode
hdfs namenode -initializeSharedEdits(在已经format的namenode上执行)
hadoop-daemon.sh start namenode
hdfs namenode -bootstrapStandby(没有format的namenode上执行)
如果是新建集群
hdfs namenode -format
hadoop-daemon.sh start namenode
hdfs namenode -bootstrapStandby(没有format的namenode上执行)
1.10 格式化zookeeper并启动
hdfs zkfc -formatZK(在其中一台namenode节点上格式化即可)
hadoop-daemon.sh start zkfc(两台zkfc(也就是namenode)节点都启动)或者直接全部启动start-dfs.sh
9.yarn搭建
1)配置文件
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node4</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node2:2181,node3:2181,node4:2181</value>
</property>
2)启动
start-yarn.sh (这个只启动nodemanager)
yarn-daemon.sh start resourcemanager (在两台resourcemanager节点上都启动)
3)测试wordcount
hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /user/jqbai/test.txt /user/jqbai/wordcount
10.搭建 windows开发环境
添加环境变量:
1)HADOOP_USER_NAME=root
2)HADOOP_HOME=D:softwarehadoop-2.6.5(这是Windows下专用的)