1、首先准备4台能上网的机器,主机名叫shizhan01,shizhan02,shizhan03,shizhan04
2、修改主机名和IP的映射关系 vim /etc/hosts 如下:
每一台机器都要配置
192.168.137.200 shizhan01
192.168.137.201 shizhan02
192.168.137.202 shizhan03
192.168.137.203 shizhan04
3、关闭防火墙
#查看防火墙状态
service iptables status
#关闭防火墙
service iptables stop
#查看防火墙开机启动状态
chkconfig iptables --list
#关闭防火墙开机启动
chkconfig iptables off
4、hadoop用户获取root权限
vim /etc/sudoers 加入下面一行红色标记
root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL
5、重启Linux
reboot
6、安装JDK
2.1上传alt+p 后出现sftp窗口,然后put d:xxxyylljdk-7u_65-i585.tar.gz(windows的路径)
解压jdk
#创建文件夹
mkdir /home/hadoop/app
#解压
tar -zxvf jdk-7u55-linux-i586.tar.gz -C /home/hadoop/app
7、将java添加到环境变量中
vim /etc/profile
#在文件最后添加
export JAVA_HOME=/home/hadoop/app/jdk-7u_65-i585
export PATH=$PATH:$JAVA_HOME/bin
重新加载环境变量配置
source /etc/profile
8、安装hadoop2.6.x
先上传hadoop的安装包到服务器上去/home/hadoop/
注意:hadoop2.x的配置文件$HADOOP_HOME/etc/hadoop
分布式需要修改5个配置文件
8.1配置hadoop
第一个:hadoop-env.sh
vim hadoop-env.sh
#第27行
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_25
第二个:core-site.xml
<!-- 指定HADOOP所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://shizhan01:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hdpdata</value></property>
第三个:hdfs-site.xml
<!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- secondary启动后可以访问的界面,地址:http://shizhan02:50090 -->
<property>
<name>dfs.secondary.http.address</name>
<value>shizhan02:50090</value>
</property>
第四个:mapred-site.xml
mv mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<!-- 指定mapreduce运行在yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
第五个:yarn-site.xml
<!-- 指定YARN的老大(ResourceManager)的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>shizhan01</value>
</property>
<property>
<!-- reducer获取数据的方式 ,使用到的服务-->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
8.2将hadoop添加到环境变量
vim /etc/proflie
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_25
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.4
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
8.3格式化namenode(是对namenode进行初始化)
hdfs namenode -format
8.4启动hadoop
HDFS
start-dfs.sh
再启动YARN,这步自己做实验时忘记了,下面有讲
start-yarn.sh
8.5验证是否启动成功
使用jps命令验证
4809 ResourceManager
4670 SecondaryNameNode
4487 NameNode
7075 Jps
3542 Jps
2779 NodeManager 如果没有启动yarn这个是没有的
2665 DataNode
http://shizhan01:50070 (HDFS管理界面)
9、运行wordcount
使用hadoop fs -mkdir /wordcount/input在hdfs下创建一个目录
运行wordcount ,cd 到目录/home/hadoop/app/hadoop-2.6.4/share/hadoop/mapreduce
运行命令:hadoop jar hadoop-mapreduce-examples-2.6.4.jar wordcount /wordcount/input /wordcount/output
报如下错:测试
18/07/22 01:00:46 INFO client.RMProxy: Connecting to ResourceManager at shizhan01/192.168.137.200:8032
18/07/22 01:00:47 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/07/22 01:00:48 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/07/22 01:00:50 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/07/22 01:00:51 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/07/22 01:00:52 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/07/22 01:00:53 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 5 time(s); retry
原因:yarn没有运行,使用start-yarn.sh,这个问题得到解决但是又报如下错
192694875_0001 failed 2 times due to Error launching appattempt_1532192694875_0001_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1532352626322 found 1532193319198
Note: System times on machines may be out of sync. Check system time and time zones.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:251)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
原因:集群的时间没有同步,也就是hadoop的namenode,datanode时间不一致出的错
解决办法:
多个datanode与namenode进行时间同步,在每台服务器执行如下两个命令进行同步
这里面用的是亚洲上海
1)“cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime”
2)“ntpdate pool.ntp.org”
再运行
运行命令:hadoop jar hadoop-mapreduce-examples-2.6.4.jar wordcount /wordcount/input /wordcount/output
成功如下:
18/07/28 14:52:26 INFO client.RMProxy: Connecting to ResourceManager at shizhan01/192.168.137.200:8032
18/07/28 14:52:27 INFO input.FileInputFormat: Total input paths to process : 2
18/07/28 14:52:27 INFO mapreduce.JobSubmitter: number of splits:2
18/07/28 14:52:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532760644619_0001
18/07/28 14:52:28 INFO impl.YarnClientImpl: Submitted application application_1532760644619_0001
18/07/28 14:52:28 INFO mapreduce.Job: The url to track the job: http://shizhan01:8088/proxy/application_1532760644619_0001/
18/07/28 14:52:28 INFO mapreduce.Job: Running job: job_1532760644619_0001
18/07/28 14:52:36 INFO mapreduce.Job: Job job_1532760644619_0001 running in uber mode : false
18/07/28 14:52:36 INFO mapreduce.Job: map 0% reduce 0%
18/07/28 14:52:54 INFO mapreduce.Job: map 50% reduce 0%
18/07/28 14:52:57 INFO mapreduce.Job: map 100% reduce 0%
18/07/28 14:53:03 INFO mapreduce.Job: map 100% reduce 100%
18/07/28 14:53:03 INFO mapreduce.Job: Job job_1532760644619_0001 completed successfully
18/07/28 14:53:03 INFO mapreduce.Job: Counters: 49