Hadoop运行环境搭建
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
一.安装JDK
博主推荐阅读: https://www.cnblogs.com/yinzhengjie/p/12199413.html
二.安装Hadoop
1>.Apache Hadoop官方网站,点击"Download"
博主推荐阅读: http://hadoop.apache.org/ https://hadoop.apache.org/docs/r2.10.0/index.html http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/release/2.10.0/CHANGES.2.10.0.html https://hadoop.apache.org/docs/r3.1.3/index.html https://hadoop.apache.org/docs/r3.1.3/hadoop-project-dist/hadoop-common/release/3.1.3/CHANGES.3.1.3.html
2>.选择要下载的Hadoop版本
Apache Hadoop发行版本下载页面: https://hadoop.apache.org/releases.html
3>.下载Apache Hadoop软件
下载地址: https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz https://downloads.apache.org/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz
[root@hadoop101.yinzhengjie.org.cn ~]# wget https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz --2020-03-10 18:24:27-- https://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz Resolving mirror.bit.edu.cn (mirror.bit.edu.cn)... 114.247.56.117, 2001:da8:204:1205::22 Connecting to mirror.bit.edu.cn (mirror.bit.edu.cn)|114.247.56.117|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 392115733 (374M) [application/octet-stream] Saving to: ‘hadoop-2.10.0.tar.gz’ 100%[========================================================================================>] 392,115,733 11.3MB/s in 34s 2020-03-10 18:25:01 (11.0 MB/s) - ‘hadoop-2.10.0.tar.gz’ saved [392115733/392115733] [root@hadoop101.yinzhengjie.org.cn ~]#
4>.解压安装文件到指定目录
[root@hadoop101.yinzhengjie.org.cn ~]# tar -zxf hadoop-2.10.0.tar.gz -C /yinzhengjie/softwares/ [root@hadoop101.yinzhengjie.org.cn ~]# [root@hadoop101.yinzhengjie.org.cn ~]# ll /yinzhengjie/softwares/hadoop-2.10.0/ total 128 drwxr-xr-x 2 12334 systemd-journal 194 Oct 23 03:23 bin #是Hadoop最基本的管理脚本和使用脚本所在的目录,这些脚本是sbin目录下管理脚本的基础实现,用户可以直接使用这些脚本管理和使用Hadoop drwxr-xr-x 3 12334 systemd-journal 20 Oct 23 03:23 etc #存放Hadoop的配置文件目录 drwxr-xr-x 2 12334 systemd-journal 106 Oct 23 03:23 include #对外提供的编程库头文件(具体的动态库和静态库在lib目录中),这些文件都是用C++定义的,通常用于C++程序访问HDFS或者编写MapReduce程序。 drwxr-xr-x 3 12334 systemd-journal 20 Oct 23 03:23 lib #包含了Hadoop对外提供的编程动态库和静态库,与include目录中的头文件结合使用。 drwxr-xr-x 2 12334 systemd-journal 239 Oct 23 03:23 libexec #各个服务对应的shell配置文件所在的目录,可用于配置日志输出目录、启动参数(比如JVM参数)等基本信息。 -rw-r--r-- 1 12334 systemd-journal 106210 Oct 23 03:23 LICENSE.txt -rw-r--r-- 1 12334 systemd-journal 15841 Oct 23 03:23 NOTICE.txt -rw-r--r-- 1 12334 systemd-journal 1366 Oct 23 03:23 README.txt drwxr-xr-x 3 12334 systemd-journal 4096 Oct 23 03:23 sbin #存放启动火停止Hadoop相关服务的脚本 drwxr-xr-x 4 12334 systemd-journal 31 Oct 23 03:23 share #存放Hadoop的依赖jar包,文档,和官方案例 [root@hadoop101.yinzhengjie.org.cn ~]#
5>.将Hadoop添加到环境变量
[root@hadoop101.yinzhengjie.org.cn ~]# cat /etc/profile.d/hadoop.sh
#Add ${HADOOP_HOME} by yinzhengjie
HADOOP_HOME=/yinzhengjie/softwares/hadoop-2.10.0
PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
[root@hadoop101.yinzhengjie.org.cn ~]#
[root@hadoop101.yinzhengjie.org.cn ~]#
[root@hadoop101.yinzhengjie.org.cn ~]# source /etc/profile.d/hadoop.sh
[root@hadoop101.yinzhengjie.org.cn ~]#
[root@hadoop101.yinzhengjie.org.cn ~]# echo $HADOOP_HOME
/yinzhengjie/softwares/hadoop-2.10.0
[root@hadoop101.yinzhengjie.org.cn ~]#
[root@hadoop101.yinzhengjie.org.cn ~]# vim /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/hadoop-env.sh
[root@hadoop101.yinzhengjie.org.cn ~]#
[root@hadoop101.yinzhengjie.org.cn ~]# grep ^export /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop/hadoop-env.sh | grep JAVA_HOME
export JAVA_HOME=/yinzhengjie/softwares/jdk1.8.0_201
[root@hadoop101.yinzhengjie.org.cn ~]#
[root@hadoop101.yinzhengjie.org.cn ~]# hadoop version
Hadoop 2.10.0
Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
Compiled by jhung on 2019-10-22T19:10Z
Compiled with protoc 2.5.0
From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
This command was run using /yinzhengjie/softwares/hadoop-2.10.0/share/hadoop/common/hadoop-common-2.10.0.jar
[root@hadoop101.yinzhengjie.org.cn ~]#
6>.创建符号连接(目的是让Hadoop的多版本的运行方式共存)
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# ll total 0 drwxr-xr-x 9 12334 systemd-journal 149 Oct 23 03:23 hadoop-2.10.0 drwxr-xr-x 7 10 143 245 Dec 16 2018 jdk1.8.0_201 [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# cp -r hadoop-2.10.0 local-mode #同于本地模式 [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# cp -r hadoop-2.10.0 pseudo-mode #用于伪分布式模式 [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# cp -r hadoop-2.10.0 fully-mode #用于完全分布式模式 [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# rm -rf hadoop-2.10.0 [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# ln -sv pseudo-mode hadoop-2.10.0 ‘hadoop-2.10.0’ -> ‘pseudo-mode’ [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# ll total 0 drwxr-xr-x 9 root root 149 Mar 10 23:41 fully-mode lrwxrwxrwx 1 root root 11 Mar 10 23:42 hadoop-2.10.0 -> pseudo-mode drwxr-xr-x 7 10 143 245 Dec 16 2018 jdk1.8.0_201 drwxr-xr-x 9 root root 149 Mar 10 23:38 local-mode drwxr-xr-x 9 root root 149 Mar 10 23:41 pseudo-mode [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]# [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares]#
三.部署Hadoop集群
Hadoop的运行模式包括本地模式(Local(Standalone) Mode),伪分布式(Pseudo-Distributed Mode),分布式(Fully-Distributed Mode)。 本地模式: 不会用到HDFS存储,而是利用本地操作系统进行存储; 不会用到YARN进行资源申请,而是利用本地操作系统进行资源调度; MapReduce也运行在本地操作系统上。 综上所述,本地模式不会启动任何Hadoop进程,无论是存储还是计算其实使用的都是本地操作系统的资源,默认情况下,Hadoop被配置为以非分布式模式作为单个Java进程运行。这对于调试很有用。 伪分布式模式: 和本地模式相同点: 都是在同一个节点上运行。 和本地模式的区别: Hadoop也可以以伪分布式模式在单节点上运行,其中每个Hadoop守护程序都在单独的Java进程中运行。换句话说,会在操作系统启动Hadoop进程,只不过Hadoop的所有进程分配到同一个节点上啦。 完全分布式模式: 和伪分布式的相同点: 都需要启动进程。 和伪分布式的区别: Hadoop也可以以分布式模式在多个节点上运行,其中每个Hadoop守护进程都在单独的节点中运行,换句话说,会在不同的操作系统上启动Hadoop进程,只不过hadoop的所有进程分配到不同的节点上啦。 博主推荐阅读: http://hadoop.apache.org/docs/r2.10.0/ http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/ClusterSetup.html
1>.本地(独立)模式
博主推荐阅读: https://www.cnblogs.com/yinzhengjie2020/p/12423980.html http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation
2>.伪分布式模式
博主推荐阅读: https://www.cnblogs.com/yinzhengjie2020/p/12424154.html http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation
3>.全分布式模式
博主推荐阅读: https://www.cnblogs.com/yinzhengjie2020/p/12424192.html http://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html#Fully-Distributed_Operation