zoukankan      html  css  js  c++  java
  • 【Hadoop】单机、伪分布式、完全分布式集群搭建

    搭建Hadoop本地模式

    本地模式就是单机装hadoop。

    安装hadoop

    上传hadoop包

    通过winSCP上传hadoop包到/opt/soft/文件夹下

    [root@bigdata111 soft]# ls
    hadoop-2.8.4.tar.gz  jdk-8u144-linux-x64.tar.gz
    

    解压hadoop

    解压hadoop到/opt/module/下

    [root@bigdata111 module]# tar -zvxf hadoop-2.8.4.tar.gz -C /opt/module/
    [root@bigdata111 soft]# cd /opt/module/
    [root@bigdata111 module]# ls
    hadoop-2.8.4  jdk1.8.0_144
    

    设置hadoop环境变量

    [root@bigdata111 module]# vi /etc/profile
    

    末尾添加如下配置,保存退出:

    export HADOOP_HOME=/opt/module/hadoop-2.8.4
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    

    刷新配置文件

    [root@bigdata111 module]# source /etc/profile
    

    查看hadoop是否安装成功

    [root@bigdata111 module]# hadoop
    Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
      CLASSNAME            run the class named CLASSNAME
     or
      where COMMAND is one of:
      fs                   run a generic filesystem user client
      version              print the version
      jar <jar>            run a jar file
                           note: please use "yarn jar" to launch
                                 YARN applications, not this command.
      checknative [-a|-h]  check native hadoop and compression libraries availability
      distcp <srcurl> <desturl> copy file or directories recursively
      archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
      classpath            prints the class path needed to get the
                           Hadoop jar and the required libraries
      credential           interact with credential providers
      daemonlog            get/set the log level for each daemon
      trace                view and modify Hadoop tracing settings
    
    Most commands print help when invoked w/o parameters.
    

    测试hadoop实例

    创建测试文件

    在module目录下新建testdoc文件,输入文本:

    [root@bigdata111 module]# cd /opt/module
    [root@bigdata111 module]# touch testdoc
    [root@bigdata111 module]# vi testdoc
    [root@bigdata111 module]# cat testdoc
    this is a test page!
    chinese is the best country
    this is a ceshi page!
    i love china
    listen to the music
    and son on
    

    切换jar包目录

    切换到hadoop的jar包执行目录:

    [root@bigdata111 module]# cd /opt/module/hadoop-2.8.4/share/hadoop/mapreduce/
    [root@bigdata111 mapreduce]# ls
    hadoop-mapreduce-client-app-2.8.4.jar     hadoop-mapreduce-client-core-2.8.4.jar  hadoop-mapreduce-client-hs-plugins-2.8.4.jar  hadoop-mapreduce-client-jobclient-2.8.4-tests.jar  hadoop-mapreduce-examples-2.8.4.jar  lib           sources
    hadoop-mapreduce-client-common-2.8.4.jar  hadoop-mapreduce-client-hs-2.8.4.jar    hadoop-mapreduce-client-jobclient-2.8.4.jar   hadoop-mapreduce-client-shuffle-2.8.4.jar          jdiff                                lib-examples
    

    执行wordcount程序

    [root@bigdata111 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.8.4.jar wordcount /opt/module/testdoc /opt/module/out
    [root@bigdata111 mapreduce]# ls /opt/module/out
    part-r-00000  _SUCCESS
    [root@bigdata111 mapreduce]# cat /opt/module/out/part-r-00000
    a	2
    and	1
    best	1
    ceshi	1
    china	1
    chinese	1
    country	1
    i	1
    is	3
    listen	1
    love	1
    music	1
    on	1
    page!	2
    son	1
    test	1
    the	2
    this	2
    to	1
    

    搭建Hadoop伪分布式

    伪分布式就是在单台机器上配置分布式操作。

    查看hadoop可执行文件

    [root@bigdata111 mapreduce]# cd /opt/module/hadoop-2.8.4/
    [root@bigdata111 hadoop-2.8.4]# ls
    bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share
    [root@bigdata111 hadoop-2.8.4]# cd bin
    [root@bigdata111 bin]# ls
    container-executor  hadoop  hadoop.cmd  hdfs  hdfs.cmd  mapred  mapred.cmd  rcc  test-container-executor  yarn  yarn.cmd
    [root@bigdata111 bin]# cd ..
    [root@bigdata111 hadoop-2.8.4]# cd sbin
    [root@bigdata111 sbin]# ls
    distribute-exclude.sh  hadoop-daemons.sh  hdfs-config.sh  kms.sh                   refresh-namenodes.sh  start-all.cmd  start-balancer.sh  start-dfs.sh         start-yarn.cmd  stop-all.cmd  stop-balancer.sh  stop-dfs.sh         stop-yarn.cmd  yarn-daemon.sh
    hadoop-daemon.sh       hdfs-config.cmd    httpfs.sh       mr-jobhistory-daemon.sh  slaves.sh             start-all.sh   start-dfs.cmd      start-secure-dns.sh  start-yarn.sh   stop-all.sh   stop-dfs.cmd      stop-secure-dns.sh  stop-yarn.sh   yarn-daemons.sh
    

    切换配置文件目录

    进入到hadoop设置/opt/module/hadoop-2.8.4/etc/hadoop/目录:

    [root@bigdata111 hadoop]# cd /opt/module/hadoop-2.8.4/etc/hadoop/
    [root@bigdata111 hadoop]# ls
    capacity-scheduler.xml  core-site.xml   hadoop-metrics2.properties  hdfs-site.xml            httpfs-signature.secret  kms-env.sh            log4j.properties  mapred-queues.xml.template  ssl-client.xml.example  yarn-env.sh
    configuration.xsl       hadoop-env.cmd  hadoop-metrics.properties   httpfs-env.sh            httpfs-site.xml          kms-log4j.properties  mapred-env.cmd    mapred-site.xml.template    ssl-server.xml.example  yarn-site.xml
    container-executor.cfg  hadoop-env.sh   hadoop-policy.xml           httpfs-log4j.properties  kms-acls.xml             kms-site.xml          mapred-env.sh     slaves                      yarn-env.cmd
    
    

    配置core-site.xml

    [root@bigdata111 hadoop]# vi core-site.xml
    
    <configuration>
    
    <!-- 指定HDFS中NameNode的地址 -->
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://bigdata111:9000</value>
    </property>
    
    <!-- 指定hadoop运行时产生文件的存储目录 -->
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/opt/module/hadoop-2.8.4/data/tmp</value>
    </property>
    
    </configuration>
    

    配置hdfs-site.xml

    [root@bigdata111 hadoop]# vi hdfs-site.xml
    
    <configuration>
    
    <!--数据冗余数-->
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    
    </configuration>
    

    配置yarn-site.xml

    [root@bigdata111 hadoop]# vi yarn-site.xml 
    
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    
    <!-- reducer获取数据的方式 -->
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    
    <!-- 指定YARN的ResourceManager的地址 -->
    <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>bigdata111</value>
    </property>
    
    <!-- 日志聚集功能使能 -->
    <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
    </property>
    
    <!-- 日志保留时间设置7天(秒) -->
    <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
    </property>
    
    
    </configuration>
    

    配置mapred-site.xml

    重命名mapred-site.xml.template为mapred-site.xml,配置内容

    [root@bigdata111 hadoop]# mv mapred-site.xml.template mapred-site.xml
    [root@bigdata111 hadoop]# ls
    capacity-scheduler.xml  core-site.xml   hadoop-metrics2.properties  hdfs-site.xml            httpfs-signature.secret  kms-env.sh            log4j.properties  mapred-queues.xml.template  ssl-client.xml.example  yarn-env.sh
    configuration.xsl       hadoop-env.cmd  hadoop-metrics.properties   httpfs-env.sh            httpfs-site.xml          kms-log4j.properties  mapred-env.cmd    mapred-site.xml             ssl-server.xml.example  yarn-site.xml
    container-executor.cfg  hadoop-env.sh   hadoop-policy.xml           httpfs-log4j.properties  kms-acls.xml             kms-site.xml          mapred-env.sh     slaves                      yarn-env.cmd
    [root@bigdata111 hadoop]# vi mapred-site.xml
    
    <configuration>
    
    <!-- 指定mr运行在yarn上-->
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    
    <!--历史服务器的地址-->
    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>bigdata111:10020</value>
    </property>
    
    <!--历史服务器页面的地址-->
    <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>bigdata111:19888</value>
    </property>
    
    
    </configuration>
    

    配置hadoop-env.sh

    修改java_home为绝对路径,保存退出:

    [root@bigdata111 hadoop]# vi hadoop-env.sh
    
    export JAVA_HOME=/opt/module/jdk1.8.0_144
    

    格式化namenode

    配置完毕,格式化namenode(仅第一次格式化文件)

    [root@bigdata111 hadoop]# hadoop namenode -format
    

    为什么要格式化?

    NameNode主要被用来管理整个分布式文件系统的命名空间(实际上就是目录和文件)的元数据信息,同时为了保证数据的可靠性,还加入了操作日志,所以,NameNode会持久化这些数据(保存到本地的文件系统中)。对于第一次使用HDFS,在启动NameNode时,需要先执行-format命令,然后才能正常启动NameNode节点的服务。

    格式化做了哪些事情?

    在NameNode节点上,有两个最重要的路径,分别被用来存储元数据信息和操作日志,而这两个路径来自于配置文件,它们对应的属性分别是dfs.name.dir和dfs.name.edits.dir,同时,它们默认的路径均是/tmp/hadoop/dfs/name。格式化时,NameNode会清空两个目录下的所有文件,之后,会在目录dfs.name.dir下创建文件

    hadoop.tmp.dir 这个配置,会让dfs.name.dir和dfs.name.edits.dir会让两个目录的文件生成在一个目录里

    开启hdfs和yarn服务

    当namenode和resourcemanager在一台机器时,使用如下命令:

    [root@bigdata111 hadoop]# start-all.sh
    

    当二者不为一台机器时,使用如下命令:

    [root@bigdata111 hadoop]# start-dfs.sh
    [root@bigdata111 hadoop]# start-yarn.sh
    

    访问hdfs 的web页面

    默认端口:50070

    http://192.168.1.111:50070
    

    访问yarn的web页面

    默认端口:8088

    http://192.168.1.111:8088
    

    搭建Hadoop完全分布式集群

    采用VMvare克隆模式,以111机器为模板,克隆另外两台机器。

    修改主机名和IP

    修改克隆的两台机器的hostname和IP地址,方便xshell连接:

    [root@bigdata112 ~]# vi /etc/hostname
    [root@bigdata112 ~]# vi /etc/sysconfig/network-scripts/ifcfg-eno16777736
    [root@bigdata112 ~]# service network restart
    [root@bigdata112 ~]# ip addr
    
    TYPE=Ethernet
    BOOTPROTO=static
    DEFROUTE=yes
    PEERDNS=yes
    PEERROUTES=yes
    IPV4_FAILURE_FATAL=no
    IPV6INIT=yes
    IPV6_AUTOCONF=yes
    IPV6_DEFROUTE=yes
    IPV6_PEERDNS=yes
    IPV6_PEERROUTES=yes
    IPV6_FAILURE_FATAL=no
    NAME=eno16777736
    UUID=24bbe130-f59a-4b25-9df6-cf5857c89699
    DEVICE=eno16777736
    ONBOOT=yes
    IPADDR=192.168.1.112
    GATEWAY=192.168.1.2
    DNS1=8.8.8.8
    

    删除data目录

    删除/opt/module/hadoop-2.8.4的data目录,目的配置分布式集群。

    [root@bigdata111 hadoop-2.8.4]# cd /opt/module/hadoop-2.8.4/
    [root@bigdata111 hadoop-2.8.4]# rm -rf data/
    

    配置hosts

    配置hosts的IP和主机名对应关系

    [root@bigdata111 hadoop-2.8.4]# vi /etc/hosts
    
    192.168.1.111 bigdata111
    192.168.1.112 bigdata112
    192.168.1.113 bigdata113
    

    SCP发送其他机器

    将第一台配置好的hosts文件发送到其他两台机器:

    [root@bigdata111 hadoop-2.8.4]# scp /etc/hosts root@bigdata112:/etc/
    [root@bigdata111 hadoop-2.8.4]# scp /etc/hosts root@bigdata113:/etc/
    

    配置SSH免密登录

    1. 利用Xshell的发送键输入到所有会话功能,在三台机器生成秘钥
    [root@bigdata111 hadoop-2.8.4]# ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/root/.ssh/id_rsa): 
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /root/.ssh/id_rsa.
    Your public key has been saved in /root/.ssh/id_rsa.pub.
    The key fingerprint is:
    cc:47:37:5a:93:0f:77:38:53:af:a3:57:47:55:27:59 root@bigdata111
    The key's randomart image is:
    +--[ RSA 2048]----+
    |              .oE|
    |             ..++|
    |          . B = +|
    |       o . + * * |
    |        S o   + o|
    |         .   . o.|
    |            . .  |
    |             .   |
    |                 |
    +-----------------+
    
    1. 利用Xshell的发送键输入到所有会话功能,将秘钥添加到集群中各个机器的秘钥库中
    [root@bigdata111 hadoop-2.8.4]# ssh-copy-id bigdata111
    [root@bigdata111 hadoop-2.8.4]# ssh-copy-id bigdata112
    [root@bigdata111 hadoop-2.8.4]# ssh-copy-id bigdata113
    
    1. 查看秘钥库是否存在
    [root@bigdata111 .ssh]# cd /root/.ssh
    [root@bigdata111 .ssh]# ls
    authorized_keys  id_rsa  id_rsa.pub  known_hosts
    [root@bigdata111 .ssh]# cat authorized_keys 
    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7cSXZDdNJ0Cg+1wyVoCn4pWEAxy/13/ekg//YVkGwEsR6HO4XaYxxstVBij5JoTEEjSDNmz2HifTZDB098py3x882ZLVHJllJWzXYX4gVof/tmdmk5AJbhIlX3SoauTrrrzFiMtuXKdu6slvzhs9IbDp68xCUNiVI06OnWFSuhQc8Td+tekwlFPfm+v3W/PqUUgQAd+OAqOUC2vEjjnACQNw/wgGvF/lqrXDv5ZIFmYCBlB7YxwP9RykOvAzEe7w2W7TOt0K8V8oKKTui4aZuahWDbsGwlD7TAQRkilXkG59XG48AWOQoU/XFxph+XECqJzjmdxYedzY8inYW/Lfx root@bigdata111
    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDYyMVfLaL9w9sGz5hQG96ksUN5ih2RHdwsiXBpL/ZRG7LasKS+OQcszmc61TJfV0Vjad7kuL9wlg2YqlVIQJvaIUQCw4+5BrO0vCy4JBrz/FiDjzxKx0Ba+ILziuMxl35RxDCVGph17i2jpYfy6jGLejYK9kpJH4ueIj8mm+4LTKabRZTcjdNNI0kYM+Tr08wEIuQ45adqVU9MpZc/j6i1FIr4R/RabyuO1FhEh0+Oc5Xbm3jSAYH0MgEvK1cuG9wmX7SaB/opO00Ts+nW/P4umeZQUy51IQSRdUF6BlMrshnCSlKHnuLv2eSCx9yv3QuQMWHnL/SOXUgTnIuzbrv9 root@bigdata112
    ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDoBOAT/n1QCnaVJtRS1Q9GeoP665gIayWxpSWbjEFus4DL4as5S9jAIhBQWrTnvZzm+Skb4dxGPgdPYLaMFX9tdDYPPsnnRR92sLpRw9gwvG5ROL5XPpV2X+Yxl6yACmlMT0JP1uk+Ekm623n6wtBSBP1BDtJ/fhXkRX6bo2kuXs4BvmP76cikdGBDygKNIEMPTcs6p2lfOnuVdQLSCGm+Q9NswKSBVElNyywNl5J9L/5kIzGXnoGtwhQtdrOjZ+c1tyiwhCz42I3c4z0Sb/zH3OFtHCvRG7cF72uDFxe1QwVJ4h1hJ1dmtwVCckNMbmmgK72PsN8Zg4Y8XtBXgX8n root@bigdata113
    
    
    
    1. 验证SSH免密码登录是否配置成功
    [root@bigdata111 .ssh]# ssh bigdata112
    Last login: Mon Aug  5 09:23:11 2019 from bigdata112
    [root@bigdata112 ~]# ssh bigdata111
    Last login: Mon Aug  5 09:09:23 2019 from 192.168.1.1
    

    部署jdk和hadoop

    1. 去除勾选“发送键输入到所有会话”,从bigdata111发送module文件夹到另外两台机器/opt/文件夹下:
    [root@bigdata111 module]# scp -r /opt/module/ root@bigdata112:/opt/
    [root@bigdata111 module]# scp -r /opt/module/ root@bigdata113:/opt/
    
    1. 将环境变量/etc/profile发送到另外两台机器:
    [root@bigdata111 module]# scp -r /etc/profile root@bigdata112:/etc/
    [root@bigdata111 module]# scp -r /etc/profile root@bigdata113:/etc/
    
    1. 切换到另外两台机器,刷新环境变量:
    [root@bigdata112 module]# source /etc/profile
    [root@bigdata112 module]# jps
    2775 Jps
    [root@bigdata113 module]# source /etc/profile
    [root@bigdata113 module]# jps
    2820 Jps
    

    配置集群xml

    勾选“发送键输入到所有会话”,配置hdfs-site,yarn-site,mapred-site的xml文件:

    1. hdfs-site.xml配置如下(SecondaryNameNode配置在113上):
    <configuration>
    
    <!--数据冗余数-->
    <property>
    <name>dfs.replication</name>
    <value>3</value>
    </property>
    <!--secondary的地址-->
    <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>bigdata113:50090</value>
    </property>
    <!--关闭权限-->
    <property>
    <name>dfs.permissions</name>
    <value>false</value>
    </property>
    
    </configuration>
    
    
    1. yarn-site.xml配置如下(yarn配置在112上):
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    <!-- reducer获取数据的方式 -->
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    
    <!-- 指定YARN的ResourceManager的地址 -->
    <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>bigdata112</value>
    </property>
    
    <!-- 日志聚集功能使能 -->
    <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
    </property>
    
    <!-- 日志保留时间设置7天(秒) -->
    <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
    </property>
    
    </configuration>
    
    1. mapred-site.xml配置如下:
    <configuration>
    
    <!-- 指定mr运行在yarn上-->
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    
    <!--历史服务器的地址-->
    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>bigdata112:10020</value>
    </property>
    
    <!--历史服务器页面的地址-->
    <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>bigdata112:19888</value>
    </property>
    
    </configuration>
    
    

    配置slaves的datanode

    [root@bigdata111 ~]# cd /opt/module/hadoop-2.8.4/etc/hadoop/
    [root@bigdata111 hadoop]# ls
    capacity-scheduler.xml  core-site.xml   hadoop-metrics2.properties  hdfs-site.xml            httpfs-signature.secret  kms-env.sh            log4j.properties  mapred-queues.xml.template  ssl-client.xml.example  yarn-env.sh
    configuration.xsl       hadoop-env.cmd  hadoop-metrics.properties   httpfs-env.sh            httpfs-site.xml          kms-log4j.properties  mapred-env.cmd    mapred-site.xml             ssl-server.xml.example  yarn-site.xml
    container-executor.cfg  hadoop-env.sh   hadoop-policy.xml           httpfs-log4j.properties  kms-acls.xml             kms-site.xml          mapred-env.sh     slaves                      yarn-env.cmd
    [root@bigdata111 hadoop]# vi slaves
    
    bigdata111
    bigdata112
    bigdata113
    

    格式化namenode

    利用xshell的“发送键输入到所有会话”功能,格式化namenode

    [root@bigdata111 hadoop]# hadoop namenode -format
    [root@bigdata112 hadoop]# hadoop namenode -format
    [root@bigdata113 hadoop]# hadoop namenode -format
    

    启动111的hdfs

    [root@bigdata111 hadoop]# start-dfs.sh
    

    启动112的yarn

    [root@bigdata112 hadoop]# start-yarn.sh
    

    输出三台机器的jps进程

    [root@bigdata111 hadoop]# jps
    2512 DataNode
    2758 NodeManager
    2377 NameNode
    2894 Jps
    
    [root@bigdata112 ~]# jps
    2528 NodeManager
    2850 Jps
    2294 DataNode
    2413 ResourceManager
    
    [root@bigdata113 ~]# jps
    2465 NodeManager
    2598 Jps
    2296 DataNode
    2398 SecondaryNameNode
    
  • 相关阅读:
    Git 一些关于 Git、Github 的学习资源
    迁移应用数据库到MySQL Database on Azure
    释放虚拟磁盘未使用空间来减少计费容量
    链路层的双链路--大型服务器的优化体系
    解读 Windows Azure 存储服务的账单 – 带宽、事务数量,以及容量
    产品技术恩仇记:这个需求真的很简单吗?
    语音识别真的比肩人类了?听听阿里iDST初敏怎么说
    《伟大的计算原理》一云计算
    如何在 CentOS 7 用 cPanel 配置 Nginx 反向代理
    安装PHP的memcache扩展
  • 原文地址:https://www.cnblogs.com/ShadowFiend/p/11332382.html
Copyright © 2011-2022 走看看