zoukankan      html  css  js  c++  java
  • hadoop安装文档

    一、准备

      该准备工作在三台机器上都需要进行,首先使用 vmvare 创建 1 个虚拟机,这台虚拟机是 master,一会需要把 master 克隆出两台 slave

        

        

        

         

        

        

        

        

        

        

        点确定然后开启此虚拟机

        

        

        

        

        

        

        

        然后添加/boot 分区,大小为 1G,文件系统选 ext4 

        

        然后添加 swap 分区,注意,swap 分区为内存的 2 倍,文件系统则选择为 swap 

        

        然后点完成 

        

        

        

        

        

        

        然后等待安装完成,然后点重启

        

        

        到此系统安装就完成了,然后设置网络 

        

        

        点完确定后,然后再进去查看下网关

        

        

        点取消,记住这个网关 

        1、我先换下主机名 

    [root@localhost ~]# hostnamectl set-hostname wangmaster 
    [root@localhost ~]# hostname wangmaster 
    [root@localhost ~]# exit

        2、重新登录,然后设置网卡 

    [root@wangmaster ~]# vi /etc/sysconfig/network-scripts/ifcfg-eno16777736  
    TYPE=Ethernet 
    BOOTPROTO=static 
    DEFROUTE=yes 
    PEERDNS=yes 
    PEERROUTES=yes 
    IPV4_FAILURE_FATAL=no 
    IPV6INIT=yes 
    IPV6_AUTOCONF=yes 
    IPV6_DEFROUTE=yes 
    IPV6_PEERDNS=yes 
    IPV6_PEERROUTES=yes 
    IPV6_FAILURE_FATAL=no 
    NAME=eno16777736 
    DEVICE=eno16777736 
    ONBOOT=yes  //启用网卡 
    IPADDR=192.168.225.100  //设置 IP 
    NETMASK=255.255.255.0  //设置掩码 
    GATEWAY=192.168.225.2  //设置网关,就是记住的网关 
    DNS1=114.114.114.114 //设置 DNS 
    DNS2=114.114.114.115   //设置备用 DNS 
    [root@wangmaster ~]# systemctl restart network.service //重启网络

        3、设置网络 YUM 源 

        

        选择远程登录工具登录操作 (可以用Xshell)

        

        点击文件传输按钮,进入Xftp软件,进行传输文件。

        

        将上面保存的文件传入/etc/yum.repos.d文件夹下。

            

    [root@wangmaster ~]# cd /etc/yum.repos.d/ 
    [root@wangmaster yum.repos.d]# ls
    CentOS7-Base-163.repo  CentOS-Debuginfo.repo  CentOS-Sources.repo 
    CentOS-Base.repo       CentOS-fasttrack.repo  CentOS-Vault.repo 
    CentOS-CR.repo         CentOS-Media.repo 
    [root@wangmaster yum.repos.d]# mv CentOS-Base.repo CentOS-Base.repo.bak //使原来的 yum 失效 
    [root@wangmaster yum.repos.d]# yum clean all  //清除 yum 缓存 
    已加载插件:fastestmirror 
    正在清理软件源: base extras updates 
    Cleaning up everything 
    [root@wangmaster yum.repos.d]# yum repolist  //更新 yum 库 
    已加载插件:fastestmirror 
    base                                                     | 3.6 kB     00:00      
    extras                                                   | 3.4 kB     00:00      
    updates                                                  | 3.4 kB     00:00      
    (1/4): base/7/x86_64/group_gz                              | 155 kB   00:00      
    (2/4): extras/7/x86_64/primary_db                          | 139 kB   00:00      
    (3/4): base/7/x86_64/primary_db                            | 5.6 MB   00:09      
    (4/4): updates/7/x86_64/primary_db                         | 3.9 MB   00:11      
    Determining fastest mirrors 
    源标识                         源名称                                      状态 
    base/7/x86_64                  CentOS-7 - Base - 163.com                 9,363 
    extras/7/x86_64                CentOS-7 - Extras - 163.com               311 
    updates/7/x86_64               CentOS-7 - Updates - 163.com              1,126 
    repolist: 10,800 
    [root@wangmaster yum.repos.d]# 
    [root@wangmaster yum.repos.d]# yum install -y vim //安装 VIM 工具 

        4、关闭 selinux

    [root@wangmaster yum.repos.d]# vim /etc/selinux/config  
    # This file controls the state of SELinux on the system. 
    # SELINUX= can take one of these three values: 
    #     enforcing - SELinux security policy is enforced. 
    #     permissive - SELinux prints warnings instead of enforcing. 
    #     disabled - No SELinux policy is loaded. 
    SELINUX=disabled 
    # SELINUXTYPE= can take one of three two values: 
    #     targeted - Targeted processes are protected, 
    #     minimum - Modification of targeted policy. Only selected processes are 
    pro 
    tected.  
    #     mls - Multi Level Security protection. 
    SELINUXTYPE=targeted

        5、停止防火墙功能

    [root@wangmaster ~]# systemctl stop firewalld.service 停止防火墙 
    [root@wangmaster ~]# systemctl disable firewalld.service  停止防火墙开机自启动 
    [root@wangmaster ~]# systemctl status firewalld 查看防火墙状态

        6、规划为 3 个虚拟机,分别为 master,slave1,slave2,在/etx/hosts 文件中修改 

    [root@wangmaster ~]# vim /etc/hosts 
    127.0.0.1   localhost localhost.localdomain localhost4 
    localhost4.localdomain4 
    ::1         localhost localhost.localdomain localhost6 
    localhost6.localdomain6 
     
    192.168.225.100 wangmaster 
    192.168.225.101 wangslave1 
    192.168.225.102 wangslave2 

        然后重启虚拟机(一定要重启,因为 selinux 设置重启才生效)

        (注意:在所有三台虚拟机中都进行这样的修改,ip 地址根据实际情况进行修改)

        7、使用如下命令在线安装 

    [root@wangmaster ~]$ yum install –y wget 
    [root@wangmaster ~]$ yum install –y net-tools 

        8、创建目录

    [root@wangmaster ~]# mkdir /opt/bigdata

        9、将 jdk 拷贝到 192.168.225.100 的 opt 的 bigdata 目录 

    [root@wangmaster bigdata]# ls 
    hadoop-2.7.3.tar.gz  jdk1.8.tar.gz 
    这里我提前在 bigdata 中传入了所需要的软件

        10、在 master 中创建用户 hadoop 

    [root@wangmaster bigdata]# useradd hadoop 
    [root@wangmaster bigdata]# id hadoop 
    uid=1000(hadoop) gid=1000(hadoop) 组=1000(hadoop) 
    [root@wangmaster ~]# passwd hadoop 
    更改用户 hadoop 的密码 。我设置的密码是 123456,需要打两遍 
    新的 密码: 
    无效的密码: 密码少于 8 个字符 
    重新输入新的 密码: 
    passwd:所有的身份验证令牌已经成功更新。 
    [root@wangmaster ~]# 

        11、使用户成为 sudoers,以 root 用户修改文件/etc/sudoers,修改方式如下: 

    [root@wangmaster bigdata]# vim /etc/sudoers 
    ## Allow root to run any commands anywhere 
    root    ALL=(ALL)       ALL 
    hadoop  ALL=(ALL)       ALL 

        12、修改/opt/bigdata 文件夹的权限

    [root@wangmaster ~]# chmod -R 777 /opt/bigdata 
    [root@wangmaster ~]# chown -R hadoop.hadoop /opt/bigdata 
    [root@wangmaster ~]# ll /opt 
    总用量 4 
    drwxrwxrwx. 2 hadoop hadoop 4096 49 05:28 bigdata 

        13、安装 JDK 运行环境 

    [hadoop@wangmaster bigdata]# tar -zxvf jdk1.8.tar.gz  
    [hadoop@wangmaster bigdata]# mv /opt/bigdata/jdk1.8 /opt/bigdata/ 
    hadoop@wangmaster bigdata]# ls 
    hadoop-2.7.3.tar.gz  jdk1.8  jdk1.8.tar.gz  opt 

        14、在修改/etc/profile 文件,配置 java 环境: 

    [root@wangmaster ~]$ vim /etc/profile 
    #java configuration 
    JAVA_HOME=/opt/bigdata/jdk1.8 
    JAVA_BIN=/opt/bigdata/jdk1.8/bin 
    PATH=$PATH:$JAVA_HOME/bin 
    CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar 
    export JAVA_HOME 
    export JAVA_BIN  
    export PATH  
    export CLASSPATH 
    [hadoop@wangmaster ~]# source /etc/profile 
    [hadoop@wangmaster bigdata]$ java -version 
    java version "1.8.0_111" 
    Java(TM) SE Runtime Environment (build 1.8.0_111-b14) 
    Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode) 
    [hadoop@wangmaster bigdata]$ javac -version 
    javac 1.8.0_111 

        (注:上图中,JAVA_HOME 为你安装的 JDK 路径) 

        15、安装 hadoop

    [hadoop@wangmaster bigdata]$ tar -zxvf hadoop-2.7.3.tar.gz  
    [hadoop@wangmaster bigdata]$ ll 
    总用量 386500 
    drwxr-xr-x. 9 hadoop hadoop      4096 818 2016 hadoop-2.7.3 
    -rwxrwxrwx. 1 hadoop hadoop 214092195 313 19:16 hadoop-2.7.3.tar.gz 
    drwxrwxrwx. 8 hadoop hadoop      4096 313 00:14 jdk1.8 
    -rwxrwxrwx. 1 hadoop hadoop 181668321 322 23:31 jdk1.8.tar.gz

        16、在 hadoop 目录下建立 tmp 目录,并将权限设定为 777 

    [hadoop@wangmaster bigdata]$ cd hadoop-2.7.3  
    [hadoop@wangmaster hadoop-2.7.3]$ mkdir tmp 
    [hadoop@wangmaster hadoop-2.7.3]$ chmod 777 tmp 
    [hadoop@wangmaster hadoop-2.7.3]$ mkdir dfs 
    [hadoop@wangmaster hadoop-2.7.3]$ mkdir dfs/name 
    [hadoop@wangmaster hadoop-2.7.3]$ mkdir dfs/data

     二、 Hadoop 安装与配置 

        Hadoop 是大数据生态圈的基石,下面首先以 Hadoop 的安装与配置开始。 

        1、进入安装目录 

    [hadoop@wangmaster ~]$ cd /opt/bigdata/hadoop-2.7.3

        2、环境配置 

    [hadoop@wangmaster hadoop-2.7.3]$ cd etc/hadoop 
    [hadoop@wangmaster hadoop]$ vim yarn-env.sh 
    # some Java parameters 
    export JAVA_HOME=/opt/bigdata/jdk1.8 

        3、core 配置

    [hadoop@wangmaster hadoop]$ vim core-site.xml  
    <property> 
            <name>hadoop.tmp.dir</name> 
    <value>/opt/bigdata/hadoop-2.7.3/tmp</value> 
    </property> 
    <property> 
     <name>fs.default.name</name> 
     <value>hdfs://wangmaster:9000</value> 
    </property> 
    <property> 
            <name>hadoop.proxyuser.hadoop.hosts</name> 
            <value>*</value> 
    </property> 
    <property> 
           <name>hadoop.proxyuser.hadoop.groups</name> 
            <value>*</value> 
    </property> 

        4、hdfs 配置 

    [hadoop@wangmaster hadoop]$ vim hdfs-site.xml 
    <configuration> 
    <property> 
        <name>dfs.replication</name>   
        <value>3</value> 
    </property> 
    <property> 
      <name>dfs.namenode.name.dir</name> <value>/opt/bigdata/hadoop-2.7.3/dfs/name</value> </property> <property>   <name>dfs.datanode.data.dir</name> <value>/opt/bigdata/hadoop-2.7.3/dfs/data</value>
    </property> <property>   <name>dfs.web.ugi</name> <value>hdfs,hadoop</value> </property> <property>   <name>dfs.permissions</name>   <value>false</value> </property> </configuration>

        5、yarn 配置 

    [hadoop@wangmaster hadoop]$ vim yarn-site.xml 
    <configuration> 
     
    <!-- Site specific YARN configuration properties --> 
    <property> 
        <name>yarn.resourcemanager.hostname</name> 
        <value>wangmaster</value> 
    </property> 
     
    <property> 
        <name>yarn.resourcemanager.webapp.address</name> 
        <value>wangmaster:8088</value> 
    </property> 
     
    <property> 
        <name>yarn.resourcemanager.scheduler.address</name> 
        <value>wangmaster:8081</value> 
    </property>
    <property> 
        <name>yarn.resourcemanager.resource-tracker.address</name> 
        <value>wangmaster:8082</value> 
    </property> 
     
    <property> 
        <name>yarn.nodemanager.aux-services</name> 
        <value>mapreduce_shuffle</value> 
    </property> 
    <property> 
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 
        <value>org.apache.hadoop.mapred.ShuffleHandler</value> 
    </property> 
    <property> 
        <name>yarn.web-proxy.address</name> 
        <value>wangmaster:54315</value> 
    </property> 
     
    </configuration> 

        6、mapreduce 配置

    [hadoop@wangmaster hadoop]$ vim mapred-site.xml 
    <configuration> 
    <property> 
        <name>mapreduce.framework.name</name> 
        <value>yarn</value> 
    </property> 
    <property> 
        <name>mapred.job.tracker</name>   
        <value>wangmaster:9001</value> 
    </property> 
    <property>    
          <name>mapreduce.jobhistory.address</name>    
          <value>wangmaster:10020</value>    
    </property> 
    </configuration> 

        7、slaves 配置(master、slave1 和 slave2 均作为 datanode) 

    [hadoop@wangmaster hadoop]$ vim slaves  
    wangmaster 
    wangslave1 
    wangslave2
    master02 
    slave01 
    slave02 
    slave03

        8、配置系统环境 

    [root@wangmaster bin]# vim /etc/profile  
    末尾添加这两句 
    export HADOOP_HOME=/opt/bigdata/hadoop-2.7.3 
    export PATH=$HADOOP_HOME/bin:$PATH 

        使配置生效:

    [hadoop@wangmaster hadoop]$ source /etc/profile 

        9、将虚拟机复制成 wangslave1 和 wangslave2 上。

        

        把 CentOS 64 位 Minimal 重名为 wangmaster,克隆 wangmaster,建立 wangslave1 和 wangslave2 节点,克隆教程

        首先给 wangmaster 关机

        

        

        

        复制过后的虚拟机不能直接使用,需要进行如下操作:

        

        

        

     [root@wangmaster ~]# vim /etc/sysconfig/network-scripts/ifcfg-eno16777736 
    HWADDR=00:50:56:36:BF:60  //修改这个里 mac 地址,改成刚才生成的 
    TYPE="Ethernet" 
    BOOTPROTO="static" 
    DEFROUTE="yes" 
    PEERDNS="yes" 
    PEERROUTES="yes" 
    IPV4_FAILURE_FATAL="no" 
    IPV6INIT="yes" 
    IPV6_AUTOCONF="yes" 
    IPV6_DEFROUTE="yes" 
    IPV6_PEERDNS="yes" 
    IPV6_PEERROUTES="yes" 
    IPV6_FAILURE_FATAL="no" 
    NAME="eno16777736" 
    DEVICE="eno16777736" 
    ONBOOT="yes" 
    IPADDR=192.168.225.101 //这里改成 wangslave1 的 ip 192.168.225.101 
    NETMASK=255.255.255.0 
    GATEWAY=192.168.225.2 
    DNS1=114.114.114.114 
    DNS2=114.114.114.115 
    [root@wangmaster ~]# systemctl restart network.service //重启网络

        然后修改主机名 

    [root@wangmaster ~]# hostnamectl set-hostname wangslave1 
    [root@wangmaster ~]# hostname wangslave1 
    [root@wangmaster ~]# exit 
    重新登录 
    测试 ping 自己 
    [root@wangslave1 ~]# ping wangslave1 
    PING wangslave1 (192.168.225.101) 56(84) bytes of data. 
    64 bytes from wangslave1 (192.168.225.101): icmp_seq=1 ttl=64 time=0.012ms

        根据上面操作,克隆出 wangslave2

        10、需要在 hadoop 用户下进行 master 和 slave 之间的免密 

        Master 给自己和 slave1,slave2 发证书 

    [hadoop@wangmaster ~]$ ssh-keygen 
    [hadoop@wangmaster ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@wangslave1 
    [hadoop@wangmaster ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@wangslave2 
    [hadoop@wangmaster ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@wangmaster 
    这步完成后,正常情况下就可以无密码登录本机了,即 ssh localhost,无需输入密码。

        然后 slave1 给 master 发证书 

    [hadoop@ wangslave1 ~]$ ssh-keygen 
    [hadoop@ wangslave1 ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@wangmaste

        然后 slave2 给 master 发证书 (略)

        11、首次启动之前格式化 hdfs: 

    [hadoop@wangmaster ~]$ hdfs namenode -format

        启动各组件 

    [hadoop@wangmaster ~]$ cd /opt/bigdata/hadoop-2.7.3/sbin

        全部启动: 

    [hadoop@wangmaster sbin]$ ./start-all.sh  
    验证 
    [hadoop@wangmaster sbin]$ jps 
    1666 DataNode 
    2099 NodeManager 
    2377 Jps 
    1853 SecondaryNameNode 
    1998 ResourceManager 
    1567 NameNode 
    [hadoop@wangslave1 ~]$ jps 
    1349 NodeManager 
    1452 Jps 
    1245 DataNode 
    [hadoop@wangslave2 ~]$ jps 
    1907 Jps 
    1703 DataNode 
    1807 NodeManager 

        在浏览器里输入 http://192.168.225.100:50070 ,判断是否启动成功

        12、全部停止

    [hadoop@wangmaster sbin]$ ./stop-all.sh

    三、实验

        1、使用

    [hadoop@wangmaster ~]$ hadoop fs -mkdir /wang 
    [hadoop@wangmaster ~]$ cd /opt/bigdata/hadoop-2.7.3 
    [hadoop@wangmaster hadoop-2.7.3]$ hadoop fs -put LICENSE.txt  /wang 
    [hadoop@wangmaster ~]$ hadoop fs -ls /wang

        2、实验,进行 wordcount 程序(可选)。一个统计文本单词个数的程序,它会统计放入文件夹内的文本的总共单词的出现个数

    [hadoop@wangmaster hadoop-2.7.3]$ hadoop fs -ls /wang 查看                      
    Found 1 items 
    -rw-r--r--   3 hadoop supergroup      84854 2017-04-09 07:34 
    /wang/LICENSE.txt  
    [hadoop@wangmaster hadoop-2.7.3]$ cd 
    /opt/bigdata/hadoop-2.7.3/share/hadoop/mapreduce 
    [hadoop@wangmaster mapreduce]$ hadoop jar 
    hadoop-mapreduce-examples-2.7.3.jar wordcount /wang /output  执行程序 
    [hadoop@wangmaster mapreduce]$ hadoop fs -ls /output  查看程序运行的输出文件 
    Found 2 items 
    -rw-r--r--   3 hadoop supergroup          0 2017-04-09 07:38 
    /output/_SUCCESS 
    -rw-r--r--   3 hadoop supergroup      22002 2017-04-09 07:38 
    /output/part-r-00000 [hadoop@wangmaster mapreduce]$ hadoop fs -cat 
    /output/part-r-00000  查看结果 
  • 相关阅读:
    SpringBoot实现原理
    常见Http状态码大全
    forward(转发)和redirect(重定向)有什么区别
    1094. Car Pooling (M)
    0980. Unique Paths III (H)
    1291. Sequential Digits (M)
    0121. Best Time to Buy and Sell Stock (E)
    1041. Robot Bounded In Circle (M)
    0421. Maximum XOR of Two Numbers in an Array (M)
    0216. Combination Sum III (M)
  • 原文地址:https://www.cnblogs.com/g-smile/p/9106261.html
Copyright © 2011-2022 走看看