zoukankan      html  css  js  c++  java
  • 完全分布模式安装Hadoop

      典型的在一个集群中:NameNode部署在一台机器上,JobTracker部署在另一台机器上,作为主节点;其他的机器部署DataNode和TaskTracker,作为从节点。

      我的机子内存太小,只虚拟了三个Linux系统,NameNode和JobTracker部署在同一个机器上,作为主节点,另外两个部署DataNode和TaskTracker,作为从节点。

      主机名 IP
    主节点 h1 192.168.0.129
    从节点 h2 192.168.0.130
    从节点 h3 192.169.0.131

      

      1、各个节点配置

        1)确保各个机器的主机名和IP地址之间能够正常解析,修改/etc/hosts文件。

          如果该机器作为主节点,则需要在文件中添加集群中所有机器的IP及其对应的主机名;

          如果该机器作为从节点,则只需要在文件中添加本机的IP地址及其对应的主机名和主服务的IP的地址及其对应的主机名

          所有节点的/etc/hosts文件配置如下:

    [root@h1 ~]# vi /etc/hosts
    192.168.0.128 centoshadoop
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.0.129 h1
    192.168.0.130 h2
    192.168.0.131 h3

          关闭所有节点服务器的防火墙:

    [root@h1 ~]# service iptables stop

          关闭所有节点的selinux

    [root@h1 ~]# cat /etc/selinux/config
    
    # This file controls the state of SELinux on the system.
    # SELINUX= can take one of these three values:
    #     enforcing - SELinux security policy is enforced.
    #     permissive - SELinux prints warnings instead of enforcing.
    #     disabled - No SELinux policy is loaded.
    SELINUX=disabled
    # SELINUXTYPE= can take one of these two values:
    #     targeted - Targeted processes are protected,
    #     mls - Multi Level Security protection.
    SELINUXTYPE=targeted 

        2)在所有的机器上建立相同的用户 coder

    [root@h1 ~]# useradd coder
    [root@h1 ~]# passwd coder
    Changing password for user coder.
    New password: 
    BAD PASSWORD: it is based on a dictionary word
    Retype new password: 
    passwd: all authentication tokens updated successfully.
    [root@h1 ~]# 

        3)在所有节点的上以coder账号登陆,并进入到coder的主目录

    [coder@h1 ~]$ pwd
    /home/coder
    [coder@h1 ~]$ 

        4)SSH配置

          4.1)在所有节点上生成密钥对。

    [coder@h1 ~]$ ssh-keygen -t rsa
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/coder/.ssh/id_rsa): 
    Created directory '/home/coder/.ssh'.
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /home/coder/.ssh/id_rsa.
    Your public key has been saved in /home/coder/.ssh/id_rsa.pub.
    The key fingerprint is:
    29:1c:df:59:0d:5f:ee:28:07:c0:57:21:15:af:a3:88 coder@h1
    The key's randomart image is:
    +--[ RSA 2048]----+
    |         .. oo=o.|
    |          ...= + |
    |      .    .o o o|
    |     . o o o . + |
    |      o S o . = .|
    |       . . . + . |
    |        E . .    |
    |                 |
    |                 |
    +-----------------+
    [coder@h1 ~]$ 

          4.2)然后进入.ssh目录,把公钥复制到authorized_keys

    [coder@h1 ~]$ cd .ssh
    [coder@h1 .ssh]$ ls
    id_rsa  id_rsa.pub
    [coder@h1 .ssh]$ cp id_rsa.pub authorized_keys

          4.3)分发ssh公钥到各个节点,把authorized_keys的内容互相拷贝加入到各个节点的authorized_keys中,这样就可以免密码彼此ssh连入。

            可以使用scp命令把h2和h3节点的authorized_keys传送到h1主节点汇总成一个authorized_keys文件后再传送到各个节点

    [root@h2 .ssh]# scp authorized_keys h1:/softs

            汇总成一个文件可以使用cat authorized_keys >> /authorized_keys ,汇总完成后再把authorized_keys传回到各个节点相应的位置。

              最后合成的文件如下:

    [root@h1 .ssh]# cat authorized_keys
    ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAwVeTrgPPwMlI5l8cBVafmx3SSnDF/ad62xdJgnRFtzcBbwGpDcciAcPTLrjwbrodhDS/jWk1CIwQegPziHK94+Z9D9hGmyzJg3qRc9iH9tJF8BxZnsiM5zaXvU921mHdbQO/eeXGROvlX1VmkeoAZFXamzfPSXPL/ooyWxBvNiG8j8G4mxd2Cm/UpaaEI/C+gBB5hgerKJCCpyHudNuiqwz7SDZxIOOCU1hEG4xnMZJtbZg39QMPuLOYcodSMI3cGHb+zdwct62IxMMV/ZupQW2h5rXN0SmVTNDB5dsd5NIDCdsGEJU59ZSid2d71yek8iUk9t497cwZvKwrd7lVTw== coder@h1
    ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA8mrNyz17fHLzqyPWcmDhNwCu3iQPOS9gH4yK/10kNNu0Lly6G+OxzfB93+mfOQK2ejGjmYwNYKeSkDqyvHBw/A7Gyb+r2WZBlq/hNwrQsulUo57EUPLvF4g3cTlPAznhBGu4fFgSE7VXR1YZ6R0qUBwLRqvnZODhHzOklIH4Jasyc3oE1bHkEeixwo+V9MuwlnTLmy2R9HgixFCCzNQqiaIRgdi+/10FLQH18QGYTP2CQMpvtWoFXmOLL3VbHQXMlosfVYSXg3/wJA1X6KqYXrWkX5FAPMpeVyFl1OpHC+oH1SNf7FcVsAJ2E8QjQZ3UQxjN+wOzwe8AauLkyNhnbw== coder@h2
    ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAtQkOcpp9/s3v+jyX3T8jO7XiTqW0KxoUGl9ZdIcToish57tzJq1ajkKDMFDVBXXb8h3m+T9dPc6lhYtb1r6WBF6HB0fQ8Cwt8Sg6WxkhJDGhNzZTYL6U1rLWDLXQ6Y0NVub5mktu1ToDzHJw8GHri125b0RuRjwx12eo1kt0E3hP6DCEFtQfEyO/24dFOlbVTqF+/LT5HIA7lJFwlWZcRx0WrpB/w3lzQ3qKShAqo5MiCMJ7F5oEzgIeNcTQIqn4TJxci3NVG3VLga/MR2K9O2OZQjKhBUxMKPaZUlQefkbrxPBcKSfS1khqdAuXyTYfeSD0QPzrtSBxo9bLB7+urQ== coder@h3
    [root@h1 .ssh]# 

     

      2、安装、配置Hadoop文件

        1)把hadoop解压到 /home/coder/

        2)主节点上的配置

          2.1)hadoop-env.sh,找到 export JAVA_HOME,修改为JDK的安装路径

     export JAVA_HOME=/usr/java/jdk1.6.0_38

          2.1)core-site.xml,一定要配上namenode所在的主节点ip或者主机名

    [coder@h1 conf]$ vi core-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
            <property>
                    <name>fs.default.name</name>
                    <value>hdfs://192.168.0.129:9000</value>
            </property>
    </configuration>

          2.3)hdfs-site.xml,有两个从节点,数据复制的份数可以改成2

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
            <property>
                    <name>dfs.replication</name>
                    <value>2</value>
            </property>
    </configuration>

          2.4)mapred-site.xml,改成JobTracker所在主节点的ip或者主机名

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
            <property>
                    <name>mapred.job.tracker</name>
                    <value>192.168.0.129:9001</value>
            </property>
    </configuration>

          2.5)masters,修改为主节点的主机名,每个主机名一行。

    [coder@h1 conf]$ cat masters
    h1
    [coder@h1 conf]$

          2.6)slaves,加入所有的从节点主机名

    [coder@h1 conf]$ cat slaves
    h2
    h3
    [coder@h1 conf]$

        3)从节点上的配置

          把主节点上的已配置好的hadoop整个安装目录打包后复制到各个从节点中就行了,保持路径一致,用scp命令传输

      3、hadoop运行

        1)格式化文件系统,在主节点执行,进入hadoop安装根目录

    [coder@h1 hadoop-0.20.2]# bin/hadoop namenode -format
    [coder@h1 hadoop-0.20.2]$ bin/hadoop namenode -format
    13/03/27 23:03:59 INFO namenode.NameNode: STARTUP_MSG: 
    /******************************************************/************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = h1/192.168.0.129
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 0.20.2
    STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
    ************************************************************/
    13/03/27 23:04:00 INFO namenode.FSNamesystem: fsOwner=coder,coder
    13/03/27 23:04:00 INFO namenode.FSNamesystem: supergroup=supergroup
    13/03/27 23:04:00 INFO namenode.FSNamesystem: isPermissionEnabled=true
    13/03/27 23:04:00 INFO common.Storage: Image file of size 95 saved in 0 seconds.
    13/03/27 23:04:00 INFO common.Storage: Storage directory /tmp/hadoop-coder/dfs/name has been successfully formatted.
    13/03/27 23:04:00 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /******************************************************/************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at h1/192.168.0.129
    *********************************

         2)启动hadoop守护进程,在主节点上执行

    [coder@h1 hadoop-0.20.2]# bin/start-all.sh

        3)使用jps命令检测启动情况

          主节点上:

    [coder@h1 hadoop-0.20.2]$ jps
    2610 NameNode
    2843 Jps
    2736 SecondaryNameNode
    2775 JobTracker
    [coder@h1 hadoop-0.20.2]$

         从节点上:

    [coder@h2 conf]$ jps
    2748 DataNode
    2854 Jps
    2792 TaskTracker
    [coder@h2 conf]$

        4)hadoop启动成功。

    一些注意的地方:

    在使用scp命令传输的时候,可能是需要切换到root账号进行操作;

    在用root账号操作的文件,比如说传输authorized_keys文件,合并传输好之后最好把文件的所有者改回coder

      4、测试

        4.1)先建立一个文件夹,并在文件夹中建立两个文件。

    [coder@h1 ~]$ mkdir input
    [coder@h1 ~]$ cd input
    [coder@h1 input]$ echo "hey man" > test1.txt
    [coder@h1 input]$ echo "hey hadoop" > test2.txt
    [coder@h1 input]$ ls
    test1.txt  test2.txt
    [coder@h1 input]$ cat test1.txt
    hey man
    [coder@h1 input]$ cat test2.txt
    hey hadoop
    [coder@h1 input]$ 

        4.2)把刚才建立的文件夹放到hadoop文件系统中

    [coder@h1 hadoop-0.20.2]$ bin/hadoop dfs -put ../input in

        查看一下是否存放成功,检查hadoop文件系统中的in目录

    [coder@h1 hadoop-0.20.2]$ bin/hadoop dfs -ls ./in/*
    -rw-r--r--   2 coder supergroup          8 2013-03-28 21:28 /user/coder/in/test1.txt
    -rw-r--r--   2 coder supergroup         11 2013-03-28 21:28 /user/coder/in/test2.txt
    [coder@h1 hadoop-0.20.2]$ 

        4.3)运行hadoop自带的单词个数统计测试程序,把结果输出到out目录

    [coder@h1 hadoop-0.20.2]$ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount in out
    13/03/28 21:34:47 INFO input.FileInputFormat: Total input paths to process : 2
    13/03/28 21:34:48 INFO mapred.JobClient: Running job: job_201303282119_0001
    13/03/28 21:34:49 INFO mapred.JobClient:  map 0% reduce 0%
    13/03/28 21:35:02 INFO mapred.JobClient:  map 50% reduce 0%
    13/03/28 21:35:08 INFO mapred.JobClient:  map 100% reduce 0%
    13/03/28 21:35:11 INFO mapred.JobClient:  map 100% reduce 16%
    13/03/28 21:35:20 INFO mapred.JobClient:  map 100% reduce 100%
    13/03/28 21:35:22 INFO mapred.JobClient: Job complete: job_201303282119_0001
    13/03/28 21:35:22 INFO mapred.JobClient: Counters: 17
    13/03/28 21:35:22 INFO mapred.JobClient:   Job Counters 
    13/03/28 21:35:22 INFO mapred.JobClient:     Launched reduce tasks=1
    13/03/28 21:35:22 INFO mapred.JobClient:     Launched map tasks=2
    13/03/28 21:35:22 INFO mapred.JobClient:     Data-local map tasks=2
    13/03/28 21:35:22 INFO mapred.JobClient:   FileSystemCounters
    13/03/28 21:35:22 INFO mapred.JobClient:     FILE_BYTES_READ=37
    13/03/28 21:35:22 INFO mapred.JobClient:     HDFS_BYTES_READ=19
    13/03/28 21:35:22 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=144
    13/03/28 21:35:22 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=23
    13/03/28 21:35:22 INFO mapred.JobClient:   Map-Reduce Framework
    13/03/28 21:35:22 INFO mapred.JobClient:     Reduce input groups=2
    13/03/28 21:35:22 INFO mapred.JobClient:     Combine output records=2
    13/03/28 21:35:22 INFO mapred.JobClient:     Map input records=2
    13/03/28 21:35:22 INFO mapred.JobClient:     Reduce shuffle bytes=43
    13/03/28 21:35:22 INFO mapred.JobClient:     Reduce output records=2
    13/03/28 21:35:22 INFO mapred.JobClient:     Spilled Records=4
    13/03/28 21:35:22 INFO mapred.JobClient:     Map output bytes=27
    13/03/28 21:35:22 INFO mapred.JobClient:     Combine input records=2
    13/03/28 21:35:22 INFO mapred.JobClient:     Map output records=2
    13/03/28 21:35:22 INFO mapred.JobClient:     Reduce input records=2
    [coder@h1 hadoop-0.20.2]$ 

        4.4)查看hadoop文件系统中out目录的内容

    [coder@h1 hadoop-0.20.2]$ bin/hadoop dfs -ls
    Found 2 items
    drwxr-xr-x   - coder supergroup          0 2013-03-28 21:28 /user/coder/in
    drwxr-xr-x   - coder supergroup          0 2013-03-28 21:35 /user/coder/out
    [coder@h1 hadoop-0.20.2]$ bin/hadoop dfs -ls ./out
    Found 2 items
    drwxr-xr-x   - coder supergroup          0 2013-03-28 21:34 /user/coder/out/_logs
    -rw-r--r--   2 coder supergroup         23 2013-03-28 21:35 /user/coder/out/part-r-00000
    [coder@h1 hadoop-0.20.2]$ bin/hadoop dfs -cat ./out/*
    hadoop  1
    hey   2
    man 1 cat: Source must be a file. [coder@h1 hadoop-0.20.2]$

        4.5)可以通过浏览器访问JobTracker节点的50030端口监控JobTracker

      

        4.6)可以通过浏览器访问NameNode节点的50070端口监控集群

  • 相关阅读:
    Flutter form 的表单 input
    FloatingActionButton 实现类似 闲鱼 App 底部导航凸起按钮
    Flutter 中的常见的按钮组件 以及自 定义按钮组件
    Drawer 侧边栏、以及侧边栏内 容布局
    AppBar 自定义顶部导航按钮 图标、颜色 以及 TabBar 定义顶部 Tab 切换 通过TabController 定义TabBar
    清空路由 路由替换 返回到根路由
    应对ubuntu linux图形界面卡住的方法
    [转] 一块赚零花钱
    [转]在树莓派上搭建LAMP服务
    ssh保持连接
  • 原文地址:https://www.cnblogs.com/luxh/p/2877386.html
Copyright © 2011-2022 走看看