zoukankan      html  css  js  c++  java
  • centos 6.5中安装hadoop2.2

    1.配置集群机器之间ssh免密码登录
    (1)
    ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    将id_dsa.pub 公钥 加入授权的key中去
    这条命令的功能是把公钥加到用于认证的公钥文件中,这里的authorized_keys 是用于认证的公钥文件 
    (2)
    cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    (3)
    这样就把登陆本机的密钥加入公钥之中,以后登陆本机就无需输入密码了,但是集群之间还是不能免密码登陆,我们还要把集群之中其他机器登陆的密钥文件id_dsa.pub加入authorized_keys之中。
    我们集群的组成是3台机器,分别是master,slave1,slave2,我们在3台主机上执行上述命令,这样集群中每台主机都生成了id_dsa.pub文件,我们将slave1与slave2主机的id_dsa.pub文件内容都加入master主机的authorized_keys文件中,处理之后,master主机的authorized_keys文件就像这样:
    ssh-dss AAAAB3NzaC1kc3MAAACBAKpCe9woQHMTehKLJA+6GdseAMPGnykirGIzbqqwhU/dHVNMyaxwGrK42c0Sxrtg9Q/zeaAmvbtxjmtVIJ9EImWGH7U0/ijs+PVspGpp1RZoI+5eSBwCUDRF93yT9/hVm/X9mP+k/bETwC7zi1mei+ai/V6re6fTelwS9dkiYHsfAAAAFQCoai5Gh74xcauX8ScXqCZK8FOHVwAAAIAajMwOhEnRSANAtjfFo0Fx2Dhq8VZqGlJzT2xqKQv0VkxqJgE8WNv4IMIIehdhl0kSFE6640zi3B2CZ3muTQxNOK4kxWxi36HhffvLpzcVrme6HVhOGnZFrbqpmo0cLZdK99aMF/TkEF2UhRb6pL2QWAyZgIrZbWm5iGq8W47UsgAAAIAGB3DfhF9GjnrZKIIsIeSrETo1ebJfZK1z7hf3CIHWb51I+gNHVtLZuuljeLIS8oTtKu0IZcI3zvCWWGi+anAhAK+9N/VWppzC75q7Tp+XPw0OAwHeC7OjHnj4oIUYnV8+QQDgK51njl8pwQNcW5ytAr1GXMxfPnq1Do29JW5FDQ== root@master
    ssh-dss AAAAB3NzaC1kc3MAAACBAJN2NYZap/VXLECMgCFXWyvz2uY9ciLwhOhTqnLeX5giJUWfEvvlzpuxzhrMmJdo40Rn6h/ggf2qgrCDo0NM7aaoo3nG2cW3e1mrpkDgpI+qYrNUwtdZ6a2jWs//gourBa359v/8NQgkdPZXw1JCnE3qzLxJQ2YfTPLFMmV7yv01AAAAFQDoIbKLeHjrtgHuCCT6CHbmV69jJwAAAIEAgj9piFkKUDAVeP60YQy3+CI2RSaU1JBopXOuzLJcYZcsZm+z1+b4HKgF23MsK0nEpl0UgnlicGk6GgiulBHTAMoq/GO6Hn5I1tEtXjDKlWG1PaGoH8Wua6GlziyxrZ/0OKjTdJaOirctVFnD/yyoO3xE8jpGzJwqWuScW44W3zQAAACADGFDYzG34Jr3M+BUkB11vGcv6NKeyU/CP/OSx5LGjQwwwD2f0UdSYEAuqvvkccNB9MB10H0OJCSFNGtbULA8kpDXM03q2VkJcJXQcRx+C9QoHCtF1EaM7GFmSuAEegzvv2UR122qXsxsxZIiJXhKZKzbznTIoipm0KEAqp0cz48= root@slave1
    ssh-dss AAAAB3NzaC1kc3MAAACBAOLxtxe3HLhc01szJFXktBJUfjnQwan/EvXcalvHv/DX9jsp5OroEclNE9NLzeL+NU9Ax0Jh7zYbyvQ2xK/lW9syfkJWntdwXcpeTBRrH1NX+dV1LentHyvgAj411LHZLfnkYaztXPWB/ux8JK9F6GB16uVWTG1KjCQwo44q5MtFAAAAFQDw/590kNub5MXnQCMBe4ggfK8dmQAAAIAg2GEhEPak+ETd9UekWL/k5168ng9SmA7sWvABs/dVePFdpP2WY+WNOOmyryvvtpsBfEyAM/NCaTsrMWcGorOdAJ4IKyMDl3QLTolelnjBaC8pcHEZ1igKR2JPGDIQSSlBkvB/Q8+qVmwYlHIQnEoYgGOoEokdtmHVMwOR053/hAAAAIB/kGh9FN4ie+5zRmQLiYTDES3ztm/Ik3UU0fOoNWkdeTVAXvp1xXotkQIkeh3bGFHwGfDUjNtTlrS+qqvAQqCpcj8LR8+pQh0UbxT2rZ1AsGviUVoK8mbosJ3eUjcigCCbF3SChy8TYIU7fsAynavqFubsbmV/6HpbHJNyC1+MAA== root@slave2
    然后将master主机处理之后的authorized_keys文件覆盖slave1和slave2主机~/.ssh/ 目录下的authorized_keys文件,这样集群内部各主机都实现了免密码登陆。重启电脑,我们任意选择一个主机,分别ssh 其他两台主机,如果能够不输入密码就能直接登陆,那么就配置成功了。
    2.配置hadoop中的一些配置文件
    解压 hadoop安装文件至/cloud目录下,如下:
    (1)编辑配置文件hadoop-env.sh 指定JAVA_HOME的目录
    首先查看一下JAVA_HOME的地址 :
    echo $JAVA_HOME
    可以知道JAVA_HOME的地址如下:
    /usr/lib/jvm/java-1.7.0-openjdk.x86_64
    vi /cloud/hadoop-2.2/etc/hadoop/hadoop-env.sh
    (2)配置文件core-site.xml,添加以下内容:
    vi /cloud/hadoop-2.2/etc/hadoop/core-site.xml
     
    <configuration>
        <property>  
            <name>fs.default.name</name>  
            <value>hdfs://master:9000</value>  
        </property>
        <property>  
              <name>dfs.replication</name>  
              <value>3</value>  
            </property>
      <property>
        <name>hadoop.tmp.dir</name>
        <value>/cloud/hadoopData</value>
      </property>
    </configuration>
     
    设置hdfs的访问地址是hdfs://110.64.76.130:9000,临时文件的存放地址是/cloud/hadoopData,要注意创建此目录
    (3)配置文件hdfs-site.xml
    vi /cloud/hadoop-2.2/etc/hadoop/hdfs-site.xml
    添加以下内容:
    <configuration>
    <property>  
       <name>dfs.replication</name>  
       <value>2</value>  
     </property>  
     
     <property>  
       <name>dfs.namenode.name.dir</name>  
       <value>/cloud/hadoopData/name</value>  
     </property>  
     
     
     <property>  
       <name>dfs.datanode.data.dir</name>  
       <value>/cloud/hadoopData/data</value>  
     </property>  
     
    </configuration>
     (4)配置文件yarn-site.xml
    vi /cloud/hadoop-2.2/etc/hadoop/yarn-site.xml
    添加以下内容:
    <?xml version="1.0"?>
     
    <configuration>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master:8031</value>
        <description>host is the hostname of the resource manager and
        port is the port on which the NodeManagers contact the Resource Manager.
        </description>
      </property>
     
      <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value>
        <description>host is the hostname of the resourcemanager and port is the port
        on which the Applications in the cluster talk to the Resource Manager.
        </description>
      </property>
     
      <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
        <description>In case you do not want to use the default scheduler</description>
      </property>
     
      <property>
        <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
        <description>the host is the hostname of the ResourceManager and the port is the port on
        which the clients can talk to the Resource Manager. </description>
      </property>
     
      <property>
        <name>yarn.nodemanager.address</name>
        <value>0.0.0.0:8034</value>
        <description>the nodemanagers bind to this port</description>
      </property>
     
      <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>10240</value>
        <description>the amount of memory on the NodeManager in GB</description>
      </property>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>shuffle service that needs to be set for Map Reduce to run </description>
      </property>
    </configuration>
     
    (5)配置文件 slaves
     修改成以下内容:
    slave1
    slave2
     
    3.将hadoop添加到环境变量
    在/etc/profile文件中添加以下内容,并且更新系统配置。
    export HADOOP_HOME=/cloud/hadoop-2.2
    expoer PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    执行下述命令,使环境变量设置生效
    source /etc/profile
     
    4.将hadoop安装配置文件复制分发到集群的其他主机上
    cd /cloud
    scp -r hadoop-2.2 root@slave1:/cloud
    scp -r hadoopData root@slave1:/cloud
     
    scp -r hadoop-2.2 root@slave2:/cloud
    scp -r hadoopData root@slave2:/cloud
    5.格式化hdfs文件系统
    以下操作在master主机上进行
    cd /cloud/bin
    hdfs namenode -format 
    (只需运行一次)
    6. 启动每个hadoop节点上的hadoop服务
    cd /cloud/hadoop-2.2/sbin
    master:
    ./start-dfs.sh
    ./start-yarn.sh
    slave1与slave2:
    在Hadoop 2.x中,MapReduce Job不需要额外的daemo
    n进程,在Job开始的时候,NodeManager会启动一个MapReduce Application Master(相当与一个精简的JobTracker),Job结束的时候自动被关闭。
    所以无需在slave1和slave2执行命令来启动节点。
     
    7.测试hadoop 集群

    可以用浏览器打开NameNode, ResourceManager和各个NodeManager的web界面,

        - NameNode web UI, http://master:50070/
        - ResourceManager web UI, http://master:8088/
        - NodeManager web UI, http://slave01:8042

     

     

    还可以启动JobHistory Server,能够通过Web页面查看集群的历史Job,执行如下命令:

    mr-jobhistory-daemon.sh start historyserver

    默认使用19888端口,通过访问http://master:19888/查看历史信息。

    终止JobHistory Server,执行如下命令:

    mr-jobhistory-daemon.sh stop historyserver

    9.运行wordcount示例程序

    hdfs dfs -mkdir /user

    hdfs dfs -mkdir /user/root    用于创建用户文件夹,以后如果不指明路径,默认存储在用户目录下

    hdfs dfs -put ./test.txt input  将本地目录中的test.txt 文件复制到用户路劲下作为input文件

    hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount input output

    hdfs dfs -cat output/*

    10.停止运行hadoop集群

    master上执行:

    cd /cloud/hadoop-2.2/sbin

    ./stop-yarn.sh

    ./stop-dfs.sh



  • 相关阅读:
    Androidの多线程之多线程用法大集合(未整理)
    Androidの多线程之更新ui(Thread+Handler+Message)
    构建之法阅读笔记1
    文件与流
    公文流转系统一
    JAVA web课堂测试1
    10.21动手动脑
    Android学习02
    Android学习01
    Android学习03
  • 原文地址:https://www.cnblogs.com/zhoudayang/p/5233557.html
Copyright © 2011-2022 走看看