zoukankan      html  css  js  c++  java
  • Hadoop分布式安装(0.20.205.0)

    一:准备

    4台集群的机器.

    192.168.1.31   192.168.1.32 192.168.1.33   192.168.1.34

    root账号修改 4台机器的 /etc/hosts 文件,添加如下:

             192.168.1.31   hadoop1  #namenode+job

             192.168.1.32   hadoop2  #datanode

             192.168.1.33   hadoop3  # datanode

             192.168.1.34   hadoop4  # datanode

    在每台服务器上创建4个一样的hadoop用户:

    1. useradd hadoop #加用户
    2. passwd hadoop #修改密码

    二:创建SSH免密码登录

    hadoop 账号在namenode节点上,查看 ssh localhost 是否是免密码的,如果不是,执行下面命令:

      生成公钥密钥:

        

    ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    

      

      执行完成之后,会在 ~/.ssh/ 下生成 id_dsa.pub 和 id_dsa

      把 id_dsa.pub 文件内容复制到 authorized_keys 文件里:

      

    cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    

      

      执行完成后会在 ~/.ssh/ 下生成 authorized_keys

      删除 id_dsa.pub 文件: 

      

    rm –rf id_dsa.pub

      将 authorized_keys复制到各个datanode节点上:

      在各个子节点上创建 .ssh 文件夹: 

    mkdir /home.hadoop/.ssh

    将证书文件拷贝到各台机器上(会提示输入密码):

    scp authorized_keys hadoop2:/home/hadoop/.ssh/
    
    scp authorized_keys hadoop3:/home/hadoop/.ssh/
    
    scp authorized_keys hadoop4:/home/hadoop/.ssh/

      将各个datanode 节点下的 .ssh 权限改成 700, authorized_keys 改成600

      改完后应该就差不多可以了,在 hadoop1 机器上 依次执行: ssh hadoop2 ,ssh hadoop3 , ssh hadoop4, 如果都不用密码的话,表是成功.

         如果以上还不行的话 执行如下代码:

    restorecon -r -vv /root/.ssh
    

      

    三:安装jdk

             用root账号安装,可以下载个bin文件,我的是 jdk-6u23-linux-i586-rpm.bin , 直接 ./ jdk-6u23-linux-i586-rpm.bin 安装,  默认路径在 /usr/java/jdk1.6.0_23/ 下,4台机器都一样,注意点:安装的路径一定要一样.

    四:安装配置Hadoop

             用hadoop账号,下载 hadoop-0.20.205.0.tar.gz ,放在 /home/hadoop/ 下,

             修改 hadoop-0.20.205.0.tar.gz 权限: chmod 700 hadoop-0.20.205.0.tar.gz

       解压: tar zxvf hadoop-0.20.205.0.tar.gz ,完成后会生成 hadoop-0.20.205.0 目录,进入hadoop-0.20.205.0/conf/ 下

             按照如下方式修改下面文件:

    1. Namenode配置: core-site.xml:
    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/tmp/hadooptmp</value>
            <description>A base for other temporary directories.</description>
        </property>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://hadoop1:9000</value>
            <description>The name of the default file system.  A URI whose  scheme and authority determine the FileSystem implementation.  The  uri's scheme determines the config property (fs.SCHEME.impl) naming  the FileSystem implementation class.  The uri's authority is used to determine the host, port, etc. for a filesystem.
            </description>
        </property>
    </configuration>    
    1. Tracker配制: Mapred-site.xml:
    <configuration>
        <property>
            <name>mapred.job.tracker</name>
            <value>hadoop1:9001</value>
            <description> The host and port that the MapReduce job tracker runs at.  If "local", then jobs are run in-process as a single map and reduce task.
            </description>
        </property>
        <property>
            <name>mapred.hosts</name>
            <value>slaves.include</value>
            <description>show slaves node</description>
        </property>
    </configuration>    
    1. hdfs: Hdfs-site.xml

      

    <configuration>
        <property>
            <name>dfs.name.dir</name>
                    <value>/home/hadoop/dfsnamedir</value>
                    <description>namenode dir</description>
            </property>
            <property>
                    <name>dfs.data.dir</name>
                    <value>/home/hadoop/dfsdatadir</value>
                    <description>datanode dir</description>
            </property>
            <property>
                    <name>dfs.replication</name>
                    <value>1</value>
                    <description>backup</description>
            </property>
    </configuration>
    1. 集群配置

    slaves:

             hadoop2

             hadoop3

             hadoop4

            

    1. Hadoop环境配置:

        hadoop-env.sh

        把 export JAVA_HOME 前面的 “#” 去掉,改成你本机安装的jdk的路径

        到此,hadoop配置已经完成.

        把hadoop分发给各个子节点:  

          把hadoop1(namenode)节点上配置好的hadoop打个包:

                    tar -vczf hadoop-0.20.205.0.tar.gz hadoop-0.20.205.0

    1.  把已经配置好的压缩包 hadoop-0.20.205.0.tar.gz 分发到各个子节点:

        

    scp hadoop-0.20.205.0.tar.gz hadoop2:/home/hadoop/
    scp hadoop-0.20.205.0.tar.gz hadoop3:/home/hadoop/
    scp hadoop-0.20.205.0.tar.gz hadoop4:/home/hadoop/

      每个节点上解压hadoop-0.20.205.0.tar.gz

    1. 回到hadoop1 (namenode)节点上,运行hadoop

        进入到/home/hadoop/hadoop0.20.205.0/bin/下:

               格式化namenode: ./hadoop namenode –format

        启动hadoop: ./start-all.sh

        查看任务状态: http://192.168.1.31:50070/dfshealth.jsp

        停止hadoop: ./stop-all.sh

    1. 运行wordcount 例子

        ./hadoop fs –put $LOCALFILE $HDFS  //把本地文件拷贝到hdfs里

        ./hadoop fs –put nginx.log input

        ./hadoop jar $JARFILE $CLASSNAME $HDFSINPUT $OUTPUT //运行jar文件

        ./hadoop WorldCount.jar wordcount input output  //output必须是不存在的

       备注: 启动的时候如果遇到如下错误:

    The authenticity of host 192.168.0.xxx can't be established
    

      请执行如下命令:

    ssh -o StrictHostKeyChecking=no 192.168.0.xxx
    

      

  • 相关阅读:
    使用Stream方式处理集合元素
    Consumer方法结合Lambda表达式的应用
    java-遍历字符串的两种方式:1.char charAt(int index);2.char[] toCharArray()
    java-成员变量与局部变量的测试
    java-统计字符串中各字符次数
    java-字符串的遍历和字符串数组的遍历
    java-String类的获取方法(indexOf();substring()等)
    java-模拟登陆练习
    java-String类中的各字符串判断(包括" "和null的区别)
    java-String类的常见面试题
  • 原文地址:https://www.cnblogs.com/BennyTian/p/2944868.html
Copyright © 2011-2022 走看看