zoukankan      html  css  js  c++  java
  • 【Hadoop学习】CDH5.2安装部署

    【时间】2014年11月19日

    【平台】Centos 6.5

    【工具】scp

    【软件】jdk-7u67-linux-x64.rpm

        CDH5.2.0-hadoop2.5.0

    【步骤】

        1. 准备条件

          (1)集群规划

    主机类型 IP地址 域名
    master 192.168.50.10 master.hadoop.com
    slave1 192.168.50.11 slave1.hadoop.com
    slave2 192.168.50.12 slave2.hadoop.com
    slave3 192.168.50.13 slave3.hadoop.com

       

            (2)以root身份登录操作系统

          (3)在集群中的每台主机上执行如下命令,设置主机名。

              hostname *.hadoop.com 

              编辑文件/etc/sysconfig/network如下

              HOSTNAME=*.hadoop.com 

          (4)修改文件/etc/hosts如下

             192.168.86.10 master.hadoop.com
             192.168.86.11 slave1.hadoop.com
             192.168.86.12 slave2.hadoop.com
             192.168.86.13 slave3.hadoop.com

              执行如下命令,将hosts文件复制到集群中每台主机上

              scp /etc/hosts 192.168.50.*:/etc/hosts 

          (5)安装jdk

              rpm -ivh jdk-7u67-linux-x64.rpm 

             创建文件

              echo -e "JAVA_HOME=/usr/java/default export PATH=$JAVA_HOME/bin:$PATH" > /etc/profile.d/java-env.sh 

              . /etc/profile.d/java-env.sh 

          (6)关闭iptables

             service iptables stop  

             chkconfig iptables off 

          (7)关闭selinux。修改文件/etc/selinux/config,然后重启操作系统

             SELINUX=disabled

        2. 安装 (with YARN)

          (1)在master.hadoop.com主机上执行

               yum install hadoop-yarn-resourcemanager hadoop-mapreduce-historyserver hadoop-yarn-proxyserver hadoop-hdfs-namenode

             yum install hadoop-hdfs-secondarynamenode  可选,如果使用HA,就不要安装此包

          (2)在所有的slave*.hadoop.com主机上执行

               yum install hadoop-yarn-nodemanager hadoop-mapreduce hadoop-hdfs-datanode

         3. 配置。将以下文件修改完毕后,用scp命令复制到集群中的所有主机上

          (1)创建配置文件

    cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
    alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
    alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
    View Code

          (2)创建必要的本地文件夹

    sudo -u hdfs hadoop fs -mkdir -p /tmp && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
    sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn
    sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
    sudo -u hdfs hadoop fs -mkdir -p /var
    sudo -u hdfs hadoop fs -mkdir -p /var/log && sudo -u hdfs hadoop fs -chmod -R 1775 /var/log && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log
    sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chmod -R 1777 /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn/apps
    sudo -u hdfs hadoop fs -mkdir -p /user
    sudo -u hdfs hadoop fs -mkdir -p /user/history && sudo -u hdfs hadoop fs -chown mapred /user/history
    sudo -u hdfs hadoop fs -mkdir -p /user/test && sudo -u hdfs hadoop fs -chmod -R 777 /user/test && sudo -u hdfs hadoop fs -chown test /user/test
    sudo -u hdfs hadoop fs -mkdir -p /user/root && sudo -u hdfs hadoop fs -chmod -R 777 /user/root && sudo -u hdfs hadoop fs -chown root /user/root
    View Code

          (3)修改配置文件

            1)core-site.xml

      <property>
         <name>fs.defaultFS</name>
         <value>hdfs://master.hadoop.com:8020</value>
      </property>
    
      <property>
         <name>fs.trash.interval</name>
         <value>1440</value>
      </property>
    
      <property>
         <name>fs.trash.checkpoint.interval</name>
         <value>720</value>
      </property>
    
      <property>
         <name>hadoop.proxyuser.mapred.groups</name>
         <value>*</value>
      </property>
    
      <property>
         <name>hadoop.proxyuser.mapred.hosts</name>
         <value>*</value>
      </property>
    
      <property>
         <name>io.compression.codecs</name>
         <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
      </property>
    View Code

            2)hdfs-site.xml

      <property>
         <name>dfs.permissions.superusergroup</name>
         <value>hadoop</value>
      </property>
    
      <property>
         <name>dfs.namenode.name.dir</name>
         <value>file:///data/1/dfs/nn</value>
      </property>
    
      <property>
         <name>dfs.datanode.data.dir</name>
         <value>file:///data/1/dfs/dn,file:///data/2/dfs/dn,file:///data/3/dfs/dn,file:///data/4/dfs/dn</value>
      </property>
    
      <property>
         <name>dfs.datanode.failed.volumes.tolerated</name>
         <value>3</value>
      </property>
    
      <property>
         <name>dfs.datanode.fsdataset.volume.choosing.policy</name>
         <value>org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy</value>
      </property>
    
      <property>
         <name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name>
         <value>10737418240</value>
      </property>
    
      <property>
         <name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name>
         <value>0.75</value>
      </property>
    
      <property>
         <name>dfs.webhdfs.enabled</name>
         <value>true</value>
      </property>
    
      <property>
         <name>dfs.webhdfs.user.provider.user.pattern</name>
         <value>^[A-Za-z0-9_][A-Za-z0-9._-]*[$]?$</value>
      </property>
    View Code

            3)yarn-site.xml

      <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master.hadoop.com</value>
      </property>
    
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
    
      <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
    
      <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
      </property>
    
      <property>
        <description>List of directories to store localized files in.</description>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/data/1/yarn/local,/data/2/yarn/local,/data/3/yarn/local,/data/4/yarn/local</value>
      </property>
    
      <property>
        <description>Where to store container logs.</description>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/data/1/yarn/logs,/data/2/yarn/logs,/data/3/yarn/logs,/data/4/yarn/logs</value>
      </property>
    
      <property>
        <description>Where to aggregate logs to.</description>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>hdfs://master.hadoop.com:8020/var/log/hadoop-yarn/apps</value>
      </property>
    
      <property>
        <description>Classpath for typical applications.</description>
         <name>yarn.application.classpath</name>
         <value>
            $HADOOP_CONF_DIR,
            $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
            $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
            $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
            $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
         </value>
      </property>
    
      <property>
        <name>yarn.web-proxy.address</name>
        <value>master.hadoop.com</value>
      </property>
    
      <property>
        <description>It's not the memory the physical machine totally has, but that allocated to containers</description>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>5120</value>
      </property>
    
      <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>512</value>
      </property>
    
      <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>10240</value>
      </property>
      <property>
        <name>yarn.app.mapreduce.am.resource.mb</name>
        <value>512</value>
      </property>
    
      <property>
        <name>yarn.app.mapreduce.am.command-opts</name>
        <value>-Xmx512m</value>
      </property>
    
      <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
      </property>
    
      <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>4</value>
      </property>
    
      <property>
        <name>yarn.scheduler.minimum-allocation-vcores</name>
        <value>1</value>
      </property>
    
      <property>
        <name>yarn.scheduler.maximum-allocation-vcores</name>
        <value>10</value>
      </property>
    
      <property>
        <name>yarn.scheduler.increment-allocation-mb</name>
        <value>512</value>
      </property>
    
      <property>
        <name>yarn.scheduler.increment-allocation-vcores</name>
        <value>1</value>
      </property>
    View Code

            4)mapred-site.xml

      <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
      </property>
    
      <property>
         <name>mapreduce.jobhistory.address</name>
         <value>master.hadoop.com:10020</value>
      </property>
    
      <property>
         <name>mapreduce.jobhistory.webapp.address</name>
         <value>master.hadoop.com:19888</value>
      </property>
    
      <property>
         <name>yarn.app.mapreduce.am.staging-dir</name>
         <value>/user/history</value>
      </property>
    
      <property>
         <name>mapreduce.jobhistory.intermediate-done-dir</name>
         <value>/user/history/intermediate-done-dir</value>
      </property>
    
      <property>
         <name>mapreduce.jobhistory.done-dir</name>
         <value>/user/history/done-dir</value>
      </property>
    View Code

          (4)复制配置文件到集群中的所有主机上

              scp /etc/hadoop/conf.my_cluster/*-site.xml  192.168.50.*:/etc/hadoop/conf.my_cluster/ 

         4. 格式化HDFS

           sudo -u hdfs hdfs namenode -format 

         5. 启动HDFS

           for x in `cd /etc/init.d ; ls hadoop-hdfs-*`; do service $x start; done 

         6. 在HDFS上创建必要的文件夹

    sudo -u hdfs hadoop fs -mkdir -p /tmp && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
    sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn
    sudo -u hdfs hadoop fs -mkdir -p /tmp/hadoop-yarn/staging/history/done_intermediate && sudo -u hdfs hadoop fs -chown -R mapred:mapred /tmp/hadoop-yarn/staging && sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
    sudo -u hdfs hadoop fs -mkdir -p /var
    sudo -u hdfs hadoop fs -mkdir -p /var/log && sudo -u hdfs hadoop fs -chmod -R 1775 /var/log && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log
    sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chmod -R 1777 /var/log/hadoop-yarn/apps && sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn/apps
    sudo -u hdfs hadoop fs -mkdir -p /user
    sudo -u hdfs hadoop fs -mkdir -p /user/history && sudo -u hdfs hadoop fs -chown mapred /user/history
    sudo -u hdfs hadoop fs -mkdir -p /user/test && sudo -u hdfs hadoop fs -chmod -R 777 /user/test && sudo -u hdfs hadoop fs -chown test /user/test
    sudo -u hdfs hadoop fs -mkdir -p /user/root && sudo -u hdfs hadoop fs -chmod -R 777 /user/root && sudo -u hdfs hadoop fs -chown root /user/root
    View Code

         7. 操作YARN 

           在集群中每台机器上执行如下命令:

          (1)启动  

    service hadoop-yarn-resourcemanager start;service hadoop-mapreduce-historyserver start;service hadoop-yarn-proxyserver start;service hadoop-yarn-nodemanager start
    View Code

          (2)查看  

    service hadoop-yarn-resourcemanager status;service hadoop-mapreduce-historyserver status;service hadoop-yarn-proxyserver status;service hadoop-yarn-nodemanager status
    View Code

          (3)停止  

    service hadoop-yarn-resourcemanager stop;service hadoop-mapreduce-historyserver stop;service hadoop-yarn-proxyserver stop;service hadoop-yarn-nodemanager stop 
    View Code

            (4)重启  

    service hadoop-yarn-resourcemanager restart;service hadoop-mapreduce-historyserver restart;service hadoop-yarn-proxyserver restart;service hadoop-yarn-nodemanager restart
    View Code

         8. 安装Hadoop客户端

          (1)安装CentOS 6.5

          (2)以root身份登录,执行以下命令:

    rpm -ivh jdk-7u67-linux-x64.rpm
    
    yum install hadoop-client
    
    cp -r /etc/hadoop/conf.empty /etc/hadoop/conf.my_cluster
    alternatives --install /etc/hadoop/conf hadoop-conf /etc/hadoop/conf.my_cluster 50
    alternatives --set hadoop-conf /etc/hadoop/conf.my_cluster
    
    scp 192.168.50.10:/etc/hadoop/conf.my_cluster/*-site.xml /etc/hadoop/conf.my_cluster/
    scp 192.168.50.10:/etc/hosts /etc/
    scp 192.168.50.10:/etc/profile.d/hadoop-env.sh /etc/profile.d/
    . /etc/profile
    
    useradd -u 700 -g hadoop test
    passwd test <test用户密码>
    View Code

          9. 测试Hadoop with YARN

    su - test
    
    #计算Pi
    hadoop fs -mkdir input
    hadoop fs -put /etc/hadoop/conf/*.xml input
    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input output
    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 2 100
    
    #执行grep任务
    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output 'dfs[a-z.]+'
    hadoop fs -ls output
    hadoop fs -cat output/part-r-00000 | head
    View Code

    【参考】

        1)Cloudera 官方安装文档     http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_command_line.html

      

  • 相关阅读:
    已安装 SQL Server 2005 Express 工具。若要继续,请删除 SQL Server 2005 Express 工具
    超时时间已到。超时时间已到,但是尚未从池中获取连接。出现这种情况可能是因为所有池连接均在使用,并且达到了最大池大小。
    C#微信公众号开发 -- (七)自定义菜单事件之VIEW及网页(OAuth2.0)授权
    C#微信公众号开发 -- (六)自定义菜单事件之CLICK
    C#微信公众号开发 -- (五)自定义菜单创建
    C#微信公众号开发 -- (四)获取API调用所需的全局唯一票据access_token
    C#微信公众号开发 -- (三)用户关注之后自动回复
    C#微信公众号开发 -- (二)验证成为开发者
    linux下删除大量文件提示参数过长解决办法
    Linux 远程连接sftp与ftp
  • 原文地址:https://www.cnblogs.com/zhangningbo/p/4104849.html
Copyright © 2011-2022 走看看