zoukankan      html  css  js  c++  java
  • Ubuntu12.04-x64编译Hadoop2.2.0和安装Hadoop2.2.0集群

     
     
      本文对Hadoop-2.2.0源码进行重新编译(64位操作系统下不重新编译会有版本问题),并构建Hadoop-2.2.0集群,生成在Eclipse环境上的Hadoop-2.2.0插件,并通过了测试运行。
    1 、 安装maven 、libssl-dev cmake JDK

    安装本机库http://wiki.apache.org/hadoop/HowToContribute

    sudo apt-get -y install maven build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev
    sudo tar -zxvf jdk-7u51-linux-x64.tar.gz /usr/lib/jvm/
    sudo gedit /etc/profile

    For RHEL (and hence also CentOS):

    yum -y install  lzo-devel  zlib-devel  gcc autoconf automake libtool openssl-devel

    还要安装cmake,不然会报错 Cannot run program "cmake" (in directory "/opt/hadoop-2.5.1-src/hadoop-common-project/hadoop-common/target/native"): error=2, No such file or directory

    yum install cmake

    然后,添加

    #set Java Environment
    
    export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_51
    export CLASSPATH=.:$JAVA_HOME/lib:$CLASSPATH
    export PATH=$JAVA_HOME/bin:$PATH

    设置默认JDK版本

    sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/jdk1.7.0_51/bin/java 300
    sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/jdk1.7.0_51/bin/javac 300
    sudo update-alternatives --config java

    使设置生效

    source /etc/profile

    2 、安装ProtocolBuffer

    下载https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz

    tar -zxvf protobuf-2.5.0.tar.gz
    cd protobuf-2.5.0
    sudo ./configure
    sudo make
    sudo make check
    sudo make install
    sudo ldconfig
    protoc --version

    对于CentOS系统,也可以使用

    yum install protobuf

    3 、安装FindBugs

    下载http://sourceforge.net/projects/findbugs/files/findbugs/2.0.3/findbugs-2.0.3.tar.gz

    解压

    tar -zxvf findbugs-2.0.3.tar.gz
    sudo gedit /etc/profile

    添加

    #set Findbugs Environment
    
    export FINDBUGS_HOME=/home/hadoop/findbugs-2.0.3
    export PATH=$FINDBUGS_HOME/bin:$PATH
    source /etc/profile

    4、安装Maven

    下载 http://mirrors.cnnic.cn/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz

    解压,然后配置/etc/profile

    #Set Maven
    export M2_HOME=/opt/maven
    export PATH=$PATH:$M2_HOME/bin
    source /etc/profile
    mvn -v

    4 、编译Hadoop-2.2.0

    ①下载http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz

    ②解压
    tar -zxvf hadoop-2.2.0-src.tar.gz
    添加jetty-util依赖Index: hadoop-common-project/hadoop-auth/pom.xml

    <dependency>
          <groupId>org.mortbay.jetty</groupId>
           <artifactId>jetty</artifactId>
           <scope>test</scope>
    </dependency>

    添加

    <dependency>
           <groupId>org.mortbay.jetty</groupId>
          <artifactId>jetty-util</artifactId>
          <scope>test</scope>
    </dependency>

    ④执行编译

    cd hadoop-2.2.0-src
    mvn clean package -Pdist,native -DskipTests -Dtar -e -X

    (可以不要clean、-e -X)

    5、安装Hadoop集群

    ①这里总共2台主机,在2台主机上创建相同的用户hadoop每一台主机上配置主机名和IP地址,打开每个主机的/etc/hosts文件,输入(这里IP地址根据每台主机的具体IP地址进行设定可以设置静态IP可以在master上ping一下slave的地址,看看是否能通信ipv6等设置删去)

    127.0.0.1      localhost
    192.168.116.133      master
    192.168.116.134 slave1

    分别修改各自的/etc/hostname文件内容为masterslave1

    ②安装SSH,设置SSH免密码登录JDK安装这里不详述了)

    sudo apt-get install ssh
    ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    ssh -version
    ssh localhost

    注:将节点A的authorized_keys放到节点B的authorized_keys中,则节点A可以免密码访问节点B

    文件master复制到slave主机的相同文件夹

    cd .ssh
    scp authorized_keys slave1:~/.ssh/

    (输入ssh slave1查看是否可以从master主机免密码登录slave1

     

    可以master上切换到slave1主机,输入exit返回到master主机

    ③解压编译好的hadoop

    tar -zxvf hadooop-2.2.0.tar.gz

    添加环境变量

    sudo gedit /etc/profile

    添加代码

    export HADOOP_HOME=/home/hadoop/hadoop-2.2.0
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

    使配置生效

    sudo source /etc/profile

    ④修改配置文件

    这些配置参考过官网

    cd /hadoop-2.2.0/etc/hadoop

    文件hadoop-env.sh

    替换export JAVA_HOME=${JAVA_HOME}自己的JDK安装目录

    export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_51

    文件mapred-env.sh

    添加 

    export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_51

    文件yarn-env.sh

    添加 

    export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_51

    文件core-site.xml

      <property>
            <name>fs.defaultFS</name>
            <value>hdfs://master:9000/</value>
      </property>
      <property>
             <name>hadoop.tmp.dir</name>
             <value>file:///home/hadoop/hadoop-2.2.0/tmp</value>
      </property>

    文件hdfs-site.xml

      <property>
              <name>dfs.namenode.name.dir</name>
               <value>${hadoop.tmp.dir}/dfs/name</value>
      </property>
      <property>
                <name>dfs.datanode.data.dir</name>
                 <value>${hadoop.tmp.dir}/dfs/data</value>
      </property>
      <property>
                  <name>dfs.replication</name>
                  <value>2</value>
       </property>        

    文件mapred-site.xml

    cp mapred-site.xml.template mapred-site.xml
      <property>
                  <name>mapreduce.framework.name</name>
                  <value>yarn</value>
       </property>
    
       <property>
                  <name>mapreduce.jobtracker.system.dir</name>
                  <value>${hadoop.tmp.dir}/mapred/system</value>
       </property>
    
       <property>
                  <name>mapreduce.cluster.local.dir</name>
                  <value>${hadoop.tmp.dir}/mapred/local</value>
        </property>
    
        <property>
                  <name>mapreduce.cluster.temp.dir</name>
                  <value>${hadoop.tmp.dir}/mapred/temp</value>
        </property>
    
        <property>
                   <name>mapreduce.jobtracker.address</name>
                   <value>master:9001</value>
        </property>

    文件yarn-site.xml

    <property>
               <name>yarn.resourcemanager.resource-tracker.address</name>
               <value>master:8031</value>
               <description>host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager.</description>
      </property>
    
      <property>
               <name>yarn.resourcemanager.scheduler.address</name>
               <value>master:8030</value>
               <description>host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager.</description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.scheduler.class</name>
       <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
        <description>In case you do not want to use the default scheduler</description>
      </property>
    
      <property>
               <name>yarn.resourcemanager.address</name>
               <value>master:8032</value>
               <description>the host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager. </description>
      </property>
    
      <property>
               <name>yarn.nodemanager.local-dirs</name>
               <value>${hadoop.tmp.dir}/nm-local-dir</value>
               <description>the local directories used by the nodemanager</description>
      </property>
    
      <property>
               <name>yarn.nodemanager.address</name>
               <value>0.0.0.0:8034</value>
               <description>the nodemanagers bind to this port</description>
      </property> 
    
      <property>
               <name>yarn.nodemanager.resource.memory-mb</name>
               <value>10240</value>
               <description>the amount of memory on the NodeManager in GB</description>
      </property>
    
      <property>
               <name>yarn.nodemanager.remote-app-log-dir</name>
               <value>${hadoop.tmp.dir}/app-logs</value>
               <description>directory on hdfs where the application logs are moved to </description>
      </property>
    
       <property>
               <name>yarn.nodemanager.log-dirs</name>
               <value>${hadoop.tmp.dir}/userlogs</value>
               <description>the directories used by Nodemanagers as log directories</description>
      </property>
    
      <property>
               <name>yarn.nodemanager.aux-services</name>
               <value>mapreduce_shuffle</value>
               <description>shuffle service that needs to be set for Map Reduce to run </description>
      </property>

    文件slaves

    slave1

    ⑤在master主机上配置完成,Hadoop目录复制到slave1主机

    scp -r hadoop-2.2.0 slave1:~/

    master主机上格式化namenode

    cd hadoop-2.2.0
    bin/hdfs namenode -format

     

    master主机上启动集群

    sbin/start-all.sh
    jps

    浏览器中输入http://master:50070查看namenode状态

    浏览器中输入http://master:50090查看secondnamenode状态

    在浏览器中输入http://master:8088 查看job状态(http://master:50030

    6 、生成Eclipse插件

    https://github.com/winghc/hadoop2x-eclipse-plugin

    下载hadoop2x-eclipse-plugin-master

    cd src/contrib/eclipse-plugin
    
    ant jar -Dversion=2.2.0 -Declipse.home=/home/hadoop/eclipse -Dhadoop.home=/home/hadoop/hadoop-2.2.0

    jar包生成在目录

    /build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.2.0.jar

    Hadoop-2.2.0编译后的源码、Eclipse插件、及配置文件,我分享在: http://pan.baidu.com/s/1gdsvYz5 提取密码: slgy

  • 相关阅读:
    Linux软件安装中RPM与YUM 区别和联系(转载)
    linux文件系统
    (转载) linux基础知识
    linux学习:硬盘挂载
    IDEA版github教程(转载)
    idea配置gitee(转载)
    事与人的关系
    maven打包失败与问题反思
    计算机开机
    maven到底干了啥
  • 原文地址:https://www.cnblogs.com/fesh/p/3766656.html
Copyright © 2011-2022 走看看