zoukankan      html  css  js  c++  java
  • 搭建基于ubuntu14.04麒麟的hadoop单机测试环境

    使用hadoop版本:2.2.0

    安装下载啥的就不嘀咕了,直接从配置开始:

    hadoop需要配置的有以下几个文件,都在$HADOOP_HOME/etc/hadoop/:

    hadoop-env.sh:里面有个JAVA_HOME的,配置到JDK的位置

    core-site.xml:将以下代码插入到configuration中间

    <property>
    
      <name>hadoop.tmp.dir</name>
    
     <value>/home/username/kit/hadoop/data/temp/</value>
    
    </property>
    
    <property>
    
     <name>fs.default.name</name>
    
     <value>hdfs://localhost:9000</value>
     <final>true</final>
    
    </property>

    hdfs-site.xml:代码如下:

    <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///home/username/kit/hadoop/namenode/</value>
    <final>true</final>
    </property>
    
    <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///home/username/kit/hadoop/datanode/</value>
    <final>true</final>
    </property>
    
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    
    <property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
    </property>

    mapred-site.xml:这个是复制一个mapred-site.xml.template,然后改名,然后写入如下代码:

      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        </property>

    yarn-site.xml:这个略多,有些可能不必要,从别处抄的,就全加上了

    <property>
          <name>yarn.resourcemanager.hostname</name>
          <value>localhost</value>
          <description>hostanem of RM</description>
        </property>
    
    
        <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>localhost:5274</value>
        <description>host is the hostname of the resource manager and 
        port is the port on which the NodeManagers contact the Resource Manager.
        </description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>localhost:5273</value>
        <description>host is the hostname of the resourcemanager and port is the port
        on which the Applications in the cluster talk to the Resource Manager.
        </description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
        <description>In case you do not want to use the default scheduler</description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.address</name>
        <value>localhost:5271</value>
        <description>the host is the hostname of the ResourceManager and the port is the port on
        which the clients can talk to the Resource Manager. </description>
      </property>
    
      <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value></value>
        <description>the local directories used by the nodemanager</description>
      </property>
    
      <property>
        <name>yarn.nodemanager.address</name>
        <value>localhost:5272</value>
        <description>the nodemanagers bind to this port</description>
      </property>  
    
      <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>10240</value>
        <description>the amount of memory on the NodeManager in GB</description>
      </property>
     
      <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/app-logs</value>
        <description>directory on hdfs where the application logs are moved to </description>
      </property>
    
       <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value></value>
        <description>the directories used by Nodemanagers as log directories</description>
      </property>
    
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>shuffle service that needs to be set for Map Reduce to run </description>
      </property>

    把这几个文件配置好后,基本就大功告成了。

    如果系统是64位的,需要将$HADOOP_HOME/lib/native/的文件替换为64位版本的,这个可以自己下载源码编译,具体请百度搜索,网上也有大神编译好的文件可以拿来替换。

    然后是ssh的安装,因为系统自带有openssh-client,安装一个openssh-server就可以了。

    ssh有个免密码的设置,可以省去超多的麻烦,下文的设置只适用于单机:

    $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

    $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

    注意第一行中间那是两个单引号!

    然后在/etc/profile文件中加入如下语句:

    export HADOOP_HOME=/home/shizhida/kit/hadoop-2.2.0
    export PATH=$HADOOP_HOME/bin:$PATH

    将hadoop的路径加入到环境变量,可以省去超多麻烦有木有

    至此安装基本完成,请重启后输入:

    $hadoop namenode -format

    进行最初的格式化。然后该干啥干啥吧~

  • 相关阅读:
    go函数
    Linux 查看磁盘容量、查找大文件、查找大目录
    五分钟理解一致性哈希算法(consistent hashing)
    使用Java实现三个线程交替打印0-74
    Python实现IOC控制反转
    Wannafly挑战赛5 A珂朵莉与宇宙 前缀和+枚举平方数
    Yandex Big Data Essentials Week1 Scaling Distributed File System
    Yandex Big Data Essentials Week1 Unix Command Line Interface Processes managing
    Yandex Big Data Essentials Week1 Unix Command Line Interface File Content exploration
    Yandex Big Data Essentials Week1 Unix Command Line Interface File System exploration
  • 原文地址:https://www.cnblogs.com/Ayanami-Blob/p/3675561.html
Copyright © 2011-2022 走看看