zoukankan      html  css  js  c++  java
  • Ubuntu14.04下hadoop-2.6.0单机配置和伪分布式配置

    注意:安装之前最好删除hadoop-2.6.0/dfs/data下的所有文件,避免出现各种问题。尤其安装包不是官方现下载的。

    需要重新编译的教程:http://blog.csdn.net/ggz631047367/article/details/42460589

    在Ubuntu下创建hadoop用户组和用户

    hadoop的管理员最好就是以后要登录桌面环境运行eclipse的用户,否则后面会有拒绝读写的问题出现。当然不是也有办法办法解决。

    1. 创建hadoop用户组;

    sudo addgroup hadoop

    2. 创建hadoop用户;

    sudo adduser -ingroup hadoop hadoop

    3. 给hadoop用户添加权限,打开/etc/sudoers文件;

    sudo gedit /etc/sudoers

    在root   ALL=(ALL:ALL)   ALL下添加hadoop   ALL=(ALL:ALL)  ALL.

    在Ubuntu下安装JDK

    具体见:http://blog.csdn.net/ggz631047367/article/details/42366687          

    //JAVA_HOME=/usr/lib/jvm/jdk1.8.0_25

    安装ssh服务 

    sudo apt-get install ssh openssh-server

    建立ssh无密码登录本机

    切换到hadoop用户,执行以下命令:

    su - hadoop

    ssh生成密钥有rsa和dsa两种生成方式,默认情况下采用rsa方式。

    1. 创建ssh-key,,这里我们采用rsa方式;

    ssh-keygen -t rsa -P "" (注:回车后会在~/.ssh/下生成两个文件:id_rsa和id_rsa.pub这两个文件是成对出现的)
    

    2. 进入~/.ssh/目录下,将id_rsa.pub追加到authorized_keys授权文件中,开始是没有authorized_keys文件的;

    cd ~/.ssh
    cat id_rsa.pub >> authorized_keys (完成后就可以无密码登录本机了。)

    3. 登录localhost;

    ssh localhost
    

    4. 执行退出命令;

    exit

    安装hadoop

    下载地址:http://apache.fayea.com/hadoop/common/stable/hadoop-2.6.0.tar.gz

    因多次修改,可能会有Hadoop实际路径和配置文件中Hadoop路径不同,可能会产生错误

    1. 把hadoop解压到/usr/local下:

    sudo tar -zxvf hadoop-2.6.0.tar.gz
    sudo mv hadoop-2.6.0 /usr/local/hadoop
    sudo chmod -R 775 /usr/local/hadoop
    sudo chown -R hadoop:hadoop /usr/local/hadoop  //否则ssh会拒绝访问
    

    2.配置

    修改bashrc的配置:

    sudo gedit ~/.bashrc

    在文件末尾添加:

    #HADOOP VARIABLES START
    
    export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_25
    
    export HADOOP_INSTALL=/usr/local/hadoop
    
    export PATH=$PATH:$HADOOP_INSTALL/bin
    export PATH=$PATH:$JAVA_HOME/bin 
    export PATH=$PATH:$HADOOP_INSTALL/sbin
    export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
    export HADOOP_COMMON_HOME=$HADOOP_INSTALL
    export HADOOP_HDFS_HOME=$HADOOP_INSTALL
    export YARN_HOME=$HADOOP_INSTALL
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
    #HADOOP VARIABLES END
    
    
    
    
    
    如果不知道JAVA_HOME可以通过命令获得:
    

    update-alternatives --config java
    目录取到java根目录即可。


    执行下面命令使改动生效:

    source ~/.bashrc

    修改hadoop-env.sh的配置:

    sudo gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh

    找到JAVA_HOME改为上面的值。


    测试

    • 通过执行hadoop自带实例WordCount验证是否安装成功

     /usr/local/hadoop路径下创建input文件夹  

    mkdir input
    cp README.txt input
    
    在hadoop目录下执行WordCount:
    bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.6.0-sources.jar
    org.apache.hadoop.examples.WordCount input output

    Hadoop伪分布式配置

    sudo gedit /usr/local/hadoop/etc/hadoop/core-site.xml

    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/usr/local/hadoop/tmp</value>
            <description>Abase for other temporary directories.</description>
        </property>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
    </configuration>

     sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml

    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
    <property>
        <description>The address of the applications manager interface in the RM.</description>
        <name>yarn.resourcemanager.address</name>
        <value>${yarn.resourcemanager.hostname}:8032</value>
    </property>
    <property>
        <description>The address of the scheduler interface.</description>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>${yarn.resourcemanager.hostname}:8030</value>
      </property>
    <property>
        <description>The http address of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>${yarn.resourcemanager.hostname}:8088</value>
    </property>
    <property>
        <description>The https adddress of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.https.address</name>
        <value>${yarn.resourcemanager.hostname}:8090</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>${yarn.resourcemanager.hostname}:8031</value>
    </property>
    <property>
        <description>The address of the RM admin interface.</description>
        <name>yarn.resourcemanager.admin.address</name>
        <value>${yarn.resourcemanager.hostname}:8033</value>
    </property>
    <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
    </property>


    
    

     sudo gedit /usr/local/Hadoop/etc/Hadoop/mapred-site.xml  //伪分布式不用配

    <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
    </property>
    <property>
      <name>mapreduce.jobhistory.address</name>
      <value>master:10020</value>
      <description>MapReduce JobHistory Server IPC host:port</description>
    </property>
    <property>
      <name>mapreduce.jobhistory.webapp.address</name>
      <value>master:19888</value>
      <description>MapReduce JobHistory Server Web UI host:port</description>
    </property>

     sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml

    <configuration>
    <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/usr/local/hadoop/dfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/usr/local/hadoop/dfs/data</value>
        </property>
        <property>                 //这个属性节点是为了防止后面eclopse存在拒绝读写设置的
                <name>dfs.permissions</name>
                <value>false</value>
         </property>
     </configuration>

    sudo gedit /usr/local/hadoop/etc/hadoop/masters 添加:localhost

    sudo gedit /usr/local/hadoop/etc/hadoop/slaves  添加:localhost


    关于配置的一点说明:上面只要配置 fs.defaultFS 和 dfs.replication 就可以运行,不过有个说法是如没有配置 hadoop.tmp.dir 参数,此时 Hadoop 默认的使用的临时目录为 /tmp/hadoo-hadoop,而这个目录在每次重启后都会被干掉,必须重新执行 format 才行(未验证),所以伪分布式配置中最好还是设置一下。

    配置完成后,首先在 Hadoop 目录下创建所需的临时目录:

    cd /usr/local/hadoop
    mkdir tmp dfs dfs/name dfs/data



    修改pid目录位置(可以不做)

    vi hadoop-env.sh
    export HADOOP_PID_DIR=/usr/local/hadoop-2.6.0/pid
    vi mapred-env.sh
    export HADOOP_MAPRED_PID_DIR=/usr/local/hadoop-2.6.0/pid

    yarn-env.sh
    export YARN_PID_DIR=/usr/local/hadoop-2.6.0/pid  


    接着初始化文件系统HDFS。
    bin/hdfs namenode -format //每次执行此命令要把dfs/data/文件清空
    成功的话,最后的提示如下,Exitting with status 0 表示成功,Exitting with status 1: 则是出错。

    sbin/start-dfs.sh
    sbin/start-yarn.sh
    

    开启Jobhistory

    sbin/mr-jobhistory-daemon.sh  start historyserver
    <a target=_blank href="http://master:19888/">http://master:19888/</a>


    Unable to load native-hadoop library for your platform这个提示,解决方式:
    1、重新编译源码后将新的lib/native替换到集群中原来的lib/native
    2、修改hadoop-env.sh ,增加
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib:$HADOOP_PREFIX/lib/native"

    Namenode information:http://localhost:50070来查看Hadoop的信息。

    All Applications:http://http://2xx.81.8x.1xx:8088/,将其中的2xx.81.8x.1xx替换为你的实际IP地址。


    运行例子:

    1.先在hdfs上建个文件夹  bin/hdfs dfs -mkdir -p /user/ha1/input

                                      bin/hdfs dfs -mkdir -p /user/ha1/output

    2.上传一些文件:bin/hdfs dfs -put etc/hadoop/  /user/ha1/input  把etc/hadoop文件上传到hdfs的/user/ha1/input中

    3.执行指令

    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep /user/ha1/input/hadoop  /user/ha1/output/temp 'dfs[a-z.]+'


    错误1:

    WARN hdfs.DFSClient: DataStreamer Exception
    org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/ha1/input/hadoop/yarn-env.sh._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

    //删除dfs/data/目录所有文件,关闭所有服务重新执行bin/hdfs namenode -format以及1,2步骤

    错误2:

    org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot delete /user/root/grep-temp-23493900. Name node is in safe mode.
    The reported blocks 188 has reached the threshold 0.9990 of total blocks 188. The number of live datanodes 2 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 10 seconds.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1364)

    操作太快,未退出安全模式,等退出安全模式

    4.查看结果

    bin/hdfs dfs -cat /user/ha1/output/temp/*

    8	dfs.audit.logger
    4	dfs.class
    3	dfs.server.namenode.
    2	dfs.audit.log.maxbackupindex
    2	dfs.period
    2	dfs.audit.log.maxfilesize
    1	dfsmetrics.log
    1	dfsadmin
    1	dfs.servers
    1	dfs.replication
    1	dfs.file
    1	dfs.datanode.data.dir
    1	dfs.namenode.name.dir
    终止一个job:

    hadoop job -kill job_1447903602796_0001
    
    
    
  • 相关阅读:
    在Centos 7下编译openwrt+njit-client
    开博随笔
    Chapter 6. Statements
    Chapter 4. Arrays and Pointers
    Chapter 3. Library Types
    Chapter 2.  Variables and Basic Types
    关于stm32不常用的中断,如何添加, 比如timer10 timer11等
    keil 报错 expected an identifier
    案例分析 串口的地不要接到电源上 会烧掉
    案例分析 CAN OPEN 调试记录 进度
  • 原文地址:https://www.cnblogs.com/ggzone/p/10121320.html
Copyright © 2011-2022 走看看