zoukankan      html  css  js  c++  java
  • 详细版在虚拟机安装和使用hadoop分布式集群

      集群模式:

      一台master   192.168.85.2

      一台slave    192.168.85.3 

      jdk      jdk1.8.0_74(版本不重要,看喜欢)

      hadoop版本  2.7.2(版本不重要,2.*都差不多)

      本文从安装Ubuntu14.04后开始一步步搭建hadoop集群:

      简单说一下虚拟机linux系统的搭建:

      因为要搭建hadoop集群,所以预计至少两台虚拟机,这个不着急,我们可以布置一台然后克隆出另一台,然后稍微改动一下配置

      我用的镜像是ubuntu-14.04.3-server-amd64.iso,为了主机连接和网络连接建立两个网卡,相关内容可以查看另外一篇博文:本机上搭建虚拟机的网络玩法,安装过程中注意安装openssh服务就好了,安装好之后可以用工具ssh到虚拟机上面操作更方便.安装的时候可以直接指定主机名为master比较好识别,用户名指定为hadoop

      1.安装jdk

      查看是否安装jdk

    java -version

      如果未安装参考:Ubuntu系统如何卸载并安装新版本的jdk(permission denied问题),已安装则跳过此步

      2.下载hadoop

      我下载的地址http://mirror.bit.edu.cn/apache/hadoop/common/,上面会有很多种版本可以选择,对试用来说都是一样的.随便下一个

      通过ftp或者ssh传送到虚拟机上解压:

    tar zxvf hadoop-2.7.2.tar.gz

      重命名:

    mv hadoop-2.7.2 hadoop

      查看安装目录:

    hadoop@master:~/hadoop$ pwd
    /home/hadoop/hadoop

      接下来配置多个配置文件,配置文件集中在安装目录下的的etc/hadoop下,我们将目录切换到该目录下方便操作,我将配置的内容贴出来:

      slaves文件

    vi slaves

      内容改为

    master

      core-site.xml文件

    vi core-site.xml

      在<configuration>标签中添加如下内容:

    <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/hadoop/hadoop/tmp</value>
            <description>Abase for other temporary directories.</description>
        </property>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://master:9000</value>
        </property>

      hdfs-site.xml

    vi hdfs-site.xml

      添加:

    <property>
        <name>dfs.name.dir</name>
        <value>/home/hadoop/hadoop/dfs/name</value>
        <description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
    </property>
    
    <property>
        <name>dfs.data.dir</name>
        <value>/home/hadoop/hadoop/dfs/data</value>
        <description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

      mapred-site.xml,这个文件需要从模板中复制一份过来:

    cp mapred-site.xml.template mapred-site.xml 
    vi mapred-site.xml

      添加

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

      yarn-site.xml

    vi yarn-site.xml

      添加:

    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

      到这边理论上hadoop已经可以跑了.但是在hadoop启动过程中因为脚本的限定可能会报一些环境配置错误,我经过实践为了一绝后患先将该配置的东西配置好

      首先是的java_home配置:

    vi hadoop-env.sh

      修改

    export JAVA_HOME=/home/hadoop/jdk1.8.0_74

      然后添加hadoop环境变量配置(在java配置下面添加就行):

    export HADOOP_DEV_HOME=/home/hadoop/hadoop
    export PATH=$PATH:$HADOOP_DEV_HOME/bin
    export PATH=$PATH:$HADOOP_DEV_HOME/sbin
    export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}
    export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
    export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
    export YARN_HOME=${HADOOP_DEV_HOME}
    export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
    export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
    export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop

      保存更改

    source ~/.bashrc

      格式化hdfs

    bin/hdfs namenode -format

      在克隆虚拟机之前先将主机配置好.

    vi /etc/hosts

      修改

    127.0.0.1       localhost
    #127.0.1.1      master
    192.168.85.2    master
    192.168.85.3    slave1

      克隆虚拟机,并启动克隆的机器.修改主机名和ip

    vi /etc/hostname

      修改为slave1

    vi /etc/network/interfaces

      看到

    # This file describes the network interfaces available on your system
    # and how to activate them. For more information, see interfaces(5).
    
    # The loopback network interface
    auto lo
    iface lo inet loopback
    
    # The primary network interface
    auto eth0
    iface eth0 inet dhcp
    auto eth1
    iface eth1 inet static
    address 192.168.85.2
    netmask 255.255.255.0

      修改为

    # This file describes the network interfaces available on your system
    # and how to activate them. For more information, see interfaces(5).
    
    # The loopback network interface
    auto lo
    iface lo inet loopback
    
    # The primary network interface
    auto eth0
    iface eth0 inet dhcp
    auto eth1
    iface eth1 inet static
    address 192.168.85.3
    netmask 255.255.255.0

      重启机器

      到此,hadoop集群就搭建完了.

      安装两台机器后,需要让master无密码登录到slave上面

    ssh localhost
    cd ~/.ssh 
    ssh-keygen -t rsa 

      一直确认即可;

      Master 节点需能无密码 ssh 本机,这一步还是在 Master 节点上执行:

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

      完成后可以使用 ssh Master 验证一下。接着将公匙传输到 Slave1 节点:

    scp ~/.ssh/id_rsa.pub hadoop@Slave1:/home/hadoop/

      scp时会要求输入Slave1上hadoop用户的密码(hadoop),输入完成后会提示传输完毕。

      接着在 Slave1节点 上将ssh公匙保存到相应位置,执行

    cat ~/id_rsa.pub >> ~/.ssh/authorized_keys

      我们来测试一下是否可以运行:

    sbin/start-dfs.sh
    sbin/start-yarn.sh

      这个命令启动了master和slave上面的东西,用jps查看内容

    hadoop@master:~$ jps
    1632 SecondaryNameNode
    4581 Jps
    1782 ResourceManager
    1402 NameNode

      在slave1中执行jps

    4586 Jps
    3210 DataNode
    3356 NodeManager 

      登录http://192.168.85.2:50070/可以看到master和slave的分布以及启动状况

       执行经典案例wordcount.

      新建一个text1.txt并上传到集群

    cd
    mkdir input
    cd input
    echo "hello world" > test1.txt
    hadoop fs –mkdir input 

      最后一条命令可能会报错,报错找不到input文件夹,那是因为hdfs初始化还没有根目录,加上/就好了

    hadoop fs –mkdir /input

      查看文件:

    hadoop@master:~/hadoop$ hadoop fs -ls /
    Found 1 items
    drwxr-xr-x   - hadoop supergroup          0 2016-05-10 10:36 /input

      上传文件到input中并查看

    hadoop@master:~/hadoop$ hadoop fs -put ../input/*.txt /input
    hadoop@master:~/hadoop$ hadoop fs -ls /
    Found 1 items
    drwxr-xr-x - hadoop supergroup 0 2016-05-10 10:38 /input
    hadoop@master:~/hadoop$ hadoop fs -ls /input
    Found 2 items
    -rw-r--r-- 1 hadoop supergroup 12 2016-05-10 10:38 /input/test1.txt
    -rw-r--r-- 1 hadoop supergroup 13 2016-05-10 10:38 /input/test2.txt

      接下来就是用hadoop自带的一个脚本运行该文件,计算单词数

    hadoop/bin/hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input/test1.txt output2

      格式是hadoop脚本+jar命令+jar脚本+方法+输入文件+输出文件.

      job开始执行输出

    16/05/10 10:44:14 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.85.2:8032
    16/05/10 10:44:15 INFO input.FileInputFormat: Total input paths to process : 1
    16/05/10 10:44:15 INFO mapreduce.JobSubmitter: number of splits:1
    16/05/10 10:44:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1462879083278_0001
    16/05/10 10:44:16 INFO impl.YarnClientImpl: Submitted application application_1462879083278_0001
    16/05/10 10:44:16 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1462879083278_0001/
    16/05/10 10:44:16 INFO mapreduce.Job: Running job: job_1462879083278_0001
    16/05/10 10:44:30 INFO mapreduce.Job: Job job_1462879083278_0001 running in uber mode : false
    16/05/10 10:44:30 INFO mapreduce.Job:  map 0% reduce 0%
    16/05/10 10:44:40 INFO mapreduce.Job:  map 100% reduce 0%
    16/05/10 10:44:47 INFO mapreduce.Job:  map 100% reduce 100%
    16/05/10 10:44:47 INFO mapreduce.Job: Job job_1462879083278_0001 completed successfully
    16/05/10 10:44:47 INFO mapreduce.Job: Counters: 49
        File System Counters
            FILE: Number of bytes read=30
            FILE: Number of bytes written=234875
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=111
            HDFS: Number of bytes written=16
            HDFS: Number of read operations=6
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=1
            Launched reduce tasks=1
            Data-local map tasks=1
            Total time spent by all maps in occupied slots (ms)=7470
            Total time spent by all reduces in occupied slots (ms)=4602
            Total time spent by all map tasks (ms)=7470
            Total time spent by all reduce tasks (ms)=4602
            Total vcore-milliseconds taken by all map tasks=7470
            Total vcore-milliseconds taken by all reduce tasks=4602
            Total megabyte-milliseconds taken by all map tasks=7649280
            Total megabyte-milliseconds taken by all reduce tasks=4712448
        Map-Reduce Framework
            Map input records=1
            Map output records=2
            Map output bytes=20
            Map output materialized bytes=30
            Input split bytes=99
            Combine input records=2
            Combine output records=2
            Reduce input groups=2
            Reduce shuffle bytes=30
            Reduce input records=2
            Reduce output records=2
            Spilled Records=4
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=229
            CPU time spent (ms)=2730
            Physical memory (bytes) snapshot=298352640
            Virtual memory (bytes) snapshot=3748110336
            Total committed heap usage (bytes)=139145216
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=12
        File Output Format Counters 
            Bytes Written=16

      http://master:8088/proxy/application_1462879083278_0001/可以查看当前job的运行状态,在运行过程中可以查看.看到map 100% reduce 100%就是运行成功了,可以登录http://192.168.85.2:8088/cluster查看具体信息

      最后关闭hadoop集群

    sbin/stop-dfs.sh
    sbin/stop-yarn.sh

      是不是很简单呢.

      

      

      

      

     

      

  • 相关阅读:
    Monkeyrunner介绍
    monkeyrunner 简单用例编写
    Android中如何查看内存
    Android内存之VSS/RSS/PSS/USS
    Android内存泄露(全自动篇)
    生成hprof文件,用MAT进行分析
    OpenGL入门学习【转】
    Vim保存代码折叠信息【转】
    windows下Cscope【转】
    Ruby学习笔记:Fiber
  • 原文地址:https://www.cnblogs.com/garfieldcgf/p/5478043.html
Copyright © 2011-2022 走看看