zoukankan      html  css  js  c++  java
  • 搭建Hadoop2.6.4伪分布式

    准备工作

    操作系统

    CentOS 7

    软件环境

    1. JDK 1.7.0_79 下载地址
    2. SSH,正常来说是系统自带的,若没有请自行搜索安装方法

    关闭防火墙

    systemctl stop firewalld.service #停止firewall
    systemctl disable firewalld.service #禁止firewall开机启动

    设置HostName

    [root@localhost ~]# hostname localhost

    安装环境

    安装JDK

    [root@localhost ~]# tar -xzvf jdk-7u79-linux-x64.tar.gz

    配置java环境变量

    [root@localhost ~]# vi /etc/profile
    #添加如下配置
    JAVA_HOME=/root/jdk1.7.0_79
    PATH=$JAVA_HOME/bin:$PATH
    CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    
    export JAVA_HOME
    export PATH
    export CLASSPATH
    

    验证java

    [root@localhost ~]# java -version
    java version "1.7.0_79"
    Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
    Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

    待输出以上内容时说明java已安装配置成功。

    安装Hadoop

    下载Hadoop 2.6.4

    安装Hadoop 2.6.4

    [root@localhost ~]# tar -xzvf hadoop-2.6.4.tar.gz

    配置Hadoop环境变量

    [root@localhost ~]# vim /etc/profile
    #添加以下配置
    export HADOOP_HOME=/root/hadoop-2.6.4
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    
    
    [root@localhost ~]# vim /root/hadoop-2.6.4/etc/hadoop/hadoop-env.sh
    #修改以下配置
    # The only required environment variable is JAVA_HOME.  All others are
    # optional.  When running a distributed configuration it is best to
    # set JAVA_HOME in this file, so that it is correctly defined on
    # remote nodes.
    
    # The java implementation to use.
    export JAVA_HOME=/root/jdk1.7.0_79

    验证Hadoop

    [root@localhost ~]# hadoop version
    Hadoop 2.6.4
    Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 5082c73637530b0b7e115f9625ed7fac69f937e6
    Compiled by jenkins on 2016-02-12T09:45Z
    Compiled with protoc 2.5.0
    From source with checksum 8dee2286ecdbbbc930a6c87b65cbc010
    This command was run using /root/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar

    修改Hadoop配置文件

    配置文件均存放在/root/hadoop-2.6.4/etc/hadoop

    <!-- core-site.xml-->
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
        </property>
    </configuration>
    
    
    <!-- hdfs-site.xml -->
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
    </configuration>
    
    
    <!-- mapred-site.xml -->
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>
    
    
    <!-- yarn-site.xml -->
    <configuration>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
    </configuration>

    SSH免密码登陆

    [root@localhost ~]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    [root@localhost ~]# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

    输入以下命令,如果不要求输入密码则表示配置成功:

    [root@localhost ~]# ssh localhost
    Last login: Fri May  6 05:17:32 2016 from 192.168.154.1

    执行Hadoop

    格式化hdfs

    [root@localhost ~]# hdfs namenode -format

    启动NameNode,DataNode和YARN

    [root@localhost ~]# start-dfs.sh
    Starting namenodes on [localhost]
    localhost: starting namenode, logging to /root/hadoop-2.6.4/logs/hadoop-root-namenode-localhost.out
    localhost: starting datanode, logging to /root/hadoop-2.6.4/logs/hadoop-root-datanode-localhost.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /root/hadoop-2.6.4/logs/hadoop-root-secondarynamenode-localhost.out
    
    [root@localhost ~]# start-yarn.sh
    starting yarn daemons
    starting resourcemanager, logging to /root/hadoop-2.6.4/logs/yarn-root-resourcemanager-localhost.out
    localhost: starting nodemanager, logging to /root/hadoop-2.6.4/logs/yarn-root-nodemanager-localhost.out

    向hdfs上传测试文件

    首先在/root/test中建立test1.txt和test2.txt,分别输入“hello world”和“hello hadoop”并保存。

    使用如下命令将文件上传至hdfs的input目录中:

    [root@localhost ~]# hadoop fs -put /root/test/ input
    [root@localhost ~]# hadoop fs -ls input
    Found 2 items
    -rw-r--r--   1 root supergroup         12 2016-05-06 06:35 input/test1.txt
    -rw-r--r--   1 root supergroup         13 2016-05-06 06:35 input/test2.txt

    执行wordcount demo

    输入以下命令并等待执行结果:

    [root@localhost ~]# hadoop jar /root/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount input output
    16/05/06 06:44:15 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    16/05/06 06:44:16 INFO input.FileInputFormat: Total input paths to process : 2
    16/05/06 06:44:17 INFO mapreduce.JobSubmitter: number of splits:2
    16/05/06 06:44:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1462530786445_0001
    16/05/06 06:44:18 INFO impl.YarnClientImpl: Submitted application application_1462530786445_0001
    16/05/06 06:44:18 INFO mapreduce.Job: The url to track the job: http://server1:8088/proxy/application_1462530786445_0001/
    16/05/06 06:44:18 INFO mapreduce.Job: Running job: job_1462530786445_0001
    16/05/06 06:44:33 INFO mapreduce.Job: Job job_1462530786445_0001 running in uber mode : false
    16/05/06 06:44:33 INFO mapreduce.Job:  map 0% reduce 0%
    16/05/06 06:44:52 INFO mapreduce.Job:  map 50% reduce 0%
    16/05/06 06:44:53 INFO mapreduce.Job:  map 100% reduce 0%
    16/05/06 06:45:03 INFO mapreduce.Job:  map 100% reduce 100%
    16/05/06 06:45:03 INFO mapreduce.Job: Job job_1462530786445_0001 completed successfully
    16/05/06 06:45:04 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=55
                    FILE: Number of bytes written=320242
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=249
                    HDFS: Number of bytes written=25
                    HDFS: Number of read operations=9
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters 
                    Launched map tasks=2
                    Launched reduce tasks=1
                    Data-local map tasks=2
                    Total time spent by all maps in occupied slots (ms)=34487
                    Total time spent by all reduces in occupied slots (ms)=7744
                    Total time spent by all map tasks (ms)=34487
                    Total time spent by all reduce tasks (ms)=7744
                    Total vcore-milliseconds taken by all map tasks=34487
                    Total vcore-milliseconds taken by all reduce tasks=7744
                    Total megabyte-milliseconds taken by all map tasks=35314688
                    Total megabyte-milliseconds taken by all reduce tasks=7929856
            Map-Reduce Framework
                    Map input records=2
                    Map output records=4
                    Map output bytes=41
                    Map output materialized bytes=61
                    Input split bytes=224
                    Combine input records=4
                    Combine output records=4
                    Reduce input groups=3
                    Reduce shuffle bytes=61
                    Reduce input records=4
                    Reduce output records=3
                    Spilled Records=8
                    Shuffled Maps =2
                    Failed Shuffles=0
                    Merged Map outputs=2
                    GC time elapsed (ms)=364
                    CPU time spent (ms)=3990
                    Physical memory (bytes) snapshot=515538944
                    Virtual memory (bytes) snapshot=2588155904
                    Total committed heap usage (bytes)=296755200
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters 
                    Bytes Read=25
            File Output Format Counters 
                    Bytes Written=25

    查看执行结果

    [root@localhost ~]# hadoop fs -ls output
    Found 2 items
    -rw-r--r--   1 root supergroup          0 2016-05-06 06:45 output/_SUCCESS
    -rw-r--r--   1 root supergroup         25 2016-05-06 06:45 output/part-r-00000
    [root@localhost ~]# hadoop fs -cat output/part-r-00000
    hadoop  1
    hello   2
    world   1

    至此,Pseudo-Distributed就已经完成了。

    完全分布式可参考这里

    原创文章,转载请注明: 转载自xdlysk的博客

    本文链接地址: 搭建Hadoop伪分布式[http://www.xdlysk.com/article/572c956642c817300e0f7ab1]

  • 相关阅读:
    Nginx 配置指令的执行顺序(一)
    缘起 --转
    Nginx 变量漫谈(八)
    Nginx 变量漫谈(七)
    Nginx 变量漫谈(六)
    Windows批量添加防火墙例外端口
    Neo4j 的一些使用心得
    一文教你用 Neo4j 快速构建明星关系图谱
    GemFire 入门篇1:GemFire 是什么?
    数据结构(逻辑结构,物理结构,特点)
  • 原文地址:https://www.cnblogs.com/xdlysk/p/5514082.html
Copyright © 2011-2022 走看看