zoukankan      html  css  js  c++  java
  • Hadoop学习历程(四、运行一个真正的MapReduce程序)

    上次的程序只是操作文件系统,本次运行一个真正的MapReduce程序。

    运行的是官方提供的例子程序wordcount,这个例子类似其他程序的hello world。

    1. 首先确认启动的正常:运行 start-all.sh

    2. 执行jps命令检查:NameNode,DateNode,SecondaryNameNode,ResourceManager,NodeManager是否已经启动正常。这里我遇到了一个问题,NodeManager没有正常启动。错误信息如下:

    2014-01-07 13:46:21,442 FATAL org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Failed to initialize mapreduce.shuffle
    java.lang.IllegalArgumentException: The ServiceName: mapreduce.shuffle set in yarn.nodemanager.aux-services is invalid.The valid service name should only contain a-zA-Z0-9_ and can not start with numbers
            at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
            at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:98)
            at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
            at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
            at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:218)
            at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
            at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:188)
            at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:338)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:386)

    经过检查,是配置文件中有点错误,请修改yarn-site.xml文件,更改为如下内容(原因不明)

        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>

    3. 准备数据:在hadoop文件系统中增加input/file1.txt和input/file2.txt

    [root@dbserver mapreduce]# hadoop fs -ls /input
    Found 2 items
    -rw-r--r--   1 root supergroup         12 2013-12-06 16:22 /input/file1.txt
    -rw-r--r--   1 root supergroup         13 2013-12-06 16:22 /input/file2.txt
    [root@dbserver mapreduce]# hadoop fs -cat /input/file1.txt
    Hello World
    [root@dbserver mapreduce]# hadoop fs -cat /input/file2.txt
    Hello Hadoop

    4. 例子程序的位置在:/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar

    hadoop jar ./hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output

    画面输出内容

    14/01/07 14:00:37 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
    14/01/07 14:00:38 INFO input.FileInputFormat: Total input paths to process : 2
    14/01/07 14:00:38 INFO mapreduce.JobSubmitter: number of splits:2
    14/01/07 14:00:38 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
    14/01/07 14:00:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    14/01/07 14:00:38 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
    14/01/07 14:00:38 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
    14/01/07 14:00:38 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
    14/01/07 14:00:38 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
    14/01/07 14:00:38 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
    14/01/07 14:00:38 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
    14/01/07 14:00:38 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
    14/01/07 14:00:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    14/01/07 14:00:38 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
    14/01/07 14:00:38 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
    14/01/07 14:00:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1389074273046_0001
    14/01/07 14:00:38 INFO impl.YarnClientImpl: Submitted application application_1389074273046_0001 to ResourceManager at localhost/127.0.0.1:8032
    14/01/07 14:00:38 INFO mapreduce.Job: The url to track the job: http://dbserver:8088/proxy/application_1389074273046_0001/
    14/01/07 14:00:38 INFO mapreduce.Job: Running job: job_1389074273046_0001
    14/01/07 14:00:48 INFO mapreduce.Job: Job job_1389074273046_0001 running in uber mode : false
    14/01/07 14:00:48 INFO mapreduce.Job:  map 0% reduce 0%
    14/01/07 14:00:58 INFO mapreduce.Job:  map 100% reduce 0%
    14/01/07 14:01:04 INFO mapreduce.Job:  map 100% reduce 100%
    14/01/07 14:01:05 INFO mapreduce.Job: Job job_1389074273046_0001 completed successfully
    14/01/07 14:01:05 INFO mapreduce.Job: Counters: 43
            File System Counters
                    FILE: Number of bytes read=55
                    FILE: Number of bytes written=236870
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=229
                    HDFS: Number of bytes written=25
                    HDFS: Number of read operations=9
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters
                    Launched map tasks=2
                    Launched reduce tasks=1
                    Data-local map tasks=2
                    Total time spent by all maps in occupied slots (ms)=15178
                    Total time spent by all reduces in occupied slots (ms)=4384
            Map-Reduce Framework
                    Map input records=2
                    Map output records=4
                    Map output bytes=41
                    Map output materialized bytes=61
                    Input split bytes=204
                    Combine input records=4
                    Combine output records=4
                    Reduce input groups=3
                    Reduce shuffle bytes=61
                    Reduce input records=4
                    Reduce output records=3
                    Spilled Records=8
                    Shuffled Maps =2
                    Failed Shuffles=0
                    Merged Map outputs=2
                    GC time elapsed (ms)=108
                    CPU time spent (ms)=2200
                    Physical memory (bytes) snapshot=568229888
                    Virtual memory (bytes) snapshot=2566582272
                    Total committed heap usage (bytes)=392298496
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters
                    Bytes Read=25
            File Output Format Counters
                    Bytes Written=25

    5. 查看运行结果:

    [root@dbserver mapreduce]# hadoop fs -ls /output
    Found 2 items
    -rw-r--r--   1 root supergroup          0 2014-01-07 14:01 /output/_SUCCESS
    -rw-r--r--   1 root supergroup         25 2014-01-07 14:01 /output/part-r-00000
    [root@dbserver mapreduce]# hadoop fs -cat /output/part-r-00000
    Hadoop  1
    Hello   2
    World   1




  • 相关阅读:
    windows下使用vscode编写运行以及调试C/C++
    nginx基础模块
    Windows下配置nginx+php(wnmp)
    快速创建 Vue 项目
    你真的会玩SQL吗?冷落的Top和Apply
    你真的会玩SQL吗?透视转换的艺术
    你真的会玩SQL吗?你所不知道的 数据聚合
    你真的会玩SQL吗?简单的数据修改
    你真的会玩SQL吗?表表达式,排名函数
    你真的会玩SQL吗?Case也疯狂
  • 原文地址:https://www.cnblogs.com/hutou/p/Hadoop4.html
Copyright © 2011-2022 走看看