zoukankan      html  css  js  c++  java
  • [Linux][Hadoop] 运行WordCount例子

    紧接上篇,完成Hadoop的安装并跑起来之后,是该运行相关例子的时候了,而最简单最直接的例子就是HelloWorld式的WordCount例子。

    参照博客进行运行:http://xiejianglei163.blog.163.com/blog/static/1247276201443152533684/

    首先创建一个文件夹,并创建两个文件,目录随意,为以下文件结构:

    examples

    --file1.txt

    --file2.txt

    文件内容随意填写,我是从新闻copy下来的一段英文:

    执行以下命令:

    hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -mkdir /data    #在hadoop中创建/data文件夹,该文件夹用来存放输入数据,这个文件不是Linux的根目录下的文件,而是hadoop下的文件夹
    hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -put -f ./data_input/* /data #将前面生成的两个 文件拷贝至/data下

    image

    执行WordCount命令,并查看结果:

    hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop jar ./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.4.1-sources.jar org.apache.hadoop.examples.WordCount /data /output
    14/07/22 22:34:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    14/07/22 22:34:27 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    14/07/22 22:34:29 INFO input.FileInputFormat: Total input paths to process : 2
    14/07/22 22:34:29 INFO mapreduce.JobSubmitter: number of splits:2
    14/07/22 22:34:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406038146260_0001
    14/07/22 22:34:32 INFO impl.YarnClientImpl: Submitted application application_1406038146260_0001
    14/07/22 22:34:32 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1406038146260_0001/
    14/07/22 22:34:32 INFO mapreduce.Job: Running job: job_1406038146260_0001
    14/07/22 22:34:58 INFO mapreduce.Job: Job job_1406038146260_0001 running in uber mode : false
    14/07/22 22:34:58 INFO mapreduce.Job:  map 0% reduce 0%
    14/07/22 22:35:34 INFO mapreduce.Job:  map 100% reduce 0%
    14/07/22 22:35:52 INFO mapreduce.Job:  map 100% reduce 100%
    14/07/22 22:35:52 INFO mapreduce.Job: Job job_1406038146260_0001 completed successfully
    14/07/22 22:35:53 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=2521
                    FILE: Number of bytes written=283699
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=2280
                    HDFS: Number of bytes written=1710
                    HDFS: Number of read operations=9
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters 
                    Launched map tasks=2
                    Launched reduce tasks=1
                    Data-local map tasks=2
                    Total time spent by all maps in occupied slots (ms)=71182
                    Total time spent by all reduces in occupied slots (ms)=13937
                    Total time spent by all map tasks (ms)=71182
                    Total time spent by all reduce tasks (ms)=13937
                    Total vcore-seconds taken by all map tasks=71182
                    Total vcore-seconds taken by all reduce tasks=13937
                    Total megabyte-seconds taken by all map tasks=72890368
                    Total megabyte-seconds taken by all reduce tasks=14271488
            Map-Reduce Framework
                    Map input records=29
                    Map output records=274
                    Map output bytes=2814
                    Map output materialized bytes=2527
                    Input split bytes=202
                    Combine input records=274
                    Combine output records=195
                    Reduce input groups=190
                    Reduce shuffle bytes=2527
                    Reduce input records=195
                    Reduce output records=190
                    Spilled Records=390
                    Shuffled Maps =2
                    Failed Shuffles=0
                    Merged Map outputs=2
                    GC time elapsed (ms)=847
                    CPU time spent (ms)=6410
                    Physical memory (bytes) snapshot=426119168
                    Virtual memory (bytes) snapshot=1953292288
                    Total committed heap usage (bytes)=256843776
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters 
                    Bytes Read=2078
            File Output Format Counters 
                    Bytes Written=1710
    hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$

    上面的日志显示出了wordCount的详细情况,然后执行查看结果命令查看统计结果:

    hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -cat /output/part-r-00000
    14/07/22 22:38:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    "as     1
    "atrocious,"    1
    -       1
    10-day  1
    13      1
    18      1
    20,     1
    2006.   1
    3,000   1
    432     1
    65      1
    7.4.52  1
    :help   2
    :help<Enter>    1
    :q<Enter>       1
    <F1>    1
    Already,        1
    Ban     1
    Benjamin        1

    后面省略了很多统计数据,wordCount统计结果完成。

  • 相关阅读:
    Javascript FP-ramdajs
    微信小程序开发
    SPA for HTML5
    One Liners to Impress Your Friends
    Sass (Syntactically Awesome StyleSheets)
    iOS App Icon Template 5.0
    React Native Life Cycle and Communication
    Meteor framework
    RESTful Mongodb
    Server-sent Events
  • 原文地址:https://www.cnblogs.com/garinzhang/p/linux_hadoop_demo_wordcount.html
Copyright © 2011-2022 走看看