zoukankan      html  css  js  c++  java
  • [Linux][Hadoop] 运行WordCount例子

    紧接上篇,完成Hadoop的安装并跑起来之后,是该运行相关例子的时候了,而最简单最直接的例子就是HelloWorld式的WordCount例子。

    参照博客进行运行:http://xiejianglei163.blog.163.com/blog/static/1247276201443152533684/

    首先创建一个文件夹,并创建两个文件,目录随意,为以下文件结构:

    examples

    --file1.txt

    --file2.txt

    文件内容随意填写,我是从新闻copy下来的一段英文:

    执行以下命令:

    hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -mkdir /data    #在hadoop中创建/data文件夹,该文件夹用来存放输入数据,这个文件不是Linux的根目录下的文件,而是hadoop下的文件夹
    hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -put -f ./data_input/* /data #将前面生成的两个 文件拷贝至/data下

    image

    执行WordCount命令,并查看结果:

    hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop jar ./share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.4.1-sources.jar org.apache.hadoop.examples.WordCount /data /output
    14/07/22 22:34:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    14/07/22 22:34:27 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    14/07/22 22:34:29 INFO input.FileInputFormat: Total input paths to process : 2
    14/07/22 22:34:29 INFO mapreduce.JobSubmitter: number of splits:2
    14/07/22 22:34:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406038146260_0001
    14/07/22 22:34:32 INFO impl.YarnClientImpl: Submitted application application_1406038146260_0001
    14/07/22 22:34:32 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1406038146260_0001/
    14/07/22 22:34:32 INFO mapreduce.Job: Running job: job_1406038146260_0001
    14/07/22 22:34:58 INFO mapreduce.Job: Job job_1406038146260_0001 running in uber mode : false
    14/07/22 22:34:58 INFO mapreduce.Job:  map 0% reduce 0%
    14/07/22 22:35:34 INFO mapreduce.Job:  map 100% reduce 0%
    14/07/22 22:35:52 INFO mapreduce.Job:  map 100% reduce 100%
    14/07/22 22:35:52 INFO mapreduce.Job: Job job_1406038146260_0001 completed successfully
    14/07/22 22:35:53 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=2521
                    FILE: Number of bytes written=283699
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=2280
                    HDFS: Number of bytes written=1710
                    HDFS: Number of read operations=9
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters 
                    Launched map tasks=2
                    Launched reduce tasks=1
                    Data-local map tasks=2
                    Total time spent by all maps in occupied slots (ms)=71182
                    Total time spent by all reduces in occupied slots (ms)=13937
                    Total time spent by all map tasks (ms)=71182
                    Total time spent by all reduce tasks (ms)=13937
                    Total vcore-seconds taken by all map tasks=71182
                    Total vcore-seconds taken by all reduce tasks=13937
                    Total megabyte-seconds taken by all map tasks=72890368
                    Total megabyte-seconds taken by all reduce tasks=14271488
            Map-Reduce Framework
                    Map input records=29
                    Map output records=274
                    Map output bytes=2814
                    Map output materialized bytes=2527
                    Input split bytes=202
                    Combine input records=274
                    Combine output records=195
                    Reduce input groups=190
                    Reduce shuffle bytes=2527
                    Reduce input records=195
                    Reduce output records=190
                    Spilled Records=390
                    Shuffled Maps =2
                    Failed Shuffles=0
                    Merged Map outputs=2
                    GC time elapsed (ms)=847
                    CPU time spent (ms)=6410
                    Physical memory (bytes) snapshot=426119168
                    Virtual memory (bytes) snapshot=1953292288
                    Total committed heap usage (bytes)=256843776
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters 
                    Bytes Read=2078
            File Output Format Counters 
                    Bytes Written=1710
    hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$

    上面的日志显示出了wordCount的详细情况,然后执行查看结果命令查看统计结果:

    hadoop@ubuntu:/usr/local/gz/hadoop-2.4.1$ ./bin/hadoop fs -cat /output/part-r-00000
    14/07/22 22:38:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    "as     1
    "atrocious,"    1
    -       1
    10-day  1
    13      1
    18      1
    20,     1
    2006.   1
    3,000   1
    432     1
    65      1
    7.4.52  1
    :help   2
    :help<Enter>    1
    :q<Enter>       1
    <F1>    1
    Already,        1
    Ban     1
    Benjamin        1

    后面省略了很多统计数据,wordCount统计结果完成。

  • 相关阅读:
    C#实现-浏览器UA解析获得手机、系统、浏览器等信息
    C#代码实现-冒泡排序
    C# DateTime 工具类
    net core 3.1 跨域 Cors 找不到 “Access-Control-Allow-Origin”
    C#/.Net开发入门篇(3)——console类的输入输出
    C#/.Net开发入门篇(2)——第一个控制台应用程序
    C#/.Net开发入门篇(1)——开发工具安装
    docker 学习笔记(2)--docker file命令
    docker 学习笔记(1)--常用命令
    导出大数据方法。批量导BOM
  • 原文地址:https://www.cnblogs.com/garinzhang/p/linux_hadoop_demo_wordcount.html
Copyright © 2011-2022 走看看