zoukankan      html  css  js  c++  java
  • Hadoop示例程序WordCount编译运行

    首先确保Hadoop已正确安装及运行。

    将WordCount.java拷贝出来

    $ cp ./src/examples/org/apache/hadoop/examples/WordCount.java /home/hadoop/

    在当前目录下创建一个存放WordCount.class的文件夹

    $ mkdir class

    编译WordCount.java

    $ javac -classpath /usr/local/hadoop/hadoop-core-0.20.203.0.jar:/usr/local/hadoop/lib/commons-cli-1.2.jar WordCount.java -d class

    编译完成后class文件夹下会出现一个org文件夹

    $ ls class
    org

    对编译好的class打包

    $ cd class
    $ jar cvf WordCount.jar *
    已添加清单
    正在添加: org/(输入 = 0) (输出 = 0)(存储了 0%)
    正在添加: org/apache/(输入 = 0) (输出 = 0)(存储了 0%)
    正在添加: org/apache/hadoop/(输入 = 0) (输出 = 0)(存储了 0%)
    正在添加: org/apache/hadoop/examples/(输入 = 0) (输出 = 0)(存储了 0%)
    正在添加: org/apache/hadoop/examples/WordCount$TokenizerMapper.class(输入 = 1790) (输出 = 765)(压缩了 57%)
    正在添加: org/apache/hadoop/examples/WordCount$IntSumReducer.class(输入 = 1793) (输出 = 746)(压缩了 58%)
    正在添加: org/apache/hadoop/examples/WordCount.class(输入 = 1911) (输出 = 996)(压缩了 47%)
    

    至此java文件的编译工作已经完成


    准备测试文件,启动Hadoop。

    由于运行Hadoop时指定的输入文件只能是HDFS文件系统里的文件,所以我们必须将要测试的文件从本地文件系统拷贝到HDFS文件系统中。

    $ hadoop fs -mkdir input
    $ hadoop fs -ls
    Found 1 items
    drwxr-xr-x   - hadoop supergroup          0 2014-03-26 10:39 /user/hadoop/input
    $ hadoop fs -put file input
    $ hadoop fs -ls input
    Found 1 items
    -rw-r--r--   2 hadoop supergroup         75 2014-03-26 10:40 /user/hadoop/input/file

    运行程序

    $ cd class
    $ ls
    org  WordCount.jar
    $ hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount input output
    14/03/26 10:57:39 INFO input.FileInputFormat: Total input paths to process : 1
    14/03/26 10:57:40 INFO mapred.JobClient: Running job: job_201403261015_0001
    14/03/26 10:57:41 INFO mapred.JobClient:  map 0% reduce 0%
    14/03/26 10:57:54 INFO mapred.JobClient:  map 100% reduce 0%
    14/03/26 10:58:06 INFO mapred.JobClient:  map 100% reduce 100%
    14/03/26 10:58:11 INFO mapred.JobClient: Job complete: job_201403261015_0001
    14/03/26 10:58:11 INFO mapred.JobClient: Counters: 25
    14/03/26 10:58:11 INFO mapred.JobClient:   Job Counters 
    14/03/26 10:58:11 INFO mapred.JobClient:     Launched reduce tasks=1
    14/03/26 10:58:11 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=12321
    14/03/26 10:58:11 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    14/03/26 10:58:11 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    14/03/26 10:58:11 INFO mapred.JobClient:     Launched map tasks=1
    14/03/26 10:58:11 INFO mapred.JobClient:     Data-local map tasks=1
    14/03/26 10:58:11 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10303
    14/03/26 10:58:11 INFO mapred.JobClient:   File Output Format Counters 
    14/03/26 10:58:11 INFO mapred.JobClient:     Bytes Written=51
    14/03/26 10:58:11 INFO mapred.JobClient:   FileSystemCounters
    14/03/26 10:58:11 INFO mapred.JobClient:     FILE_BYTES_READ=85
    14/03/26 10:58:11 INFO mapred.JobClient:     HDFS_BYTES_READ=184
    14/03/26 10:58:11 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=42541
    14/03/26 10:58:11 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=51
    14/03/26 10:58:11 INFO mapred.JobClient:   File Input Format Counters 
    14/03/26 10:58:11 INFO mapred.JobClient:     Bytes Read=75
    14/03/26 10:58:11 INFO mapred.JobClient:   Map-Reduce Framework
    14/03/26 10:58:11 INFO mapred.JobClient:     Reduce input groups=7
    14/03/26 10:58:11 INFO mapred.JobClient:     Map output materialized bytes=85
    14/03/26 10:58:11 INFO mapred.JobClient:     Combine output records=7
    14/03/26 10:58:11 INFO mapred.JobClient:     Map input records=1
    14/03/26 10:58:11 INFO mapred.JobClient:     Reduce shuffle bytes=0
    14/03/26 10:58:11 INFO mapred.JobClient:     Reduce output records=7
    14/03/26 10:58:11 INFO mapred.JobClient:     Spilled Records=14
    14/03/26 10:58:11 INFO mapred.JobClient:     Map output bytes=131
    14/03/26 10:58:11 INFO mapred.JobClient:     Combine input records=14
    14/03/26 10:58:11 INFO mapred.JobClient:     Map output records=14
    14/03/26 10:58:11 INFO mapred.JobClient:     SPLIT_RAW_BYTES=109
    14/03/26 10:58:11 INFO mapred.JobClient:     Reduce input records=7

    查看结果

    $ hadoop fs -ls
    Found 2 items
    drwxr-xr-x   - hadoop supergroup          0 2014-03-26 10:40 /user/hadoop/input
    drwxr-xr-x   - hadoop supergroup          0 2014-03-26 10:58 /user/hadoop/output

    可以发现hadoop中多了一个output文件,查看output中的文件信息

    $ hadoop fs -ls output
    Found 3 items
    -rw-r--r--   2 hadoop supergroup          0 2014-03-26 11:04 /user/hadoop/output/_SUCCESS
    drwxr-xr-x   - hadoop supergroup          0 2014-03-26 11:04 /user/hadoop/output/_logs
    -rw-r--r--   2 hadoop supergroup         65 2014-03-26 11:04 /user/hadoop/output/part-r-00000

    查看运行结果

    $ hadoop fs -cat output/part-r-00000
    Bye	3
    Hello	3
    Word	1
    World	3
    bye	1
    hello	2
    world	1

    至此,Hadoop下WordCount示例运行结束。

    如果还想运行一遍就需要把output文件夹删除,否则会报异常,如下

    14/03/26 11:41:30 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201403261015_0003
    Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory output already exists
    	at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:134)
    	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:830)
    	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at javax.security.auth.Subject.doAs(Subject.java:415)
    	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
    	at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
    	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494)
    	at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:601)
    	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

    删除output文件夹操作如下

    $ hadoop fs -rmr output
    Deleted hdfs://localhost:9000/user/hadoop/output


    也可以直接运行Hadoop示例中已经编译过的jar文件

    $ hadoop jar /usr/local/hadoop/hadoop-examples-0.20.203.0.jar wordcount input output
    14/03/28 17:02:33 INFO input.FileInputFormat: Total input paths to process : 2
    14/03/28 17:02:33 INFO mapred.JobClient: Running job: job_201403281439_0004
    14/03/28 17:02:34 INFO mapred.JobClient:  map 0% reduce 0%
    14/03/28 17:02:49 INFO mapred.JobClient:  map 100% reduce 0%
    14/03/28 17:03:01 INFO mapred.JobClient:  map 100% reduce 100%
    14/03/28 17:03:06 INFO mapred.JobClient: Job complete: job_201403281439_0004
    14/03/28 17:03:06 INFO mapred.JobClient: Counters: 25
    14/03/28 17:03:06 INFO mapred.JobClient:   Job Counters 
    14/03/28 17:03:06 INFO mapred.JobClient:     Launched reduce tasks=1
    14/03/28 17:03:06 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=17219
    14/03/28 17:03:06 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    14/03/28 17:03:06 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    14/03/28 17:03:06 INFO mapred.JobClient:     Launched map tasks=2
    14/03/28 17:03:06 INFO mapred.JobClient:     Data-local map tasks=2
    14/03/28 17:03:06 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10398
    14/03/28 17:03:06 INFO mapred.JobClient:   File Output Format Counters 
    14/03/28 17:03:06 INFO mapred.JobClient:     Bytes Written=65
    14/03/28 17:03:06 INFO mapred.JobClient:   FileSystemCounters
    14/03/28 17:03:06 INFO mapred.JobClient:     FILE_BYTES_READ=131
    14/03/28 17:03:06 INFO mapred.JobClient:     HDFS_BYTES_READ=343
    14/03/28 17:03:06 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=63840
    14/03/28 17:03:06 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=65
    14/03/28 17:03:06 INFO mapred.JobClient:   File Input Format Counters 
    14/03/28 17:03:06 INFO mapred.JobClient:     Bytes Read=124
    14/03/28 17:03:06 INFO mapred.JobClient:   Map-Reduce Framework
    14/03/28 17:03:06 INFO mapred.JobClient:     Reduce input groups=9
    14/03/28 17:03:06 INFO mapred.JobClient:     Map output materialized bytes=137
    14/03/28 17:03:06 INFO mapred.JobClient:     Combine output records=11
    14/03/28 17:03:06 INFO mapred.JobClient:     Map input records=2
    14/03/28 17:03:06 INFO mapred.JobClient:     Reduce shuffle bytes=85
    14/03/28 17:03:06 INFO mapred.JobClient:     Reduce output records=9
    14/03/28 17:03:06 INFO mapred.JobClient:     Spilled Records=22
    14/03/28 17:03:06 INFO mapred.JobClient:     Map output bytes=216
    14/03/28 17:03:06 INFO mapred.JobClient:     Combine input records=23
    14/03/28 17:03:06 INFO mapred.JobClient:     Map output records=23
    14/03/28 17:03:06 INFO mapred.JobClient:     SPLIT_RAW_BYTES=219
    14/03/28 17:03:06 INFO mapred.JobClient:     Reduce input records=11


    参考资料:http://www.cnblogs.com/aukle/p/3214984.html

                       http://blog.csdn.net/turkeyzhou/article/details/8121601

                       http://www.cnblogs.com/xia520pi/archive/2012/05/16/2504205.html


  • 相关阅读:
    21个高质量的Swift开源iOS App
    浅谈 JavaScriptCore
    开发完 iOS 应用,接下去你该做的事
    Xcode8的调试技能Memory Graph 实战解决闭包引用循环问题
    减肥App计划
    在管理实际中,心态很重要,当你以欣赏的态度去看一件事,你便会看到许多优点,以批评的态度,你便会看到无数缺点。
    怎样做才是一个独立自主的人?
    《圣经、》箴言篇13.3
    做事情需要坚持需要毅力更加需要观察和方法。(人生会遭遇许多事,其中很多是难以解决的,这时心中被盘根错结的烦恼纠缠住,茫茫然不知如何面对?如果能静下心來思考,往往会恍然大悟。 )
    10000单词积累,从今天开始(待续)。。。
  • 原文地址:https://www.cnblogs.com/liushaobo/p/4373737.html
Copyright © 2011-2022 走看看