zoukankan      html  css  js  c++  java
  • Hadoop示例程序WordCount编译运行

    首先确保Hadoop已正确安装及运行。

    将WordCount.java拷贝出来

    $ cp ./src/examples/org/apache/hadoop/examples/WordCount.java /home/hadoop/

    在当前目录下创建一个存放WordCount.class的文件夹

    $ mkdir class

    编译WordCount.java

    $ javac -classpath /usr/local/hadoop/hadoop-core-0.20.203.0.jar:/usr/local/hadoop/lib/commons-cli-1.2.jar WordCount.java -d class

    编译完成后class文件夹下会出现一个org文件夹

    $ ls class
    org

    对编译好的class打包

    $ cd class
    $ jar cvf WordCount.jar *
    已添加清单
    正在添加: org/(输入 = 0) (输出 = 0)(存储了 0%)
    正在添加: org/apache/(输入 = 0) (输出 = 0)(存储了 0%)
    正在添加: org/apache/hadoop/(输入 = 0) (输出 = 0)(存储了 0%)
    正在添加: org/apache/hadoop/examples/(输入 = 0) (输出 = 0)(存储了 0%)
    正在添加: org/apache/hadoop/examples/WordCount$TokenizerMapper.class(输入 = 1790) (输出 = 765)(压缩了 57%)
    正在添加: org/apache/hadoop/examples/WordCount$IntSumReducer.class(输入 = 1793) (输出 = 746)(压缩了 58%)
    正在添加: org/apache/hadoop/examples/WordCount.class(输入 = 1911) (输出 = 996)(压缩了 47%)
    

    至此java文件的编译工作已经完成


    准备测试文件,启动Hadoop。

    由于运行Hadoop时指定的输入文件只能是HDFS文件系统里的文件,所以我们必须将要测试的文件从本地文件系统拷贝到HDFS文件系统中。

    $ hadoop fs -mkdir input
    $ hadoop fs -ls
    Found 1 items
    drwxr-xr-x   - hadoop supergroup          0 2014-03-26 10:39 /user/hadoop/input
    $ hadoop fs -put file input
    $ hadoop fs -ls input
    Found 1 items
    -rw-r--r--   2 hadoop supergroup         75 2014-03-26 10:40 /user/hadoop/input/file

    运行程序

    $ cd class
    $ ls
    org  WordCount.jar
    $ hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount input output
    14/03/26 10:57:39 INFO input.FileInputFormat: Total input paths to process : 1
    14/03/26 10:57:40 INFO mapred.JobClient: Running job: job_201403261015_0001
    14/03/26 10:57:41 INFO mapred.JobClient:  map 0% reduce 0%
    14/03/26 10:57:54 INFO mapred.JobClient:  map 100% reduce 0%
    14/03/26 10:58:06 INFO mapred.JobClient:  map 100% reduce 100%
    14/03/26 10:58:11 INFO mapred.JobClient: Job complete: job_201403261015_0001
    14/03/26 10:58:11 INFO mapred.JobClient: Counters: 25
    14/03/26 10:58:11 INFO mapred.JobClient:   Job Counters 
    14/03/26 10:58:11 INFO mapred.JobClient:     Launched reduce tasks=1
    14/03/26 10:58:11 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=12321
    14/03/26 10:58:11 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    14/03/26 10:58:11 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    14/03/26 10:58:11 INFO mapred.JobClient:     Launched map tasks=1
    14/03/26 10:58:11 INFO mapred.JobClient:     Data-local map tasks=1
    14/03/26 10:58:11 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10303
    14/03/26 10:58:11 INFO mapred.JobClient:   File Output Format Counters 
    14/03/26 10:58:11 INFO mapred.JobClient:     Bytes Written=51
    14/03/26 10:58:11 INFO mapred.JobClient:   FileSystemCounters
    14/03/26 10:58:11 INFO mapred.JobClient:     FILE_BYTES_READ=85
    14/03/26 10:58:11 INFO mapred.JobClient:     HDFS_BYTES_READ=184
    14/03/26 10:58:11 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=42541
    14/03/26 10:58:11 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=51
    14/03/26 10:58:11 INFO mapred.JobClient:   File Input Format Counters 
    14/03/26 10:58:11 INFO mapred.JobClient:     Bytes Read=75
    14/03/26 10:58:11 INFO mapred.JobClient:   Map-Reduce Framework
    14/03/26 10:58:11 INFO mapred.JobClient:     Reduce input groups=7
    14/03/26 10:58:11 INFO mapred.JobClient:     Map output materialized bytes=85
    14/03/26 10:58:11 INFO mapred.JobClient:     Combine output records=7
    14/03/26 10:58:11 INFO mapred.JobClient:     Map input records=1
    14/03/26 10:58:11 INFO mapred.JobClient:     Reduce shuffle bytes=0
    14/03/26 10:58:11 INFO mapred.JobClient:     Reduce output records=7
    14/03/26 10:58:11 INFO mapred.JobClient:     Spilled Records=14
    14/03/26 10:58:11 INFO mapred.JobClient:     Map output bytes=131
    14/03/26 10:58:11 INFO mapred.JobClient:     Combine input records=14
    14/03/26 10:58:11 INFO mapred.JobClient:     Map output records=14
    14/03/26 10:58:11 INFO mapred.JobClient:     SPLIT_RAW_BYTES=109
    14/03/26 10:58:11 INFO mapred.JobClient:     Reduce input records=7

    查看结果

    $ hadoop fs -ls
    Found 2 items
    drwxr-xr-x   - hadoop supergroup          0 2014-03-26 10:40 /user/hadoop/input
    drwxr-xr-x   - hadoop supergroup          0 2014-03-26 10:58 /user/hadoop/output

    可以发现hadoop中多了一个output文件,查看output中的文件信息

    $ hadoop fs -ls output
    Found 3 items
    -rw-r--r--   2 hadoop supergroup          0 2014-03-26 11:04 /user/hadoop/output/_SUCCESS
    drwxr-xr-x   - hadoop supergroup          0 2014-03-26 11:04 /user/hadoop/output/_logs
    -rw-r--r--   2 hadoop supergroup         65 2014-03-26 11:04 /user/hadoop/output/part-r-00000

    查看运行结果

    $ hadoop fs -cat output/part-r-00000
    Bye	3
    Hello	3
    Word	1
    World	3
    bye	1
    hello	2
    world	1

    至此,Hadoop下WordCount示例运行结束。

    如果还想运行一遍就需要把output文件夹删除,否则会报异常,如下

    14/03/26 11:41:30 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201403261015_0003
    Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory output already exists
    	at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:134)
    	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:830)
    	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at javax.security.auth.Subject.doAs(Subject.java:415)
    	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
    	at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
    	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494)
    	at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:601)
    	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

    删除output文件夹操作如下

    $ hadoop fs -rmr output
    Deleted hdfs://localhost:9000/user/hadoop/output


    也可以直接运行Hadoop示例中已经编译过的jar文件

    $ hadoop jar /usr/local/hadoop/hadoop-examples-0.20.203.0.jar wordcount input output
    14/03/28 17:02:33 INFO input.FileInputFormat: Total input paths to process : 2
    14/03/28 17:02:33 INFO mapred.JobClient: Running job: job_201403281439_0004
    14/03/28 17:02:34 INFO mapred.JobClient:  map 0% reduce 0%
    14/03/28 17:02:49 INFO mapred.JobClient:  map 100% reduce 0%
    14/03/28 17:03:01 INFO mapred.JobClient:  map 100% reduce 100%
    14/03/28 17:03:06 INFO mapred.JobClient: Job complete: job_201403281439_0004
    14/03/28 17:03:06 INFO mapred.JobClient: Counters: 25
    14/03/28 17:03:06 INFO mapred.JobClient:   Job Counters 
    14/03/28 17:03:06 INFO mapred.JobClient:     Launched reduce tasks=1
    14/03/28 17:03:06 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=17219
    14/03/28 17:03:06 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    14/03/28 17:03:06 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    14/03/28 17:03:06 INFO mapred.JobClient:     Launched map tasks=2
    14/03/28 17:03:06 INFO mapred.JobClient:     Data-local map tasks=2
    14/03/28 17:03:06 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10398
    14/03/28 17:03:06 INFO mapred.JobClient:   File Output Format Counters 
    14/03/28 17:03:06 INFO mapred.JobClient:     Bytes Written=65
    14/03/28 17:03:06 INFO mapred.JobClient:   FileSystemCounters
    14/03/28 17:03:06 INFO mapred.JobClient:     FILE_BYTES_READ=131
    14/03/28 17:03:06 INFO mapred.JobClient:     HDFS_BYTES_READ=343
    14/03/28 17:03:06 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=63840
    14/03/28 17:03:06 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=65
    14/03/28 17:03:06 INFO mapred.JobClient:   File Input Format Counters 
    14/03/28 17:03:06 INFO mapred.JobClient:     Bytes Read=124
    14/03/28 17:03:06 INFO mapred.JobClient:   Map-Reduce Framework
    14/03/28 17:03:06 INFO mapred.JobClient:     Reduce input groups=9
    14/03/28 17:03:06 INFO mapred.JobClient:     Map output materialized bytes=137
    14/03/28 17:03:06 INFO mapred.JobClient:     Combine output records=11
    14/03/28 17:03:06 INFO mapred.JobClient:     Map input records=2
    14/03/28 17:03:06 INFO mapred.JobClient:     Reduce shuffle bytes=85
    14/03/28 17:03:06 INFO mapred.JobClient:     Reduce output records=9
    14/03/28 17:03:06 INFO mapred.JobClient:     Spilled Records=22
    14/03/28 17:03:06 INFO mapred.JobClient:     Map output bytes=216
    14/03/28 17:03:06 INFO mapred.JobClient:     Combine input records=23
    14/03/28 17:03:06 INFO mapred.JobClient:     Map output records=23
    14/03/28 17:03:06 INFO mapred.JobClient:     SPLIT_RAW_BYTES=219
    14/03/28 17:03:06 INFO mapred.JobClient:     Reduce input records=11


    参考资料:http://www.cnblogs.com/aukle/p/3214984.html

                       http://blog.csdn.net/turkeyzhou/article/details/8121601

                       http://www.cnblogs.com/xia520pi/archive/2012/05/16/2504205.html


  • 相关阅读:
    ant 软件包不存在报错
    在 Internet Explorer 中使用 Windows 窗体控件
    智能客户端
    Back to the Future with Smart Clients
    "Automation 服务器不能创建对象" 的解决方案
    Top 10 Reasons for Developers to Create Smart Clients
    Updater Application Block for .NET
    Smart Client Application Model and the .NET Framework 1.1
    Security and Versioning Models in the Windows Forms Engine Help You Create and Deploy Smart Clients
    智能客户端技术总结(二)
  • 原文地址:https://www.cnblogs.com/liushaobo/p/4373737.html
Copyright © 2011-2022 走看看