zoukankan      html  css  js  c++  java
  • 运行Hadoop的示例程序WordCount-Running Hadoop Example

     

     
    In the last post we've installed Hadoop 2.2.0 on Ubuntu. Now we'll see how to launch an example mapreduce task on Hadoop. 

    In the Hadoop directory (which you should find at /opt/hadoop/2.2.0) you can find a JAR containing some examples: the exact path is $HADOOP_COMMON_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar . 
    This JAR contains different examples of mapreduce programs. We'll launch the WordCount program, which is the equivalent of "Hello, world" for MapReduce. This programs just count the occurrences of every single word of the file given as the input. 
    To run this example we need to prepare something. We assume that we have the HDFS service running; if we didn't create a user directory, we have to do it now (assuming the hadoop user we're using is mapred):
    $ hadoop fs -mkdir -p /user/mapred
    
    When we pass "fs" as the first argument to the hadoop command, we're telling hadoop to work on HDFS filesystem; in this case, we used the mkdir command as a switch to create a new directory on HDFS. 
    Now that our user has a home directory, we can create a directory that we'll use lo load the input file for the mapreduce programs:
    $ hadoop fs -mkdir inputdir
    
    We can check the result issuing a "ls" command on HDFS:
    $ hadoop fs -ls 
    Found 1 items
    drwxr-xr-x   - mapred mrusers        0 2014-02-11 22:54 inputdir
    
    Now we can decide which file we'll count the words of; in this example, I'll use the text of the novella Flatland by Edwin Abbot, which is freely available on gutemberg project for download:
    $ wget http://www.gutenberg.org/cache/epub/201/pg201.txt
    
    Now we can put this file onto the HDFS, more precisely into the inputdir dir we created a moment ago:
    $ hadoop fs -put pg201.txt inputdir
    
    The switch "-put" tells Hadoop to get the file from the machine's file system and to put it onto the HDFS filesystem. We can check that the file is really there:
    $ hadoop fs -ls inputdir
    Found 1 items
    drwxr-xr-x   - mapred mrusers        227368 2014-02-11 22:59 inputdir/pg201.txt
    

    Now we're ready to execute the MapReduce program. Hadoop tarball comes with a JAR containing the WordCount example; we can launch Hadoop with these parameters:
    • jar: we're telling Hadoop we want to execute a mapreduce program contained in a JAR
    • /opt/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar: this is the absolute path and filename of the JAR
    • wordcount: tells Hadoop which of the many examples contained in the JAR to run
    • inputdir: the directory on HDFS in which Hadoop can find the input file(s)
    • outputdir: the directory on HDFS in which Hadoop must write the result of the program
    $ hadoop jar /opt/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount inputdir outputdir
    
    and the output is:
    14/02/11 23:16:19 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    14/02/11 23:16:20 INFO input.FileInputFormat: Total input paths to process : 1
    14/02/11 23:16:20 INFO mapreduce.JobSubmitter: number of splits:1
    14/02/11 23:16:21 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
    14/02/11 23:16:21 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    14/02/11 23:16:21 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
    14/02/11 23:16:21 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
    14/02/11 23:16:21 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
    14/02/11 23:16:21 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
    14/02/11 23:16:21 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
    14/02/11 23:16:21 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
    14/02/11 23:16:21 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
    14/02/11 23:16:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    14/02/11 23:16:21 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
    14/02/11 23:16:21 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
    14/02/11 23:16:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1392155226604_0001
    14/02/11 23:16:22 INFO impl.YarnClientImpl: Submitted application application_1392155226604_0001 to ResourceManager at /0.0.0.0:8032
    14/02/11 23:16:23 INFO mapreduce.Job: The url to track the job: http://hadoop-VirtualBox:8088/proxy/application_1392155226604_0001/
    14/02/11 23:16:23 INFO mapreduce.Job: Running job: job_1392155226604_0001
    14/02/11 23:16:38 INFO mapreduce.Job: Job job_1392155226604_0001 running in uber mode : false
    14/02/11 23:16:38 INFO mapreduce.Job:  map 0% reduce 0%
    14/02/11 23:16:47 INFO mapreduce.Job:  map 100% reduce 0%
    14/02/11 23:16:57 INFO mapreduce.Job:  map 100% reduce 100%
    14/02/11 23:16:58 INFO mapreduce.Job: Job job_1392155226604_0001 completed successfully
    14/02/11 23:16:58 INFO mapreduce.Job: Counters: 43
     File System Counters
      FILE: Number of bytes read=121375
      FILE: Number of bytes written=401139
      FILE: Number of read operations=0
      FILE: Number of large read operations=0
      FILE: Number of write operations=0
      HDFS: Number of bytes read=227485
      HDFS: Number of bytes written=88461
      HDFS: Number of read operations=6
      HDFS: Number of large read operations=0
      HDFS: Number of write operations=2
     Job Counters 
      Launched map tasks=1
      Launched reduce tasks=1
      Data-local map tasks=1
      Total time spent by all maps in occupied slots (ms)=7693
      Total time spent by all reduces in occupied slots (ms)=7383
     Map-Reduce Framework
      Map input records=4239
      Map output records=37680
      Map output bytes=366902
      Map output materialized bytes=121375
      Input split bytes=117
      Combine input records=37680
      Combine output records=8341
      Reduce input groups=8341
      Reduce shuffle bytes=121375
      Reduce input records=8341
      Reduce output records=8341
      Spilled Records=16682
      Shuffled Maps =1
      Failed Shuffles=0
      Merged Map outputs=1
      GC time elapsed (ms)=150
      CPU time spent (ms)=5490
      Physical memory (bytes) snapshot=399077376
      Virtual memory (bytes) snapshot=1674149888
      Total committed heap usage (bytes)=314048512
     Shuffle Errors
      BAD_ID=0
      CONNECTION=0
      IO_ERROR=0
      WRONG_LENGTH=0
      WRONG_MAP=0
      WRONG_REDUCE=0
     File Input Format Counters 
      Bytes Read=227368
     File Output Format Counters 
      Bytes Written=88461
    
    The last part of the output is a summary of the execution of the mapreduce program; just before this, we can spot the "Job job_1392155226604_0001 completed successfully" line, which tells us the mapreduce program has been executed successfully. As told, Hadoop wrote the output onto the outputdir on HDFS; let's see what's inside this dir:
    $ hadoop fs -ls outputdir
    Found 2 items
    -rw-r--r--   1 mapred mrusers          0 2014-02-11 23:16 outputdir/_SUCCESS
    -rw-r--r--   1 mapred mrusers      88461 2014-02-11 23:16 outputdir/part-r-00000
    
    The presence of the _SUCCESS file confirms us the successful execution of the job; in the part-r-00000 Hadoop wrote the result of the execution. We can bring the file up to the filesystem of our machine using the "get" switch:
    $ hadoop fs -get outputdir/part-r-00000 .
    
    Now we can see the content of the file (this is a small subset of the whole file):
    ...
    leading 2
    leagues 1
    leaning 1
    leap    1
    leaped  1
    learn   7
    learned 1
    least   23
    least.  1
    leave   3
    leaves  3
    leaving 2
    lecture 1
    led     4
    left    9
    ...
    
    The wordcount program just count the occurrences of every single word and outputs it. 
    Well, we've successfully run our first mapreduce job on our Hadoop installation!
     
    from: http://andreaiacono.blogspot.com/2014/02/running-hadoop-example.html
  • 相关阅读:
    JQ选择器
    设计模式
    招银网络面试
    斗鱼面经
    招银科技面经
    用户访问网站基本流程
    shell的条件判断
    crontab -e 和/etc/crontab的区别
    秘钥对登录配置
    CentOS6 x86_64最小化安装优化脚本
  • 原文地址:https://www.cnblogs.com/GarfieldEr007/p/5281254.html
Copyright © 2011-2022 走看看