zoukankan      html  css  js  c++  java
  • MapReduce_wordcount

    测试数据:

    [hadoop@h201 mapreduce]$ more counttext.txt
    hello mama
    hello baba
    hello word
    cai wen wei
    mama baba jiejie gege
    gege jiejie didi
    meimei jiejie
    didi mama
    ayi shushu
    ayi mama
    hello mama
    hello baba
    hello word
    cai wen wei
    mama baba jiejie gege
    gege jiejie didi
    meimei jiejie
    didi mama
    ayi shushu
    ayi mama
    hello mama
    hello baba
    hello word
    cai wen wei
    mama baba jiejie gege
    gege jiejie didi
    meimei jiejie
    didi mama
    ayi shushu
    ayi mama
    hello mama
    hello baba
    hello word
    cai wen wei
    mama baba jiejie gege
    gege jiejie didi
    meimei jiejie
    didi mama
    ayi shushu
    ayi mama
    hello mama
    hello baba
    hello word
    cai wen wei
    mama baba jiejie gege
    gege jiejie didi
    meimei jiejie
    didi mama
    ayi shushu
    ayi mama

    vim WordCount2.java

     1 package MapReduce;
     2 
     3 import java.io.*;
     4 import org.apache.hadoop.conf.Configuration;
     5 import org.apache.hadoop.fs.Path;
     6 import org.apache.hadoop.io.IntWritable;
     7 import org.apache.hadoop.io.Text;
     8 import org.apache.hadoop.mapreduce.Job;
     9 import org.apache.hadoop.mapreduce.Mapper;
    10 import org.apache.hadoop.mapreduce.Reducer;
    11 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    12 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    13 
    14 public class WordCount2{
        private static final String INPUT_PATH = "hdfs://h201:9000/user/hadoop/counttext.txt";
          private static final String OUTPUT_PATH = "hdfs://h201:9000/user/hadoop/output";
    15 public static class WordCount2Mapper extends Mapper<Object,Text,Text,IntWritable>{ 16 private final static IntWritable one = new IntWritable(1); 17 private Text word = new Text(); 18 19 public void map(Object key,Text value,Context context) throws IOException, InterruptedException { 20 String[] words = value.toString().split(" "); 21 for (String str: words){ 22 word.set(str); 23 context.write(word,one); 24 } 25 } 26 } 27 28 public static class WordCount2Reducer extends Reducer<Text,IntWritable,Text,IntWritable> { 29 public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException { 30 int total=0; 31 for (IntWritable val : values){ 32 total++; 33 } 34 context.write(key, new IntWritable(total)); 35 } 36 } 37 38 public static void main (String[] args) throws Exception{ 39 Configuration conf = new Configuration(); 40 conf.set("mapred.jar","wc1.jar"); 41 Job job = new Job(conf, "wordcount"); 42 job.setJarByClass(WordCount2.class); 43 job.setMapperClass(WordCount2Mapper.class); 44 job.setReducerClass(WordCount2Reducer.class); 45 job.setOutputKeyClass(Text.class); 46 job.setOutputValueClass(IntWritable.class); 47 FileInputFormat.addInputPath(job, new Path(args[0])); 48 FileOutputFormat.setOutputPath(job, new Path(args[1])); 49 //FileInputFormat.addInputPath(job, new Path(INPUT_PATH));addInputPaths多路径50 //FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH)); 51 System.exit(job.waitForCompletion(true) ? 0 : 1); 52 } 53 }

    [hadoop@h201 mapreduce]$ /usr/jdk1.7.0_25/bin/javac WordCount2.java
    Note: WordCount2.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    [hadoop@h201 mapreduce]$ ls
    counttext.txt  WordCount2.class  WordCount2.java  WordCount2$WordCount2Mapper.class  WordCount2$WordCount2Reducer.class
    [hadoop@h201 mapreduce]$ /usr/jdk1.7.0_25/bin/jar cvf wc1.jar WordCount2*class
    added manifest
    adding: WordCount2.class(in = 1531) (out= 815)(deflated 46%)
    adding: WordCount2$WordCount2Mapper.class(in = 1831) (out= 783)(deflated 57%)
    adding: WordCount2$WordCount2Reducer.class(in = 1623) (out= 670)(deflated 58%)
    [hadoop@h201 mapreduce]$ ls
    counttext.txt  wc1.jar  WordCount2.class  WordCount2.java  WordCount2$WordCount2Mapper.class  WordCount2$WordCount2Reducer.class
    [hadoop@h201 mapreduce]$ hadoop jar wc1.jar WordCount2 hdfs://h201:9000/user/hadoop/counttext.txt hdfs://h201:9000/user/hadoop/output
    18/03/09 23:33:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    18/03/09 23:33:39 INFO client.RMProxy: Connecting to ResourceManager at h201/192.168.121.132:8032
    18/03/09 23:33:55 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    18/03/09 23:34:05 INFO input.FileInputFormat: Total input paths to process : 1
    18/03/09 23:34:06 INFO mapreduce.JobSubmitter: number of splits:1
    18/03/09 23:34:06 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    18/03/09 23:34:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1516635595760_0001
    18/03/09 23:34:21 INFO impl.YarnClientImpl: Submitted application application_1516635595760_0001
    18/03/09 23:34:21 INFO mapreduce.Job: The url to track the job: http://h201:8088/proxy/application_1516635595760_0001/
    18/03/09 23:34:21 INFO mapreduce.Job: Running job: job_1516635595760_0001
    18/03/09 23:35:32 INFO mapreduce.Job: Job job_1516635595760_0001 running in uber mode : false
    18/03/09 23:35:32 INFO mapreduce.Job:  map 0% reduce 0%
    18/03/09 23:36:33 INFO mapreduce.Job:  map 100% reduce 0%
    18/03/09 23:36:45 INFO mapreduce.Job:  map 100% reduce 100%
    18/03/09 23:36:47 INFO mapreduce.Job: Job job_1516635595760_0001 completed successfully
    18/03/09 23:36:47 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=1366
                    FILE: Number of bytes written=221143
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=747
                    HDFS: Number of bytes written=101
                    HDFS: Number of read operations=6
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters
                    Launched map tasks=1
                    Launched reduce tasks=1
                    Data-local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=55286
                    Total time spent by all reduces in occupied slots (ms)=8704
                    Total time spent by all map tasks (ms)=55286
                    Total time spent by all reduce tasks (ms)=8704
                    Total vcore-seconds taken by all map tasks=55286
                    Total vcore-seconds taken by all reduce tasks=8704
                    Total megabyte-seconds taken by all map tasks=56612864
                    Total megabyte-seconds taken by all reduce tasks=8912896
            Map-Reduce Framework
                    Map input records=50
                    Map output records=120
                    Map output bytes=1120
                    Map output materialized bytes=1366
                    Input split bytes=107
                    Combine input records=0
                    Combine output records=0
                    Reduce input groups=13
                    Reduce shuffle bytes=1366
                    Reduce input records=120
                    Reduce output records=13
                    Spilled Records=240
                    Shuffled Maps =1
                    Failed Shuffles=0
                    Merged Map outputs=1
                    GC time elapsed (ms)=1264
                    CPU time spent (ms)=4210
                    Physical memory (bytes) snapshot=223772672
                    Virtual memory (bytes) snapshot=2148155392
                    Total committed heap usage (bytes)=136712192
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters
                    Bytes Read=640
            File Output Format Counters
                    Bytes Written=101
    [hadoop@h201 mapreduce]$ hadoop fs -lsr /user/hadoop/output
    lsr: DEPRECATED: Please use 'ls -R' instead.
    18/03/09 23:37:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    -rw-r--r--   2 hadoop supergroup          0 2018-03-09 23:36 /user/hadoop/output/_SUCCESS
    -rw-r--r--   2 hadoop supergroup        101 2018-03-09 23:36 /user/hadoop/output/part-r-00000
    [hadoop@h201 mapreduce]$ hadoop fs -cat /user/hadoop/output/part-r-00000
    18/03/09 23:39:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    ayi     10
    baba    10
    cai     5
    didi    10
    gege    10
    hello   15
    jiejie  15
    mama    20
    meimei  5
    shushu  5
    wei     5
    wen     5
    word    5

  • 相关阅读:
    lucene初探
    直接插入排序算法(java)
    快速排序优化算法
    大根堆
    学习资料地址
    Lucene:基于Java的全文检索引擎简介
    开关按钮
    微信小程序—如何获取用户输入文本框的值
    微信小程序—获取用户网络状态和设备的信息
    Bootstrap 导航栏
  • 原文地址:https://www.cnblogs.com/jieran/p/8537012.html
Copyright © 2011-2022 走看看