zoukankan      html  css  js  c++  java
  • 大型数据库技术实验六 实验6:Mapreduce实例——WordCount

    现有某电商网站用户对商品的收藏数据,记录了用户收藏的商品id以及收藏日期,名为buyer_favorite1

    buyer_favorite1包含:买家id,商品id,收藏日期这三个字段,数据以“ ”分割,样本数据及格式如下:

    买家id   商品id    收藏日期  

    10181   1000481   2010-04-04 16:54:31  

    20001   1001597   2010-04-07 15:07:52  

    20001   1001560   2010-04-07 15:08:27  

    20042   1001368   2010-04-08 08:20:30  

    20067   1002061   2010-04-08 16:45:33  

    20056   1003289   2010-04-12 10:50:55  

    20056   1003290   2010-04-12 11:57:35  

    20056   1003292   2010-04-12 12:05:29  

    20054   1002420   2010-04-14 15:24:12  

    20055   1001679   2010-04-14 19:46:04  

    20054   1010675   2010-04-14 15:23:53  

    20054   1002429   2010-04-14 17:52:45  

    20076   1002427   2010-04-14 19:35:39  

    20054   1003326   2010-04-20 12:54:44  

    20056   1002420   2010-04-15 11:24:49  

    20064   1002422   2010-04-15 11:35:54  

    20056   1003066   2010-04-15 11:43:01  

    20056   1003055   2010-04-15 11:43:06  

    20056   1010183   2010-04-15 11:45:24  

    20056   1002422   2010-04-15 11:45:49  

    20056   1003100   2010-04-15 11:45:54  

    20056   1003094   2010-04-15 11:45:57  

    20056   1003064   2010-04-15 11:46:04  

    20056   1010178   2010-04-15 16:15:20  

    20076   1003101   2010-04-15 16:37:27  

    20076   1003103   2010-04-15 16:37:05  

    20076   1003100   2010-04-15 16:37:18  

    20076   1003066   2010-04-15 16:37:31  

    20054   1003103   2010-04-15 16:40:14  

    20054   1003100   2010-04-15 16:40:16  

    要求编写MapReduce程序,统计每个买家收藏商品数量。

    统计结果数据如下:

    1. 买家id 商品数量  
    2. 10181   1  
    3. 20001   2  
    4. 20042   1  
    5. 20054   6  
    6. 20055   1  
    7. 20056   12  
    8. 20064   1  
    9. 20067   1  
    10. 20076   5  
    package mapreduce;  
    import java.io.IOException;  
    import java.util.StringTokenizer;  
    import org.apache.hadoop.fs.Path;  
    import org.apache.hadoop.io.IntWritable;  
    import org.apache.hadoop.io.Text;  
    import org.apache.hadoop.mapreduce.Job;  
    import org.apache.hadoop.mapreduce.Mapper;  
    import org.apache.hadoop.mapreduce.Reducer;  
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
    public class WordCount {  
        public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {  
            Job job = Job.getInstance();  
            job.setJobName("WordCount");  
            job.setJarByClass(WordCount.class);  
            job.setMapperClass(doMapper.class);  
            job.setReducerClass(doReducer.class);  
            job.setOutputKeyClass(Text.class);  
            job.setOutputValueClass(IntWritable.class);  
            Path in = new Path("hdfs://localhost:9000/mymapreduce1/in/buyer_favourite9");  
            Path out = new Path("hdfs://localhost:9000/mymapreduce1/out");  
            FileInputFormat.addInputPath(job, in);  
            FileOutputFormat.setOutputPath(job, out);  
            System.exit(job.waitForCompletion(true) ? 0 : 1);  
        }  
        public static class doMapper extends Mapper<Object, Text, Text, IntWritable>{  
            public static final IntWritable one = new IntWritable(1);  
            public static Text word = new Text();  
            @Override  
            protected void map(Object key, Text value, Context context)  
                        throws IOException, InterruptedException {  
                StringTokenizer tokenizer = new StringTokenizer(value.toString(), "   ");  
                    word.set(tokenizer.nextToken());  
                    context.write(word, one);  
            }  
        }  
        public static class doReducer extends Reducer<Text, IntWritable, Text, IntWritable>{  
            private IntWritable result = new IntWritable();  
            @Override  
            protected void reduce(Text key, Iterable<IntWritable> values, Context context)  
            throws IOException, InterruptedException {  
            int sum = 0;  
            for (IntWritable value : values) {  
            sum += value.get();  
            }  
            result.set(sum);  
            context.write(key, result);  
            }  
        }  
    }  

    实验截图:

  • 相关阅读:
    【crontab】误删crontab及其恢复
    New Concept English there (7)
    New Concept English there (6)
    New Concept English there (5)
    New Concept English there (4)
    New Concept English there (3)
    New Concept English there (2)Typing speed exercise
    New Concept English there (1)Typing speed exercise
    New Concept English Two 34 game over
    New Concept English Two 33 94
  • 原文地址:https://www.cnblogs.com/zlc364624/p/11767108.html
Copyright © 2011-2022 走看看