今日进度 - 走看看

zoukankan html css js c++ java

今日进度
现有某电商关于商品点击情况的数据文件，表名为goods_click，包含两个字段（商品分类，商品点击次数），分隔符“\t”，由于数据很大，所以为了方便统计我们只截取它的一部分数据，内容如下：
1. 商品分类商品点击次数
2. 52127 5
3. 52120 93
4. 52092 93
5. 52132 38
6. 52006 462
7. 52109 28
8. 52109 43
9. 52132 0
10. 52132 34
11. 52132 9
12. 52132 30
13. 52132 45
14. 52132 24
15. 52009 2615
16. 52132 25
17. 52090 13
18. 52132 6
19. 52136 0
20. 52090 10
21. 52024 347
要求使用mapreduce统计出每类商品的平均点击次数。

结果数据如下：
1. 商品分类商品平均点击次数
2. 52006 462
3. 52009 2615
4. 52024 347
5. 52090 11
6. 52092 93
7. 52109 35
8. 52120 93
9. 52127 5
10. 52132 23
11. 52136 0
实验步骤

1.切换到/apps/hadoop/sbin目录下，开启Hadoop。
1. cd /apps/hadoop/sbin
2. ./start-all.sh
2.在Linux本地新建/data/mapreduce4目录。
1. mkdir -p /data/mapreduce4
3.在Linux中切换到/data/mapreduce4目录下，用wget命令从http://192.168.1.100:60000/allfiles/mapreduce4/goods_click网址上下载文本文件goods_click。
1. cd /data/mapreduce4
2. wget http://192.168.1.100:60000/allfiles/mapreduce4/goods_click
然后在当前目录下用wget命令从http://192.168.1.100:60000/allfiles/mapreduce4/hadoop2lib.tar.gz网址上下载项目用到的依赖包。
1. wget http://192.168.1.100:60000/allfiles/mapreduce4/hadoop2lib.tar.gz
将hadoop2lib.tar.gz解压到当前目录下。
1. tar zxvf hadoop2lib.tar.gz
4.首先在HDFS上新建/mymapreduce4/in目录，然后将Linux本地/data/mapreduce4目录下的goods_click文件导入到HDFS的/mymapreduce4/in目录中。
1. hadoop fs -mkdir -p /mymapreduce4/in
2. hadoop fs -put /data/mapreduce4/goods_click /mymapreduce4/in
5.新建Java Project项目，项目名为mapreduce4。

在mapreduce4项目下新建包，包名为mapreduce。

在mapreduce包下新建类，类名为MyAverage。

6.添加项目所需依赖的jar包，右键点击mapreduce4，新建一个文件夹，名为hadoop2lib，用于存放项目所需的jar包。

将/data/mapreduce4目录下，hadoop2lib目录中的jar包，拷贝到eclipse中mapreduce4项目的hadoop2lib目录下。

选中hadoop2lib目录下所有jar包，并添加到Build Path中。

7.编写Java代码并描述其设计思路。

Mapper代码
1. public static class Map extends Mapper<Object , Text , Text , IntWritable>{
2. private static Text newKey=new Text();
3. //实现map函数
4. public void map(Object key,Text value,Context context) throws IOException, InterruptedException{
5. // 将输入的纯文本文件的数据转化成String
6. String line=value.toString();
7. System.out.println(line);
8. String arr[]=line.split("\t");
9. newKey.set(arr[0]);
10. int click=Integer.parseInt(arr[1]);
11. context.write(newKey, new IntWritable(click));
12. }
13. }
map端在采用Hadoop的默认输入方式之后，将输入的value值通过split()方法截取出来，我们把截取的商品点击次数字段转化为IntWritable类型并将其设置为value，把商品分类字段设置为key,然后直接输出key/value的值。

Reducer代码
1. public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>{
2. //实现reduce函数
3. public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{
4. int num=0;
5. int count=0;
6. for(IntWritable val:values){
7. num+=val.get(); //每个元素求和num
8. count++; //统计元素的次数count
9. }
10. int avg=num/count; //计算平均数
12. context.write(key,new IntWritable(avg));
13. }
14. }
map的输出<key,value>经过shuffle过程集成<key,values>键值对，然后将<key,values>键值对交给reduce。reduce端接收到values之后，将输入的key直接复制给输出的key，将values通过for循环把里面的每个元素求和num并统计元素的次数count，然后用num除以count 得到平均值avg，将avg设置为value，最后直接输出<key,value>就可以了。

完整代码
1. package mapreduce;
2. import java.io.IOException;
3. import org.apache.hadoop.conf.Configuration;
4. import org.apache.hadoop.fs.Path;
5. import org.apache.hadoop.io.IntWritable;
6. import org.apache.hadoop.io.NullWritable;
7. import org.apache.hadoop.io.Text;
8. import org.apache.hadoop.mapreduce.Job;
9. import org.apache.hadoop.mapreduce.Mapper;
10. import org.apache.hadoop.mapreduce.Reducer;
11. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
12. import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
13. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
14. import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
15. public class MyAverage{
16. public static class Map extends Mapper<Object , Text , Text , IntWritable>{
17. private static Text newKey=new Text();
18. public void map(Object key,Text value,Context context) throws IOException, InterruptedException{
19. String line=value.toString();
20. System.out.println(line);
21. String arr[]=line.split("\t");
22. newKey.set(arr[0]);
23. int click=Integer.parseInt(arr[1]);
24. context.write(newKey, new IntWritable(click));
25. }
26. }
27. public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable>{
28. public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{
29. int num=0;
30. int count=0;
31. for(IntWritable val:values){
32. num+=val.get();
33. count++;
34. }
35. int avg=num/count;
36. context.write(key,new IntWritable(avg));
37. }
38. }
39. public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException{
40. Configuration conf=new Configuration();
41. System.out.println("start");
42. Job job =new Job(conf,"MyAverage");
43. job.setJarByClass(MyAverage.class);
44. job.setMapperClass(Map.class);
45. job.setReducerClass(Reduce.class);
46. job.setOutputKeyClass(Text.class);
47. job.setOutputValueClass(IntWritable.class);
48. job.setInputFormatClass(TextInputFormat.class);
49. job.setOutputFormatClass(TextOutputFormat.class);
50. Path in=new Path("hdfs://localhost:9000/mymapreduce4/in/goods_click");
51. Path out=new Path("hdfs://localhost:9000/mymapreduce4/out");
52. FileInputFormat.addInputPath(job,in);
53. FileOutputFormat.setOutputPath(job,out);
54. System.exit(job.waitForCompletion(true) ? 0 : 1);
56. }
57. }
8.在MyAverage类文件中，右键并点击=>Run As=>Run on Hadoop选项，将MapReduce任务提交到Hadoop中。

9.待执行完毕后，进入命令模式下，在HDFS上/mymapreduce4/out中查看实验结果。
1. hadoop fs -ls /mymapreduce4/out
2. hadoop fs -cat /mymapreduce4/out/part-r-00000
运行结果
查看全文

相关阅读:
101. Symmetric Tree（js）
100. Same Tree（js）
99. Recover Binary Search Tree（js）
98. Validate Binary Search Tree（js）
97. Interleaving String（js）
96. Unique Binary Search Trees（js）
95. Unique Binary Search Trees II（js）
94. Binary Tree Inorder Traversal（js）
93. Restore IP Addresses（js）
92. Reverse Linked List II（js）

原文地址：https://www.cnblogs.com/hanmy/p/14882529.html