Map
public static class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable>
继承Mapper类
其中<LongWritable, Text, Text, IntWritable>的含义如下:
- LongWritable为map函数的输入键,行首偏移量
- Text为map函数的输入值,每行的内容
- Text为输出类型,根据业务来定义
- IntWritable为输出值,即reduce的输入值,根据业务来定义
继承后,编写map函数
public void map(LongWritable key, Text value,Context context)
Reduce
pubilc static class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable>
继承Reducer类
其中<Text, IntWritable, Text, IntWritable>的含义如下:
- Text 输入key
- IntWritable reduce阶段输入值
- Text reduce阶段输出类型
- IntWritable 输出值
继承后,编写reduce函数
public void reduce(Text key, Iterable
main
主函数中要设置job
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
在main中要熟悉job和fs各项操作
- main函数调用jobconf来对MapReduce Job进行初始化,调用setJobName()命名Job
- 设置输出结果类型
conf.setOutputKeyClass(Text.class );
conf.setOutputValueClass(IntWritable.class );
Text相当于Java的String,IntWritable相当于Int
3. 设置Map、Combiner、Reduce的相关处理类
4. 调用setInputFormat()、setOutputFormat()设置输入输出路径
输入分片与记录
- JobClient通过指定输入文件格式来生成数据分片InputSplit
- 分片不是数据本身,而是数据的索引
- InputFormat负责分片的生成