假设有如下两个文件,一个是表是公司和地址的序号的对应,一个表是地址的序号和地址的名称的对应。
表1:
A:Beijing Red Star 1 A:Shenzhen Thunder 3 A:Guangzhou Honda 2 A:Beijing Rising 1 A:Guangzhou Development Bank 2 A:Tencent 3 A:Back of Beijing 1
表2:
B:1 Beijing B:2 Guangzhou B:3 Shenzhen B:4 Xian
mapreduce如下:
private static final Text typeA = new Text("A:"); private static final Text typeB = new Text("B:"); private static Log log = LogFactory.getLog(MTJoin.class); public static class Map extends Mapper<Object, Text, Text, MapWritable> { public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String valueStr = value.toString(); String type = valueStr.substring(0, 2); String content = valueStr.substring(2); log.info(content); if(type.equals("A:")) { String[] contentArray = content.split("\t"); String city = contentArray[0]; String address = contentArray[1]; MapWritable map = new MapWritable(); map.put(typeA, new Text(city)); context.write(new Text(address), map); } else if(type.equals("B:")) { String[] contentArray = content.split("\t"); String adrNum = contentArray[0]; String adrName = contentArray[1]; MapWritable map = new MapWritable(); map.put(typeB, new Text(adrName)); context.write(new Text(adrNum), map); } } } public static class Reduce extends Reducer<Text, MapWritable, Text, Text> { public void reduce(Text key, Iterable<MapWritable> values, Context context) throws IOException, InterruptedException { Iterator<MapWritable> it = values.iterator(); List<Text> cityList = new ArrayList<Text>(); List<Text> adrList = new ArrayList<Text>(); while(it.hasNext()) { MapWritable map = it.next(); if(map.containsKey(typeA)) { cityList.add((Text)map.get(typeA)); } else if(map.containsKey(typeB)) { adrList.add((Text)map.get(typeB)); } } for(int i = 0; i < cityList.size(); i++) { for(int j = 0; j < adrList.size(); j++) { context.write(cityList.get(i), adrList.get(j)); } } } }
原理很简单,map的出口,以地址的序号作为key,然后出来的时候,公司名称放一个list,地址的名称放一个list,两个list的内容作笛卡儿积,就得到了结果。
输出如下:
Beijing Red Star Beijing Beijing Rising Beijing Back of Beijing Beijing Guangzhou Honda Guangzhou Guangzhou Development Bank Guangzhou Shenzhen Thunder Shenzhen Tencent Shenzhen