zoukankan      html  css  js  c++  java
  • hadoop mapreduce多表关联

    假设有如下两个文件,一个是表是公司和地址的序号的对应,一个表是地址的序号和地址的名称的对应。

    表1:

    A:Beijing Red Star	1
    A:Shenzhen Thunder	3
    A:Guangzhou Honda	2
    A:Beijing Rising	1
    A:Guangzhou Development Bank	2
    A:Tencent	3
    A:Back of Beijing	1


    表2:

    B:1	Beijing
    B:2	Guangzhou
    B:3	Shenzhen
    B:4	Xian


    mapreduce如下:

    private static final Text typeA = new Text("A:");
    	
    	private static final Text typeB = new Text("B:");
    	
    	private static Log log = LogFactory.getLog(MTJoin.class);
    	
        public static class Map extends Mapper<Object, Text, Text, MapWritable> {
        	
        	public void map(Object key, Text value, Context context)
                    throws IOException, InterruptedException {
        		String valueStr = value.toString();
        		String type = valueStr.substring(0, 2);
        		String content = valueStr.substring(2);
        		log.info(content);
        		if(type.equals("A:"))
        		{
        			String[] contentArray = content.split("\t");
        			String city = contentArray[0];
        			String address = contentArray[1];
        			MapWritable map = new MapWritable();
        			map.put(typeA, new Text(city));
        			context.write(new Text(address), map);
        		}
        		else if(type.equals("B:"))
        		{
        			String[] contentArray = content.split("\t");
        			String adrNum = contentArray[0];
        			String adrName = contentArray[1];
        			MapWritable map = new MapWritable();
        			map.put(typeB, new Text(adrName));
        			context.write(new Text(adrNum), map);
        		}
        	}
        }
        
        public static class Reduce extends Reducer<Text, MapWritable, Text, Text> {
        	
        	
        	
        	 public void reduce(Text key, Iterable<MapWritable> values, Context context)
                     throws IOException, InterruptedException {
        		 Iterator<MapWritable> it = values.iterator();
        		 List<Text> cityList = new ArrayList<Text>();
        		 List<Text> adrList = new ArrayList<Text>();
        		 while(it.hasNext())
        		 {
        			 MapWritable map = it.next();
        			 if(map.containsKey(typeA))
        			 {
        				 cityList.add((Text)map.get(typeA));
        			 }
        			 else if(map.containsKey(typeB))
        			 {
        				 adrList.add((Text)map.get(typeB));
        			 }
        		 }
        		 for(int i = 0; i < cityList.size(); i++)
        		 {
        			 for(int j = 0; j < adrList.size(); j++)
        			 {
        				 context.write(cityList.get(i), adrList.get(j));
        			 }
        		 }
        	 }
        }

    原理很简单,map的出口,以地址的序号作为key,然后出来的时候,公司名称放一个list,地址的名称放一个list,两个list的内容作笛卡儿积,就得到了结果。

    输出如下:

    Beijing Red Star	Beijing
    Beijing Rising	Beijing
    Back of Beijing	Beijing
    Guangzhou Honda	Guangzhou
    Guangzhou Development Bank	Guangzhou
    Shenzhen Thunder	Shenzhen
    Tencent	Shenzhen



  • 相关阅读:
    理解margin
    dedecms 时间标签strftime和MyDate
    dede调用img图片
    dedecms中调用制定栏目
    在list_*页面显示出一级栏目下的所有二级栏目
    调用二级、三级栏目
    dedecms二级导航标签调用使用的方法
    学习PHP第一天-----简单登录
    Python程序设计9——数据库编程
    Python程序设计8——网络编程
  • 原文地址:https://www.cnblogs.com/javawebsoa/p/3065705.html
Copyright © 2011-2022 走看看