zoukankan      html  css  js  c++  java
  • 1125每日博客

    Mapreduce实例——单表join

    以本实验的buyer1(buyer_id,friends_id)表为例来阐述单表连接的实验原理。单表连接,连接的是左表的buyer_id列和右表的friends_id列,且左表和右表是同一个表。因此,在map阶段将读入数据分割成buyer_id和friends_id之后,会将buyer_id设置成key,friends_id设置成value,直接输出并将其作为左表;再将同一对buyer_id和friends_id中的friends_id设置成key,buyer_id设置成value进行输出,作为右表。为了区分输出中的左右表,需要在输出的value中再加上左右表的信息,比如在value的String最开始处加上字符1表示左表,加上字符2表示右表。这样在map的结果中就形成了左表和右表,然后在shuffle过程中完成连接。reduce接收到连接的结果,其中每个key的value-list就包含了"buyer_idfriends_id--friends_idbuyer_id"关系。取出每个key的value-list进行解析,将左表中的buyer_id放入一个数组,右表中的friends_id放入一个数组,然后对两个数组求笛卡尔积就是最后的结果了。

    Map处理的是一个纯文本文件,Mapper处理的数据是由InputFormat将数据集切分成小的数据集InputSplit,并用RecordReader解析成<key/value>对提供给map函数使用。map函数中用split("\t")方法把每行数据进行截取,并把数据存入到数组arr[],把arr[0]赋值给mapkey,arr[1]赋值给mapvalue。用两个context的write()方法把数据输出两份,再通过标识符relationtype为1或2对两份输出数据的value打标记。

    reduce端在接收map端传来的数据时已经把相同key的所有value都放到一个Iterator容器中values。reduce函数中,首先新建两数组buyer[]和friends[]用来存放map端的两份输出数据。然后Iterator迭代中hasNext()和Next()方法加while循环遍历输出values的值并赋值给record,用charAt(0)方法获取record第一个字符赋值给relationtype,用if判断如果relationtype为1则把用substring(2)方法从下标为2开始截取record将其存放到buyer[]中,如果relationtype为2时将截取的数据放到frindes[]数组中。然后用三个for循环嵌套遍历输出<key,value>,其中key=buyer[m],value=friends[n]。

    代码如下:

    package exper;

    import java.io.IOException;
    import java.util.Iterator;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

    public class DanJoin {
        public static class Map extends Mapper<Object, Text, Text, Text> {
            public void map(Object key, Text value, Context context)
                    throws IOException, InterruptedException {
                String line = value.toString();
                String[] arr = line.split("   ");
                String mapkey = arr[0];
                String mapvalue = arr[1];
                String relationtype = new String();
                relationtype = "1";
                context.write(new Text(mapkey), new Text(relationtype + "+" + mapvalue));
                //System.out.println(relationtype+"+"+mapvalue);
               
    relationtype = "2";
                context.write(new Text(mapvalue), new Text(relationtype + "+" + mapkey));
                //System.out.println(relationtype+"+"+mapvalue);
           
    }
        }

        public static class Reduce extends Reducer<Text, Text, Text, Text> {
            public void reduce(Text key, Iterable<Text> values, Context context)
                    throws IOException, InterruptedException {
                int buyernum = 0;
                String[] buyer = new String[20];
                int friendsnum = 0;
                String[] friends = new String[20];
                Iterator ite = values.iterator();
                while (ite.hasNext()) {
                    String record = ite.next().toString();
                    int len = record.length();
                    int i = 2;
                    if (0 == len) {
                        continue;
                    }
                    char relationtype = record.charAt(0);
                    if ('1' == relationtype) {
                        buyer[buyernum] = record.substring(i);
                        buyernum++;
                    }
                    if ('2' == relationtype) {
                        friends[friendsnum] = record.substring(i);
                        friendsnum++;
                    }
                }
                if (0 != buyernum && 0 != friendsnum) {
                    for (int m = 0; m < buyernum; m++) {
                        for (int n = 0; n < friendsnum; n++) {
                            if (buyer[m] != friends[n]) {
                                context.write(new Text(buyer[m]), new Text(friends[n]));
                            }
                        }
                    }
                }
            }
        }

        public static void main(String[] args) throws Exception {

            Configuration conf = new Configuration();
            String[] otherArgs = new String[2];
            String InPath="D:\\mapreduce\\4in\\buyer1.txt";
            String OutPath="file:///D:/mapreduce/4out";
            Job job = new Job(conf, "   Table   join");
            job.setJarByClass(DanJoin.class);
            job.setMapperClass(Map.class);
            job.setReducerClass(Reduce.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);
            FileInputFormat.addInputPath(job, new Path(InPath));
            FileOutputFormat.setOutputPath(job, new Path(OutPath));
            System.exit(job.waitForCompletion(true) ? 0 : 1);

        }
    }

  • 相关阅读:
    [工作中的设计模式]中介模式模式Mediator
    [工作中的设计模式]责任链模式chain
    [工作中的设计模式]迭代子模式Iterator
    [工作中的设计模式]组合模式compnent
    TI IPNC Web网页之流程分析
    TI IPNC Web网页之GoDB开发环境
    安装ubuntu时将boot目录单独挂载的意义
    ubuntu添加自定义vga输出分辨率
    GCC编译默认的头文件搜索路径
    设置搜狗输入法在任何时候按左右两侧的shift激活
  • 原文地址:https://www.cnblogs.com/ruangongwangxiansheng/p/14568354.html
Copyright © 2011-2022 走看看