zoukankan      html  css  js  c++  java
  • Mapreduce 反向索引

    反向索引主要用于全文搜索,就是形成一个word url这样的结构
    file1:
    MapReduce is simple
    file2:
    MapReduce is powerful is simple
    file3:
    Hello MapReduce bye MapReduce
    那么经过反向索引后就是:
    Hello     file3.txt:1;
    MapReduce     file3.txt:2;fil1.txt:1;fil2.txt:1;
    bye     file3.txt:1; 
    is     fil1.txt:1;fil2.txt:2;
    powerful     fil2.txt:1;
    simple     fil2.txt:1;fil1.txt:1;
    主要的方法就是,对每个文件的内容进行遍历,形成的key为word+filename,value=1然后在combiner中将key相同的进行累加,这样就得到在同一个文件中word的字数了。最后在reduce中将filename进行分割即可。不过这里有个小的bug,一般来说combiner是在同一个节点上进行reduce,但是我这里却是用于统计同一个文件了,如果说文件很大,那么很有可能一个文件的内容会被分配到两个不同的节点上,那么就有会bug了。所以这里只能适合小的文件。
    PS:获得文件名String filename = ((FileSplit) context.getInputSplit()).getPath().getName();别的似乎没有了。
    public class MyMapper extends Mapper<LongWritable, Text, Text, Text> {
     
                     public void map(LongWritable ikey, Text ivalue, Context context)
                                                     throws IOException, InterruptedException {
                                    StringTokenizer st= new StringTokenizer(ivalue.toString());
                                    FileSplit split=new FileSplit();
                                    split = (FileSplit) context.getInputSplit();
                                    InputSplit isplit=context.getInputSplit();
                                    String filename = ((FileSplit) context.getInputSplit()).getPath().getName();
                                     while(st.hasMoreTokens()){
                                                     //int splitIndex = split.getPath().toString().indexOf("file");
                                                    String key=st.nextToken()+":" +filename;
                                                    context.write( new Text(key),new Text("1"));
                                    }
                    }
     
    }
     
     
    public class MyCombiner extends Reducer<Text, Text, Text, Text> {
     
                     public void reduce(Text _key, Iterable<Text> values, Context context)
                                                     throws IOException, InterruptedException {
                                     // process values
                                     int sum=0;
                                     for (Text val : values) {
                                                    sum++;
                                    }
                                    StringTokenizer st= new StringTokenizer(_key.toString(),":");
                                    String key=st.nextToken();
                                    String value=st.nextToken();
                                    value=value+ ":"+sum;
                                    context.write( new Text(key),new Text(value));
                    }
     
    }
     
     
    public class MyReducer extends Reducer<Text, Text, Text, Text> {
     
                     public void reduce(Text _key, Iterable<Text> values, Context context)
                                                     throws IOException, InterruptedException {
                                     // process values
                                    String filelist= new String();
                                     for (Text val : values) {
                                                    filelist=filelist+val.toString()+ ";  ";
                                    }
                                    context.write(_key, new Text(filelist));
                                     //System.out.println(_key.toString()+filelist);
                    }
     
    }
  • 相关阅读:
    Ogre参考手册(九) 5 硬件缓冲区
    Ogre参考手册(十二) 8 动画
    Ogre参考手册(三)3.1.3 纹理单元TextureUnit
    在Ogre中直接使用D3D
    Ogre参考手册(四)3.1.43.1.14 声明顶点、几何、片段程序(Ogre着色器脚本)
    制作立体图像(上):红蓝眼镜原理
    算法学习(java实现)
    JAVA 的data类型 long类型 生成星期几汇总
    Android R.java解析
    Myeclipse的使用方法查找类文件(Open Type)
  • 原文地址:https://www.cnblogs.com/sunrye/p/4543365.html
Copyright © 2011-2022 走看看