zoukankan      html  css  js  c++  java
  • mapreduce程序的按照key值从大到小降序排列

    在近期的Hadoop的学习中,在学习mapreduce时遇到问题:让求所给数据的top10,们我们指导mapreduce中是有默认的排列机制的,是按照key的升序从大到小排列的

    然而top10问题的求解需要按照降序排列。在网上找了很长时间才得以解决,解决方法如下:

    自定义一个比较器,这个比较器要继承WritableComparator类,代码如下:

    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.WritableComparator;
    
    public  class DescSort extends WritableComparator{
    
         public DescSort() {
             super(LongWritable.class,true);//注册排序组件
        }
         @Override
        public int compare(byte[] arg0, int arg1, int arg2, byte[] arg3,
                int arg4, int arg5) {
            return -super.compare(arg0, arg1, arg2, arg3, arg4, arg5);//注意使用负号来完成降序
        }
    
         @Override
        public int compare(Object a, Object b) {
    
            return   -super.compare(a, b);//注意使用负号来完成降序
        }
    }

    在主函数中要执行时要声明该比较器的类的名称,代码如下:

    package Sort;
    
    import java.io.IOException;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.io.RawComparator;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    public class SortRunner {
    
        public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
             Configuration conf = new Configuration();
             conf.set("fs.defaultFS","hdfs://192.168.252.200:9000");
              Job job = Job.getInstance(conf);
              job.setJarByClass(SortRunner.class);
              
              job.setSortComparatorClass(DescSort.class);
              job.setMapperClass(SortMapper.class);
              job.setReducerClass(SortReducer.class);
          
              job.setMapOutputKeyClass(LongWritable.class);
              job.setMapOutputValueClass(NullWritable.class);
              
              job.setOutputKeyClass(LongWritable.class);
              job.setOutputValueClass(NullWritable.class);
              
              //输入输出的路径
              
              
               FileInputFormat.setInputPaths(job,new Path("/sort/srcdata/"));
             FileOutputFormat.setOutputPath(job, new Path("/sort/output3"));
              System.exit(job.waitForCompletion(true)?0:1);
            
        }
    }

    注:红色部分便是声明比较器

    这样就可以实现降序输出了。

    网上与很多按照自定义类类型的排序的输出,在这里便不进行详细介绍,望采纳!!!!

  • 相关阅读:
    敏感词过滤
    Tarjan+topsort(DP)【P3387】 [模板]缩点
    树状数组【CF703D】Mishka and Interesting sum
    组合数学+错排问题【p4071】[SDOI2016]排列计数
    Dijkstra【p3003(bzoj2100)】[USACO10DEC]苹果交货Apple Delivery
    Trie树【p2264】情书
    线段树+扫描线【p1884】[Usaco12FEB]过度种植(银)Overplanting …
    区间DP【p4290】[HAOI2008]玩具取名
    暴力 【p4092】[HEOI2016/TJOI2016]树
    暴力 【p4098】[HEOI2013]ALO
  • 原文地址:https://www.cnblogs.com/ljysy/p/10053326.html
Copyright © 2011-2022 走看看