zoukankan      html  css  js  c++  java
  • MapReduce实现手机上网日志分析(排序)

    一、背景

    1.1 流程

      实现排序,分组拍上一篇通过Partitioner实现了。

      实现接口,自动产生接口方法,写属性,产生getter和setter,序列化和反序列化属性,写比较方法,重写toString,为了方便复制写够着方法,不过重写够着方法map里需要不停地new,发现LongWritable有set方法,text也有,可以用,产生默认够着方法。

    	public void set(String account,double income,double expense,double surplus) {
    		this.account = account;
    		this.income = income;
    		this.expense = expense;
    		this.surplus = income-expense;
    	}
    

    1.2 数据集

    为了和上一篇保在知识上持递进,数据及换了,名字没变。

      下面是输出结果,其实mr也会自动排序,不过string按字典序排序了。

    二、理论知识

      字符串拼接,记得以前自己写过,现在拿出来看看,http://www.cnblogs.com/hxsyl/archive/2012/10/18/2729112.html

      简单总结扩展如下:String是final的,不能改变也不能继承,因此在每次对 String 类型进行改变的时候其实都等同于生成了一个新的 String 对象,然后将指针指向新的 String 对象,所以经常改变内容的字符串最好不要用 String ,因为每次生成对象都会对系统性能产生影响,特别当内存中无引用对象多了以后, JVM 的 GC 就会开始工作,那速度是一定会相当慢的。

     

      如果for循环1w次,这句 string += "hello";的过程相当于将原有的string变量指向的对象内容取出与"hello"作字符串相加操作再存进另一个新的String对象当中,再让string变量指向新生成的对象。反编译出的字节码文件可以很清楚地看出,每次循环会new出一个StringBuilder对象,然后进行append操作,最后通过toString方法返回String对象。也就是说这个循环执行完毕new出了10000个对象,试想一下,如果这些对象没有被回收,内存浪费不说,有可能重复使用赵成系统卡死。从上面还可以看出:string+="hello"的操作事实上会自动被JVM优化成:

      StringBuilder str = new StringBuilder(string);

      str.append("hello");

      str.toString();

      如果直接for循环里StringBuilder 的话会只是new一次。效率高。

      而StringBuffer是线程安全的,多了synchronized关键字,也就是在多线程下会顺序读取换冲刺。

     参考了这个http://blog.csdn.net/loveyaozu/article/details/47037957

    三、实体类

      收入相同的话按消费从低到高,否则收入从高到低。

    package cn.app.hadoop.mr.sort;
    
    import java.io.DataInput;
    import java.io.DataOutput;
    import java.io.IOException;
    import java.math.BigDecimal;
    
    import org.apache.hadoop.io.WritableComparable;
    import org.apache.jasper.tagplugins.jstl.core.Out;
    
    //Writable是序列化接口
    //泛型是InfoBean,就像比较学生信息一样,成绩,性别等 ,封装在了一个bean里
    //不过发现WritableComparable  有了序列化和反序列化
    public class InfoBean implements WritableComparable<InfoBean>{
    	
    	
    	private String account;
    	//金钱类都需要BigDecimal,double顺势精度,不过不知道下边序列化咋写类型,所以先用double,估计writeUTF可以
    	private double income;
    	private double expense;
    	private double surplus;
    	
    	
    	public String getAccount() {
    		return account;
    	}
    	public void setAccount(String account) {
    		this.account = account;
    	}
    	public double getIncome() {
    		return income;
    	}
    	public void setIncome(double income) {
    		this.income = income;
    	}
    	public double getExpense() {
    		return expense;
    	}
    	public void setExpense(double expense) {
    		this.expense = expense;
    	}
    	public double getSurplus() {
    		return surplus;
    	}
    	public void setSurplus(double surplus) {
    		this.surplus = surplus;
    	}
    	public void readFields(DataInput in) throws IOException {
    		// TODO Auto-generated method stub
    		this.account = in.readUTF();
    		this.income = in.readDouble();
    		this.expense = in.readDouble();
    		this.surplus = in.readDouble();
    	}
    	public void write(DataOutput out) throws IOException {
    		// TODO Auto-generated method stub
    		out.writeUTF(account);
    		out.writeDouble(income);
    		out.writeDouble(expense);
    		out.writeDouble(surplus);
    		
    	}
    	
    	public void set(String account,double income,double expense) {
    		this.account = account;
    		this.income = income;
    		this.expense = expense;
    		this.surplus = income - expense;
    	}
    	
    
    	public InfoBean() {
    		super();
    		// TODO Auto-generated constructor stub
    	}
    	@Override
    	public String toString() {
    		return "InfoBean [income=" + income + ", expense=" + expense
    				+ ", surplus=" + surplus + "]";
    	}
    	public int compareTo(InfoBean o) {
    		// TODO Auto-generated method stub
    		if(this.income == o.getIncome()) {
    			return this.expense>o.getExpense()?1:-1;
    		}else {
    			return this.income>o.getIncome()?-1:1;
    		}
    	}
    }

    四、第一种实现

    4.1 Mapper

    //第一个处理文本的话一般是LongWritable  或者object
    //一行一行的文本是text
    //输出的key的手机号 定位Text
    //结果是DataBean  一定要实现Writable接口
    public class InfoSortMapper extends Mapper<LongWritable, Text, Text, InfoBean> {
    
    	
    	private InfoBean v = new InfoBean();
    	private Text k = new Text();
    	
    	public void map(LongWritable key, Text value, Context context)
    			throws IOException, InterruptedException {
    		String line = value.toString();
    		String[] fields = line.split("	");
    		String account = fields[0];
    		double in = Double.parseDouble(fields[1]);
    		double out = Double.parseDouble(fields[2]);
    		
    		//不用每次new  几遍不重写内存引用,也很站用资源
    		k.set(account);
    		v.set(account, in, out);
    		
    		context.write(k, v);
    	}
    

      4.2 Reducer

    public class InfoSortReducer extends Reducer<Text, InfoBean, Text, InfoBean> {
    
    	//k就是key,不需要
    	private InfoBean v = new InfoBean();
    	public void reduce(Text key, Iterable<InfoBean> value, Context context)
    			throws IOException, InterruptedException {
    		// process values
    		double incomeSum = 0;
    		double expenseSum = 0;
    		for (InfoBean o : value) {
    			incomeSum += o.getIncome();
    			expenseSum += o.getExpense();
    		}
    		v.set(key.toString(), incomeSum, expenseSum);
    		//databean会自动调用toString
    		context.write(key,v);
    	}
    }
    

    五、第二种实现

    5.1 Mapper

    //对 InfoBean  排序  k2就是他
    public class SortMapper extends Mapper<LongWritable, Text, InfoBean, NullWritable> {
    
    	
    	private InfoBean k = new InfoBean();
    	public void map(LongWritable key, Text value, Context context)
    			throws IOException, InterruptedException {
    		String line = value.toString();
    		String[] fields = line.split("	");
    		String account = fields[0];
    		double in = Double.parseDouble(fields[1]);
    		double out = Double.parseDouble(fields[2]);
    		
    		//不用每次new  几遍不重写内存引用,也很站用资源
    		k.set(account, in, out);
    		//value必须是NullWritable.get(),NullWritable不行,提示不是变量
    		context.write(k, NullWritable.get());
    	}
    }
    

      5.2 Reducer

    //对 InfoBean  排序  k2就是他
    public class SortMapper extends Mapper<LongWritable, Text, InfoBean, NullWritable> {
    
    	
    	private InfoBean k = new InfoBean();
    	public void map(LongWritable key, Text value, Context context)
    			throws IOException, InterruptedException {
    		String line = value.toString();
    		String[] fields = line.split("	");
    		String account = fields[0];
    		double in = Double.parseDouble(fields[1]);
    		double out = Double.parseDouble(fields[2]);
    		
    		//不用每次new  几遍不重写内存引用,也很站用资源
    		k.set(account, in, out);
    		//value必须是NullWritable.get(),NullWritable不行,提示不是变量
    		context.write(k, NullWritable.get());
    	}
    }

    六、结束语

      如果k2 v2和k4 v4,也就是mapp的输出和reducer的输出类型不一致的话必须在Main里也设置Mapper的输出,上面的第二种就是。

    job.setMapOutputKeyClass(InfoBean.class);
    		job.setMapOutputValueClass(NullWritable.class);
    		
    		job.setOutputKeyClass(Text.class);
    		job.setOutputValueClass(InfoBean.class);
    

      否则java里不报错,加上log4j后看到类型不匹配。

  • 相关阅读:
    www.insidesql.org
    kevinekline----------------- SQLSERVER MVP
    Sys.dm_os_wait_stats Sys.dm_performance_counters
    如何使用 DBCC MEMORYSTATUS 命令来监视 SQL Server 2005 中的内存使用情况
    VITAM POST MORTEM – ANALYZING DEADLOCKED SCHEDULERS MINI DUMP FROM SQL SERVER
    Cargo, Rust’s Package Manager
    建筑识图入门(初学者 入门)
    Tracing SQL Queries in Real Time for MySQL Databases using WinDbg and Basic Assembler Knowledge
    Microsoft SQL Server R Services
    The Rambling DBA: Jonathan Kehayias
  • 原文地址:https://www.cnblogs.com/hxsyl/p/6165176.html
Copyright © 2011-2022 走看看