zoukankan html css js c++ java

hadoop计算二度人脉关系推荐好友

https://www.jianshu.com/p/8707cd015ba1

问题描述：

以下是qq好友关系，进行好友推荐，比如：老王和二狗是好友，二狗和春子以及花朵是好友，那么老王和花朵或者老王和春子就有可能也认识，可以对老王推荐春子和或花朵作为好友。

注意以下是制表符：tab建，所以程序中用 /t进行分割

老王二狗
老王二毛
二狗春子
二狗花朵
老王花朵
花朵老王
春子菊花

问题分析

问题分析：
主 ---> 从
从 --->主
分别列出每一个关系，然后都列出从-->主
这样去重后每个人可以有一个关系集合，然后对这个集合中的每个元素求笛卡尔积，记得到可能的关系
比如：
老王 -->二狗
二狗--->老王
这是一对主从从主
然后：可以对二狗求出一个集合
如下进行全面列出：
老王二狗
二狗老王
二狗春子
二狗花朵
这样二狗进行合并后就是老王春子和花朵组成一个集合，然后对集合中的每个元素求笛卡尔积即可

编程实现：

mapper实现分离主从从主


package com.topwqp.mr;

import java.io.IOException;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

public class QQMapper extends Mapper<LongWritable,Text,Text,Text>{
           @Override
        protected void map(LongWritable key, Text value,
          Mapper<LongWritable, Text, Text, Text>.Context context)
          throws IOException, InterruptedException {
         // TODO Auto-generated method stub
           String line = value.toString();
           //通过制表符进行分割
           String[]  lineDatas = line.split("	");
           context.write(new Text(lineDatas[0]), new Text(lineDatas[1]));
           context.write(new Text(lineDatas[1]), new Text(lineDatas[0]));
        }    
}

reduce实现去重和笛卡尔积


package com.topwqp.mr;

import java.io.IOException;

import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.io.Text;

import java.util.*;

public class QQReduce extends Reducer<Text,Text,Text,Text>{
   @Override
protected void reduce(Text key, Iterable<Text> i,
  Reducer<Text, Text, Text, Text>.Context context) throws IOException,
  InterruptedException {
 // TODO Auto-generated method stub
 //首先进行去重
 Set<String>  set = new HashSet<String>(); 
 for(Text t:i){
  set.add(t.toString());
 }
 //每个元素都拿出来，计算笛卡尔积 如果只有一个元素，就不用求笛卡尔积，直接列出即可
 if(set.size()>1){
  for(Iterator j = set.iterator();j.hasNext();){
   String name =(String)j.next();
   for (Iterator k = set.iterator(); k.hasNext();) {
    String other = (String) k.next();
    //排除自己
    if(!name.equals(other)){
     context.write(new Text(name), new Text(other));
    }
   }
  }
 }
}
}

JobRun编写


package com.topwqp.mr;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class QQJobRun {
  public static void main(String[] args) {
   //configuration中配置的key value和  配置文件下的conf/mapred-site.xml保持一致
   Configuration conf = new Configuration();
   conf.set("mapred.job.tracker", "localhost:9001");
   conf.addResource(new Path("/Users/wangqiupeng/Documents/xplan/bigdata/hadoop-1.2.1/conf/core-site.xml"));
   conf.addResource(new Path("/Users/wangqiupeng/Documents/xplan/bigdata/hadoop-1.2.1/conf/hdfs-site.xml"));
      conf.set("mapred.jar", "/Users/wangqiupeng/Downloads/qq.jar");
   try{
    Job job = new Job(conf);
    job.setJobName("qq");
    //当前类是运行入口
    job.setJarByClass(QQJobRun.class);
    //mapper类
    job.setMapperClass(QQMapper.class);
    //reducer类
    job.setReducerClass(QQReduce.class);
    //最终统计结果输出类型
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);
    
    job.setNumReduceTasks(1);//设置reduce任务的个数，默认是一个
    //mapreduce 输入数据所在的目录或者文件
    FileInputFormat.addInputPath(job, new Path("/Users/wangqiupeng/Documents/xplan/bigdata/data/hadoop-1.2.1/input/qq/"));
    //mapreduce执行之后的输出数据的目录 这个输出路径的部分目录可以没有，如果没有会自动创建
    FileOutputFormat.setOutputPath(job, new Path("/Users/wangqiupeng/Documents/xplan/bigdata/data/hadoop-1.2.1/output/qq/"));
    
    //等待job完成退出
    System.exit(job.waitForCompletion(true) ? 0 :1);
    
   }catch(Exception e){
    e.printStackTrace();
   }
  }
}

执行结果：

作者：topwqp
链接：https://www.jianshu.com/p/8707cd015ba1
來源：简书
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

查看全文

相关阅读:
UVA11367 Full Tank?
不均衡样本集问题
 NLP interview
Linux 指令
 Python 趣题
 Grid Illumination
动态规划-Minimum Cost to Merge Stones
Contest 141
Python join()方法
 Single Number

原文地址：https://www.cnblogs.com/xiaohanlin/p/8908124.html

hadoop计算二度人脉关系推荐好友

问题描述：

注意以下是制表符：tab建，所以程序中用 /t进行分割

问题分析

编程实现：

mapper实现 分离主 从 从 主

reduce实现去重和笛卡尔积

JobRun编写

执行结果：

mapper实现分离主从从主