hadoop一代集群运行代码案例

zoukankan html css js c++ java

hadoop一代集群运行代码案例

hadoop一代集群运行代码案例

集群一个 master，两个slave，IP分别是192.168.1.2、192.168.1.3、192.168.1.4               hadoop版本是1.2.1

一、            启动hadoop

        进入hadoop的bin目录

二、建立数据文件，并上传至hdfs

1、在文件目录为 /home/hadoop 下建立文件夹 file，并在file里面建立文件hadoop_02

cd /home/hadoop

mkdir file

cd file

2、写入数据：

数据格式为：

2012-3-1 a

2012-3-2 b

2012-3-3 c

2012-3-4 d

2012-3-5 a

2012-3-6 b

2012-3-7 c

2012-3-3 c

可以循环复制粘贴数据，这样数据量就多了

（学hadoop没数据怎么办？ nutch抓、付费软件抓取、根据需要模拟生成、、）

3、上传hdfs

(1)、hdfs若没有 input目录，创建一个

hadoop fs –mkdir input

(2)、查看hdfs文件

hadoop fs –ls

(3)、把hadoop_02上传至input里面

hadoop fs –put~/file/hadoop_02 input

(4)、查看input文件

hadoop fs –ls input

4、查看 eclipse 里刚上传至hdfs的文件hadoop_02，内容如下：

5、创建MapReduce项目，写入代码：

数据去重代码如下：

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class Dedup {

//map将输入中的value复制到输出数据的key上，并直接输出

public static class Map extends Mapper<Object,Text,Text,Text>{

private static Text line=new Text();//每行数据

//实现map函数

public void map(Object key,Text value,Context context)

throws IOException,InterruptedException{

line=value;

context.write(line, new Text(""));

}

}

//reduce将输入中的key复制到输出数据的key上，并直接输出

public static class Reduce extends Reducer<Text,Text,Text,Text>{

//实现reduce函数

public void reduce(Text key,Iterable<Text> values,Context context)

throws IOException,InterruptedException{

context.write(key, new Text(""));

}

}

public static void main(String[] args) throws Exception{

Configuration conf = new Configuration();

//这句话很关键

conf.set("mapred.job.tracker", "192.168.1.2:9001");

String[] ioArgs=new String[]{"dedup_in","dedup_out"};

    String[] otherArgs = new GenericOptionsParser(conf,

ioArgs).getRemainingArgs();

    if (otherArgs.length != 2) {

      System.err.println("Usage: Data Deduplication <in> <out>");

      System.exit(2);

    }



    Job job = new Job(conf, "Data Deduplication");

    job.setJarByClass(Dedup.class);



    //设置Map、Combine和Reduce处理类

job.setMapperClass(Map.class);

  job.setCombinerClass(Reduce.class);

    job.setReducerClass(Reduce.class);



    //设置输出类型

    job.setOutputKeyClass(Text.class);

    job.setOutputValueClass(Text.class);

    //设置输入和输出目录

    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

    System.exit(job.waitForCompletion(true) ? 0 : 1);

  }

}

6、运行代码

右键项目类

设置输入输出hdfs路径

console输出部分如下：

查看output里hadoop_22 文件，结果如下：

7、关闭hadoop

至此代码运行完毕。

查看全文

相关阅读:
冲刺阶段九
 冲刺阶段八
 学习进度十一
 人月神话阅读笔记01
单词统计续
 冲刺阶段七
 冲刺阶段六
 冲刺阶段五
 bzoj1570: [JSOI2008]Blue Mary的旅行
 bzoj 1690: [Usaco2007 Dec]奶牛的旅行

原文地址：https://www.cnblogs.com/baolibin528/p/4004707.html