zoukankan      html  css  js  c++  java
  • 【Hadoop测试程序】编写MapReduce测试Hadoop环境


    • 我们使用之前搭建好的Hadoop环境,可参见:
    《【Hadoop环境搭建】Centos6.8搭建hadoop伪分布模式》http://www.cnblogs.com/ssslinppp/p/5923793.html   
    • 示例程序为《Hadoop权威指南3》中的获取最高温度的示例程序;

    数据准备

    输入数据为:sample.txt


    1. 0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
    2. 0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
    3. 0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
    4. 0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
    5. 0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999

    将samle.txt上传至HDFS


    1. hadoop fs -put /home/hadoop/ncdcData/sample.txt input


    项目结构


    MaxTemperatureMapper类

    1. package com.ll.maxTemperature;
    2. import java.io.IOException;
    3. import org.apache.hadoop.io.IntWritable;
    4. import org.apache.hadoop.io.LongWritable;
    5. import org.apache.hadoop.io.Text;
    6. import org.apache.hadoop.mapreduce.Mapper;
    7. public class MaxTemperatureMapper extends
    8. Mapper<LongWritable, Text, Text, IntWritable> {
    9. private static final int MISSING = 9999;
    10. @Override
    11. public void map(LongWritable key, Text value, Context context)
    12. throws IOException, InterruptedException {
    13. String line = value.toString();
    14. String year = line.substring(15, 19);
    15. int airTemperature;
    16. if (line.charAt(87) == '+') { // parseInt doesn't like leading plus
    17. // signs
    18. airTemperature = Integer.parseInt(line.substring(88, 92));
    19. } else {
    20. airTemperature = Integer.parseInt(line.substring(87, 92));
    21. }
    22. String quality = line.substring(92, 93);
    23. if (airTemperature != MISSING && quality.matches("[01459]")) {
    24. context.write(new Text(year), new IntWritable(airTemperature));
    25. }
    26. }
    27. }
    28. // ^^ MaxTemperatureMapper

    MaxTemperatureReducer类

    1. package com.ll.maxTemperature;
    2. import java.io.IOException;
    3. import org.apache.hadoop.io.IntWritable;
    4. import org.apache.hadoop.io.Text;
    5. import org.apache.hadoop.mapreduce.Reducer;
    6. public class MaxTemperatureReducer extends
    7. Reducer<Text, IntWritable, Text, IntWritable> {
    8. @Override
    9. public void reduce(Text key, Iterable<IntWritable> values, Context context)
    10. throws IOException, InterruptedException {
    11. int maxValue = Integer.MIN_VALUE;
    12. for (IntWritable value : values) {
    13. maxValue = Math.max(maxValue, value.get());
    14. }
    15. context.write(key, new IntWritable(maxValue));
    16. }
    17. }
    18. // ^^ MaxTemperatureReducer

    MaxTemperature类(主函数)

    1. package com.ll.maxTemperature;
    2. import org.apache.hadoop.fs.Path;
    3. import org.apache.hadoop.io.IntWritable;
    4. import org.apache.hadoop.io.Text;
    5. import org.apache.hadoop.mapreduce.Job;
    6. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    7. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    8. public class MaxTemperature {
    9. public static void main(String[] args) throws Exception {
    10. if (args.length != 2) {
    11. args = new String[] {
    12. "hdfs://localhost:9000/user/hadoop/input/sample.txt",
    13. "hdfs://localhost:9000/user/hadoop/out2" };
    14. }
    15. Job job = new Job(); // 指定作业执行规范
    16. job.setJarByClass(MaxTemperature.class);
    17. job.setJobName("Max temperature");
    18. FileInputFormat.addInputPath(job, new Path(args[0]));
    19. FileOutputFormat.setOutputPath(job, new Path(args[1])); // Reduce函数输出文件的写入路径
    20. job.setMapperClass(MaxTemperatureMapper.class);
    21. job.setCombinerClass(MaxTemperatureReducer.class);
    22. job.setReducerClass(MaxTemperatureReducer.class);
    23. job.setOutputKeyClass(Text.class);
    24. job.setOutputValueClass(IntWritable.class);
    25. System.exit(job.waitForCompletion(true) ? 0 : 1);
    26. }
    27. }
    28. // ^^ MaxTemperature
    解释说明:
    输入路径为:hdfs://localhost:9000/user/hadoop/input/sample.txt
    这部分由两部分组成:
    1. hdfs://localhost:9000/;
    2. /user/hadoop/input/sample.txt
    其中hdfs://localhost:9000/由文件core-size.xml进行设置:

    其中/user/hadoop/input/sample.txt就是上面准备数据时sample.txt存放的路径:

    输出路径为:hdfs://localhost:9000/user/hadoop/out2
    需要注意的是,在执行MapReduce时,这个输出路径一定不要存在,否则会出错。

    pom.xml

    1. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    2. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    3. <modelVersion>4.0.0</modelVersion>
    4. <groupId>com.ll</groupId>
    5. <artifactId>MapReduceTest</artifactId>
    6. <version>0.0.1-SNAPSHOT</version>
    7. <packaging>jar</packaging>
    8. <name>MapReduceTest</name>
    9. <url>http://maven.apache.org</url>
    10. <properties>
    11. <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    12. <hadoopVersion>1.2.1</hadoopVersion>
    13. <junit.version>3.8.1</junit.version>
    14. </properties>
    15. <dependencies>
    16. <dependency>
    17. <groupId>junit</groupId>
    18. <artifactId>junit</artifactId>
    19. <version>${junit.version}</version>
    20. <scope>test</scope>
    21. </dependency>
    22. <!-- Hadoop -->
    23. <dependency>
    24. <groupId>org.apache.hadoop</groupId>
    25. <artifactId>hadoop-core</artifactId>
    26. <version>${hadoopVersion}</version>
    27. <!-- Hadoop -->
    28. </dependency>
    29. </dependencies>
    30. </project>

    程序测试

    Hadoop环境准备

    我们使用之前搭建好的Hadoop环境,可参见:
    《【Hadoop环境搭建】Centos6.8搭建hadoop伪分布模式》http://www.cnblogs.com/ssslinppp/p/5923793.html 

    生成jar包

    下面是生成jar包过程




    上传服务器并运行测试



    使用默认的输入输出路径:

    1. hadoop jar mc.jar


    指定输入输出路径:

    1. hadoop jar /home/hadoop/jars/mc.jar hdfs://localhost:9000/user/hadoop/input/sample.txt hdfs://localhost:9000/user/hadoop/out5






  • 相关阅读:
    ZOJ3209 Treasure Map —— Danc Links 精确覆盖
    HUST1017 Exact cover —— Dancing Links 精确覆盖 模板题
    FZU1686 神龙的难题 —— Dancing Links 可重复覆盖
    POJ3074 Sudoku —— Dancing Links 精确覆盖
    HDU3085 Nightmare Ⅱ —— 双向BFS + 曼哈顿距离
    HDU3533 Escape —— BFS / A*算法 + 预处理
    HDU3567 Eight II —— IDA*算法
    HDU1560 DNA sequence —— IDA*算法
    FZU2150 Fire Game —— BFS
    POJ3087 Shuffle'm Up —— 打表找规律 / map判重
  • 原文地址:https://www.cnblogs.com/ssslinppp/p/5941304.html
Copyright © 2011-2022 走看看