zoukankan      html  css  js  c++  java
  • Hadoop2.4.1入门实例:MaxTemperature


    注意:以下内容在2.x版本与1.x版本同样适用,已在2.4.1与1.2.0进行测试。

    一、前期准备

    1、创建伪分布Hadoop环境,请参考官方文档。或者http://blog.csdn.net/jediael_lu/article/details/38637277

    2、准备数据文件如下sample.txt:

    123456798676231190101234567986762311901012345679867623119010123456798676231190101234561+00121534567890356
    123456798676231190101234567986762311901012345679867623119010123456798676231190101234562+01122934567890456
    123456798676231190201234567986762311901012345679867623119010123456798676231190101234562+02120234567893456
    123456798676231190401234567986762311901012345679867623119010123456798676231190101234561+00321234567803456
    123456798676231190101234567986762311902012345679867623119010123456798676231190101234561+00429234567903456
    123456798676231190501234567986762311902012345679867623119010123456798676231190101234561+01021134568903456
    123456798676231190201234567986762311902012345679867623119010123456798676231190101234561+01124234578903456
    123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+04121234678903456
    123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+00821235678903456


    二、编写代码

    1、创建Map

    package org.jediael.hadoopDemo.maxtemperature;
    
    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    
    public class MaxTemperatureMapper extends
    		Mapper<LongWritable, Text, Text, IntWritable> {
    	private static final int MISSING = 9999;
    
    	@Override
    	public void map(LongWritable key, Text value, Context context)
    			throws IOException, InterruptedException {
    		String line = value.toString();
    		String year = line.substring(15, 19);
    		int airTemperature;
    		if (line.charAt(87) == '+') { // parseInt doesn't like leading plus
    										// signs
    			airTemperature = Integer.parseInt(line.substring(88, 92));
    		} else {
    			airTemperature = Integer.parseInt(line.substring(87, 92));
    		}
    		String quality = line.substring(92, 93);
    		if (airTemperature != MISSING && quality.matches("[01459]")) {
    			context.write(new Text(year), new IntWritable(airTemperature));
    		}
    	}
    }
    

    2、创建Reduce

    package org.jediael.hadoopDemo.maxtemperature;
    
    import java.io.IOException;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;
    
    public class MaxTemperatureReducer extends
    		Reducer<Text, IntWritable, Text, IntWritable> {
    	@Override
    	public void reduce(Text key, Iterable<IntWritable> values, Context context)
    			throws IOException, InterruptedException {
    		int maxValue = Integer.MIN_VALUE;
    		for (IntWritable value : values) {
    			maxValue = Math.max(maxValue, value.get());
    		}
    		context.write(key, new IntWritable(maxValue));
    	}
    }

    3、创建main方法

    package org.jediael.hadoopDemo.maxtemperature;
    
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    public class MaxTemperature {
    	public static void main(String[] args) throws Exception {
    		if (args.length != 2) {
    			System.err
    					.println("Usage: MaxTemperature <input path> <output path>");
    			System.exit(-1);
    		}
    		Job job = new Job();
    		job.setJarByClass(MaxTemperature.class);
    		job.setJobName("Max temperature");
    		FileInputFormat.addInputPath(job, new Path(args[0]));
    		FileOutputFormat.setOutputPath(job, new Path(args[1]));
    		job.setMapperClass(MaxTemperatureMapper.class);
    		job.setReducerClass(MaxTemperatureReducer.class);
    		job.setOutputKeyClass(Text.class);
    		job.setOutputValueClass(IntWritable.class);
    		System.exit(job.waitForCompletion(true) ? 0 : 1);
    	}
    }
    

    4、导出成MaxTemp.jar,并上传至运行程序的服务器。


    三、运行程序

    1、创建input目录并将sample.txt复制到input目录

    hadoop fs -put sample.txt /

    2、运行程序

    export HADOOP_CLASSPATH=MaxTemp.jar

     hadoop org.jediael.hadoopDemo.maxtemperature.MaxTemperature /sample.txt output10

    注意输出目录不能已经存在,否则会创建失败。

    3、查看结果

    (1)查看结果

    [jediael@jediael44 code]$  hadoop fs -cat output10/*
    14/07/09 14:51:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    1901    42
    1902    212
    1903    412
    1904    32
    1905    102

    (2)运行时输出

    [jediael@jediael44 code]$  hadoop org.jediael.hadoopDemo.maxtemperature.MaxTemperature /sample.txt output10
    14/07/09 14:50:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    14/07/09 14:50:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
    14/07/09 14:50:42 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    14/07/09 14:50:43 INFO input.FileInputFormat: Total input paths to process : 1
    14/07/09 14:50:43 INFO mapreduce.JobSubmitter: number of splits:1
    14/07/09 14:50:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1404888618764_0001
    14/07/09 14:50:44 INFO impl.YarnClientImpl: Submitted application application_1404888618764_0001
    14/07/09 14:50:44 INFO mapreduce.Job: The url to track the job: http://jediael44:8088/proxy/application_1404888618764_0001/
    14/07/09 14:50:44 INFO mapreduce.Job: Running job: job_1404888618764_0001
    14/07/09 14:50:57 INFO mapreduce.Job: Job job_1404888618764_0001 running in uber mode : false
    14/07/09 14:50:57 INFO mapreduce.Job:  map 0% reduce 0%
    14/07/09 14:51:05 INFO mapreduce.Job:  map 100% reduce 0%
    14/07/09 14:51:15 INFO mapreduce.Job:  map 100% reduce 100%
    14/07/09 14:51:15 INFO mapreduce.Job: Job job_1404888618764_0001 completed successfully
    14/07/09 14:51:16 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=94
                    FILE: Number of bytes written=185387
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=1051
                    HDFS: Number of bytes written=43
                    HDFS: Number of read operations=6
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters 
                    Launched map tasks=1
                    Launched reduce tasks=1
                    Data-local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=5812
                    Total time spent by all reduces in occupied slots (ms)=7023
                    Total time spent by all map tasks (ms)=5812
                    Total time spent by all reduce tasks (ms)=7023
                    Total vcore-seconds taken by all map tasks=5812
                    Total vcore-seconds taken by all reduce tasks=7023
                    Total megabyte-seconds taken by all map tasks=5951488
                    Total megabyte-seconds taken by all reduce tasks=7191552
            Map-Reduce Framework
                    Map input records=9
                    Map output records=8
                    Map output bytes=72
                    Map output materialized bytes=94
                    Input split bytes=97
                    Combine input records=0
                    Combine output records=0
                    Reduce input groups=5
                    Reduce shuffle bytes=94
                    Reduce input records=8
                    Reduce output records=5
                    Spilled Records=16
                    Shuffled Maps =1
                    Failed Shuffles=0
                    Merged Map outputs=1
                    GC time elapsed (ms)=154
                    CPU time spent (ms)=1450
                    Physical memory (bytes) snapshot=303112192
                    Virtual memory (bytes) snapshot=1685733376
                    Total committed heap usage (bytes)=136515584
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters 
                    Bytes Read=954
            File Output Format Counters 
                    Bytes Written=43


  • 相关阅读:
    win10 visual studio 2017环境中安装CUDA8
    UEFI+GPT电脑Win10下安装openSUSE Leap 42.2双系统
    CentOS7安装PPTP
    Python基础-三元运算
    Python基础-字典dict
    Python基础-元组tuple
    Python基础-列表list
    Python基础-str类型
    Python基础-int类型方法
    Python基础-查看对象的类或对象所具备的功能
  • 原文地址:https://www.cnblogs.com/jediael/p/4304100.html
Copyright © 2011-2022 走看看