zoukankan      html  css  js  c++  java
  • 0006.HDFS的底层原理:代理对象和RPC


    05-01-MapReduce回顾


    05-02-分析WordCount数据处理的过程

    分析WordCount数据处理的过程

    1602970288243
    1602970288243

    分析WordCount数据处理的过程
    =============Map阶段===============
    k1,v1	代表Map的输入
    	k1 该行数据的偏移量,字节计算 :LongWritable
    	v1 读入的数据 :Text
    	
    k2,v2	代表Map的输出
    	k2 单词 :Text
    	v2 每个单词记一次 :IntWritable
    
    =============Map阶段===============
    //相同的k2会被同一个Reduce处理
    //这种处理逻辑实际上是实现了SQL的
    1.group by 分组
    2.distinct 去重
    //v2和v3数据类型一致,v3是一个集合,该集合中的每个值就是v2
    =============Reduce阶段============
    	k3,v3	代表Reduce的输入 
    	k3 :Text
    	v3 :IntWritable
    
    k4,v4	代表Reduce的输出	
    	k4 :Text
    	v4 :IntWritable
    	
    =============Reduce阶段============
    

    05-03-开发WC的Map和Reducer

    添加依赖的jar包
    	/root/training/hadoop-2.7.3/share/hadoop/common/*.jar
    	/root/training/hadoop-2.7.3/share/hadoop/common/lib/*.jar
    	
    	/root/training/hadoop-2.7.3/share/hadoop/mapreduce/*.jar
    	/root/training/hadoop-2.7.3/share/hadoop/mapreduce/lib/*.jar
    

    05-04-开发WC的主程序

    
    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    
    //实现Map的功能
    //                                           k1           v1      k2      v2
    public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    
    	@Override
    	protected void map(LongWritable key1, Text value1, Context context)
    			throws IOException, InterruptedException {
    		/*
    		 * context: map的上下文 
    		 * 上文:HDFS
    		 * 下文:Reducer
    		 */
    		//得到数据   I love Beijing
    		String data = value1.toString();
    		
    		//分词
    		String[] words = data.split(" ");
    		
    		//输出    k2     v2
    		for(String w:words){
    			//             k2            v2
    			context.write(new Text(w), new IntWritable(1));
    		}
    	}
    }
    
    
    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;
    
    //实现Reducer的功能
    //                                             k3      v3         k4       v4
    public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    
    	@Override
    	protected void reduce(Text k3, Iterable<IntWritable> v3,Context context) throws IOException, InterruptedException {
    		/*
    		 * context是Reducer的上下文
    		 * 上文:Map
    		 * 下文:HDFS
    		 */
    		int total = 0;
    		for(IntWritable v:v3){
    			//求和
    			total = total + v.get();
    		}
    		
    		//输出  k4  v4
    		context.write(k3, new IntWritable(total));
    	}
    }
    
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    //主程序
    public class WordCountMain {
    
    	public static void main(String[] args) throws Exception {
    		//1、创建一个任务
    		Job job = Job.getInstance(new Configuration());
    		job.setJarByClass(WordCountMain.class); //任务的入口		
    		
    		//2、指定任务的map和map输出的数据类型
    		job.setMapperClass(WordCountMapper.class);
    		job.setMapOutputKeyClass(Text.class);  //k2的数据类型
    		job.setMapOutputValueClass(IntWritable.class);  //v2的类型
    		
    		//3、指定任务的reduce和reduce的输出数据的类型
    		job.setReducerClass(WordCountReducer.class);
    		job.setOutputKeyClass(Text.class); //k4的类型
    		job.setOutputValueClass(IntWritable.class); //v4的类型
    		
    		//4、指定任务的输入路径、任务的输出路径
    		FileInputFormat.setInputPaths(job, new Path(args[0]));
    		FileOutputFormat.setOutputPath(job, new Path(args[1]));
    		
    		//5、执行任务
    		job.waitForCompletion(true);
    	}
    }
    
    
    root@bigdata00:~# cd /root/temp/input
    root@bigdata00:~/temp/input# ls
    data.txt  wc.jar
    root@bigdata00:~/temp/input# hadoop jar /input/data.txt /output/mwc
    Not a valid JAR: /input/data.txt
    root@bigdata00:~/temp/input# start-all.sh
    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
    Starting namenodes on [192.168.16.143]
    192.168.16.143: starting namenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-namenode-bigdata00.out
    localhost: starting datanode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-datanode-bigdata00.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /root/training/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-bigdata00.out
    starting yarn daemons
    starting resourcemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-resourcemanager-bigdata00.out
    localhost: starting nodemanager, logging to /root/training/hadoop-2.7.3/logs/yarn-root-nodemanager-bigdata00.out
    root@bigdata00:~/temp/input# jps
    2992 SecondaryNameNode
    2662 DataNode
    3384 NodeManager
    3706 Jps
    2429 NameNode
    3149 ResourceManager
    root@bigdata00:~/temp/input# hadoop jar wc.jar /input/data.txt /output/wc
    20/10/18 07:01:19 INFO client.RMProxy: Connecting to ResourceManager at /192.168.16.143:8032
    20/10/18 07:01:20 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    20/10/18 07:01:21 INFO input.FileInputFormat: Total input paths to process : 1
    20/10/18 07:01:22 INFO mapreduce.JobSubmitter: number of splits:1
    20/10/18 07:01:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1602975652710_0001
    20/10/18 07:01:24 INFO impl.YarnClientImpl: Submitted application application_1602975652710_0001
    20/10/18 07:01:24 INFO mapreduce.Job: The url to track the job: http://192.168.16.143:8088/proxy/application_1602975652710_0001/
    20/10/18 07:01:24 INFO mapreduce.Job: Running job: job_1602975652710_0001
    20/10/18 07:01:48 INFO mapreduce.Job: Job job_1602975652710_0001 running in uber mode : false
    20/10/18 07:01:48 INFO mapreduce.Job:  map 0% reduce 0%
    20/10/18 07:02:01 INFO mapreduce.Job:  map 100% reduce 0%
    20/10/18 07:02:15 INFO mapreduce.Job:  map 100% reduce 100%
    20/10/18 07:02:17 INFO mapreduce.Job: Job job_1602975652710_0001 completed successfully
    20/10/18 07:02:17 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=138
                    FILE: Number of bytes written=237505
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=166
                    HDFS: Number of bytes written=55
                    HDFS: Number of read operations=6
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters 
                    Launched map tasks=1
                    Launched reduce tasks=1
                    Data-local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=10530
                    Total time spent by all reduces in occupied slots (ms)=10790
                    Total time spent by all map tasks (ms)=10530
                    Total time spent by all reduce tasks (ms)=10790
                    Total vcore-milliseconds taken by all map tasks=10530
                    Total vcore-milliseconds taken by all reduce tasks=10790
                    Total megabyte-milliseconds taken by all map tasks=10782720
                    Total megabyte-milliseconds taken by all reduce tasks=11048960
            Map-Reduce Framework
                    Map input records=3
                    Map output records=12
                    Map output bytes=108
                    Map output materialized bytes=138
                    Input split bytes=106
                    Combine input records=0
                    Combine output records=0
                    Reduce input groups=8
                    Reduce shuffle bytes=138
                    Reduce input records=12
                    Reduce output records=8
                    Spilled Records=24
                    Shuffled Maps =1
                    Failed Shuffles=0
                    Merged Map outputs=1
                    GC time elapsed (ms)=350
                    CPU time spent (ms)=4560
                    Physical memory (bytes) snapshot=287481856
                    Virtual memory (bytes) snapshot=4438601728
                    Total committed heap usage (bytes)=138268672
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters 
                    Bytes Read=60
            File Output Format Counters 
                    Bytes Written=55
    root@bigdata00:~/temp/input# hdfs dfs -cat /output/wc
    cat: `/output/wc': Is a directory
    root@bigdata00:~/temp/input# hdfs dfs -cat /output/wc/part-r-00000
    Beijing 2
    China   2
    I       2
    capital 1
    is      1
    love    2
    of      1
    the     1
    
    
    源码day0907HelloMap
    打包.png

    1602970288243


    05-05-Yarn调度MapReduce任务的过程

    Yarn调度MapReduce任务的过程.png

    1602970288243

  • 相关阅读:
    福建省队集训被虐记——DAY1
    bzoj1755 [Usaco2005 qua]Bank Interest
    bzoj1754 [Usaco2005 qua]Bull Math
    bzoj1753 [Usaco2005 qua]Who's in the Middle
    wikioi1369 xth 砍树
    wikioi1191 数轴染色
    bzoj1189 [HNOI2007]紧急疏散evacuate
    POJ 3734 Blocks(矩阵快速幂+矩阵递推式)
    斐波那契+大数相加
    矩阵的快速幂
  • 原文地址:https://www.cnblogs.com/RoyalGuardsTomCat/p/13833388.html
Copyright © 2011-2022 走看看