zoukankan      html  css  js  c++  java
  • hadoop权威指南 chapter2 MapReduce

    MapReduce

    MapReduce is a programming model for data processing. The model is simple, yet not too simple to express useful programs in. Hadoop can run MapReduce programs written
    in various languages; in this chapter, we shall look at the same program expressed in Java, Ruby, Python, and C++. Most important, MapReduce programs are inherently parallel, thus putting very large-scale data analysis into the hands of anyone with enough machines at their disposal. MapReduce comes into its own for large datasets, so let’s start by looking at one.

    2.1 Analyzing the Data with Hadoop 使用Hadoop分析数据

    To take advantage of the parallel processing that Hadoop provides, we need to express our query as a MapReduce job. After some local, small-scale testing, we will be able to
    run it on a cluster of machines.

    利用Hadoop提供的并发处理的优势,我们须要使用MapReduce job来表达一个查询,通过一个本地化、小范围的測试,我们就能够在集群机器上执行了。

    2.2 Map and Reduce

    MapReduce works by breaking the processing into two phases: the map phase and the reduce phase.

    Each phase has key-value pairs as input and output, the types of which may be chosen by the programmer. The programmer also specifies two functions: the map function and the reduce function.

    map函数和 reduce函数   输入输出键值对  

    2.3 Scaling Out 横向扩展

    Data Flow 数据流

    A MapReduce job is a unit of work that the client wants to be performed: it consists of the input data, the MapReduce program, and configuration information. Hadoop runs the job by dividing it into tasks, of which there are two types: map tasks and reduce tasks.

    job 是client运行的一个工作单元。由输入数据、程序和配置信息组成。

  • 相关阅读:
    hdu (欧拉函数+容斥原理) GCD
    UVA 11624 Fire!
    drf框架之跨域问题的解决与缓存问题
    drf框架之分页器的用法
    DRF框架之 用户角色权限与访问频率的权限设置
    DRF框架之认证组件用法(第四天)
    DRF框架之视图方法的几个封装好的模块介绍(第三天)
    DRF框架之 serializers 序列化组件
    DRF框架简介(第一天)
    BBS(第三天) 如何吧用户上传的图片文件保存到本地
  • 原文地址:https://www.cnblogs.com/bhlsheji/p/4079804.html
Copyright © 2011-2022 走看看