zoukankan      html  css  js  c++  java
  • MapReduce(2): How does Mapper work

    In the previous post, we've illustrated how Hadoop MapReduce prepares input for Mappers. Long story short, InputSplit convert physical storaged data into many logical unit, and each one will be processed by a RecordReader, who will generate input (K,V) pairs for Mapper. I used to be confused about how (K,V) pairs are generated, but actually it just breaks a 128M file into single lines (just an example), and each line is a (K,V) pair. A mapper process these pairs one by one untill the end of the file.

    A user-defined mapper, takes input (K,V) pairs from RecordReader, generate new key/value pair set at the output side.Usually we call the new (K,V) pairs as 'immediate (K,V) pairs'. For example: in the post (Using MapReduce on Azure), we define a Mapper as following:

    #!/usr/bin/env python
    """mapper.py"""
    
    import sys
    
    # input comes from STDIN (standard input)
    for line in sys.stdin:
        # remove leading and trailing whitespace
        line = line.strip()
        # split the line into words
        words = line.split()
        # increase counters
        for word in words:
            # write the results to STDOUT (standard output);
            # what we output here will be the input for the
            # Reduce step, i.e. the input for reducer.py
            #
            # tab-delimited; the trivial word count is 1
            print '%s	%s' % (word, 1)
    

     We can see, this mapper just breaks a line into words set, and ouput immediate (K,V) pairs, in which key is the word and value is 1.

    A funny but intuitive illustration for this process is cutting a car into pieces:

  • 相关阅读:
    版本控制:SVN中Branch/tag的使用 -摘自网络
    安卓手机修改hosts攻略-摘自网络
    Web Api 2 怎么支持 Session
    几种判断asp.net中session过期方法的比较
    MSDN在线
    JS监听关闭浏览器事件
    VS调试Ajax
    SQL Server LEFT Functions
    Sql Server REPLACE函数的使用;SQL中 patindex函数的用法
    EXCEL公式测试使用Substitute
  • 原文地址:https://www.cnblogs.com/rhyswang/p/10946727.html
Copyright © 2011-2022 走看看