zoukankan      html  css  js  c++  java
  • MapReduce(2): How does Mapper work

    In the previous post, we've illustrated how Hadoop MapReduce prepares input for Mappers. Long story short, InputSplit convert physical storaged data into many logical unit, and each one will be processed by a RecordReader, who will generate input (K,V) pairs for Mapper. I used to be confused about how (K,V) pairs are generated, but actually it just breaks a 128M file into single lines (just an example), and each line is a (K,V) pair. A mapper process these pairs one by one untill the end of the file.

    A user-defined mapper, takes input (K,V) pairs from RecordReader, generate new key/value pair set at the output side.Usually we call the new (K,V) pairs as 'immediate (K,V) pairs'. For example: in the post (Using MapReduce on Azure), we define a Mapper as following:

    #!/usr/bin/env python
    """mapper.py"""
    
    import sys
    
    # input comes from STDIN (standard input)
    for line in sys.stdin:
        # remove leading and trailing whitespace
        line = line.strip()
        # split the line into words
        words = line.split()
        # increase counters
        for word in words:
            # write the results to STDOUT (standard output);
            # what we output here will be the input for the
            # Reduce step, i.e. the input for reducer.py
            #
            # tab-delimited; the trivial word count is 1
            print '%s	%s' % (word, 1)
    

     We can see, this mapper just breaks a line into words set, and ouput immediate (K,V) pairs, in which key is the word and value is 1.

    A funny but intuitive illustration for this process is cutting a car into pieces:

  • 相关阅读:
    win7共享文件
    Linux之samba服务
    Linux之Apache服务
    Linux之ssh服务
    Linux基础入门之管理linux软件(rpm/yum)
    Linux基础入门之文件管理类命令
    PHP ssh链接sftp上传下载
    Black Hat Python之#2:TCP代理
    Black Hat Python之#1:制作简单的nc工具
    使用python的socket模块进行网络编程
  • 原文地址:https://www.cnblogs.com/rhyswang/p/10946727.html
Copyright © 2011-2022 走看看