zoukankan      html  css  js  c++  java
  • MapReduce(2): How does Mapper work

    In the previous post, we've illustrated how Hadoop MapReduce prepares input for Mappers. Long story short, InputSplit convert physical storaged data into many logical unit, and each one will be processed by a RecordReader, who will generate input (K,V) pairs for Mapper. I used to be confused about how (K,V) pairs are generated, but actually it just breaks a 128M file into single lines (just an example), and each line is a (K,V) pair. A mapper process these pairs one by one untill the end of the file.

    A user-defined mapper, takes input (K,V) pairs from RecordReader, generate new key/value pair set at the output side.Usually we call the new (K,V) pairs as 'immediate (K,V) pairs'. For example: in the post (Using MapReduce on Azure), we define a Mapper as following:

    #!/usr/bin/env python
    """mapper.py"""
    
    import sys
    
    # input comes from STDIN (standard input)
    for line in sys.stdin:
        # remove leading and trailing whitespace
        line = line.strip()
        # split the line into words
        words = line.split()
        # increase counters
        for word in words:
            # write the results to STDOUT (standard output);
            # what we output here will be the input for the
            # Reduce step, i.e. the input for reducer.py
            #
            # tab-delimited; the trivial word count is 1
            print '%s	%s' % (word, 1)
    

     We can see, this mapper just breaks a line into words set, and ouput immediate (K,V) pairs, in which key is the word and value is 1.

    A funny but intuitive illustration for this process is cutting a car into pieces:

  • 相关阅读:
    MVC之Servlet控制器(二)
    MVC之Servlet控制器(一)
    基于Java实现批量下载网络图片
    @ModelAttribute运用详解
    MyBatis
    理解RESTful架构
    并行计算结课论文边写边总结2
    并行计算结课论文边写边总结(1)
    CUDA笔记(六)
    ubuntu12.04
  • 原文地址:https://www.cnblogs.com/rhyswang/p/10946727.html
Copyright © 2011-2022 走看看