MapReduce(2): How does Mapper work

zoukankan html css js c++ java

MapReduce(2): How does Mapper work
In the previous post, we've illustrated how Hadoop MapReduce prepares input for Mappers. Long story short, InputSplit convert physical storaged data into many logical unit, and each one will be processed by a RecordReader, who will generate input (K,V) pairs for Mapper. I used to be confused about how (K,V) pairs are generated, but actually it just breaks a 128M file into single lines (just an example), and each line is a (K,V) pair. A mapper process these pairs one by one untill the end of the file.

A user-defined mapper, takes input (K,V) pairs from RecordReader, generate new key/value pair set at the output side.Usually we call the new (K,V) pairs as 'immediate (K,V) pairs'. For example: in the post (Using MapReduce on Azure), we define a Mapper as following:
#!/usr/bin/env python """mapper.py""" import sys # input comes from STDIN (standard input) for line in sys.stdin: # remove leading and trailing whitespace line = line.strip() # split the line into words words = line.split() # increase counters for word in words: # write the results to STDOUT (standard output); # what we output here will be the input for the # Reduce step, i.e. the input for reducer.py # # tab-delimited; the trivial word count is 1 print '%s %s' % (word, 1)
We can see, this mapper just breaks a line into words set, and ouput immediate (K,V) pairs, in which key is the word and value is 1.

A funny but intuitive illustration for this process is cutting a car into pieces:
查看全文

相关阅读:
win7共享文件
 Linux之samba服务
 Linux之Apache服务
 Linux之ssh服务
 Linux基础入门之管理linux软件（rpm/yum）
Linux基础入门之文件管理类命令
 PHP ssh链接sftp上传下载
 Black Hat Python之#2：TCP代理
 Black Hat Python之#1：制作简单的nc工具
 使用python的socket模块进行网络编程

原文地址：https://www.cnblogs.com/rhyswang/p/10946727.html