zoukankan html css js c++ java

Python & MapReduce

使用Python实现Hadoop MapReduce程序

原文请参考：

http://blog.csdn.net/zhaoyl03/article/details/8657031/

下面只是将mapper.py和reducer.py在windows上运行了一遍，没有用Hadoop的环境去测试。

环境准备：

Window 7 – 32
安装GunWin32，使得Linux命令可以在cmd上执行
安装IDLE (Python GUI)，使得Python脚本可以执行
将Python的安装路径添加到windows的环境变量中，使得在cmd窗口中切换到Python脚本所在目录时，通过输入脚本名，可以直接执行Python脚本

我的Python安装在： C:Python27python.exe下

测试脚本放在： E:PythonTest下

windows环境变量中增加：C:Python27

mapper.py :

#!/usr/bin/env python  
  
import sys  
  
# input comes from STDIN (standard input)  
for line in sys.stdin:  
    # remove leading and trailing whitespace  
    line = line.strip()  
    # split the line into words  
    words = line.split()  
    # increase counters  
    for word in words:  
        # write the results to STDOUT (standard output);  
        # what we output here will be the input for the  
        # Reduce step, i.e. the input for reducer.py  
        #  
        # tab-delimited; the trivial word count is 1  
        print '%s	%s' % (word, 1)

reducer.py :

#!/usr/bin/env python  
  
from operator import itemgetter  
import sys  
  
current_word = None  
current_count = 0  
word = None  
  
# input comes from STDIN  
for line in sys.stdin:  
    # remove leading and trailing whitespace  
    line = line.strip()  
  
    # parse the input we got from mapper.py  
    word, count = line.split('	', 1)  
  
    # convert count (currently a string) to int  
    try:  
        count = int(count)  
    except ValueError:  
        # count was not a number, so silently  
        # ignore/discard this line  
        continue  
  
    # this IF-switch only works because Hadoop sorts map output  
    # by key (here: word) before it is passed to the reducer  
    if current_word == word:  
        current_count += count  
    else:  
        if current_word:  
            # write result to STDOUT  
            print '%s	%s' % (current_word, current_count)  
        current_count = count  
        current_word = word  
  
# do not forget to output the last word if needed!  
if current_word == word:  
    print '%s	%s' % (current_word, current_count)

输出结果：

查看全文

相关阅读:
Win7 IE11无法打开的可能解决办法
 sql server 2000登录名与数据库用户名的关联问题
 错误 0xc0202049: 数据流任务 1: 无法在只读列“ID”中插入数据
 清空SQL Server数据库中所有表数据的方法
 01-鼠标点击空白处实现层隐藏
 01-artDialog4.1.7常用整理
 ASP.NET MVC HtmlHelper用法大全
 随机生成十个数填充数组
 字串加密、解密
 动手动脑、String类函数的使用说明

原文地址：https://www.cnblogs.com/kevin-yuan/p/4485143.html