zoukankan      html  css  js  c++  java
  • Python & MapReduce

    使用Python实现Hadoop MapReduce程序

     

    原文请参考:

    http://blog.csdn.net/zhaoyl03/article/details/8657031/

    下面只是将mapper.py和reducer.py在windows上运行了一遍,没有用Hadoop的环境去测试。

    环境准备:

    1. Window 7 – 32
    2. 安装GunWin32,使得Linux命令可以在cmd上执行
    3. 安装IDLE (Python GUI),使得Python脚本可以执行
    4. 将Python的安装路径添加到windows的环境变量中,使得在cmd窗口中切换到Python脚本所在目录时,通过输入脚本名,可以直接执行Python脚本

    我的Python安装在: C:Python27python.exe下

    测试脚本放在: E:PythonTest下

    windows环境变量中增加:C:Python27

    mapper.py :

     

    #!/usr/bin/env python  
      
    import sys  
      
    # input comes from STDIN (standard input)  
    for line in sys.stdin:  
        # remove leading and trailing whitespace  
        line = line.strip()  
        # split the line into words  
        words = line.split()  
        # increase counters  
        for word in words:  
            # write the results to STDOUT (standard output);  
            # what we output here will be the input for the  
            # Reduce step, i.e. the input for reducer.py  
            #  
            # tab-delimited; the trivial word count is 1  
            print '%s	%s' % (word, 1)  

     

     

    reducer.py :

     

    #!/usr/bin/env python  
      
    from operator import itemgetter  
    import sys  
      
    current_word = None  
    current_count = 0  
    word = None  
      
    # input comes from STDIN  
    for line in sys.stdin:  
        # remove leading and trailing whitespace  
        line = line.strip()  
      
        # parse the input we got from mapper.py  
        word, count = line.split('	', 1)  
      
        # convert count (currently a string) to int  
        try:  
            count = int(count)  
        except ValueError:  
            # count was not a number, so silently  
            # ignore/discard this line  
            continue  
      
        # this IF-switch only works because Hadoop sorts map output  
        # by key (here: word) before it is passed to the reducer  
        if current_word == word:  
            current_count += count  
        else:  
            if current_word:  
                # write result to STDOUT  
                print '%s	%s' % (current_word, current_count)  
            current_count = count  
            current_word = word  
      
    # do not forget to output the last word if needed!  
    if current_word == word:  
        print '%s	%s' % (current_word, current_count) 

    输出结果:

  • 相关阅读:
    课堂作业
    读书计划
    软件工程----11软件演化
    软件工程----10软件测试
    软件工程概论第五章--软件工程中的形式化方法
    软件工程概论第四章--需求工程
    软件工程概论第三章--软件项目管理
    软件工程概论第二章--软件过程
    软件工程概论第一章--概述
    在jsp里面如何用按钮跳转(转自http://oracleabc-126-com.iteye.com/blog/941739)自己留着学
  • 原文地址:https://www.cnblogs.com/kevin-yuan/p/4485143.html
Copyright © 2011-2022 走看看