zoukankan      html  css  js  c++  java
  • Python & MapReduce

    使用Python实现Hadoop MapReduce程序

     

    原文请参考:

    http://blog.csdn.net/zhaoyl03/article/details/8657031/

    下面只是将mapper.py和reducer.py在windows上运行了一遍,没有用Hadoop的环境去测试。

    环境准备:

    1. Window 7 – 32
    2. 安装GunWin32,使得Linux命令可以在cmd上执行
    3. 安装IDLE (Python GUI),使得Python脚本可以执行
    4. 将Python的安装路径添加到windows的环境变量中,使得在cmd窗口中切换到Python脚本所在目录时,通过输入脚本名,可以直接执行Python脚本

    我的Python安装在: C:Python27python.exe下

    测试脚本放在: E:PythonTest下

    windows环境变量中增加:C:Python27

    mapper.py :

     

    #!/usr/bin/env python  
      
    import sys  
      
    # input comes from STDIN (standard input)  
    for line in sys.stdin:  
        # remove leading and trailing whitespace  
        line = line.strip()  
        # split the line into words  
        words = line.split()  
        # increase counters  
        for word in words:  
            # write the results to STDOUT (standard output);  
            # what we output here will be the input for the  
            # Reduce step, i.e. the input for reducer.py  
            #  
            # tab-delimited; the trivial word count is 1  
            print '%s	%s' % (word, 1)  

     

     

    reducer.py :

     

    #!/usr/bin/env python  
      
    from operator import itemgetter  
    import sys  
      
    current_word = None  
    current_count = 0  
    word = None  
      
    # input comes from STDIN  
    for line in sys.stdin:  
        # remove leading and trailing whitespace  
        line = line.strip()  
      
        # parse the input we got from mapper.py  
        word, count = line.split('	', 1)  
      
        # convert count (currently a string) to int  
        try:  
            count = int(count)  
        except ValueError:  
            # count was not a number, so silently  
            # ignore/discard this line  
            continue  
      
        # this IF-switch only works because Hadoop sorts map output  
        # by key (here: word) before it is passed to the reducer  
        if current_word == word:  
            current_count += count  
        else:  
            if current_word:  
                # write result to STDOUT  
                print '%s	%s' % (current_word, current_count)  
            current_count = count  
            current_word = word  
      
    # do not forget to output the last word if needed!  
    if current_word == word:  
        print '%s	%s' % (current_word, current_count) 

    输出结果:

  • 相关阅读:
    飞入飞出效果
    【JSOI 2008】星球大战 Starwar
    POJ 1094 Sorting It All Out
    POJ 2728 Desert King
    【ZJOI 2008】树的统计 Count
    【SCOI 2009】生日快乐
    POJ 3580 SuperMemo
    POJ 1639 Picnic Planning
    POJ 2976 Dropping Tests
    SPOJ QTREE
  • 原文地址:https://www.cnblogs.com/kevin-yuan/p/4485143.html
Copyright © 2011-2022 走看看