zoukankan      html  css  js  c++  java
  • 三、基于hadoop的nginx访问日志分析--计算时刻pv

    代码:

    # cat pv_hour.py 
    #!/usr/bin/env python
    # coding=utf-8
    
    from mrjob.job import MRJob
    from nginx_accesslog_parser import NginxLineParser
    
    class PvDay(MRJob):
    
        nginx_line_parser = NginxLineParser()
    
        def mapper(self, _, line):
    
            self.nginx_line_parser.parse(line)
            _, tm = str(self.nginx_line_parser.time_local).split()
            h, m, s = tm.split(':')
            yield h, 1 # 每小时的
    
        def reducer(self, key, values):
            yield key, sum(values)
    
    def main():
        PvDay.run()
    
    if __name__ == '__main__':
        main()

    执行结果

    # python3 pv_hour.py access_all.log-20161227 
    No configs found; falling back on auto-configuration
    Creating temp directory /tmp/pv_hour.root.20161228.025503.341576
    Running step 1 of 1...
    Streaming final output from /tmp/pv_hour.root.20161228.025503.341576/output...
    "14"    21158
    "15"    20958
    "16"    16080
    "17"    14194
    "18"    13114
    "19"    16898
    "20"    18870
    "21"    14067
    "22"    14053
    "23"    12683
    "00"    13185
    "01"    14785
    "02"    12449
    "03"    7364
    "04"    3628
    "05"    9074
    "06"    9317
    "07"    11887
    "08"    13492
    "09"    19564
    "10"    18390
    "11"    15697
    "12"    17518
    "13"    18785
    Removing temp directory /tmp/pv_hour.root.20161228.025503.341576...
  • 相关阅读:
    codeforces 985 F. Isomorphic Strings
    Educational Codeforces Round 44
    codeforces 979D
    ARC060 Digit Sum II
    Iroha and Haiku II
    Unhappy Hacking II
    Just h-index 2018湘潭邀请赛
    [HAOI2007]理想的正方形
    P1231 教辅的组成
    最小割数学形式
  • 原文地址:https://www.cnblogs.com/xiaoming279/p/6228622.html
Copyright © 2011-2022 走看看