zoukankan      html  css  js  c++  java
  • [100]awk运算-解决企业统计pv/ip问题

    awk运算

    awk以脚本方式运行

    #!/bin/awk
    BEGIN{
        arr[1]="maotai";
        arr[2]="maotai"
        for(k in arr)
            print k,arr[k]
    }
    
    [root@n1 ~]# awk -f a.sh 
    1 maotai
    2 maomao
    

    awk以命令行方法

    [root@n1 ~]# awk 'BEGIN{arr[1]="maotai"; arr[2]="maotai";} END{for(k in arr) print k,arr[k]}' /etc/passwd
    1 maotai
    2 maotai
    [root@n1 ~]# cat /etc/hosts|awk 'BEGIN{arr[1]="maotai"; arr[2]="maotai";} END{for(k in arr) print k,arr[k]}'
    1 maotai
    2 maotai
    

    将文件用awk以数组方式输出

    [root@n1 ~]# cat t3.log 
    1 maotai
    2 maomao
    
    [root@n1 ~]# awk '{S[$1]=$2}END{for(k in S) print k,S[k]}' t3.log 
    1 maotai
    2 maomao
    

    awk运用

    top10url统计: 统计url(www.maotai.com)出现排名

    思路先过滤url,然后再借助awk运算或者sort+uniq排序

    [root@n1 ~]# cat url.log 
    http://www.maotai.com/index.html
    http://www.maotai.com/1.html
    http://post.maotai.com/index.html
    http://mp3.maotai.com/3.html
    http://www.maotai.com/1.html
    http://post.maotai.com/2.html
    
    [root@n1 ~]# awk -F '/' '{print $3}' url.log |sort -r|uniq -c
          3 www.maotai.com
          2 post.maotai.com
          1 mp3.maotai.com
    
    [root@n1 ~]# awk -F "/" '{}END{}'
    
    [root@n1 ~]# awk -F '/' '{s[$3]=s[$3]+1}END{for(k in s) print s[k],k}' url.log |sort -r
    3 www.maotai.com
    2 post.maotai.com
    1 mp3.maotai.com
    
    - top10 url统计
    [root@n1 ~]# awk -F '/' '{s[$3]++}END{for(k in s) print s[k],k}' url.log |sort -r|head
    3 www.maotai.com
    2 post.maotai.com
    1 mp3.maotai.com
    

    top10 ip统计: 统计web日志单ip访问请求数排名:爬虫

    access.log

    10.0.0.41 - - [03/Dec/2010:23:27:01 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.43 - - [03/Dec/2010:23:27:01 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.42 - - [03/Dec/2010:23:27:01 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.46 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.42 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.47 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.41 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.47 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.41 - - [03/Dec/2010:23:27:03 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.46 - - [03/Dec/2010:23:27:03 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    
    [root@n1 ~]# awk '{s[$1]++}END{for(k in s) print k,s[k]}' access.log |sort -rk2|head
    10.0.0.41 3
    10.0.0.47 2
    10.0.0.46 2
    10.0.0.42 2
    10.0.0.43 1
    

    tcp13种状态统计

    统计企业工作中高并发linux服务器不同网站链接状态的数量

    awk '/^tcp/' a.log
    

    方法1:

    [root@n1 ~]# awk '/^tcp/ {print $NF}' a.log |sort|uniq -c |sort -rn|head
         79 ESTABLISHED
          7 CLOSE_WAIT
          2 TIME_WAIT
    
    

    方法2:

    [root@n1 ~]# awk '/^tcp/ {S[$NF]++}END{for(k in S) print S[k],k}' a.log |sort -rn|head
    79 ESTABLISHED
    7 CLOSE_WAIT
    2 TIME_WAIT
    

    分析图片服务器日志,把日志(每个图片访问次数*图片大小的总和)排行,取top10,也就是计算每个url的总访问大小.

    69.33.22.101 - - [08/Dec/2010:15:43:56 +0800] "GET /static/images/photos/2.jpg HTTP/1.1" 200 11299 "http://www.cnblogs.com/iiiiher/static/web/column/17/index.shtml?courseId=43" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
    59.33.26.105 - - [08/Dec/2010:15:43:56 +0800] "GET /static/images/photos/2.jpg HTTP/1.1" 200 11299 "http://www.cnblogs.com/iiiiher/static/web/column/17/index.shtml?courseId=43" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
    59.33.26.105 - - [08/Dec/2010:15:44:02 +0800] "GET /static/flex/vedioLoading.swf HTTP/1.1" 200 3583 "http://www.cnblogs.com/iiiiher/static/flex/AdobeVideoPlayer.swf?width=590&height=328&url=/`DYNAMIC`/2" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
    124.115.4.18 - - [08/Dec/2010:15:44:15 +0800] "GET /?= HTTP/1.1" 200 46232 "-" "-"
    124.115.4.18 - - [08/Dec/2010:15:44:25 +0800] "GET /static/js/web_js.js HTTP/1.1" 200 4460 "-" "-"
    124.115.4.18 - - [08/Dec/2010:15:44:25 +0800] "GET /static/js/jquery.lazyload.js HTTP/1.1" 200 1627 "-" "-"
    

    轻松应对IDC机房带宽突然暴涨问题

    输出格式: [访问次数*单个文件大小] [访问次数] 文件名(可带url)

    • 输出url 大小
    [root@n1 ~]# awk '{print $7"	"$10}' image.log
    /static/images/photos/2.jpg	11299
    /static/images/photos/2.jpg	11299
    /static/flex/vedioLoading.swf	3583
    /?=	46232
    /static/js/web_js.js	4460
    /static/js/jquery.lazyload.js	1627
    
    
    • 对输出结果统计次数
    [root@n1 ~]# awk '{print $7"	"$10}' image.log|sort | uniq -c
          1 /?=	46232
          1 /static/flex/vedioLoading.swf	3583
          2 /static/images/photos/2.jpg	11299
          1 /static/js/jquery.lazyload.js	1627
          1 /static/js/web_js.js	4460
    
    • 按照格式awk运算列后打印出: 方法1
    [root@n1 ~]# awk '{print $7"	"$10}' image.log |sort|uniq -c|awk '{print $1*$3"	"$1"	"$2}'|sort -rn|head
    46232	1	/?=
    22598	2	/static/images/photos/2.jpg
    4460	1	/static/js/web_js.js
    3583	1	/static/flex/vedioLoading.swf
    1627	1	/static/js/jquery.lazyload.js
    

    方法2:

    [root@n1 ~]# awk '{print $7"	" $10}' image.log|awk '{S[$1]+=$2;S1[$1]+=1}END{for(i in S) print S[i],S1[i],i}'|sort -rn|head -10
    46232 1 /?=
    22598 2 /static/images/photos/2.jpg
    4460 1 /static/js/web_js.js
    3583 1 /static/flex/vedioLoading.swf
    1627 1 /static/js/jquery.lazyload.js
    
    

    求某一列的和

    [root@n1 sh]# cat /etc/passwd|awk -F ':' '{s+=$3}END{print s}'
    2717
    
    

    awk易错点: awk要用单引号

    - awk要用单引号
    netstat -an|awk '/^tcp/ {S[$NF]++}END{for(k in S) print S[k],k}'
    
    - awk是双引号(会报错)
    netstat -an|awk "/^tcp/ {s[$NF]++}END{for(i in s) print k,s[k]}"
    
    

  • 相关阅读:
    String.Split()函数
    Java的位运算符具体解释实例——与(&)、非(~)、或(|)、异或(^)
    开机黑屏 仅仅显示鼠标 电脑黑屏 仅仅有鼠标 移动 [已成功解决]
    Java中Scanner的使用方法
    C++经典面试题
    人脸识别算法初次了解
    ShareSDK的简化压缩和使用样例
    hdu 1316 How Many Fibs? (模拟高精度)
    AABB包围盒、OBB包围盒、包围球的比較
    Windows 7系统安装MySQL5.5.21图解
  • 原文地址:https://www.cnblogs.com/iiiiher/p/8576537.html
Copyright © 2011-2022 走看看