zoukankan      html  css  js  c++  java
  • [100]awk运算-解决企业统计pv/ip问题

    awk运算

    awk以脚本方式运行

    #!/bin/awk
    BEGIN{
        arr[1]="maotai";
        arr[2]="maotai"
        for(k in arr)
            print k,arr[k]
    }
    
    [root@n1 ~]# awk -f a.sh 
    1 maotai
    2 maomao
    

    awk以命令行方法

    [root@n1 ~]# awk 'BEGIN{arr[1]="maotai"; arr[2]="maotai";} END{for(k in arr) print k,arr[k]}' /etc/passwd
    1 maotai
    2 maotai
    [root@n1 ~]# cat /etc/hosts|awk 'BEGIN{arr[1]="maotai"; arr[2]="maotai";} END{for(k in arr) print k,arr[k]}'
    1 maotai
    2 maotai
    

    将文件用awk以数组方式输出

    [root@n1 ~]# cat t3.log 
    1 maotai
    2 maomao
    
    [root@n1 ~]# awk '{S[$1]=$2}END{for(k in S) print k,S[k]}' t3.log 
    1 maotai
    2 maomao
    

    awk运用

    top10url统计: 统计url(www.maotai.com)出现排名

    思路先过滤url,然后再借助awk运算或者sort+uniq排序

    [root@n1 ~]# cat url.log 
    http://www.maotai.com/index.html
    http://www.maotai.com/1.html
    http://post.maotai.com/index.html
    http://mp3.maotai.com/3.html
    http://www.maotai.com/1.html
    http://post.maotai.com/2.html
    
    [root@n1 ~]# awk -F '/' '{print $3}' url.log |sort -r|uniq -c
          3 www.maotai.com
          2 post.maotai.com
          1 mp3.maotai.com
    
    [root@n1 ~]# awk -F "/" '{}END{}'
    
    [root@n1 ~]# awk -F '/' '{s[$3]=s[$3]+1}END{for(k in s) print s[k],k}' url.log |sort -r
    3 www.maotai.com
    2 post.maotai.com
    1 mp3.maotai.com
    
    - top10 url统计
    [root@n1 ~]# awk -F '/' '{s[$3]++}END{for(k in s) print s[k],k}' url.log |sort -r|head
    3 www.maotai.com
    2 post.maotai.com
    1 mp3.maotai.com
    

    top10 ip统计: 统计web日志单ip访问请求数排名:爬虫

    access.log

    10.0.0.41 - - [03/Dec/2010:23:27:01 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.43 - - [03/Dec/2010:23:27:01 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.42 - - [03/Dec/2010:23:27:01 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.46 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.42 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.47 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.41 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.47 - - [03/Dec/2010:23:27:02 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.41 - - [03/Dec/2010:23:27:03 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    10.0.0.46 - - [03/Dec/2010:23:27:03 +0800] "HEAD /checkstatus.jsp HTTP/1.0" 200 -
    
    [root@n1 ~]# awk '{s[$1]++}END{for(k in s) print k,s[k]}' access.log |sort -rk2|head
    10.0.0.41 3
    10.0.0.47 2
    10.0.0.46 2
    10.0.0.42 2
    10.0.0.43 1
    

    tcp13种状态统计

    统计企业工作中高并发linux服务器不同网站链接状态的数量

    awk '/^tcp/' a.log
    

    方法1:

    [root@n1 ~]# awk '/^tcp/ {print $NF}' a.log |sort|uniq -c |sort -rn|head
         79 ESTABLISHED
          7 CLOSE_WAIT
          2 TIME_WAIT
    
    

    方法2:

    [root@n1 ~]# awk '/^tcp/ {S[$NF]++}END{for(k in S) print S[k],k}' a.log |sort -rn|head
    79 ESTABLISHED
    7 CLOSE_WAIT
    2 TIME_WAIT
    

    分析图片服务器日志,把日志(每个图片访问次数*图片大小的总和)排行,取top10,也就是计算每个url的总访问大小.

    69.33.22.101 - - [08/Dec/2010:15:43:56 +0800] "GET /static/images/photos/2.jpg HTTP/1.1" 200 11299 "http://www.cnblogs.com/iiiiher/static/web/column/17/index.shtml?courseId=43" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
    59.33.26.105 - - [08/Dec/2010:15:43:56 +0800] "GET /static/images/photos/2.jpg HTTP/1.1" 200 11299 "http://www.cnblogs.com/iiiiher/static/web/column/17/index.shtml?courseId=43" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
    59.33.26.105 - - [08/Dec/2010:15:44:02 +0800] "GET /static/flex/vedioLoading.swf HTTP/1.1" 200 3583 "http://www.cnblogs.com/iiiiher/static/flex/AdobeVideoPlayer.swf?width=590&height=328&url=/`DYNAMIC`/2" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
    124.115.4.18 - - [08/Dec/2010:15:44:15 +0800] "GET /?= HTTP/1.1" 200 46232 "-" "-"
    124.115.4.18 - - [08/Dec/2010:15:44:25 +0800] "GET /static/js/web_js.js HTTP/1.1" 200 4460 "-" "-"
    124.115.4.18 - - [08/Dec/2010:15:44:25 +0800] "GET /static/js/jquery.lazyload.js HTTP/1.1" 200 1627 "-" "-"
    

    轻松应对IDC机房带宽突然暴涨问题

    输出格式: [访问次数*单个文件大小] [访问次数] 文件名(可带url)

    • 输出url 大小
    [root@n1 ~]# awk '{print $7"	"$10}' image.log
    /static/images/photos/2.jpg	11299
    /static/images/photos/2.jpg	11299
    /static/flex/vedioLoading.swf	3583
    /?=	46232
    /static/js/web_js.js	4460
    /static/js/jquery.lazyload.js	1627
    
    
    • 对输出结果统计次数
    [root@n1 ~]# awk '{print $7"	"$10}' image.log|sort | uniq -c
          1 /?=	46232
          1 /static/flex/vedioLoading.swf	3583
          2 /static/images/photos/2.jpg	11299
          1 /static/js/jquery.lazyload.js	1627
          1 /static/js/web_js.js	4460
    
    • 按照格式awk运算列后打印出: 方法1
    [root@n1 ~]# awk '{print $7"	"$10}' image.log |sort|uniq -c|awk '{print $1*$3"	"$1"	"$2}'|sort -rn|head
    46232	1	/?=
    22598	2	/static/images/photos/2.jpg
    4460	1	/static/js/web_js.js
    3583	1	/static/flex/vedioLoading.swf
    1627	1	/static/js/jquery.lazyload.js
    

    方法2:

    [root@n1 ~]# awk '{print $7"	" $10}' image.log|awk '{S[$1]+=$2;S1[$1]+=1}END{for(i in S) print S[i],S1[i],i}'|sort -rn|head -10
    46232 1 /?=
    22598 2 /static/images/photos/2.jpg
    4460 1 /static/js/web_js.js
    3583 1 /static/flex/vedioLoading.swf
    1627 1 /static/js/jquery.lazyload.js
    
    

    求某一列的和

    [root@n1 sh]# cat /etc/passwd|awk -F ':' '{s+=$3}END{print s}'
    2717
    
    

    awk易错点: awk要用单引号

    - awk要用单引号
    netstat -an|awk '/^tcp/ {S[$NF]++}END{for(k in S) print S[k],k}'
    
    - awk是双引号(会报错)
    netstat -an|awk "/^tcp/ {s[$NF]++}END{for(i in s) print k,s[k]}"
    
    

  • 相关阅读:
    20155313 杨瀚 《网络对抗技术》实验九 Web安全基础
    20155313 杨瀚 《网络对抗技术》实验八 Web基础
    20155313 杨瀚 《网络对抗技术》实验七 网络欺诈防范
    20155313 杨瀚 《网络对抗技术》实验六 信息搜集与漏洞扫描
    20155313 杨瀚 《网络对抗技术》实验五 MSF基础应用
    20155313 杨瀚 《网络对抗技术》实验四 恶意代码分析
    20155313 杨瀚 《网络对抗技术》实验三 免杀原理与实践
    20155313 杨瀚 《网络对抗技术》实验二 后门原理与实践
    20155313 杨瀚 《网络对抗技术》实验一 PC平台逆向破解(5)M
    20155313 2017-2018-1 《信息安全系统设计基础》课程总结
  • 原文地址:https://www.cnblogs.com/iiiiher/p/8576537.html
Copyright © 2011-2022 走看看