zoukankan      html  css  js  c++  java
  • Linux之5——测试必会之 Linux 三剑客之 awk

    awk = "Aho Weiberger and Kernighan" 三个作者的姓的第一个字母
    awk 是 Linux 下的一个命令,同时也是一种语言解析引擎
    awk 具备完整的编程特性。比如执行命令,网络请求等
    精通 awk,是一个 Linux 工作者的必备技能
    语法:awk 'pattern{action}'

    awk pattern语法

    awk 理论上可以代替 grep
    awk 'pattern{action}' ,默认以空格分隔 大括号外代表正则,大括号内代表动作,多个动作可以写多个大括号,但必须在一个''内
    常用内置变量

    • FS 设置输入域分隔符,等价于命令行 -F选项
    • NF 浏览记录的域的个数(列数)
    • NR 已读的记录数(行数)
    • ARGC 命令行参数个数
    • OFS 输出域分隔符
    • ORS 输出记录分隔符
    • RS 控制记录分隔符,BEGIN{RS="|"}表示以|分成2行
    • ARGV 命令行参数排列
    • ENVIRON 支持队列中系统环境变量的使用
    • FILENAME awk浏览的文件名
    • FNR 浏览文件的记录数
    awk 'BBEGIN{}END{}' 开始和结束
    awk '/Running/' 正则匹配
    awk '/aa/,/bb/' 区间选择
    awk '$2~/xxx/' 字段匹配,这里指从第2个字段开始匹配包含xxx内容的行
    awk 'NR==2' 取第二行
    awk 'NR>1' 去掉第一行
    

    awk的字段数据处理

    • -F 参数指定字段分隔符
    • BEGIN{FS='_'} 也可以表示分隔符
    • $0 代表原来的行
    • $1 代表第一个字段
    • $N 代表第N个字段
    • $NF 代表最后一个字段

    一个例子

    chenshifengdeMacBook-Pro:~ chenshifeng$ echo "111 222|333 444|555 666"|awk 'BEGIN{RS="|"}{print $0}'
    111 222
    333 444
    555 666
    

    下面以一个在nginx.log中查找返回状态码非200的请求响应数目的需求为例,演示awk的基础用法

    有一份nginx.log文件,连接为:https://files.cnblogs.com/files/feng0815/nginx.log.tar.gz,打开解压后内容格式如下:

    220.181.108.111 - - [05/Dec/2018:00:11:42 +0000] "GET /topics/15225/show_wechat HTTP/1.1" 200 1684 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 0.029 0.029 .
    216.244.66.241 - - [05/Dec/2018:00:11:42 +0000] "GET /topics/10052/replies/85845/reply_suggest HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" 0.016 0.016 .
    216.244.66.241 - - [05/Dec/2018:00:11:42 +0000] "GET /topics/10040?order_by=created_at HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" 0.002 0.002 .
    216.244.66.241 - - [05/Dec/2018:00:11:42 +0000] "GET /topics/10043/replies/85544/reply_suggest HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" 0.001 0.001 .
    216.244.66.241 - - [05/Dec/2018:00:11:44 +0000] "GET /topics/10075/replies/89029/edit HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" 0.001 0.001 .
    216.244.66.241 - - [05/Dec/2018:00:11:44 +0000] "GET /topics/10075/replies/89631/edit HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" 0.001 0.001 .
    216.244.66.241 - - [05/Dec/2018:00:11:45 +0000] "GET /topics/10075?order_by=created_at HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" 0.000 0.000 .
    216.244.66.241 - - [05/Dec/2018:00:11:45 +0000] "GET /topics/10075?order_by=like HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" 0.001 0.001 .
    223.71.41.98 - - [05/Dec/2018:00:11:46 +0000] "GET /cable HTTP/1.1" 101 60749 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0" 2608.898 2608.898 .
    113.87.161.17 - - [05/Dec/2018:00:11:39 +0000] "GET /cable HTTP/1.1" 101 3038 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36" 112.418 112.418 .
    216.244.66.241 - - [05/Dec/2018:00:11:46 +0000] "GET /topics/10079/replies/119591/edit HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" 0.001 0.001 .
    216.244.66.241 - - [05/Dec/2018:00:11:46 +0000] "GET /topics/10089?locale=zh-TW HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)" 0.002 0.002 .
    

    nginx.log的日志文件,每个关键字段空格隔开,如果字段内容本身有空格,会有其他符号,比如中括号,冒号等进行约束。
    观察log内容,可以发现,以空格为分隔符,状态码在第九个字段位置;这里我们用awk命令从第九个字段位置开始匹配非200的状态码并打印出来。命令:

    chenshifengdeMacBook-Pro:~ chenshifeng$ awk '$9!~/200/{print $9}' nginx.log
    

    301
    301
    301
    301
    301
    301
    301
    301
    ......#剩余部分省略

    再对取出的数据进行排序->去重->按数字的倒叙进行排列。命令:

    awk '$9!~/200/{print $9}' nginx.log | sort | uniq -c | sort -nr
    

    命令含义:
    sort: 按从小到大进行排序
    uniq -c :去重(相邻),并加上出现的次数
    -nr: 按数字进行倒叙排序
    -n:按数字进行排序

    结果展示:

    chenshifengdeMacBook-Pro:~ chenshifeng$ awk '$9!~/200/{print $9}' nginx.log | sort | uniq -c | sort -nr
        433 101
        304 301
        266 404
        152 302
          7 401
          5 304
          2 499
          2 422
          1 500
    

    再结合 awk 'BBEGIN{}END{}' 命令,以统计当前用户数目的例子来展示命令用法

    使用 cat /etc/passwd 命令来查看本机用户,我们需要提取出用户名称并加上数字序号显示出来,达到这种效果:

    1 nobody2 root
    3 daemon
    4 _uucp
    5 _taskgated
    6 _networkd
    7 _installassistant
    8 _lp
    9 _postfix
    ......
    

    用户信息:

    chenshifengdeMacBook-Pro:~ chenshifeng$ cat /etc/passwd 
    ##
    # User Database
    # 
    # Note that this file is consulted directly only when the system is running
    # in single-user mode.  At other times this information is provided by
    # Open Directory.
    #
    # See the opendirectoryd(8) man page for additional information about
    # Open Directory.
    ##
    nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
    root:*:0:0:System Administrator:/var/root:/bin/sh
    daemon:*:1:1:System Services:/var/root:/usr/bin/false
    _uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
    _taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
    _networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
    _installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
    _lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
    _postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
    _scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
    _ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
    _appstore:*:33:33:Mac App Store Service:/var/db/appstore:/usr/bin/false
    _mcxalr:*:54:54:MCX AppLaunch:/var/empty:/usr/bin/false
    _appleevents:*:55:55:AppleEvents Daemon:/var/empty:/usr/bin/false
    _geod:*:56:56:Geo Services Daemon:/var/db/geod:/usr/bin/false
    _devdocs:*:59:59:Developer Documentation:/var/empty:/usr/bin/false......省略
    

    思路:

    • 利用sed删除前10行注释
    • 利用awk将取出第一列用户及行数;
    • 注意:cat /etc/passwd打印出的结果中,最上方的注释需要处理跳过
    [root@chenshifengdeLinuxServer ~]#  sed '1,10d' /etc/passwd| awk -F ':' '{print NR,$1}' 
    # 或 [root@chenshifengdeLinuxServer ~]#  cat /etc/passwd|awk 'BEGIN{FS=":"}NR>10 {print NR-10,$1}'
    1 nobody
    2 root
    3 daemon
    4 _uucp
    5 _taskgated
    6 _networkd
    7 _installassistant
    8 _lp
    9 _postfix
    10 _scsd
    11 _ces
    12 _appstore
    13 _mcxalr
    14 _appleevents
    .......
    

    awk字段枚举

    [root@chenshifengdeLinuxServer ~]# awk 'NR==1{for(i=1;i<=NF;i++) {print i"="$i} }' nginx.log
    1=223.104.7.59
    2=-
    3=-
    4=[05/Dec/2018:00:00:01
    5=+0000]
    6="GET
    7=/topics/17112
    8=HTTP/2.0"
    9=200
    10=9874
    11="https://www.googleapis.com/auth/chrome-content-suggestions"
    12="Mozilla/5.0
    13=(iPhone;
    14=CPU
    15=iPhone
    16=OS
    17=12_1
    18=like
    19=Mac
    20=OS
    21=X)
    22=AppleWebKit/605.1.15
    23=(KHTML,
    24=like
    25=Gecko)
    26=CriOS/70.0.3538.75
    27=Mobile/15E148
    28=Safari/605.1"
    29=0.040
    30=0.040
    31=.
    

    善用less
    找出访问量最高的ip, 统计分析,取出top3

    [root@chenshifengdeLinuxServer ~]# awk '{print $1}' nginx.log |less
    [root@chenshifengdeLinuxServer ~]# awk '{print $1}' nginx.log | sort | less
    [root@chenshifengdeLinuxServer ~]# awk '{print $1}' nginx.log | sort | uniq | less
    [root@chenshifengdeLinuxServer ~]# awk '{print $1}' nginx.log | sort | uniq -c | less
    [root@chenshifengdeLinuxServer ~]# awk '{print $1}' nginx.log | sort | uniq -c | sort | less
    [root@chenshifengdeLinuxServer ~]# awk '{print $1}' nginx.log | sort | uniq -c | sort -n | less
    [root@chenshifengdeLinuxServer ~]# awk '{print $1}' nginx.log | sort | uniq -c | sort -nr | less
    [root@chenshifengdeLinuxServer ~]# awk '{print $1}' nginx.log | sort | uniq -c | sort -nr | head -3 | less
    

    找出 /topics 的平均响应时间,响应时间在倒数第二个字段

    url_avg_time(){
    #易读性好,易修改
    awk '$7=="/topics"' nginx.log | awk '{print $(NF-1)}' | awk '{t+=$1}END{print t/NR }'
    #高性能
    awk '$7=="/topics"{total+=$(NF-1);count+=1}END{print total/count}'  nginx.log
    }
    

    性能统计 perf_avg
    统计aliyundun进程的cpu与mem,

    • 要求统计10次,一次间隔1s,
    • 最后输出平均cpu与mem数据。
    • 字段之间用tab隔开,平均数与之前的数据错开一行
    • 支持输入不同的进程标记来统计不同进程的数据
    top -b -d1 -n2|grep --color=auto --line-buffered -i aliyundun$|awk 'BEGIN{OFS="	"}{cpu+=$9;mem+=$10;print $9,$10}END{print "";print "avg:",cpu/NR, mem/NR}'
    

    写成函数的形式

    [root@chenshifengdeLinuxServer ~]# perf_avg() {
    >     top -b -d 1 -n $2 |
    >         grep -i "$1" 
    >             --color=auto 
    >             --line-buffered |
    >         awk '
    >         BEGIN{OFS="	"}
    >         {
    >             cpu+=$9;
    >             mem+=$10;
    >             print $9,$10
    >         }
    >         END{
    >             print "";
    >             print "avg:",cpu/NR, mem/NR
    >         }
    >         '
    > }
    [root@chenshifengdeLinuxServer ~]# type perf_avg
    perf_avg 是函数
    perf_avg () 
    { 
        top -b -d 1 -n $2 | grep --color=auto -i "$1" --color=auto --line-buffered | awk '
            BEGIN{OFS="	"}
            {
                cpu+=$9;
                mem+=$10;
                print $9,$10
            }
            END{
                print "";
                print "avg:",cpu/NR, mem/NR
            }
            '
    }
    [root@chenshifengdeLinuxServer ~]# perf_avg aliyundun$ 10
    6.2	0.7
    0.0	0.7
    0.0	0.7
    0.0	0.7
    1.0	0.7
    0.0	0.7
    1.0	0.7
    0.0	0.7
    0.0	0.7
    0.0	0.7
    

    avg: 0.82 0.7

    有如下文件,该文件为微信朋友圈页面文件

    <node index="0" text="随风" resource-id="com.tencent.mm:id/e3x" class="android.widget.TextView" package="com.tencent.mm" content-desc="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" scrollable="false" long-clickable="false" password="false" selected="false" bounds="[166,743][1040,805]" /></node>
    <node index="1" text="哈哈哈哈哈哈" resource-id="com.tencent.mm:id/b_e" class="android.widget.LinearLayout" package="com.tencent.mm" content-desc="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" scrollable="false" long-clickable="false" password="false" selected="false" bounds="[166,813][1048,867]">
    

    提取用户名和所发的朋友圈

    $ demo='<node index="0" text="随风" resource-id="com.tencent.mm:id/e3x" class="android.widget.TextView" package="com.tencent.mm" content-desc="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" scrollable="false" long-clickable="false" password="false" selected="false" bounds="[166,743][1040,805]" /></node>
    <node index="1" text="哈哈哈哈哈哈" resource-id="com.tencent.mm:id/b_e" class="android.widget.LinearLayout" package="com.tencent.mm" content-desc="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" scrollable="false" long-clickable="false" password="false" selected="false" bounds="[166,813][1048,867]">'
    $ echo $demo|sed 's#><#>|<#g' |awk 'BEGIN{RS="|"}{print $0}' |awk -F" 'BEGIN{OFS="	"}/e3x/{name=$4}/b_e/{msg=$4;print name,"|",msg}'
    随风    |    哈哈哈哈哈哈
    
    friend() {
        while true; do
            #获取界面,提取昵称和朋友圈内容
            adb shell 'uiautomator dump && cat /sdcard/window_dump.xml' |
                sed 's#><#>|<#g' |
                awk 'BEGIN{RS="|"}{print $0}' |
                awk -F" '
                BEGIN{OFS="	"}
                /e3x/{name=$4}
                /b_e/{msg=$4;print name,"|",msg}
                '
            #取出大概的滑动距离
            distance=$(
                adb shell wm size |
                    awk -F' |x' '
                    {
                        width=$(NF-1);
                        height=$NF;
                        print width*0.5, height*0.8, width*0.5, height*0.2}'
            )
            #利用input直接划屏
            adb shell input swipe $distance
        done
    }
    
  • 相关阅读:
    VC++用Recordset MSPersist载入C#DataSet Save出来的xml失败,但载入VC Recordset Save出来的xml则没问题,怎么xml不通用呢?
    观察力、分析问题的能力、与人沟通的能力和资源整合能力
    [导入]有感于神童之神源
    军训系列二:两类人创业不容易成功
    运行微软的SOAP3.0的VC样例Samples30_ADOTM_Client报错,m_pSoapClient>Invoke时直接失败
    About IM software
    [导入][转]好企业是什么样?
    动网论坛v7.0.0SQL版竟然帯病毒!
    CZoneSoft出品: 音频视频在线录制系列之 AV留言本 简介
    递归算法在生成树型结构中,几乎完全属于无稽的算法
  • 原文地址:https://www.cnblogs.com/R-bear/p/15027125.html
Copyright © 2011-2022 走看看