简介
awk是一个强大的文本分析工具,是一种编程语言,相对于grep,它在数据分析方面表现更为强大,用于linux下对数据和文本进行处理。
awk命令形式:awk '{pattern+action}' {filenames}
awk是行处理器,它每接收文件的一行,就执行相应的处理命令。
应用示例
1.取出前五行,打印第一个数据项
[root@www ~]#last -n 5 root pts/1 192.168.1.100 Tue Feb 10 11:21 still logged in root pts/1 192.168.1.100 Tue Feb 10 00:46 - 02:28 (01:41) root pts/1 192.168.1.100 Mon Feb 9 11:41 - 18:30 (06:48) dmtsai pts/1 192.168.1.100 Mon Feb 9 11:41 - 11:41 (00:00) root tty1 Fri Sep 5 14:09 - 14:10 (00:01)
[root@www ~]#last -n 5 | awk '{print $1}'
root
root
root
dmtsai
root
awk工作流程是这样的:读入有' '换行符分割的一条记录,然后将记录按指定的域分隔符划分域,填充域,$0则表示所有域,$1表示第一个域,$n表示第n个域。默认域分隔符是"空白键" 或 "[tab]键",所以$1表示登录用户,$3表示登录用户ip,以此类推。
2.统计PV、UV、IP等
从nginx的access.log中分析统计出来:
192.168.40.2 - [02/Nov/2016:15:44:35 +0800] "GET /wcm/app/main/refresh.jsp?r=1478072325778 HTTP/1.1" - 200 "User_Cookie:7F00000122A5597C46607B1C0A7EC016" 192.168.40.2 - [02/Nov/2016:15:44:35 +0800] "GET /webpic/W0201611/W020161102/W020161102566715167404.jpg HTTP/1.1" - 200 "User_Cookie:7F00000122A5597C46607B1C0A7EC016" 119.255.31.109 - [02/Nov/2016:15:44:36 +0800] "GET /wcm/app/main/refresh.jsp?r=1478072510132 HTTP/1.1" - 200 "User_Cookie:7F000001237921BE9237838AEC65704D" 119.255.31.109 - [02/Nov/2016:15:44:36 +0800] "GET /wcm/app/message/message_query_service.jsp?READFLAG=0&MSGTYPES=1%2C2%2C3 HTTP/1.1" - 200 "User_Cookie:7F000001237921BE9237838AEC65704D" 192.168.40.2 - [02/Nov/2016:15:44:37 +0800] "GET /wcm/app/message/message_query_service.jsp?READFLAG=0&MSGTYPES=1%2C2%2C3 HTTP/1.1" - 200 "User_Cookie:7F00000123D3BF2345115EAAC21F71E0" 192.168.40.2 - [02/Nov/2016:15:44:37 +0800] "GET /wcm/app/message/message_query_service.jsp?READFLAG=0&MSGTYPES=1%2C2%2C3 HTTP/1.1" - 200 "User_Cookie:7F00000123EF73896DF98EDA9950944E" 192.168.40.2 - [02/Nov/2016:15:44:37 +0800] "GET /wcm/app/message/message_query_service.jsp?READFLAG=0&MSGTYPES=1%2C2%2C3 HTTP/1.1" - 200 "User_Cookie:7F00000123FE0F9C397E1A8F0C4F044B" 192.168.40.2 - [02/Nov/2016:15:44:37 +0800] "GET /wcm/app/main/refresh.jsp?r=1478072511427 HTTP/1.1" - 200 "User_Cookie:7F00000123A465B7EA1DE0AF0AE671B7" 119.255.31.109 - [02/Nov/2016:15:44:38 +0800] "GET /wcm/app/message/message_query_service.jsp?READFLAG=0&MSGTYPES=1%2C2%2C3 HTTP/1.1" - 200 "User_Cookie:7F00000123D89B11302DF80AE773C900"
PV统计
--总PV量
awk '{print $6}' access.log | wc -l
--独立IP
awk '{print $1}' access.log | sort -r |uniq -c | wc -l
--UV统计
awk '{print $10}' access.log | sort -r |uniq -c |wc -l