awk是一个处理文本的编程语言工具,能用简短的程序处理标准输入或文件、数据排序、计算以及生成报表等等。
基本的命令语法:awk option 'pattern {action}' file
其中pattern表示AWK在数据中查找的内容,而action是在找到匹配内容时所执行的一系列命令。花括号用于根据特定的模式对一系列指令进行分组。
awk处理的工作方式与数据库类似,支持对记录和字段处理,这也是grep和sed不能实现的。
在awk中,缺省的情况下将文本文件中的一行视为一个记录,逐行放到内存中处理,而将一行中的某一部分作为记录中的一个字段。用1,2,3...数字的方式顺序的表示行(记录)中的不同字段。用$后跟数字,引用对应的字段,以逗号分隔,0表示整个行。
在Linux系统下默认awk是gawk,它是awk的GNU版本。可以通过命令查看应用的版本:ls -l /bin/awk
选项
描述
-f program-file
从文件中读取awk程序源文件
-F fs
指定fs为输入字段分隔符
-v var=value
变量赋值
--posix
兼容POSIX正则表达式
--dump-variables=[file]
把awk命令时的全局变量写入文件,
默认文件是awkvars.out
--profile=[file]
格式化awk语句到文件,默认是awkprof.out
模式
常用模式有:
Pattern
Description
BEGIN{ }
给程序赋予初始状态,先执行的工作
END{ }
程序结束之后执行的一些扫尾工作
/regular expression/
为每个输入记录匹配正则表达式
pattern && pattern
逻辑and,满足两个模式
pattern || pattern
逻辑or,满足其中一个模式
! pattern
逻辑not,不满足模式
pattern1, pattern2
范围模式,匹配所有模式1的记录,直到匹配到模式2
而动作呢,就是下面所讲的print、流程控制、I/O语句等。
示例:
1)从文件读取awk程序处理文件
# vi test.awk
{print$2}
# tail -n3 /etc/services |awk -f test.awk
48049/tcp
48128/tcp
49000/tcp
2)指定分隔符,打印指定字段
打印第二字段,默认以空格分隔:
# tail -n3 /etc/services |awk '{print $2}'
48049/tcp
48128/tcp
48128/udp
指定冒号为分隔符打印第一字段:
# awk-F ':' '{print $1}' /etc/passwd
root
bin
daemon
adm
lp
sync
......
还可以指定多个分隔符,作为同一个分隔符处理:
# tail -n3 /etc/services |awk -F'[/#]' '{print $3}'
iqobject
iqobject
MatahariBroker
# tail -n3 /etc/services |awk -F'[/#]' '{print $1}'
iqobject 48619
iqobject 48619
matahari 49000
# tail -n3 /etc/services |awk -F'[/#]' '{print $2}'
tcp
udp
tcp
# tail -n3 /etc/services |awk -F'[/#]' '{print $3}'
iqobject
iqobject
MatahariBroker
# tail -n3 /etc/services |awk -F'[ /]+' '{print $2}'
48619
48619
49000
[]元字符的意思是符号其中任意一个字符,也就是说每遇到一个/或#时就分隔一个字段,当用多个分隔符时,就能更方面处理字段了。
3)变量赋值
# awk-v a=123 'BEGIN{print a}'
123
系统变量作为awk变量的值:
#a=123
# awk-v a=$a 'BEGIN{print a}'
123
或使用单引号
# awk'BEGIN{print '$a'}'
123
4)输出awk全局变量到文件
# seq 5|awk --dump-variables '{print $0}'
1
2
3
4
5
# cat awkvars.out
ARGC:number (1)
ARGIND:number (0)
ARGV:array, 1 elements
BINMODE:number (0)
CONVFMT:string ("%.6g")
ERRNO:number (0)
FIELDWIDTHS:string ("")
FILENAME:string ("-")
FNR:number (5)
FS:string (" ")
IGNORECASE:number (0)
LINT:number (0)
NF:number (1)
NR:number (5)
OFMT:string ("%.6g")
OFS:string (" ")
ORS:string ("
")
RLENGTH:number (0)
RS:string ("
")
RSTART:number (0)
RT:string ("
")
SUBSEP:string ("34")
TEXTDOMAIN:string ("messages")
5)BEGIN和END
BEGIN模式是在处理文件之前执行该操作,常用于修改内置变量、变量赋值和打印输出的页眉或标题。
例如:打印页眉
# tail /etc/services |awk 'BEGIN{print"Service Port Description
==="}{print $0}'
Service Port Description
===
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp #iqobject
iqobject 48619/udp # iqobject
matahari 49000/tcp # Matahari Broker
END模式是在程序处理完才会执行。
例如:打印页尾
# tail /etc/services |awk '{print $0}END{print "===
END......"}'
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
matahari 49000/tcp # Matahari Broker
===
END......
6)格式化输出awk命令到文件
# tail /etc/services |awk --profile 'BEGIN{print"Service Port Description
==="}{print $0}END{print"===
END......"}'
Service Port Description
===
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast ServiceProtocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
===
END......
# cat awkprof.out
# gawk profile, created Sat Jan 7 19:45:22 2017
# BEGIN block(s)
BEGIN {
print"Service Port Description
==="
}
# Rule(s)
{
print $0
}
# END block(s)
END {
print "===
END......"
}
7)/re/正则匹配
匹配包含tcp的行:
# tail /etc/services |awk '/tcp/{print $0}'
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service
isnetserv 48128/tcp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
iqobject 48619/tcp # iqobject
matahari 49000/tcp # Matahari Broker
匹配开头是blp5的行:
# tail /etc/services |awk '/^blp5/{print $0}'
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
匹配第一个字段是8个字符的行:
# tail /etc/services |awk '/^[a-z0-9]{8} /{print $0}'
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
matahari 49000/tcp # Matahari Broker
8)逻辑and、or和not
匹配记录中包含blp5和tcp的行:
#tail /etc/services |awk '/blp5/ && /tcp/{print $0}'
blp5 48129/tcp # Bloomberg locator
匹配记录中包含blp5或tcp的行:
#tail /etc/services |awk '/blp5/ || /tcp/{print $0}'
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service
isnetserv 48128/tcp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
iqobject 48619/tcp # iqobject
matahari 49000/tcp # Matahari Broker
不匹配开头是#和空行:
# awk'! /^#/ && ! /^$/{print $0}' /etc/httpd/conf/httpd.conf
或
# awk'! /^#|^$/' /etc/httpd/conf/httpd.conf
或
# awk'/^[^#]|"^$"/' /etc/httpd/conf/httpd.conf
9)匹配范围
# tail /etc/services |awk '/^blp5/,/^com/'
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
再续.....