第十二课 正则表达式
一、正则介绍
二、grep
三、Sed
四、awk
五、扩展
一、正则表达式介绍
正则表达式(Regular Express,RE)是一种字符模式,用于在查找过程中匹配指定的字符。
元字符是这样一类字符,它们表达的是不同于字面本身的含义。正则表达式的元字符由各种执行模式匹配操作的程序来解析,如:vi、grep、sed和awk等。
能被UNIX/Linux上所有的模式匹配工具识别的基本元字符
元字符 | 功能 | 示例 | 匹配对象 |
---|---|---|---|
^ | 行首定位符 | /^love/ | 匹配所有以love开头的行 |
$ | 行尾定位符 | /love$/ | 匹配所有以love结尾的行 |
. | 匹配单个字符 | /l..e/ | 匹配包含一个l,后面跟两字符,再跟一个e的行 |
* | 匹配0个或多个重复的位于*号前的字符 | / *love/ | 匹配包含跟在0个或多个空格后的模式love行 |
[] | 匹配一组字符中任一个 | /[Ll]ove/ | 匹配包含love或Love的行 |
[x-y] | 匹配指定范围内的一个字符 | /[A-Z]ove/ | 匹配大写字母后面跟着ove的字符 |
[^] | 匹配不在指定组内的字符 | /[^A-Z]/ | 匹配不在范围A-Z之间的任意一个字符 |
\ | 用来转义元字符 | /love./ | 匹配包含love,后面跟一个句点。 |
扩展元字符,使用RE元字符的UNIX/Linux程序支持(不一定所有的模式匹配工具都支持)
column | column | column | column |
---|---|---|---|
< | 词首定位符 | /<love/ | 匹配包含以love开头的词的行 |
> | 词尾定位符 | /love>/ | 匹配包含以love结尾的词的行 |
\(..\) | 匹配稍后将要使用的字符的标签 | /(love) able \1er/ | 最多9个可用标签。模式中最左边的是第一个。左例中模式love被保存为标签1,用\1表示 |
x\{m\}或x\{m,\} 或x\{m,n\} | 字符x的重复出现:m次,至少m次,至少m次且不超过n次 | o\{5,10\} | 匹配包含5~10个连续的字母o的行 |
基本元字符示例文件
//,以grep程序演示
root@lanquark:~/unixshellbysample/chap03# cat picnic
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
is forever. I live for you. It's hard to get back in the
groove.
简单正则表达式查找
root@lanquark:~/demo# grep 'love' picnic
I had a lovely time on our little picnic.
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
行首定位符
root@lanquark:~/demo# grep '^love' picnic
love, how much I adore you. Do you know
行尾定位符
root@lanquark:~/demo# grep 'love$' picnic
clover. Did you see them? I can only hope love
任意单个字符(.)
root@lanquark:~/demo# grep 'l.ve' picnic
I had a lovely time on our little picnic.
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
is forever. I live for you. It's hard to get back in the
零个或多个前字符(*)
root@lanquark:~/demo# grep 'o*ve' picnic
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
is forever. I live for you. It's hard to get back in the
groove.
一组字符([ ])
root@lanquark:~/demo# grep '[Ll]ove' picnic
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
一个字符范围([ - ])
root@lanquark:~/demo# grep 'ove[a-z]' picnic
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
不在组内的字符([^ ])
root@lanquark:~/demo# grep 'ove[^a-zA-Z0-9]' picnic
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
groove.
扩展元字符演示文件
//以grep或sed程序演示
root@lanquark:~/unixshellbysample/chap03# cat textfile
Unusual occurrences happened at the fair.
Patty won fourth place in the 50 yard dash square and fair.
Occurrences like this are rare.
The winning ticket is 55222.
The ticket I got is 54333 and Dee got 55544.
Guy fell down while running around the south bend in his last event.
词首定位符(\<)和词尾定位符(\>)
root@lanquark:~/demo# grep '\<fourth\>' textfile
Patty won fourth place in the 50 yard dash square and fair.
用\(和\)记录模式
//occurrence替换成occurence或Occurrence替换成Occurence
root@lanquark:~/unixshellbysample/chap03# sed 's#\([Oo]ccur\)rence#\1enece#' textfile
Unusual occureneces happened at the fair.
Patty won fourth place in the 50 yard dash square and fair.
Occureneces like this are rare.
The winning ticket is 55222.
The ticket I got is 54333 and Dee got 55544.
Guy fell down while running around the south bend in his last event.
二、grep
grep表示全局查找正则表达式并打印结果行。
grep不会对输入文件进行任何修改或变化
命令格式
grep word filename
root@lanquark:~# grep hjm /etc/passwd
hjm:x:5000:5000:hjm:/home/hjm:/bin/bash
grep使用的正则表达式元字符
元字符 | 功能 | 示例 | 匹配对象 |
---|---|---|---|
^ | 行首定位符 | '^love' | 匹配所有以love开头的行 |
$ | 行尾定位符 | 'love$' | 匹配所有以love结尾的行 |
. | 匹配单个字符 | 'l..e' | 匹配包含一个l,后面跟两字符,再跟一个e的行 |
* | 匹配0个或多个重复的位于*号前的字符 | ' *love' | 匹配包含跟在0个或多个空格后的模式love行 |
[ ] | 匹配一组字符中任一个 | '[Ll]ove' | 匹配包含love或Love的行 |
[^] | 匹配不在指定组内的字符 | '[^A-K]' | 匹配不在范围A-Z之间的任意一个字符 |
\ | 用来转义元字符 | 'love.' | 匹配包含love,后面跟一个句点。 |
< | 词首定位符 | '<love' | 匹配包含以love开头的词的行 |
> | 词尾定位符 | 'love>/' | 匹配包含以love结尾的词的行 |
\(..\) | 匹配稍后将要使用的字符的标签 | '(love)ing' | 最多9个可用标签。模式中最左边的是第一个。左例中模式love被保存为标签1,用\1表示 |
x\{m\}或x\{m,\} 或x\{m,n\} | 字符x的重复出现:m次,至少m次,至少m次且不超过n次 | o\{5,10\} | 匹配包含5~10个连续的字母o的行 |
//演示文件
root@lanquark:~/demo# cat datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
打印所有包含NW的行
root@lanquark:~/demo# grep NW datafile
northwest NW Charles Main 3.0 .98 3 34
打印以字母n开头的行
root@lanquark:~/demo# grep '^n' datafile
northwest NW Charles Main 3.0 .98 3 34
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
打印以数字4结尾的行
root@lanquark:~/demo# grep '4$' datafile
northwest NW Charles Main 3.0 .98 3 34
打印以字母w或e开头的行
root@lanquark:~/demo# grep '^[we]' datafile
western WE Sharon Gray 5.3 .97 5 23
eastern EA TB Savage 4.4 .84 5 20
打印包含非数字的所有行
root@lanquark:~/demo# grep '^[we]' datafile
western WE Sharon Gray 5.3 .97 5 23
eastern EA TB Savage 4.4 .84 5 20
root@lanquark:~/demo# grep '[^0-9]' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
打印所有包含一个s,后跟0个或多个连着的s和一个空格的文本行。
root@lanquark:~/demo# grep 'ss* ' datafile
northwest NW Charles Main 3.0 .98 3 34
southwest SW Lewis Dalsass 2.7 .8 2 18
打印至少9个小写字母连在一起的行
root@lanquark:~/demo# grep '[a-z]\{9\}' datafile
northwest NW Charles Main 3.0 .98 3 34
southwest SW Lewis Dalsass 2.7 .8 2 18
southeast SE Patricia Hemenway 4.0 .7 4 17
northeast NE AM Main Jr. 5.1 .94 3 13
打印包含一个3后面跟一个句点和一个数字,再任意多个字符,然后跟一个3
root@lanquark:~/demo# grep '\(3\)\.[0-9].*\1' datafile
northwest NW Charles Main 3.0 .98 3 34
打印所有包含以north开头的单词的行
root@lanquark:~/demo# grep '\<north' datafile
northwest NW Charles Main 3.0 .98 3 34
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
root@lanquark:~/demo# grep '\<north\>' datafile
north NO Margot Weber 4.5 .89 5 9
常用grep选项
选项 | 功能 |
---|---|
-c | 显示匹配到的行的数目,而不是显示行的内容 |
-i | 比较字符时忽略大小写 |
-l | 只列出匹配行所在的文件的文件名 |
-n | 在每一行前面加上它在文件中的相对行号 |
-v | 反向查找,只显示不匹配的行 |
-w | 把表达式做为词来查,就好像被<和>所包含一样 |
-A | 匹配到模式所在行的后两行 |
-B | 匹配到模式行所在行的前两行 |
-C | 匹配到模式所在行的前后两行 |
-R | 对列出的目录,递归的读取并处理这些目录中的所有文件,也就是指该下目录下的所有目录 |
示例文件
root@lanquark:~/demo# cat datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
-c选项打印以south开头的单的数量
root@lanquark:~/demo# grep -c '^south' datafile
3
-i选项忽略大小
root@lanquark:~/demo# grep -i 'pat' datafile
southeast SE Patricia Hemenway 4.0 .7 4 17
-l选项只显示包含模式的文件名而不输出文本
root@lanquark:~/demo# grep -l 'SE' *
datafile
temp
-n选项在找到指定模式的行前面加上其行号
root@lanquark:~/demo# grep -n '^south' datafile
3:southwest SW Lewis Dalsass 2.7 .8 2 18
4:southern SO Suan Chin 5.1 .95 4 15
5:southeast SE Patricia Hemenway 4.0 .7 4 17
-v表示取反
root@lanquark:~/demo# grep -v 'Suan Chin' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
-w只查找作为一个词,而不是词的一部分出现的模式。
root@lanquark:~/demo# grep -w 'north' datafile
north NO Margot Weber 4.5 .89 5 9
-A选项打印匹配到模式所在行的后两行
root@lanquark:~/demo# grep -A 2 'NE' datafile
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
-B选项打印匹配到模式所在行的前两行
root@lanquark:~/demo# grep -B 2 'NE' datafile
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
-C选项打印匹配到模式所在行的前后两行
root@lanquark:~/demo# grep -C 2 'NE' datafile
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
-R 递归查找模式
root@lanquark:~/demo# grep -R 'central' *
datafile:central CT Ann Stephens 5.7 .94 5 13
test.dir/datafile:central CT Ann Stephens 5.7 .94 5 13
grep的退出状态
grep在脚本中很有用,它总会返回一个退出状态。退出状态为0,表示检索到模式,退出状态为1表示找不到模式,退出状态为2表示找不到要搜索的文件。
grep的输入可以是文件和管道
//取目录中的文件
root@lanquark:~/demo# ls -l | grep '^-'
-rw-r--r-- 1 root root 5 Jun 1 04:30 1111
-rw-r--r-- 1 root root 1066 May 31 20:56 1.txt
-rw-r--r-- 1 root root 351 Jun 4 23:04 datafile
-rw-r--r-- 1 root root 18 Jun 4 21:54 id.txt
-rw-r--r-- 1 root root 876 May 31 21:05 ipconfig.txt
-rw-r--r-- 1 root root 338 Jun 4 23:09 picnic
-rw-r--r--+ 1 root root 18065 May 24 21:00 temp
-rw-r--r-- 1 root root 0 Jun 1 04:25 test1.txt
-rw-r--r-- 1 root root 277 Jun 4 23:17 textfile
-rw-r--r--+ 1 root root 572 Jun 1 04:29 tt.txt
扩展的grep: Egrep
调用方式: egrep 或 grep -E
egrep的正则表达式元字符
元字符 | 功能 | 示例 | 匹配对象 |
---|---|---|---|
^ | 行首定位符 | '^love' | 匹配所有以love开头的行 |
$ | 行尾定位符 | 'love$' | 匹配所有以love结尾的行 |
. | 匹配单个字符 | 'l..e' | 匹配包含一个l,后面跟两字符,再跟一个e的行 |
* | 匹配0个或多个重复的位于*号前的字符 | ' *love' | 匹配包含跟在0个或多个空格后的模式love行 |
[ ] | 匹配一组字符中任一个 | '[Ll]ove' | 匹配包含love或Love的行 |
[^] | 匹配不在指定组内的字符 | '[^A-K]' | 匹配不在范围A-Z之间的任意一个字符 |
+ | 匹配一个或多个加号前的字符 | '[a-z]+ove' | 匹配一个或多个小写字母后跟ove的字符串 |
? | 匹配0个或1个前导字符 | 'lo?ve' | 匹配l后跟一个或0个字母o以及ve的字符串。 |
a|b | 行尾定位符 | 'love|hate' | 匹配love或hate两上表达式之一 |
() | 字符组 | 'love(able|ly)(ve)+' | 匹配lovable或lovely,匹配ov的一次或多次出现 |
示例文件
root@lanquark:~/demo# cat datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
打印包含NW或EA的行
root@lanquark:~/demo# egrep 'NW|EA' datafile
northwest NW Charles Main 3.0 .98 3 34
eastern EA TB Savage 4.4 .84 5 20
打印所有包含一个或多个数字3的行
root@lanquark:~/demo# egrep '3+' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
northeast NE AM Main Jr. 5.1 .94 3 13
central CT Ann Stephens 5.7 .94 5 13
打印所有包含数字2,后面跟零个或一个句点,再跟数字的行。
root@lanquark:~/demo# egrep '2\.?[0-9]' datafile
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
eastern EA TB Savage 4.4 .84 5 20
打印连续出现一个或多个模式no的行
root@lanquark:~/demo# egrep '(no)+' datafile
northwest NW Charles Main 3.0 .98 3 34
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
打印所有包含字母S,后跟h或u的行
root@lanquark:~/demo# egrep 'S(h|u)' datafile
western WE Sharon Gray 5.3 .97 5 23
southern SO Suan Chin 5.1 .95 4 15
三、sed
sed是一种新型的,非交互式的编辑器。它不会修改原文件。
sed编辑器逐行处理文件(或输入),并将输出结果发送到屏幕。sed把正在处理的行保存在一个临时缓冲区。sed处理完模式空间中的行后,就把该行发送到屏幕。sed处理完一行就将其从模式空间删除,然后将下一行读入空间。
sed的命令与选项
命令 | 功能 |
---|---|
a\ | 在当前行后添加一行或多行 |
c\ | 用新文本修改(替换)当前行中的文件 |
d | 删除行 |
i\ | 在当前行前插入文本 |
h | 把模式空间里的内容复制到暂存缓冲区 |
H | 把模式空间里的内容追加到暂存缓冲区 |
g | 取出暂存缓冲区的内容,并将其复制到模式空间,覆盖该处原有内容 |
G | 取出暂存缓冲区的内容,并将其复制到模式空间,追加在原有内容后面。 |
l | 列出非打印字符 |
p | 打印行 |
n | 读入下一输入行,并从下一条命令而不是第一条命令开始对其处理 |
q | 结束或退出sed |
r | 从文件中读取行 |
! | 对所选行以外的所有行应用命令 |
s | 用一个字符串替换另一个 |
替换标志 | |
g | 在行内进行全局替换 |
p | 打印行 |
w | 将行写入文件 |
x | 交换暂存缓冲区与模式空间的内容 |
y | 将字符转换为另一个字符(不能对正则表达式使用y) |
sed选项
选项 | 功能 |
---|---|
-e | 允许多项编辑 |
-f | 指定sed脚本文件名 |
-n | 取消默认的输出 |
sed元字符
元字符 | 功能 | 示例 | 匹配对象 |
---|---|---|---|
^ | 行首定位符 | /^love/ | 匹配所有以love开头的行 |
$ | 行尾定位符 | /love$/ | 匹配所有以love结尾的行 |
. | 匹配单个字符 | /l..e/ | 匹配包含一个l,后面跟两字符,再跟一个e的行 |
* | 匹配0个或多个重复的位于*号前的字符 | / *love/ | 匹配包含跟在0个或多个空格后的模式love行 |
[ ] | 匹配一组字符中任一个 | /[Ll]ove/ | 匹配包含love或Love的行 |
[^] | 匹配不在指定组内的字符 | /[^A-KM-Z]/ | 匹配包含ove,但ove之前的那个字符不在A-K或M-Z之间的行 |
\(..\) | 保存已匹配的字符 | s/\(love\)able/\1er | 标记元字符之间的模式,并将其保存为标签1,之后可以用\1来引用它。最多可以定义9个标签。从左边开始编号。 |
& | 保存查找串以便在替换串中引用 | s/love/aa&aa | 字符&代表查找串,字符串love将替换前后各加了两个aa,即love变成aaloveaa |
< | 词首定位符 | /<love/ | 匹配包含以love开头的单词的行 |
> | 词尾定位符 | /love>/ | 匹配包含以love结尾的单词的行 |
x\{m\} | 连续m个x | /o\{5\}/ | 匹配出现连续5个o |
x\{m,\} | 至少m个x | /o\{5,\}/ | 匹配至少5个连续o |
x\{m,n\} | 至少5个x,但不超过n个x | /\{5,10\}/ | 匹配最少5个,最多10个o |
示例文件
root@lanquark:~/demo# cat datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
打印命令p
root@lanquark:~/demo# sed '/north/p' datafile
northwest NW Charles Main 3.0 .98 3 34
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
取消默认默认输出-n
root@lanquark:~/demo# sed -n '/north/p' datafile
northwest NW Charles Main 3.0 .98 3 34
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
删除:d命令
//删除第3行
root@lanquark:~/demo# sed '3d' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
//删除第3行到最后一行
root@lanquark:~/demo# sed '3,$d' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
//删除最后一行
root@lanquark:~/demo# sed '$d' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
//删除包含模式north的行
root@lanquark:~/demo# sed '/north/d' datafile
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
central CT Ann Stephens 5.7 .94 5 13
替换命令:s
//将west替换为north,g表示全局替换
root@lanquark:~/demo# sed 's#west#north#g' datafile
northnorth NW Charles Main 3.0 .98 3 34
northern WE Sharon Gray 5.3 .97 5 23
southnorth SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
//&代表匹配内容
root@lanquark:~/demo# sed 's#[0-9][0-9]$#&.5#' datafile
northwest NW Charles Main 3.0 .98 3 34.5
western WE Sharon Gray 5.3 .97 5 23.5
southwest SW Lewis Dalsass 2.7 .8 2 18.5
southern SO Suan Chin 5.1 .95 4 15.5
southeast SE Patricia Hemenway 4.0 .7 4 17.5
eastern EA TB Savage 4.4 .84 5 20.5
northeast NE AM Main Jr. 5.1 .94 3 13.5
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13.5
//取消默认输出,只有发生变化的行才打印
root@lanquark:~/demo# sed -n 's#Hemenway#Jones#gp' datafile
southeast SE Patricia Jones 4.0 .7 4 17
//保存已匹配的字符()
root@lanquark:~/demo# sed -n 's#\(Mar\)got#\1iance#p' datafile
north NO Mariance Weber 4.5 .89 5 9
指定行的范围:逗号
//正则表达式确定匹配行的范围
root@lanquark:~/demo# sed -n '/west/,/east/p' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
//数字和正则表达式确定匹配行的范围
root@lanquark:~/demo# sed -n '5,/^northeast/p' datafile
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
//以数字确定匹配行的范围
root@lanquark:~/demo# sed -n '1,4p' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
多重编辑:e命令
root@lanquark:~/demo# sed -e '1,3d' -e 's#Hemenway#Jones#' datafile
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Jones 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
读文件:r命令
root@lanquark:~/demo# cat newfile
______________________________________
| *** SUAN HAS LEFT THE COMPANY *** |
|____________________________________|
root@lanquark:~/demo# sed '/Suan/r newfile' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
______________________________________
| *** SUAN HAS LEFT THE COMPANY *** |
|____________________________________|
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
写文件:w命令
root@lanquark:~/demo# sed -n '/north/w newfile1' datafile
root@lanquark:~/demo# cat newfile1
northwest NW Charles Main 3.0 .98 3 34
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
追加:命令a
root@lanquark:~/demo# sed '/^north/a\--->THE NORTH SALES DISTRICT HAS MOVED<---' datafile
northwest NW Charles Main 3.0 .98 3 34
--->THE NORTH SALES DISTRICT HAS MOVED<---
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
--->THE NORTH SALES DISTRICT HAS MOVED<---
north NO Margot Weber 4.5 .89 5 9
--->THE NORTH SALES DISTRICT HAS MOVED<---
central CT Ann Stephens 5.7 .94 5 13
插入:i命令
root@lanquark:~/demo# sed '/eastern/i\--->NEW ENGLIST REGION<---' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
--->NEW ENGLIST REGION<---
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
修改:c命令
root@lanquark:~/demo# sed '/eastern/c\THE EASTERN REGION HAS TEMPORARLLY CLOSED' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
THE EASTERN REGION HAS TEMPORARLLY CLOSED
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
获取下一行:n命令
root@lanquark:~/demo# sed -n '/eastern/{n;s#AM#Archie#p;}' datafile
northeast NE Archie Main Jr. 5.1 .94 3 13
转换:y命令
root@lanquark:~/demo# sed '1,3y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/' datafile
NORTHWEST NW CHARLES MAIN 3.0 .98 3 34
WESTERN WE SHARON GRAY 5.3 .97 5 23
SOUTHWEST SW LEWIS DALSASS 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
退出:q命令
//打印完第5行退出
root@lanquark:~/demo# sed '5q' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
//匹配到模式时,先替换再退出
root@lanquark:~/demo# sed '/Lewis/{s#Lewis#Joseph#;q;}' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Joseph Dalsass 2.7 .8 2 18
暂存和取用:h命令和g命令
//WE行打印2次,G是追加
root@lanquark:~/demo# sed -e '/northeast/h' -e '$G' datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
→northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
→northeast NE AM Main Jr. 5.1 .94 3 13
//WE行只打印一次
root@lanquark:~/demo# sed -e '/WE/{h;d;}' -e '/CT/{G;}' datafile
northwest NW Charles Main 3.0 .98 3 34
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
central CT Ann Stephens 5.7 .94 5 13
→western WE Sharon Gray 5.3 .97 5 23
//g是覆盖
root@lanquark:~/demo# sed -e '/WE/{h;d;}' -e '/CT/{g;}' datafile
northwest NW Charles Main 3.0 .98 3 34
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
north NO Margot Weber 4.5 .89 5 9
western WE Sharon Gray 5.3 .97 5 23
暂存和互换
//x表示互换
root@lanquark:~/demo# sed -e '/Patricia/h' -e /Margot/x datafile
northwest NW Charles Main 3.0 .98 3 34
western WE Sharon Gray 5.3 .97 5 23
southwest SW Lewis Dalsass 2.7 .8 2 18
southern SO Suan Chin 5.1 .95 4 15
southeast SE Patricia Hemenway 4.0 .7 4 17
eastern EA TB Savage 4.4 .84 5 20
northeast NE AM Main Jr. 5.1 .94 3 13
→southeast SE Patricia Hemenway 4.0 .7 4 17
central CT Ann Stephens 5.7 .94 5 13
四 awk
awk是一种用于处理数据和生成报告的UNIX编程语言,gawk是基于Linux的GNU版本。
awk的格式:awk指令由模式、操作、或模式与操作的组合组成。
awk可以接受来自文件、管道或标准输入的输入。
1.从文件输入
格式:
awk 'pattern' filename
awk '{action}' filename
awk 'pattern{action}' filename
//示例文件
[root@lanquark demo]# cat employees
Tom Jones 4424 5/12/66 54335
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 65000
Billy Black 1683 9/23/44 33650
//仅有模式
[root@lanquark demo]# awk '/Mary/' employees
Mary Adams 5346 11/4/63 28765
//仅有动作
[root@lanquark demo]# awk '{print $1}' employees
Tom
Mary
Sally
Billy
//模式和动作的组合
[root@lanquark demo]# awk '/Sally/{print $1,$2}' employees
Sally Chang
2.从命令输入
格式
command | awk 'pattern'
command | awk '{action}'
command | awk 'pattern{action}'
//仅有模式
[root@lanquark demo]# cat employees | awk '/Mary/'
Mary Adams 5346 11/4/63 28765
//有模式有动作
[root@lanquark demo]# cat employees | awk '/Mary/{print $1,$2}'
Mary Adams
awk的正则表达式元字符
元字符 | 说明 |
---|---|
^ | 在行首匹配 |
$ | 在行尾匹配 |
. | 匹配单个任意字符 |
* | 匹配零个或多个前导字符 |
+ | 匹配1个或多个前导字符 |
? | 匹配0个或1个前导字符 |
[ABC] | 匹配指定字符组(即A、B和C)中的字符 |
[^ABC] | 匹配任何一个不在指定字符组(即A、B和C)中的字符 |
[A-Z] | 匹配A至Z之间的任一字符 |
A|N | 匹配A或B |
(AB)+ | 匹配一个AB或多个AB组合,如AB,ABAB,ABABAB |
\* | 匹配星号本身 |
& | 用在替代串中,代表查找串中匹配到的内容 |
示例文件
[root@lanquark demo]# cat datafile1
northwest NW Joel Craig 3.0 .98 3 4
western WE Sharon Kelly 5.3 .97 5 23
southwest SW Chris Foster 2.7 .8 2 18
southern SO May Chin 5.1 .95 4 15
southeast SE Derek Johnson 4.0 .7 4 17
eastern EA Susan Beal 4.4 .84 5 20
northeast NE TJ Nichols 5.1 .94 3 13
north NO Val Shultz 4.5 .89 5 9
central CT Sheri Watson 5.7 .94 5 13
简单模式匹配
[root@lanquark demo]# awk '/west/' datafile1
northwest NW Joel Craig 3.0 .98 3 4
western WE Sharon Kelly 5.3 .97 5 23
southwest SW Chris Foster 2.7 .8 2 18
匹配行首(^)
[root@lanquark demo]# awk '/^north/' datafile1
northwest NW Joel Craig 3.0 .98 3 4
northeast NE TJ Nichols 5.1 .94 3 13
north NO Val Shultz 4.5 .89 5 9
匹配模式no或so(|)
[root@lanquark demo]# awk '/^(no|so)/' datafile1
northwest NW Joel Craig 3.0 .98 3 4
southwest SW Chris Foster 2.7 .8 2 18
southern SO May Chin 5.1 .95 4 15
southeast SE Derek Johnson 4.0 .7 4 17
northeast NE TJ Nichols 5.1 .94 3 13
north NO Val Shultz 4.5 .89 5 9
简单的操作
[root@lanquark demo]# awk '{print $3,$2}' datafile1
Joel NW
Sharon WE
Chris SW
May SO
Derek SE
Susan EA
TJ NE
Val NO
Sheri CT
[root@lanquark demo]# awk '{print "number of fields:",NF}' datafile1
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
模式与操作组合的正则表达式
[root@lanquark demo]# awk '/northeast/{print $3,$2}' datafile1
TJ NE
[root@lanquark demo]# awk '/^[ns]/{print $1}' datafile
[root@lanquark demo]# awk '/^[ns]/{print $1}' datafile1
northwest
southwest
southern
southeast
northeast
north
匹配模式(~)
[root@lanquark demo]# awk '$5~/\.[7-9]+/' datafile
southwest SW Lewis Dalsass 2.7 .8 2 18
central CT Ann Stephens 5.7 .94 5 13
输入字段分隔符(F)
//未指定分隔符,默认是以空格
[root@lanquark demo]# head -n 5 /etc/passwd | awk '{print $1}'
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
//指定分隔符为:号
[root@lanquark demo]# head -n 5 /etc/passwd | awk -F: '{print $1}'
root
bin
daemon
adm
lp
比较表达式
关系运算符
运算符 | 含义 | 示例 |
---|---|---|
< | 小于 | x < y |
<= | 小于或等于 | x <= y |
== | 等于 | x == y |
!= | 不等于 | x != y |
>= | 大于或等于 | x >= y |
> | 大于 | x > y |
~ | 与正则表达式匹配 | x ~ /y/ |
!~ | 与正则表达式不匹配 | x !~ /y/ |
示例文件
[root@lanquark demo]# cat employees
Tom Jones 4424 5/12/66 54335
Mary Adams 5346 11/4/63 28765
Sally Chang 1654 7/22/54 65000
Billy Black 1683 9/23/44 33650
[root@lanquark demo]# awk '$3 == 5346' employees
Mary Adams 5346 11/4/63 28765
[root@lanquark demo]# awk '$3>5000{print $1}' employees
Mary
[root@lanquark demo]# awk '$2~/Adam/' employees
Mary Adams 5346 11/4/63 28765
[root@lanquark demo]# awk '$2!~/Adam/' employees
Tom Jones 4424 5/12/66 54335
Sally Chang 1654 7/22/54 65000
Billy Black 1683 9/23/44 33650
算术运算
算术运算符
运算符 | 含义 | 示例 |
---|---|---|
+ | 加 | x + y |
- | 减 | x - y |
* | 乘 | x * y |
/ | 除 | x / y |
% | 模 | x % y |
^ | 幂 | x ^ y |
[root@lanquark demo]# cat emp.data
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
[root@lanquark demo]# awk '$3>0{print $2*$3}' emp.data
40
100
121
76.5
逻辑运算符和复合运算符
运算符 | 含义 | 示例 |
---|---|---|
&& | 逻辑与 | a&&b |
|| | 逻辑或 | a||b |
! | 逻辑非 | !a |
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
[root@lanquark demo]# awk '$3>10 && $3<22' emp.data
Mark 5.00 20
Susie 4.25 18
赋值运算符
[root@lanquark demo]# awk '$3=="Chris"{$3="Christian";print}' datafile1
southwest SW Christian Foster 2.7 .8 2 18
内置变量
变量名 | 含义 |
---|---|
ARGC | 命令行参数数目 |
ARGIND | 命令行中当前文件在ARGV内的索引 |
ARGV | 命令参数构成的数组 |
CONVFMT | 数字转换格式,默认为%.6g |
ENVIRON | 包含当前shell环境变量值的数组 |
ERRNO | 当使用getline函数进行读操作或使用cloase函数时,因重定向操作而生产的系统错误 |
FIELDWIDTHS | 在分隔固定宽度的列表时,使用空白而不是FS进行分隔的字段宽度列表 |
FILENAME | 当前输入文件的文件名 |
FNR | 当前文件的记录数 |
FS | 输入字段分隔符,默认为空格 |
IGNORECASE | 在正则表达式和字符串匹配中不区分大小写 |
NF | 当前记录中的字段数 |
NR | 目前的记录数 |
OFMT | 数字的输出格式 |
OFS | 输出字段分隔符 |
ORS | 输出记录分隔符 |
RLENGTH | match函数匹配到的字符串的长度 |
RS | 输入记录分隔符 |
RSTART | match函数匹配到的字符串的偏移量 |
RT | 记录终结符,对于匹配字符或者用RS指定的regex,gawk将RT设置到输入文本 |
SUBSEP | 数组下标分隔符 |
[root@lanquark demo]# cat employees2
Tom Jones:4424:5/12/66:54335
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:65000
Billy Black:1683:9/23/44:33650
[root@lanquark demo]# awk -F: '$1=="Mary Adams"{print NR,$1,$2,$NF}' employees2
2 Mary Adams 5346 28765
[root@lanquark demo]# awk -F: 'BEGIN{IGNORECASE=1};$1=="mary adams"{print NR,$1,$2,$NF}' employees2
2 Mary Adams 5346 28765
BEGIN模式
[root@lanquark demo]# awk 'BEGIN{FS=":";OFS="\t";ORS="\n\n"}{print $1,$2,$3}' employees2
Tom Jones 4424 5/12/66
Mary Adams 5346 11/4/63
Sally Chang 1654 7/22/54
Billy Black 1683 9/23/44
[root@lanquark demo]# awk 'BEGIN{print "Make Year"}'
Make Year
END模式
[root@lanquark demo]# awk 'END{print "The number of records is",NR}' employees2
The number of records is 4
[root@lanquark demo]# awk '/Mary/{count++}END{print "Mary was found",count,"times"}' employees2
Mary was found 1 times
重定向和管道
输出重定向(>清空 >>追加,不清空)
[root@lanquark demo]# awk '$1=="Tom"{print $1}' employees2
Tom
[root@lanquark demo]# awk '$1=="Tom"{print $1>"passing_file"}' employees2
[root@lanquark demo]# cat passing_file
Tom
输入重定向(getline)
[root@lanquark demo]# awk 'BEGIN{"date"|getline d;print d}'
Tue Jun 5 22:53:24 EDT 2018
[root@lanquark demo]# awk 'BEGIN{"date" | getline d;split(d,mon);print mon[2]}'
Jun
[root@lanquark demo]# awk 'BEGIN{while("ls" | getline) print}'
1111
1.txt
datafile
datafile1
emp.data
employees
employees2
id.txt
ipconfig.txt
lab5.data
names
newfile
newfile1
passing_file
picnic
temp
test1.txt
test.dir
textfile
tt.txt
管道
如果在awk中打开了管道,就必须先关闭它才能打开另一个管道。管道符右边的命令被括在双引号中。
[root@lanquark demo]# cat names
john smith
alice cheba
george goldberg
susan goldberg
tony tram
barbara nguyen
elizabeth lone
dan savage
eliza goldberg
john goldenrod
[root@lanquark demo]# awk '{print $1,$2 | "sort -r +1 -2 +0 -1"}' names
tony tram
john smith
dan savage
barbara nguyen
elizabeth lone
john goldenrod
susan goldberg
george goldberg
eliza goldberg
alice cheba
//关闭管道
[root@lanquark demo]# awk '{print $1,$2 | "sort -r +1 -2 +0 -1"}END{print "game over"}' names
game over
tony tram
john smith
dan savage
barbara nguyen
elizabeth lone
john goldenrod
susan goldberg
george goldberg
eliza goldberg
alice cheba
[root@lanquark demo]# awk '{print $1,$2 | "sort -r +1 -2 +0 -1"}END{close("sort -r +1 -2 +0 -1");print "game over"}' names
tony tram
john smith
dan savage
barbara nguyen
elizabeth lone
john goldenrod
susan goldberg
george goldberg
eliza goldberg
alice cheba
game over
五、扩展
递归过滤:
如在data目录下,过滤所有*.php文档中含有eval的行
grep -r --include="*.php" 'eval' /data/
练习