Linux 文本处理三剑客之grep

zoukankan html css js c++ java

Linux 文本处理三剑客之grep
文本处理都要使用正则表达式，正则表达式有：
- 基本正则表达式：grep或者egrep -G
- 扩展正则表达式：egreo或者grep -E
Linux 文本处理三剑客：
- sed：stream editor，流编辑工具程序。
- awk：linux上是gawk，格式化文本工具程序。
- grep：Global search Regular expression and print out the line
  
  使用基本正则表达式的命令：
  
  grep
  
  egrep -G
  
  fgrep -G
  
  使用扩展正则表达式的命令：
  
  grep -E
  
  egrep
  
  fgrep -E
  
  不使用正则表达式的命令，速度会快得多。
  
  fgrep
  
  文本搜索工具，根据用户指定的搜索条件，对目标文本逐行扫描，打印出匹配的所有行。
  
  搜索条件：就是用正则表达式来表示。
一，grep使用介绍：
- 语法：
  
  grep [OPTIONS] PATTERN [FILE...]
  
  grep [OPTIONS][-e PATTERN | -f FILE] [FILE...]
- 最基本的例子：查找"UUID"，在/etc/fstab
  
  # grep "UUID" /etc/fstab UUID=3d3b316a-529e-484a-9895-e785fdde5365 /boot xfs defaults 0 0
- 搜索时，搜索条件的字母是区分大小写的，让它不区分大小写的选项：-i
  
  # grep "UUiD" /etc/fstab # echo $? 1 # grep -i "UUiD" /etc/fstab UUID=3d3b316a-529e-484a-9895-e785fdde5365 /boot xfs defaults 0 0
- 不让它显示匹配到的一整行，只显示匹配但的文本内容本身：-o
  
  # grep -o "UUID" /etc/fstab UUID
- 让它显示没有匹配到的行：-v
  
  # grep -v "UUID" /etc/fstab /dev/mapper/centos-root / xfs defaults 0 0
- 不显示匹配到的内容，只想知道是否匹配的结果：-q
  
  # grep -q "UUID" /etc/fstab # echo $? 0 # grep -q "UUIDa" /etc/fstab # echo $? 1
- 使用扩展正则表达式：-E
- 显示匹配到的行的行号：-n
  
  # grep -n "UUID" /etc/fstab 10:UUID=3d3b316a-529e-484a-9895-e785fdde5365 /boot xfs defaults 0 0
- 显示匹配到行的后面几行：-A #。#是数字
  
  # grep -nA1 gentoo /etc/passwd 49:gentoo:x:1004:1004::/tmp/gentoo:/bin/bash 50-fedora:x:1005:1005::/tmp/fedora:/bin/bash
- 显示匹配到行的前面几行：-B #。#是数字
  
  # grep -nB2 gentoo /etc/passwd 47-za2:x:1002:1003::/home/za2:/bin/bash 48-mysql:x:1003:979::/home/mysql:/sbin/nologin 49:gentoo:x:1004:1004::/tmp/gentoo:/bin/bas
- 显示匹配到行的前面几行和后面几行：-C #。#是数字
  
  # grep -nC1 gentoo /etc/passwd 48-mysql:x:1003:979::/home/mysql:/sbin/nologin 49:gentoo:x:1004:1004::/tmp/gentoo:/bin/bash 50-fedora:x:1005:1005::/tmp/fedora:/bin/bash
- 字符匹配
  
  .：匹配任意单个字符
  
  # grep -n "f..ora" /etc/passwd 50:fedora:x:1005:1005::/tmp/fedora:/bin/bash # grep "f.ora" /etc/passwd #
  
  []：匹配指定范围内的任意单个字符，中间不用逗号分隔
  
  [^]：匹配指定范围外的任意单个字符
  
  [:digit:]，[:lower:]，[:upper:]，[:alpha:]，[:alnum:]，[:punct:]，[:space:]
  
  例子：匹配r和t之间，是2个字母的行。
  
  # grep "r[[:alpha:]][[:alpha:]]t" /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin
- 匹配次数：默认是贪婪模式，匹配到后，还会一直继续匹配下去，直到匹配不到了才停。
  
  下面的例子匹配"x*y"，【xxxxy】里有很多x，贪婪模式就把所有x都匹配了，而不是匹配的【xy】。
  
  【*】：匹配其前面的字符任意次。0次也包括。
  
  注意下面方括号里的是被匹配到的。
  
  # cat t1 abxy aby xxxxy yab asdf # grep "x*y" t1 ab[xy] ab[y] [xxxxy] [y]ab
  
  匹配r本身和r之后面的所有字符。
  
  # grep "r.*" /etc/passwd
  
  【?】:匹配其前面的字符0次或者1次。
  
  【+】:匹配其前面的字符1次或者多次。
  
  【{m}】:匹配其前面的字符m次。
  
  【{m,n}】:匹配其前面的字符至少m次，至多n次。
  
  【{m,}】:匹配其前面的字符至少m次.
  
  【{0,n}】:匹配其前面的字符至多n次。
  
  注意下面方括号里的是被匹配到的。
  
  # cat t1 abxy aby xxxxy yab asdf # grep "x?y" t1 ab[xy] ab[y] xxx[xy] [y]ab # grep "x+y" t1 ab[xy] [xxxxy] # grep "x{1}y" t1 ab[xy] xxx[xy] # grep "x{2}y" t1 xx[xxy] # grep "x{2,3}y" t1 x[xxxy] # grep "x{1,2}y" t1 ab[xy] xx[xxy] # grep "x{1,}y" t1 ab[xy] [xxxxy] [root@localhost tmp]# grep "x{,2}y" t1 ab[xy] ab[y] xx[xxy] [y]ab
- 位置锚定
  
  【^】：行首锚定
  
  【$】：行尾锚定
  
  【^PATTERN$】：用PATTERN匹配整行。
  
  【^$】：什么都不能有的空行。
  
  【^[1]+$】：包含空白字符的行。
  
  单词：非特殊字符组成的连续字符都称为单词。
  
  【<或】:单词首锚定，用于单词模式的左侧
  
  【>或】:单词尾锚定，用于单词模式的右侧
  
  【<单词>】:匹配完整单词。
  
  # grep root /etc/passwd [root]:x:0:0:[root]:/[root]:/bin/bash operator:x:11:0:operator:/[root]:/sbin/nologin [root]kit:x:1006:1006::/home/[root]kit:/bin/bash user4:x:1007:1007::/home/user4:/bin/ch[root] ch[root]er:x:1008:1008::/home/ch[root]er:/bin/bash # grep "^root" /etc/passwd [root]:x:0:0:root:/root:/bin/bash [root]kit:x:1006:1006::/home/rootkit:/bin/bash # grep "root$" /etc/passwd user4:x:1007:1007::/home/user4:/bin/ch[root] # grep "^root$" /etc/passwd # echo $? 1 # cat t1 abxy aby xxxxy yab asdf a # grep -n "^$" t1 3: # grep -n "^[[:space:]]*$" t1 3: 5: # grep -n "^[[:space:]]+$" t1 5: # grep "<root" /etc/passwd [root]:x:0:0:[root]:/[root]:/bin/bash operator:x:11:0:operator:/[root]:/sbin/nologin [root]kit:x:1006:1006::/home/[root]kit:/bin/bash # grep "root>" /etc/passwd [root]:x:0:0:[root]:/[root]:/bin/bash operator:x:11:0:operator:/[root]:/sbin/nologin user4:x:1007:1007::/home/user4:/bin/ch[root] # grep "<root>" /etc/passwd [root]:x:0:0:[root]:/[root]:/bin/bash operator:x:11:0:operator:/[root]:/sbin/nologin
  
  练习1：显示/etc/passwd文件中不以/bin/bash结尾的行
  
  # grep -nv "/bin/bash$" /etc/passwd
  
  练习2：找出/etc/passwd文件中2位数或3位数的单词。
  
  # grep -n "<[[:digit:]]{2,3}>" /etc/passwd
  
  练习3：找出/etc/grub2.cfg文件中，以至少一个空白字符开头，且后面非空白字符的行。
  
  # grep -n "^[[:space:]]{1,}[^[:space:]]" /etc/grub2.cfg
  
  练习4：找出"netstat -tan"命令结果中以"LISTEN"后跟0个，1个或多个空白字符结尾的行
  
  # netstat -tan | grep -n "LISTEN[[:space:]]*"
- 分组及引用
  
  分组【()】：将一个或多个字符用括号捆绑在一起，当作一个整体去匹配。
  
  引用：被匹配到的分组，会保存在特殊的变量里，在后面可以引用它们。
  
  1：第一个被匹配到的分组
  
  2：第二个被匹配到的分组
  
  #：第#个被匹配到的分组
  
  练习：匹配一个分组，且后面有一个同样的串。
  
  # cat t2 He likes his lover. He loves his lover. She likes her liker. She loves her liker. # grep "l..e.*l..e" t2 He [likes his love]r. He [loves his love]r. She [likes her like]r. She [loves her like]r. # grep "(l..e).*1" t2 He [loves his love]r. She [likes her like]r.
二，egrep使用介绍：

grep里的选项的用法在egrep里也适用。
- 字符匹配：和grep相同
- 次数匹配
  
  ?：匹配其前面的字符0次或者1次。
  
  +：匹配其前面的字符1次或者多次。
  
  {m}：匹配其前面的字符m次。
  
  {m,n}：匹配其前面的字符至少m次，至多n次。
  
  {m,}：匹配其前面的字符至少m次.
  
  {0,n}：匹配其前面的字符至多n次。
- 位置锚定：和grep相同
- 分组及引用
  
  分组()：将一个或多个字符用括号捆绑在一起，当作一个整体去匹配。
  
  引用：和grep相同
- 或
  
  a|b：a或者b
  
  C|cat：不是Cat或者cat，是C或者cat
  
  (C|c)at：cat或者Cat
  
  练习1：找出/proc/meminfo文件中，所有在大写或小写S开头的行。用3种方法实现。
  
  # egrep "^(s|S)" /proc/meminfo
  
  # grep -ni "^s" /proc/meminfo
  
  # grep "^[sS]" /proc/meminfo
  
  练习2：找出/etc/passwd文件中2位数或3位数的单词。
  
  # egrep -n "<[[:digit:]]{2,3}>" /etc/passwd
  
  练习3：找出/etc/grub2.cfg文件中，以至少一个空白字符开头，且后面非空白字符的行。
  
  # egrep -n "^[[:space:]]{1,}[^[:space:]]" /etc/grub2.cfg
  
  练习4：找出/etc/rc.d/init.d/functions文件中某单词后面跟一个小括号的行。
  
  # grep "<.*>[[:space:]]*()" /etc/rc.d/init.d/functions
  
  练习5：使用echo命令输出一个绝对路径，使用egrep取出基名。
  
  # echo /etc/rc.d/init.d/functions | grep -o "^/.*/" /etc/rc.d/init.d/ # echo /etc/rc.d/init.d/functions | egrep -o "[^/]+$" functions
  
  练习6：找出ifconfig命令结果中1-255之间的数值。
  
  # ifconfig | grep -E "<[1-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]>"
  
  练习7：找出ifconfig命令结果中IP地址。
  
  # ifconfig | egrep -n "<[0-9]+>.<[0-9]+>.<[0-9]+>.<[0-9]+>"
  
  练习8：找出用户名和shell名相同的用户。
  
  # egrep "^([^:]+>).*1$" /etc/passwd sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt
三，fgrep使用介绍

当无需使用到正则表达式时，使用fgrep性能更好。

四，文本查看及处理工具

1，wc：统计行数，单词数，字节数，字符数
- -l：行数
- -w：单词数
- -c：字节数
- -m：字符数
```
# wc /etc/fstab
 12  60 541 /etc/fstab
# wc -l /etc/fstab
12 /etc/fstab
# wc -w /etc/fstab
60 /etc/fstab
# wc -c /etc/fstab
541 /etc/fstab
# wc -m /etc/fstab
541 /etc/fstab
```
2，remove sections（列） from each line of files

linux下的文本，也是有格式的，所谓的格式，就是有可识别的分隔标识，用分隔标识，就可以把文本内容，切分成列。

比如，/etc/passwdwen文件里的内容就是用冒号分隔的。
- 语法：cut OPTION... [FILE]...
- 指定冒号为分隔符：-d:
  
  只能指定单一分隔符。
- 留下哪些列：-f1-3,5,7
```
# cut -d: -f1-3,5,7 /etc/passwd
rootkit:x:1006::/bin/bash
user4:x:1007::/bin/chroot
# wc -l /etc/rc.d/init.d/functions
712 /etc/rc.d/init.d/functions
# wc -l /etc/rc.d/init.d/functions | cut -d' ' -f1
712
```
3，按文本的某一列排序：sort。

把文本用指定的分隔符切分成列，然后用特定的列排序行。类似微软的excel的按列排序功能。
- 语法：sort [OPTION]... [FILE]...
- 指定分隔符：-t
- 指定用于排序的列的号码：-k
- 基于数值大小而非字符进行排序：-n
- 逆序排序：-r
- 忽略字符大小写：-f
- 连续，并重复的行只保留一份：-u
用：分隔，按第3列的数字大小比较，降序排序。
```
# sort -t: -k3 -nr /etc/passwd
```
用：分隔，用第7列基于字母比较，升序排序，并去掉重复的行。
```
# sort -t: -k7 -u /etc/passwd
```
4，删除重复的行：uniq

使用的前提：必须先sort
- 语法：uniq [OPTION]... [INPUT [OUTPUT]]
- 显示重复的次数：-c
- 仅显示未曾重复过的行：-u
- 仅显示重复过的行：-d
检查shell的使用情况。
```
# cut -d: -f7 /etc/passwd | sort |uniq -c
      7 /bin/bash
      1 /bin/chroot
      1 /bin/csh
      1 /bin/false
      1 /bin/sync
      1 /sbin/halt
     40 /sbin/nologin
      1 /sbin/shutdown
# cut -d: -f7 /etc/passwd | sort |uniq -u
/bin/chroot
/bin/csh
/bin/false
/bin/sync
/sbin/halt
/sbin/shutdown
# cut -d: -f7 /etc/passwd | sort |uniq -d
/bin/bash
/sbin/nologin
```
5，逐行比较文件，可以比较多个文件，可以按目录比较
- 语法：diff [OPTION]... FILES
- 用重定向生成一个差分的文件。
```
# diff t1 t2
# diff t1 t2 > patch1
```
6，根据diff产生的差分文件，给源文件打补丁：patch
- 修改旧的文件，让旧的文件升级(打补丁)。-i后面的文件是用diff输出重定向生成的文件。
```
# patch -i patch1 t1
```
- 补丁打错了，恢复到旧的文件：-R
```
# patch -R -i patch1 t1
```
练习：取出某个网卡的ip地址。
```
# ifconfig enp0s3
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.247.236.19  netmask 255.255.254.0  broadcast 10.247.237.255
        inet6 fe80::b497:5ec:1efb:72b5  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:10:c2:53  txqueuelen 1000  (Ethernet)
        RX packets 32057  bytes 5882570 (5.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5324  bytes 1032770 (1008.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
# ifconfig enp0s3 | grep "<inet>" | cut -d' '  -f10
10.247.236.19
```
# c/c++ 学习互助QQ群：877684253 ![](https://img2018.cnblogs.com/blog/1414315/201811/1414315-20181106214320230-961379709.jpg) # 本人微信：xiaoshitou5854
[:space:] ↩︎
查看全文

相关阅读:
TinyXML 2.4.2发布
 OpenAL 1.1 Release
Microsoft ship Visual Studio 2005 and .NET 2.0
Boost中文站
 第一次Blog
POJ3020 Antenna Placement 二分图匹配+拆点构图
 SGU438 The Glorious Karlutka River =) 最大流（动态流问题）
HDU2732 Leapin' Lizards 最大流
 SGU176 Flow construction 有上下界的最小流
 POJ1459 Power Network 网络流

原文地址：https://www.cnblogs.com/xiaoshiwang/p/12084180.html

Linux 文本处理三剑客之grep

文本处理都要使用正则表达式，正则表达式有：

Linux 文本处理三剑客：

一，grep使用介绍：

二，egrep使用介绍：

三，fgrep使用介绍

四，文本查看及处理工具