zoukankan      html  css  js  c++  java
  • shell及脚本3——正则表达式

    一.正则表达式

    1.1. 什么是正则表达式

       正则表达式是处理字符串的方法,以行为单位,通过一些特殊符号的辅助,让用户可以轻易进行查找、删除、替换某特定字符串的操作。

    1.2. 正则表达式与通配符的区别

      网友看法,有些道理,直接摘抄了:

      通配符是系统level的,通配符多用在文件名上,比如查找find,ls,cp,等等;

      而正则表达式需要相关工具的支持: egrep, awk, vi, perl。在文本过滤工具里,都是用正则表达式,比如像awk,sed等,是针对文件的内容的。不是所有工具(命令)都支持正则表达式。

      说白了就是有些命令支持正则表达式,一些不支持。

    1.3. 语系对正则表达式的影响

      不通语系,对字符的翻译规则不通,例如

      LANG=C, 顺序为:0,1,2,3,4....A,B,C,D......Za,b,c,d....z

      LANG=zh_CN,顺序为:0,1,2,3,4....a,A,b,B,c,C,d,D......z,Z

    1.4. 一些特殊符号

      特殊符号可以规避语系的影响,一些常用的特殊符号:

        [:alnum:]    所有的字母和数字,0-9,A-Z,a-z
        [:alpha:]     所有的字母,A-Z,a-z
        [:blank:]     所有呈水平排列的空白字符,空格和TAB
        [:cntrl:]      所有的控制字符,CR,LF,TAL,DEL等
        [:digit:]      所有的数字,0-9
        [:graph:]    所有的可打印字符,不包括空格(空格和TAB)外的所有按键
        [:lower:]     所有的小写字母,,a-z
        [:print:]      所有的可打印字符,包括空格
        [:punct:]    所有的标点字符
        [:space:]    所有呈水平或垂直排列的空白字符
        [:upper:]    所有的大写字母,A-Z
        [:xdigit:]    所有的十六进制数,0-9,A-Z,a-z的数字与字符

    二.基础正则表达式

    2.1.练习,使用grep

    2.1.1. grep的高级功能

    grep [-A] [-B]  ‘搜索字符串’  filename

    -A: after + 数字n,除了该行,列出后面的n行,-An,无空格

    -B:before + 数字n,除了该行,列出前面的n行-Bn,无空格

    :/$ dmesg | grep -n 'eth'
    1564:[    2.427478] e1000 0000:02:01.0 eth0: (PCI:66MHz:32-bit) 00:0c:29:93:15:12
    1565:[    2.427489] e1000 0000:02:01.0 eth0: Intel(R) PRO/1000 Network Connection
    1569:[    2.433153] e1000 0000:02:01.0 ens33: renamed from eth0
    :/$ dmesg | grep -n -A3 -B2 'eth'  #-A和-B紧接数字,没有空格
    1562-[    2.364718] Console: switching to colour frame buffer device 100x37
    1563-[    2.395386] [drm] Initialized vmwgfx 2.9.0 20150810 for 0000:00:0f.0 on minor 0
    1564:[    2.427478] e1000 0000:02:01.0 eth0: (PCI:66MHz:32-bit) 00:0c:29:93:15:12
    1565:[    2.427489] e1000 0000:02:01.0 eth0: Intel(R) PRO/1000 Network Connection
    1566-[    2.427823] ahci 0000:02:05.0: version 3.0
    1567-[    2.428781] ahci 0000:02:05.0: AHCI 0001.0300 32 slots 30 ports 6 Gbps 0x3fffffff impl SATA mode
    1568-[    2.428784] ahci 0000:02:05.0: flags: 64bit ncq clo only 
    1569:[    2.433153] e1000 0000:02:01.0 ens33: renamed from eth0
    1570-[    2.445075] scsi host3: ahci
    1571-[    2.445243] scsi host4: ahci
    1572-[    2.445375] scsi host5: ahci

    2.1.2. 基础正则表达式练习

    使用鸟哥的例子,regular_express.txt

    2.1.2.1  查找特定字符串和反向选取

    :~/test$ grep -n 'the' regular_express.txt 
    8:I can't finish the test.^M
    12:the symbol '*' is represented as start.
    15:You are the best is mean you are the no. 1.
    16:The world <Happy> is the same with "glad".
    18:google is the best tools for search keyword.
    :~/test$ grep -vn 'the' regular_express.txt   #反向选取 1:"Open Source" is a good mechanism to develop programs. 2:apple is my favorite food. 3:Football game is not use feet only. 4:this dress doesn't fit me. 5:However, this dress is about $ 3183 dollars.^M 6:GNU is free air not free beer.^M 7:Her hair is very beauty.^M 9:Oh! The soup taste good.^M 10:motorcycle is cheap than car. 11:This window is clear. 13:Oh! My god! 14:The gd software is a library for drafting programs.^M 17:I like dog. 19:goooooogle yes! 20:go! go! Let's go.

    2.1.2.2 利用[]查找字符集合

    :~/test$ grep -n 't[ae]st'  regular_express.txt   #[ae]表示1个字符,a或者e
    8:I can't finish the test.^M
    9:Oh! The soup taste good.^M
    :~/test$ grep -n 'oo'  regular_express.txt 
    1:"Open Source" is a good mechanism to develop programs.
    2:apple is my favorite food.
    3:Football game is not use feet only.
    9:Oh! The soup taste good.^M
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    :~/test$ grep -n '[^g]oo' regular_express.txt    #[^g]不是g,查找oo且前面不是g的,第1行和第9行没有了 2:apple is my favorite food. 3:Football game is not use feet only. 18:google is the best tools for search keyword. 19:goooooogle yes!
    :~/test$ grep -n '[^a-z]oo' regular_express.txt     #找oo且前面不是小写字符的 3:Football game is not use feet only.
    :~/test$ grep -n '[^[:lower:]]oo' regular_express.txt #[:lower:]小写字符的另一种写法 3:Football game is not use feet only.
    :~/test$ grep -n '[0-9]' regular_express.txt         #找数字
    5:However, this dress is about $ 3183 dollars.^M
    15:You are the best is mean you are the no. 1.
    :~/test$ grep -n '[[:digit:]]' regular_express.txt      #数字的另一种写法[:digit:]
    5:However, this dress is about $ 3183 dollars.^M
    15:You are the best is mean you are the no. 1.

    2.1.2.3 行首^与行尾$字符

    注意:[^]代表反向选取, 在括号外面^[]表示行首

    :~/test$ grep -n '^the'  regular_express.txt   #找行首是the的
    12:the symbol '*' is represented as start.
    :~/test$ grep -n '^[a-z]'  regular_express.txt   #行首是小写字符的
    2:apple is my favorite food.
    4:this dress doesn't fit me.
    10:motorcycle is cheap than car.
    12:the symbol '*' is represented as start.
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    20:go! go! Let's go.
    :~/test$ grep -n '^[^a-zA-Z]'  regular_express.txt   #行首不是字符的
    1:"Open Source" is a good mechanism to develop programs.
    :~/test$ grep -n '\.$'  regular_express.txt     #行尾是点.的,点前加了转义字符,以为.本身是特殊字符
    20:go! go! Let's go.

    2.1.2.3 任意一个字符.与重复字符*

    *:重复前一个字符0到无穷多次的意思,例如a*,代表 “空~无穷多个a”。与通配符不同,通配符中*表示0到多个字符,a*表示a或者“a若干字符”

    . : 一定有一个任意字符

    :~/test$ grep -n 'g..d'  regular_express.txt         # g..d表示g和d之间一定有2个任意字符
    1:"Open Source" is a good mechanism to develop programs.
    9:Oh! The soup taste good.^M
    16:The world <Happy> is the same with "glad".
    
    :~/test$ grep -n 'ooo*'  regular_express.txt         #ooo*,表示有2~无穷多个o
    1:"Open Source" is a good mechanism to develop programs.
    2:apple is my favorite food.
    3:Football game is not use feet only.
    9:Oh! The soup taste good.^M
    18:google is the best tools for search keyword.
    19:goooooogle yes!

    :~/test$ grep -n 'g.*g'  regular_express.txt         # g.*g,找g开头,g结尾的字符,.*”可以理解成0个或人一多个字符,与通配符中的*相当了
    1:"Open Source" is a good mechanism to develop programs.
    14:The gd software is a library for drafting programs.^M
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    20:go! go! Let's go.
    :~/test$ grep -n 'g*g'  regular_express.txt         # g*g,不一定是g开头,g结尾,因为g*表示0~无穷多个g,
    1:"Open Source" is a good mechanism to develop programs.
    3:Football game is not use feet only.
    9:Oh! The soup taste good.^M
    13:Oh!     My god!
    14:The gd software is a library for drafting programs.^M
    16:The world <Happy> is the same with "glad".
    17:I like dog.
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    20:go! go! Let's go.

    2.1.2.4 限定连续重复的字符范围{}

    :~/test$ grep -n 'go\{2,5\}g'  regular_express.txt   # \{2,5\},2到5个o
    18:google is the best tools for search keyword.
    :~/test$ grep -n 'go\{2\}g'  regular_express.txt    # \{2\},2个o
    18:google is the best tools for search keyword.
    :~/test$ grep -n 'go\{2,\}g'  regular_express.txt   # \{2,\},2个及以上个o
    18:google is the best tools for search keyword.
    19:goooooogle yes!:~/test$ grep -n 'go\{,5\}g'  regular_express.txt   # \{,5\},5个及以下个o
    18:google is the best tools for search keyword.
    :~/test$ grep -n 'go\{,10\}g'  regular_express.txt   # \{,10\},10个及以下个o
    18:google is the best tools for search keyword.
    19:goooooogle yes!

    2.基础正则表达式字符总结

     ^word: word在行首

     word$: word在行尾

    . :一定有一个任意字符

    \ :转义

    * : 0~无穷多个前一字符

    [list] :在list中的1个字符

    [n1-n2] :在字符范围内的1个字符

    [^lish] :反向选取

    \{n1,n2\} : 连续n1到n2个前一字符

    3.sed命令

    管道命令,可以进行数据替换、删除、新增、选取特定行等。

    4.awk命令

    按字段处理

    5.扩展的正则表达式

    +      重复1个或1个以上前一个字符

    ?     0个或1个前一个字符

    |       或  'glad|good'

    ()    分组 g(la|oo)d

    ()+   1个或多个重复组

  • 相关阅读:
    TCP/IP、Http、Socket的区别
    MQTT协议运用总结
    求递归算法时间复杂度:递归树
    大数乘法的几种算法分析及比较(2014腾讯南京笔试题)
    3.9重建二叉树(各种方案的分析比较及扩展问题的分析)
    3.10分层遍历二叉树-扩展问题
    青春何其美好,未来的日子里希望有你
    补充招银面经 19日面的,今天28日(昨晚发的offer)
    千里送人头---厦门美团一面挂
    滴滴一面挂
  • 原文地址:https://www.cnblogs.com/liuwanpeng/p/6226656.html
Copyright © 2011-2022 走看看