zoukankan      html  css  js  c++  java
  • 第12章 正则表达式与文件格式化处理

    基础正则表达式

    语系对正则表达式的影响

    不同语系下,字符的编码数据可能不同。

    LANG=C:012……ABC……abc……

    LANG=zh_CN:012……aAbB……

    因此,使用[A-Z]时,搜索到的字符也不一样。

    特殊符号 代表意义
    [:alnum:] 大小写字符及数字,0-9,A-Z,a-z
    [:alpha:] 英文大小写字符
    [:blank:] 空格键与tab键
    [:cntrl:] 控制按键,CR,LF,TAB,DEL等
    [:digit:] 代表数字
    [:graph:] 除空格符(空格和Tab)外其他按键
    [:lower:] 小写字符
    [:print:] 可以被打印出来的字符
    [:punct:] 标点字符," ' ? ; : # $
    [:upper:] 大写字符
    [:space:] 任何会产生空白的字符
    [:xdigit:] 十六进制数字

    grep的一些高级参数

    除了上一章介绍的基本用法,grep还有一些高级用法。

    grep [-A] [-B] [--color=auto} '搜寻字符串‘ filename

    参数:

    -A:后面可加数字n,为after的意思,除了列出该列,后面的n列也列出来

    -B:后面可加数字n,为after的意思,除了列出该列,前面的n列也列出来

    --color=auto:对正确选取的数据着色

    //-n用于显示行号
    [root@localhost 桌面]# dmesg | grep -n --color=auto 'eth'
    1730:[   10.210383] e1000 0000:02:01.0 eth0: (PCI:66MHz:32-bit) 00:0c:29:7f:dd:91
    1731:[   10.210404] e1000 0000:02:01.0 eth0: Intel(R) PRO/1000 Network Connection

    注:grep搜索到字符串后都是以整行为单位显示。

     

    基础正则表达式练习

    以下是练习文本

    [root@localhost 桌面]# cat regular_express.txt
    "Open Source" is a good mechanism to develop programs.
    apple is my favorite food.
    Football game is not use feet only.
    this dress doesn't fit me.
    However, this dress is about $ 3183 dollars.
    GNU is free air not free beer.
    Her hair is very beauty.
    I can't finish the test.
    Oh! The soup taste good.
    motorcycle is cheap than car.
    This window is clear.
    the symbol '*' is represented as start.
    Oh!    My god!
    The gd software is a library for drafting programs.
    You are the best is mean you are the no. 1.
    The world <Happy> is the same with "glad".
    I like dog.
    google is the best tools for search keyword.
    goooooogle yes!
    go! go! Let's go.
    # I am VBird
    
    [root@localhost 桌面]# 

    例题一:查找特定字符串

    //查找含有the的行
    [root@localhost 桌面]# grep -n 'the' regular_express.txt
    8:I can't finish the test.
    12:the symbol '*' is represented as start.
    15:You are the best is mean you are the no. 1.
    16:The world <Happy> is the same with "glad".
    18:google is the best tools for search keyword.
    
    //查找不含有the的行
    [root@localhost 桌面]# grep -vn 'the' regular_express.txt
    1:"Open Source" is a good mechanism to develop programs.
    2:apple is my favorite food.
    3:Football game is not use feet only.
    4:this dress doesn't fit me.
    5:However, this dress is about $ 3183 dollars.
    6:GNU is free air not free beer.
    7:Her hair is very beauty.
    9:Oh! The soup taste good.
    10:motorcycle is cheap than car.
    11:This window is clear.
    13:Oh!    My god!
    14:The gd software is a library for drafting programs.
    17:I like dog.
    19:goooooogle yes!
    20:go! go! Let's go.
    21:# I am VBird
    22:
    [root@localhost 桌面]# 

    例题二:利用中括号[]来查找集合字符

    //查找tast或test字符串
    [root@localhost 桌面]# grep -n 't[ae]st' regular_express.txt
    8:I can't finish the test.
    9:Oh! The soup taste good.
    
    //查找不是以g开头的oo字符串
    [root@localhost 桌面]# grep -n '[^g]oo' regular_express.txt
    2:apple is my favorite food.
    3:Football game is not use feet only.
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    
    //查找数字
    [root@localhost 桌面]# grep -n '[0-9]' regular_express.txt
    5:However, this dress is about $ 3183 dollars.
    15:You are the best is mean you are the no. 1.
    
    查找不是以小写字母开头的oo字符串
    [root@localhost 桌面]# grep -n '[^[:lower:]]oo' regular_express.txt
    3:Football game is not use feet only.
    [root@localhost 桌面]# 

    例题三:行首与行尾字符^$

    //以the开头的行
    [root@localhost 桌面]# grep -n '^the' regular_express.txt
    12:the symbol '*' is represented as start.
    
    //以小写字母开头的行
    [root@localhost 桌面]# grep -n '^[a-z]' regular_express.txt
    2:apple is my favorite food.
    4:this dress doesn't fit me.
    10:motorcycle is cheap than car.
    12:the symbol '*' is represented as start.
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    20:go! go! Let's go.
    
    //以小数点结尾的(需要转义)
    [root@localhost 桌面]# grep -n '.$' regular_express.txt
    1:"Open Source" is a good mechanism to develop programs.
    2:apple is my favorite food.
    3:Football game is not use feet only.
    4:this dress doesn't fit me.
    10:motorcycle is cheap than car.
    11:This window is clear.
    12:the symbol '*' is represented as start.
    15:You are the best is mean you are the no. 1.
    16:The world <Happy> is the same with "glad".
    17:I like dog.
    18:google is the best tools for search keyword.
    20:go! go! Let's go.
    
    //查找空白行
    [root@localhost 桌面]# grep -n '^$' regular_express.txt
    22:
    [root@localhost 桌面]# 

    例题四:任意字符.和重复字符*

    .(小数点):代表一定有一个任意字符的意思

    *:代表重复前一个0到无穷的意思

    //查找以g开头,d结尾,中间两个字符的字符
    [root@localhost 桌面]# grep -n 'g..d' regular_express.txt
    1:"Open Source" is a good mechanism to develop programs.
    9:Oh! The soup taste good.
    16:The world <Happy> is the same with "glad".
    
    //查找至少含有两个o,后面跟0到无穷个o的字符
    [root@localhost 桌面]# grep -n 'ooo*' regular_express.txt
    1:"Open Source" is a good mechanism to develop programs.
    2:apple is my favorite food.
    3:Football game is not use feet only.
    9:Oh! The soup taste good.
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    [root@localhost 桌面]# 

    例题五:限定连续RE字符范围{}

    {}必须转义

    //查找o重复两次的字符
    [root@localhost 桌面]# grep -n 'o{2}' regular_express.txt
    1:"Open Source" is a good mechanism to develop programs.
    2:apple is my favorite food.
    3:Football game is not use feet only.
    9:Oh! The soup taste good.
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    
    //查找o重复2到5次的字符
    [root@localhost 桌面]# grep -n 'o{2,5}' regular_express.txt
    1:"Open Source" is a good mechanism to develop programs.
    2:apple is my favorite food.
    3:Football game is not use feet only.
    9:Oh! The soup taste good.
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    
    //查找o重复两次以上的
    [root@localhost 桌面]# grep -n 'go{2,}g' regular_express.txt
    18:google is the best tools for search keyword.
    19:goooooogle yes!
    [root@localhost 桌面]# 

     基础正则表达式字符

    经过上节的五个例题,可将基础的正则表达式总结如下:

    RE字符 意义
    ^word 带查找的字符串在行首
    word$ 待查找的字符串在行尾
    . 代表一定有一个任意字符的字符
    转义字符
    * 重复零到无穷多个前一个字符
    [list] 从字符集合的RE字符里找到想要选取的字符
    [n1-n2] 从字符集合的RE字符里找到想要选取的字符范围
    [^list]
    从字符集合的RE字符里找到不想要选取的字符范围
    {n,m} 前一个字符重复n到m次

     

    sed工具

     sed本身也是管道命令,不仅可以分析标准输出数据,还可以将数据进行替换、删除、新增和选取特定行等功能。

    sed [-nefr] [动作]

    参数:

    -n:安静模式。默认情况下,所有来自STDIN的数据都会列在屏幕上,加上-n后,只有经过sed指令特殊处理的那一行才会显示出来

    -e:直接在命令行模式上进行sed动作编辑

    -f:直接将sed的动作写在一个文件内,-f filename则可以执行filename内的sed动作

    -r:sed动作支持扩展性正则表达式(默认是基础型正则表达式)

    -i:直接修改读取的文件内容,而不是屏幕输出

    动作说明:[n1[,n2]] function

    n1,n2不一定存在,一般代表选择动作的行数。

    function有以下参数:

    a:新增,a后面可以接字符串,而这些字符串会在新的一行出现(目前的下一行)

    c:替换,c的后面可以接字符串,可以替换n1-n2行之间的行

    d:删除

    i:插入,后面可以接字符串,而这些字符串会在新的一行出现(目前的上一行)

    p:打印

    s:替换,通常搭配正则表达式

    //原始文本
    [root@localhost 桌面]# cat -n test.txt
         1    this a test text!
         2    i like linux !
         3    today is monday!
         4    my name is fw.
         5    
    
    //删除2-3行
    [root@localhost 桌面]# cat -n test.txt | sed '2,3d'
         1    this a test text!
         4    my name is fw.
         5    
    
    //删除第3行及后面的
    [root@localhost 桌面]# cat -n test.txt | sed '3,$d'
         1    this a test text!
         2    i like linux !
    
    //新增(在后面)
    [root@localhost 桌面]# cat -n test.txt | sed '2a this line is new'
         1    this a test text!
         2    i like linux !
    this line is new
         3    today is monday!
         4    my name is fw.
         5    
    
    ////插入(在前面)
    [root@localhost 桌面]# cat -n test.txt | sed '2i this line is new'
         1    this a test text!
    this line is new
         2    i like linux !
         3    today is monday!
         4    my name is fw.
         5    
    
    //替换
    [root@localhost 桌面]# cat -n test.txt | sed '2c this line is new'
         1    this a test text!
    this line is new
         3    today is monday!
         4    my name is fw.
         5    
    
    //显示2-4行
    [root@localhost 桌面]# cat -n test.txt | sed -n '2,4p'
         2    i like linux !
         3    today is monday!
         4    my name is fw.

    查找并替换:sed ‘s/要替换的字符串/新的字符串/g’ 

    查找字符串可以使用正则表达式

    //查看原文本
    [root@localhost 桌面]# cat -n test.txt
         1    this a test text!
         2    i like linux !
         3    today is monday!
         4    my name is fw.
         5    
    
    //将this替换成that
    [root@localhost 桌面]# cat -n test.txt | sed 's/this/that/g'
         1    that a test text!
         2    i like linux !
         3    today is monday!
         4    my name is fw.
         5    
    
    //将结尾的!替换成小数点.
    [root@localhost 桌面]# cat -n test.txt | sed 's/!$/./g'
         1    this a test text.
         2    i like linux .
         3    today is monday.
         4    my name is fw.
         5    
    
    //将开头的this删除
    [root@localhost 桌面]# cat -n test.txt | sed 's/^.*this//g'
     a test text!
         2    i like linux !
         3    today is monday!
         4    my name is fw.
         5    
    [root@localhost 桌面]# 

    直接修改文件内容:

    -i参数

    //查看原文件
    [root@localhost 桌面]# cat test.txt
    this a test text!
    i like linux !
    today is monday!
    my name is fw.
    
    //将this替换成that,写入原文件
    [root@localhost 桌面]# sed -i 's/this/that/g' test.txt
    
    //查看原文件
    [root@localhost 桌面]# cat test.txt
    that a test text!
    i like linux !
    today is monday!
    my name is fw.

    扩展正则表达式

    该部分暂时略过。

     

    文件的格式化与相关处理

    格式化打印:printf

      printf '打印格式' 实际内容

    参数:

    几个格式方面的特殊样式:

    a:警告声音输出

    :退格键

    f:清除屏幕

    :输出新的一行

    :Enter按键

    :水平Tab按键

    v:垂直Tab按键

    xNN:NN为两位数的数字,可以转换数字为字符

    c程序语言内常见变量格式:

    %ns:n是数字,s代表string,即多少个字符

    %ni:n是数字,i代表integer,即多少个整数字数

    %N.nf:n和N都是数字,f代表float

    //查看原文本
    [root@localhost 桌面]# cat test.txt
    Name    Chinese    English    Math    Average
    Tom    80    60    92    77.33
    Sherry    75    55    80    70.00
    John    60    90    70    73.33
    
    
    [root@localhost 桌面]# printf '%s	 %s	 %s	 %s	 %s	 
    ' $(cat test.txt)
    Name     Chinese     English     Math     Average     
    Tom     80     60     92     77.33     
    Sherry     75     55     80     70.00     
    John     60     90     70     73.33     
    
    [root@localhost 桌面]# printf '%10s %5i %5i %5i %8.3f 
    ' $(cat test.txt)
    bash: printf: Chinese: 无效数字
    bash: printf: English: 无效数字
    bash: printf: Math: 无效数字
    bash: printf: Average: 无效数字
          Name     0     0     0    0.000 
           Tom    80    60    92   77.330 
        Sherry    75    55    80   70.000 
          John    60    90    70   73.330 
    
    //输出编码值为45的字符
    [root@localhost 桌面]# printf 'x45
    '
    E
    [root@localhost 桌面]# 

    awk:好用的数据处理工具

    awk ‘条件类型1{动作1} 条件类型2{动作2}……’ filename

    [root@localhost 桌面]# last -n 5
    root     pts/0        :0               Mon Jul 18 14:19   still logged in   
    root     :0           :0               Mon Jul 18 14:10   still logged in   
    (unknown :0           :0               Mon Jul 18 14:08 - 14:10  (00:01)    
    reboot   system boot  3.10.0-327.el7.x Mon Jul 18 14:08 - 16:00  (01:52)    
    root     pts/0        :0               Sun Jul 17 15:44 - crash  (22:23)    
    
    wtmp begins Mon Apr 25 13:36:45 2016
    
    [root@localhost 桌面]# last -n 5 | awk '{print $1 "	" $4}'
    root    Mon
    root    Mon
    (unknown    Mon
    reboot    3.10.0-327.el7.x
    root    Sun
        
    wtmp    Apr
    [root@localhost 桌面]# 

    awk指令会把每一行根据空格或者tab分割,然后将所有片段依次赋值给$1,$2,……变量。

    awk内置变量

    NF:每行字段总数

    NR:目前awk所处理的是第几行数据

    FS:目前的分割字符,默认是空格

    [root@localhost 桌面]# last -n 5 | awk '{print $1 "	 lines:" NR "	 cplumes:" NF}'
    root     lines:1     cplumes:10
    root     lines:2     cplumes:10
    (unknown     lines:3     cplumes:10
    reboot     lines:4     cplumes:11
    root     lines:5     cplumes:10
         lines:6     cplumes:0
    wtmp     lines:7     cplumes:7

    awk的逻辑运算符

    >:大于

    <:小于

    >=:大于等于

    <=:小于等于

    ==:等于

    !=:不等于

    [root@localhost 桌面]# cat /etc/passwd
    root:x:0:0:root:/root:/bin/bash
    bin:x:1:1:bin:/bin:/sbin/nologin
    daemon:x:2:2:daemon:/sbin:/sbin/nologin
    adm:x:3:4:adm:/var/adm:/sbin/nologin
    lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
    sync:x:5:0:sync:/sbin:/bin/sync
    shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
    halt:x:7:0:halt:/sbin:/sbin/halt
    mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
    operator:x:11:0:operator:/root:/sbin/nologin
    games:x:12:100:games:/usr/games:/sbin/nologin
    ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
    
    //以下一“:”作为分隔符,但第一行会失效
    [root@localhost 桌面]# cat /etc/passwd | 
    > awk '{FS=":"} $3<10 {print $1 "	" $3}'
    root:x:0:0:root:/root:/bin/bash    
    bin    1
    daemon    2
    adm    3
    lp    4
    sync    5
    shutdown    6
    halt    7
    mail    8
    
    //以下利用BEGIN预先设置变量,第一行便不会失效
    [root@localhost 桌面]# cat /etc/passwd | 
    > awk 'BEGIN {FS=":"} $3<10 {print $1 "	" $3}'
    root    0
    bin    1
    daemon    2
    adm    3
    lp    4
    sync    5
    shutdown    6
    halt    7
    mail    8
    [root@localhost 桌面]# 

    awk的计算功能

    //查看原文本
    [root@localhost 桌面]# cat pay.txt
    Name    1st    2nd    3th
    Tom    2300    3200    1200
    Sherry    3400    1200    7400
     
    //在awk中变量可以直接使用,不需要$,awk的{}动作内若有多个命令辅助时,使用“;”分隔
    [root@localhost 桌面]# cat pay.txt | 
    > awk 'NR==1{printf "%10s %10s %10s %10s %10s 
    ",$1,$2,$3,$4,"Total"}
    > NR>=2{total=$2+$3+$4;printf "%10s %10d %10d %10d %10.2f 
    ",$1,$2,$3,$4,total}'
          Name        1st        2nd        3th      Total 
           Tom       2300       3200       1200    6700.00 
        Sherry       3400       1200       7400   12000.00 

    文件比较工具

    diff

    用于相似文件的比较。

    diff [-bBi]  fileA fileB

    参数:

    -b:忽略一行中多个空格的区别

    -B:忽略空白行的区别

    -i:忽略大小写区别

    [root@localhost 桌面]# vim fileA
    [root@localhost 桌面]# cp fileA fileB
    [root@localhost 桌面]# vim fileB
    [root@localhost 桌面]# cat fileA
    this is fileA
    
    
    [root@localhost 桌面]# cat fileB
    this is fileB
    
    ok
    [root@localhost 桌面]# diff fileA fileB
    1,2c1
    < this is fileA
    < 
    ---
    > this is fileB
    3a3
    > ok
    [root@localhost 桌面]# 

    patch

    该命令与diff密不可分,加入fileA和fileB是两个不同版本的文件,想用fileB来更新fileA,则先通过diff比较两个文件的区别,并将区别文件制作成补丁文件,再由补丁文件更新旧文件。

    patch -pN < patchFile  《==更新

    patch -R -pN < patchFile     《==还原

    参数:

    -p:后面N表示取消几层目录

    -R:代表还原

    [root@localhost 桌面]# cat fileA
    this is fileA
    
    
    [root@localhost 桌面]# cat fileB
    this is fileB
    
    ok
    
    //制作补丁文件
    [root@localhost 桌面]# diff -Naur fileA fileB > file.patch
    [root@localhost 桌面]# cat file.patch
    --- fileA    2016-07-18 16:36:24.371373349 +0800
    +++ fileB    2016-07-18 16:37:31.523401652 +0800
    @@ -1,3 +1,3 @@
    -this is fileA
    -
    +this is fileB
     
    +ok
    
    //使用补丁文件更新旧文件,因为在当前目录,因此N为0
    [root@localhost 桌面]# patch -p0 < file.patch
    patching file fileA
    [root@localhost 桌面]# cat fileA
    this is fileB
    
    ok
    [root@localhost 桌面]# 

     

     

  • 相关阅读:
    JDK的命令详解
    聊天室java socket
    怎么实现利用Java搜索引擎收集网址的程序
    Hibernate实现对多个表进行关联查询
    如何学好J2ME?
    谈谈Java工程师应该具有的知识
    【经营智慧】005.眼光盯着未来
    【成功智慧】002.对任何小事都不要掉以轻心
    【经营智慧】008.要想赚钱,就得打破既有的成见
    【思维智慧】004.砸碎障碍的石头,把它当做钥匙
  • 原文地址:https://www.cnblogs.com/wuchaodzxx/p/5678709.html
Copyright © 2011-2022 走看看