zoukankan      html  css  js  c++  java
  • 《Rubu基础教程第五版》第十六章笔记 正则表达式类

    正则表达式类

    我们可以通过三个方式创建正则表达式的实例

    /R..y/    Regexp.new("R..y")    %r(R..y)

    irb(main):001:0> /R..y/
    => /R..y/
    irb(main):002:0> /R..y/ =~ "Ruby"
    => 0
    irb(main):003:0> Regexp.new("R..y") =~ "Ruby"
    => 0
    irb(main):004:0> %r(R..y) =~ "Ruby"
    => 0
    irb(main):005:0> %r(R..y) == Regexp.new("R..y")
    => true
    

    表达式的模式与匹配

    用=~来匹配,如果匹配返回该字符串起始字符的位置,Ruby中nil与false为假  !~可以来匹配不匹配,我想这个应该用的比较少

    匹配普通字符

    当模式中只写有英文、数字时,正则表达式会单纯地根据目标字符串忠是否包含该模式忠的字符来判断是否匹配。

    irb(main):007:0> /ABC/ =~ "123Abc"
    => nil
    irb(main):008:0> /ABC/ =~ "123ABC"
    => 3
    irb(main):009:0> /ABC/i =~ "123Abc"
    => 3
    irb(main):010:0> 
    

    ^$这种表示特殊意义,不匹配字符的称为元字符

    irb(main):007:0> /ABC/ =~ "123Abc"
    => nil
    irb(main):008:0> /ABC/ =~ "123ABC"
    => 3
    irb(main):009:0> /ABC/i =~ "123Abc"
    => 3
    irb(main):010:0> /^123$/ =~ "123bbb
    123ccc"
    => nil
    irb(main):011:0> /^123/ =~ "123bbb
    123ccc"
    => 0
    irb(main):012:0> /^123/ =~ "12bbb
    123ccc"
    => 6
    irb(main):013:0> /^123$/ =~ "123"
    => 0
    irb(main):014:0> /bb$/ =~ "12bbb
    123ccc"
    => 3
    irb(main):015:0> /$/ =~ "12bbb
    123ccc"
    => 5
    irb(main):016:0> /c$/ =~ "12bbb
    123ccc"
    => 11
    irb(main):017:0> 
    

    ^,$分别匹配"行首" "行尾",而不是字符串的开头与结尾,匹配开头与结尾用A与z

    irb(main):017:0> p "a
    ".gsub(/z/, "!")
    "a
    !"
    => "a
    !"
    irb(main):018:0> p "a
    ".gsub(//, "!")
    "a!
    !"
    => "a!
    !"
    irb(main):019:0> p "a".gsub(//, "!")
    "a!"
    => "a!"
    irb(main):020:0> p "a".gsub(/z/, "!")
    "a!"
    => "a!"
    irb(main):021:0> 
    

     大写的,会匹配两个地方,一个是 前,一个是 的最后。

    指定匹配字符的范围

    用[]来, [abc]表示abc中任意字符,[a-z][0-9]表示全部小写字母,0-9的全部数字,想匹配-这个字符,可以写在[-a-z]前面或者后面

    irb(main):024:0> /[a-z-]/ =~ "-"
    => 0
    irb(main):025:0> /[-a-z]/ =~ "-"
    => 0
    irb(main):026:0> /[-a-z]/ =~ "ha"
    => 0
    irb(main):027:0> /[-a-z]/ =~ "123ha"
    => 3
    irb(main):028:0> 
    

     [^a-z]就是选除了a-z以外的数字

    irb(main):028:0> /[^A-Z][A-Z]/ =~ "1A2b3c"
    => 0
    irb(main):029:0> /[^0-9][^A-z]/ =~ "1A2b3c4D"
    => 1
    irb(main):030:0> 
    

    匹配任意字符

    一个.匹配任意字符

    irb(main):032:0> /^...$/ =~ "123"
    => 0
    irb(main):033:0> /^...$/ =~ "1234"
    => nil
    irb(main):034:0> /^...$/ =~ "12"
    => nil
    irb(main):035:0> 
    

     这个就是匹配指定长度的字符

    使用反斜杠的模式

    s 表示空白符,匹配空格,制表符,换页符

    irb(main):035:0> /abcs123/ =~ "abc	1"
    => nil
    irb(main):036:0> /abcs123/ =~ "abc	1234"
    => 0
    irb(main):037:0> /abcs123/ =~ "abc 1234"
    => 0
    irb(main):038:0> /abcs123/ =~ "abc1234"
    => nil
    irb(main):039:0> 
    

    d匹配0-9的数字效果跟[0-9]一样

    irb(main):039:0> /dd-dd/ =~ "12-456"
    => 0
    irb(main):040:0> /dd-dd/ =~ "aa12-456"
    => 2
    irb(main):041:0> /dd-dd/ =~ "aa1a2-456"
    => nil
    irb(main):042:0> 
    

    w匹配英文字母与数字 = [a-zA-Z0-9]

    irb(main):042:0> /www/ =~ "23dd"
    => 0
    irb(main):043:0> /www/ =~ "
    23dd"
    => 1
    irb(main):044:0> /www/ =~ "
      23dd"
    => 3
    irb(main):045:0> /www/ =~ "
      23  dd"
    => nil
    irb(main):046:0> 
    

    A z一个匹配字符串的头,一个匹配字符串的尾

    irb(main):046:0> /AABC/ =~ "ABC"
    => 0
    irb(main):047:0> /AABC/ =~ "ABCdf"
    => 0
    irb(main):048:0> /AABC/ =~ "123
    ABCdf"
    => nil
    irb(main):049:0> 
    

     z

    irb(main):049:0> /ABCz/ =~"ABC"
    => 0
    irb(main):050:0> /ABCz/ =~"123ABC"
    => 3
    irb(main):051:0> /ABCz/ =~"123ABC
    "
    => nil
    irb(main):052:0> /ABC/ =~"123ABC
    "
    => 3
    irb(main):053:0> /ABCz/ =~"123
    ABC"
    => 4
    irb(main):054:0> /ABCz/ =~"123ABC
    AB"
    => nil
    irb(main):055:0> /ABCz/i =~"123ABC
    AB"
    => nil
    irb(main):056:0> 
    

    当要匹配一些特殊符号的如^$[]可以用进行转义

    irb(main):056:0> /]/ =~"[]"
    => 1
    irb(main):057:0> /[]^]/ =~"[]^"
    => 1
    irb(main):058:0> /[]^]/ =~"[12^"
    => 3
    irb(main):059:0> 
    

    重复

    * 重复0到无穷多 + 重复1到无穷多 ? 重复0到1次 {n} 重复n次 {n,m}重复n到m次 {n,} 最少重复n次 {,n}最多重复n次

    irb(main):059:0> /a{2}/ =~ "aaa"
    => 0
    irb(main):060:0> /a{2}/ =~ "312aaa"
    => 3
    irb(main):061:0> /a{2}/ =~ "312a"
    => nil
    irb(main):062:0> /a{2}/ =~ "312abaa"
    => 5
    irb(main):063:0> 
    

    最短匹配,默认是贪婪匹配,通过*?或者+?变成最小匹配或者懒惰匹配

    可以通过()选定范围来进行重复多个字符的匹配

    /(abc){2,}/

    /(abc)?/

    选择 使用(|)小括号里面一个|

    irb(main):068:0> /(123|321|23)/ =~ "312abaa"
    => nil
    irb(main):069:0> /(123|321|23|ba)/ =~ "312abaa"
    => 4
    irb(main):070:0> /(123|321|23|ba)?/ =~ "312abaa"
    => 0
    irb(main):071:0> /(123|321|23|ba)+/ =~ "312abaa"
    => 4
    irb(main):072:0> /(123|321|23|ba)+/ =~ "312abaa"
    

    使用quote的正式表达式

    当希望转义表达式中的所有元字符,可以使用quote

    irb(main):075:0> re1 = Regexp.new("abc*def")
    => /abc*def/
    irb(main):076:0> re2 = Regexp.new(Regexp.quote("abc*def"))
    => /abc*def/
    irb(main):077:0> re1 =~ "abc*def"
    => nil
    irb(main):078:0> re2 =~ "abc*def"
    => 0
    irb(main):079:0> 
    

    正则表达式的选择//i表示忽略大小写,//m表示.可以匹配换行符

    捕获

    所谓捕获,就是从正则表达式的匹配部分中提取其中的某部分。通过$1 $2这样的形式的标量,获取捕获的部分字符串

    irb(main):079:0> /(.)(.)(.)/ =~ "abcd"
    => 0
    irb(main):080:0> p $1
    "a"
    => "a"
    irb(main):081:0> p $2
    "b"
    => "b"
    irb(main):082:0> p $3
    "c"
    => "c"
    irb(main):083:0> p $4
    nil
    => nil
    

    使用(?: )过滤不需要捕获的模式

    >> /(.)(dd)+(.)/ =~ "123456"
    => 0
    >> $1
    => "1"
    >> $2
    => "45"
    >> $3
    => "6"
    >> /(.)(?:dd)+(.)/ =~ "123456"
    => 0
    >> $1
    => "1"
    >> $2
    => "6"
    >> $3
    => nil
    >> 
    

    除了$数字,还有通过$`,$&,$'分别代码匹配字符串的前面,

    >> /C./ =~ "ABCDEF"
    => 2
    >> $`
    => "AB"
    >> $&
    => "CD"
    >> $'
    => "EF"
    >> 
    

    使用$~可以获取所有的匹配结果

    >> /(C.)/ =~ "ABCDEF"
    => 2
    >> $~
    => #<MatchData "CD" 1:"CD">
    >> $~[1]
    => "CD"
    >> 
    

    使用正则表达式的方法

    sub替换一次,gsub全部替换

    >> str = "abc   def g   hi"
    => "abc   def g   hi"
    >> str.sub(/s+/, ' ')
    => "abc def g   hi"
    >> str
    => "abc   def g   hi"
    >> str.gsub(/s+/, ' ')
    => "abc def g hi"
    >> str
    => "abc   def g   hi"
    >> 
    

    sub与gsub还可以使用块

    str = "abracatabra"

    irb(main):001:0> str = "abracatabra"
    => "abracatabra"
    irb(main):002:0> nstr = str.sub(/.a/) do |matched|
    irb(main):003:1*   '<'+matched.upcase+'>'
    irb(main):004:1> end
    => "ab<RA>catabra"
    irb(main):007:0> nstr
    => "ab<RA>catabra"
    irb(main):008:0> nstr = str.gsub(/.a/) do |matched|
    irb(main):009:1*   '<'+matched.upcase+'>'
    irb(main):010:1> end
    => "ab<RA><CA><TA>b<RA>"
    irb(main):011:0> 
    

     双单引号,需要修改的变量,添加+号

    也可以通过sub!,gsub!修改本身

    scan方法

    获取匹配的字符,返回arry

    "ra"
    "ca"
    "ta"
    "ra"
    shijianzhongdeMacBook-Pro:chapter_16 shijianzhong$ cat scan1.rb 
    "abracatabra".scan(/.a/) do |matched|  
      p matched
    end
    

    在表达式中用()

    ["r", "a"]
    ["c", "a"]
    ["t", "a"]
    ["r", "a"]
    shijianzhongdeMacBook-Pro:chapter_16 shijianzhong$ cat scan2.rb 
    "abracatabra".scan(/(.)(a)/) do |matched|
      p matched
    end
    

    正则中的()通过块中的多个变量接收

    "r-a"
    "c-a"
    "t-a"
    "r-a"
    [["r", "a"], ["c", "a"], ["t", "a"], ["r", "a"]]
    shijianzhongdeMacBook-Pro:chapter_16 shijianzhong$ cat scan3.rb 
    "abracatabra".scan(/(.)(a)/) do |a, b|
      p a+"-"+b
    end
    
    p "abracatabra".scan(/(.)(a)/)
    shijianzhongdeMacBook-Pro:chapter_16 shijianzhong$ 
    

    正则表达式的例子

    匹配网址

    server address: www.ruby-lang.org
    shijianzhongdeMacBook-Pro:chapter_16 shijianzhong$ cat url_match.rb
    str = "http://www.ruby-lang.org/ja/"
    %r|http://([^/]*)| =~str
    print "server address: ", $1, "
    "
    shijianzhongdeMacBook-Pro:chapter_16 shijianzhong$ 
    

    练习题

    1将电子邮箱的账号名保存在$1,域名保存在$2

    email = "china@163.com"
    
    re = %r|(w+)@([a-zA-z0-9.]*)|
    re =~ email
    p $1
    p $2
    

    2利用gsub方法,将字符串"正则表达式真难啊,怎么这么难懂" 替换为"正则表达式真简单啊,怎么这么易懂"

    "正则表达式真简单啊,怎么这么易懂!"
    shijianzhongdeMacBook-Pro:exercises shijianzhong$ cat e2.rb 
    str = "正则表达式真难啊,怎么这么难懂!"
    
    str = str.gsub(/真难/, "真简单")
    str = str.gsub(/难懂/, "易懂")
    
    p str
    shijianzhongdeMacBook-Pro:exercises shijianzhong$ 
    

    3定义方法word_capitalize,当被指定的参数为连字符(hyphen)连接的英文字符串时,都被连字符分割的部分做capitalize化处理(即单词的首字母大写,其余小写)

    def word_capitalize(string)
      string.gsub(/w+/) do |matched|
        ''+matched.capitalize+''
      end
    end
    
    p word_capitalize("in-reply-to")
    p word_capitalize("X-MAILER")
    
  • 相关阅读:
    python动态网页爬取——四六级成绩批量爬取
    python&MongoDB爬取图书馆借阅记录(没有验证码)
    【Linux】CentOS 7安装与使用,安装jdk1.8,安装mysql
    JavaWeb项目:旅游网站【涉及各种知识】
    【SpringMVC】使用三层架构实现登录,注册。(下篇)
    【SpringMVC】使用三层架构实现登录,注册。(上篇)
    【JSP】el、jstl、MVC、三层架构
    【Tomcat】JSP使用Session、Cookie实现购物车
    HttpServletRequest对象,请求行、请求头、请求体
    【Spring】JdbcTemplate的使用,查询,增、删、改
  • 原文地址:https://www.cnblogs.com/sidianok/p/13066687.html
Copyright © 2011-2022 走看看