zoukankan      html  css  js  c++  java
  • Jul_31 PYTHON REGULAR EXPRESSIONS

    1.Special Symbols and Characters

    1.1 single regex 1

    .  ,Match any character(except )

    ^  ,Match start of string

    $  ,Match end of string

    *  ,Match 0 or more occurrences preceding regex

    +  ,Match 1 or more occurrences preceding regex

    ?  ,Match 0 or 1 occurrence preceding regex

    {N}  ,Match N occurrences preceding regex

    {M,N}  ,Match from M to N occurrences preceding regex

    [...]  ,Match any single character from character class

    [..x-y..]  ,Match any single character in the range from x to y ;["-a],In an ASCII system,all characters that fall between '"' and "a",that is ,between ordinals 34 and 97。

    [^...]  ,Do not match any character from character class ,including any ranges ,if present

    (*|+|?|{})?  ,Apply "non-greedy" versiongs of above occurrence/repetition symbols;默认情况下* + ? {}都是贪婪模式,在其后加上'?'就成了非贪婪模式。

    (...)  ,Match enclosed regex and save as subgroup .

    1.2 single regex 2

    d  ,Match any decimal digit ,same as [0-9](D is inverse of d:do not match any numeric digit)

    w  ,Match any alphanumeric character,same as [A-Za-z0-9](W is inverse of w)

    s  ,Match any whitespace character,same as [ vf](S is inverse of s)

      ,Match any word boundary(B is inverse of )

    N  ,Match saved subgroup N(see (...) above) ;exam:print(1,3,16)

     c  ,transferred meaning ,without its special meaning;exam:.,\,*

    A()  ,Match start (end) fo string (also see ^ and $ above) 

    1.3 complex regex

    (?=...)  ,前向肯定断言。如果当前包含的正则表达式(这里以 ... 表示)在当前位置成功匹配,则代表成功,否则失败。一旦该部分正则表达式被匹配引擎尝试过,就不会继续进行匹配了;剩下的模式在此断言开始的地方继续尝试。举例:love(?=FishC) 只匹配后边紧跟着 FishC的字符串 love。

    (?!...)  ,前向否定断言。这跟前向肯定断言相反(不匹配则表示成功,匹配表示失败)。举例:FishC(?!.com)只匹配后边不是 .com& 的字符串 Fish。

    (?<=...)  ,后向肯定断言。跟前向肯定断言一样,只是方向相反。举例:(?<=love)FishC 只匹配前边紧跟着 love 的字符串 FishC。

    (?<!...)  ,后向否定断言。跟前向否定断言一样,只是方向相反。举例:(?<!FishC).com 只匹配前边不是 FishC的字符串 .com。

    (?:)  ,该子组匹配的字符串无法从后面获取。

    (?(id/name)yes-pattern|no-pattern)  ,1. 如果子组的序号或名字存在的话,则尝试 yes-pattern 匹配模式;否则尝试 no-pattern 匹配模式;

                     2. no-pattern 是可选的

                        举例:(<)?(w+@w+(?:.w+)+)(?(1)>|$) 是一个匹配邮件格式的正则表达式,可以匹配 <user@fishc.com>; 和 'user@fishc.com',但是不会匹配 '<user@fishc.com' 或 'user@fishc.com>'

    1.4 匹配邮箱地址举例

    import re

    data = 'z843248880@163.com'
    data1 = '<z843248880@163.com>'
    data2 = '<z843248880@163.com'
    data3 = 'z843248880@163.com>'
    p1 = '(<)?(w+@w+(?:.w+)+)(?(1)>|$)'
    p2 = 'w+@w+.w+'
    p3 = '(<)?w+@w+.w+(?(1)>|$)'
    m1 = re.match(p3, data3)
    print(m1.group())
    PS:p1里的(?:.w+)代表这里的".w+"匹配的字符串不会被后面获取;p1里的"(?(1)>|$)"表示,如果前面有“<",则此处匹配">",如果前面没有"<",则此处匹配结束符”$“,"(1)"代表的前面的第一个括号里的字符串,也就是"(<)";p1和p3的作用一样;p2不能排除仅有"<"或仅有">"的情况。

    1.5 The re Modules:Core Functons and Methods

    match(pattern,string,flags=0)  ,Attempt to match pattern to string with optional flags;return match object on success,None on failure;it is start of the string to match.

    search(pattern,string,flags=0)  ,Search for first occurrence of pattern within string with optional flags;return match object on success,None on failure;it is start of the string to                 match.

    findall(pattern,string[,flags=0])  ,Look for all occurrences of pattern in string;return a list of matches.

    finditer(pattern,string[,flags=0])  ,Same as findall(),except returns an iterator instead of a list;for each match,the iterator returns a match object.

    split(pattern,string,max=0)  ,Split string into a list according to regex pattern delimiter and return list of successful matches,aplitting at most max times(split all occurrences is the               default)

    1.6 the usage of "?i" and "?m"

    >>> import re
    >>> re.findall(r'(?i)yes','yes Yes YES')
    ['yes', 'Yes', 'YES']
    >>> re.findall(r'(?i)thw+','The quickest way is through to this tunnel.')
    ['The', 'through', 'this']
    >>> re.findall(r'(?im)(^th[w ]+)',''')
    ... this line is the first,
    ... another line,
    ... that line,it's the best.
    ... ''')
    ['this line is the first', 'that line']
    >>> re.findall(r'(?i)(^th[w ]+)','''
    ... this line is the first,
    ... another line,
    ... that line ,it's the best.
    ... ''')
    []
    >>> re.findall(r'(?i)(^th[w ,]+)','''
    ... this line is th,
    ... anonjkl line,
    ... that line,it the best.
    ... ''')
    []

    By using "multiline" we can perform the search across multiple lines of the target string rather than treating the entire string as a single entity.

    1.7 the usage of spilt

    re.split(r'ss+',eachline)  ,at least two whitespace.

    re.split(r'ss+| ',eachline.rstrip())  ,at least two whitespace or one tablekey;rstrip(),delete the ' '.

    1.8 one example

    from random import randrange,choice
    from string import ascii_lowercase as lc
    from sys import maxsize
    from time import ctime

    tlds = ('com','org','net','gov','edu')


    for i in range(randrange(5,11)):
      dtint= randrange(1469880872)
      dstr = ctime(dtint)
      llen = randrange(4,8)
      login = ''.join(choice(lc) for j in range(llen))
      dlen = randrange(llen,13)
      dom = ''.join(choice(lc) for j in range(dlen))
      print('%s::%s@%s.%s::%d-%d-%d' % (dstr,login,dom,choice(tlds),dtint,llen,dlen))

    result:

    Sat Nov 7 01:09:06 1998::hbtua@yzhnjyjanwuq.gov::910372146-5-12
    Sat Oct 17 09:27:56 2015::djbljsf@uidicjppd.gov::1445045276-7-9
    Sun Nov 18 06:10:07 1979::fkobvlf@zlnlyjej.org::311724607-7-8
    Wed Jul 23 17:23:03 1986::hovwgi@wiidgvnng.net::522490983-6-9
    Tue Feb 24 02:15:27 1998::xnuab@sgahgahv.gov::888257727-5-8
    Thu Jun 1 14:20:55 1989::rdwqhu@xzazufffut.net::612681655-6-10
    Mon Mar 6 14:36:59 1978::qabkezi@sehnxqcuxexf.net::258014219-7-12
    Sun Apr 11 15:01:56 1982::agzp@sygikhagdasq.gov::387356516-4-12

    1.9 Matching a string

    import re
    data = 'Wed Jul 22 08:42:15 2015::qaolc@ombddhysxuv.com::1437525736-347-28'
    #pat_old = '^Mon|^Tue|^Wed|^Thu|^Fri|^Sta|^Sun'
    pat = '^(Mon|Tue|Wed|Thu|Fri|Sta|Sun)'
    m = re.match(pat, data)
    print(type(m))
    print(m.group(0))

    pat2 = '^(w{3})'
    m2 = re.match(pat2, data)
    print(type(m2))
    print(m2.group(1))

    pa3 = '.+(d+-d+-d+)'
    m3 = re.search(pa3, data)
    print(type(m3))
    print(m3.group())
    m4 = re.match(pa3, data)
    print(m4.group(1))

    pa4 = '.+?(d+-d+-d+)'
    m5 = re.match(pa4, data)
    print(m5.group(1))

    pa5 = '.+::(d+-d+-d+)'
    m6 = re.match(pa5, data)
    print(m6.group(1))

    result:

    <class '_sre.SRE_Match' at 0x89df00>
    Wed
    <class '_sre.SRE_Match' at 0x89df00>
    Wed
    <class '_sre.SRE_Match' at 0x89df00>
    Wed Jul 22 08:42:15 2015::qaolc@ombddhysxuv.com::1437525736-347-28
    6-347-28                                  //greedy

    1437525736-347-28                               //because the '?' behind of '.+',so none-greedy;(see above in 1.1)

    1437525736-347-28

    1.10 greedy and no-greedy

    '.+' is greedy; '.+?' is not greedy.

     2.用正则取出字母和数字,并将字母和数字分别输出

    import re
    str = 'python1班'
    
    print(re.search(r'(w+)(d)', str).group(0))  #取全部匹配的
    print(re.search(r'(w+)(d)', str).group(1))  #取第一个括号匹配的
    print(re.search(r'(w+)(d)', str).group(2))  #取第二个括号匹配的
    
    结果:
    python1
    python
    1
  • 相关阅读:
    Vim使用
    软件测试基础知识
    数字成像系统
    linux操作系统运行学习总结
    python算法学习总结
    Django Rest Framework框架
    mysql学习笔记一
    学习方法
    算法模板汇总
    习题练习1
  • 原文地址:https://www.cnblogs.com/fuckily/p/python_regular_expresssions.html
Copyright © 2011-2022 走看看