zoukankan      html  css  js  c++  java
  • Python之re正则模块二

    13、编译的标志

    可以用re.I、re.M等参数,也可以直接在表达式中添加"?(iLmsux)"标志

    *s:单行,“.”匹配包括换行符在内的所有字符

    *i:忽略大小写

    *L:让"w"能匹配当地字符,貌似对中文支持不好

    *m:多行

    *x:忽略多余的空白字符,让表达式更易阅读

    *u:Unicode

    例子:

    >>> re.findall(r"[a-z]+","%123Abc%45xyz&")
    ['bc', 'xyz']
    >>> re.findall(r"[a-z]+","%123Abc%45xyz&",re.I)
    ['Abc', 'xyz']
    >>> 
    >>> re.findall(r"(?i)[a-z]+","%123Abc%45xyz&",re.I)
    ['Abc', 'xyz']

    更好的格式:

    >>> pattern=r"""
    ...     (d+) #number
    ...     ([a-z]+) #letter
    ... """
    >>> 
    >>> re.findall(pattern,"%123Abc
    %45xyz&",re.i | re.S |re.x)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'module' object has no attribute 'i'
    #由错误可见是大写
    >>> re.findall(pattern,"%123Abc %45xyz&",re.I | re.S |re.X) [('123', 'Abc'), ('45', 'xyz')] >>>

    组操作

    命名组:(?P<name>...)

    >>> for m in re.finditer(r"(?P<digit>(d+))(?P<letter>([a-z]+))","%123Abc%45xyz&",re.I):
    ...     print m.groupdict()
    ... 
    {'digit': '123', 'letter': 'Abc'}
    {'digit': '45', 'letter': 'xyz'}

    无捕获组:(?:...),作为匹配条件,但不返回:

    >>> for m in re.finditer(r"(?:(d+))(?P<letter>([a-z]+))","%123Abc%45xyz&",re.I):       
    ...     print m.groupdict()
    ... 
    {'letter': 'Abc'}
    {'letter': 'xyz'}

    反向引用:<number>或者(?P=name),引用前面的组:

    >>> for m in re.finditer(r"<(w)>w+</(1)>","%<a>123Abc</a>%<b>45xyz</b>&%"): 
    ...     print m.group()
    ... 
    <a>123Abc</a>
    <b>45xyz</b>
    >>> for m in re.finditer(r"<(?P<tag>w)>w+</(?P=tag)>","%<a>123Abc</a>%<b>45xyz</b>&%"):
    ...     print m.group()
    ... 
    <a>123Abc</a>
    <b>45xyz</b>

    正声明(?=...):组内容必须出现在右侧,不返回

    负声明(?!...):组内容不能出现在右侧,不返回

    反向正声明(?<=):组内容必须出现在左侧,不返回

    反向负声明(?<!):组内容不能出现左侧,不返回

    >>> for m in re.finditer(r"d+(?=[ab])","%123Abc%45xyz%780b&",re.I):
    ...     print m.group()
    ... 
    123
    780
    >>> for m in re.finditer(r"(?<!d)[a-z]{3,}","%123Abc%45xyz%bysc&",re.I):
    ...     print m.group()
    ... 
    bysc

    修改

    split:用pattern做分割符切割字符串。如果用“(pattern)”,那么分隔符也会返回。

    >>> re.split(r"W","abc,123,x")
    ['abc', '123', 'x']
    >>> re.split(r"(W)","abc,123,x")
    ['abc', ',', '123', ',', 'x']
    #将pattern使用括号引用起来,也返回分隔符
    split(pattern, string, maxsplit=0)
        Split the source string by the occurrences of the pattern,
        returning a list containing the resulting substrings.

    sub:替换子串,可指定替换次数:

    >>> re.split(r"(W)","abc,123,x")
    ['abc', ',', '123', ',', 'x']
    >>> re.sub(r"[a-z]+","*","abc,123,x")
    '*,123,*'
    >>> 
    >>> re.sub(r"[a-z]+","*","abc,123,x",1)
    '*,123,x'
    sub(pattern, repl, string, count=0)
        Return the string obtained by replacing the leftmost
        non-overlapping occurrences of the pattern in string by the
        replacement repl.  repl can be either a string or a callable;
        if a string, backslash escapes in it are processed.  If it is
        a callable, it's passed the match object and must return
        a replacement string to be used.

    subn()和sub()差不多,不过返回"(新字符串,替换次数)":

    >>> re.subn(r"W","*","abc,123,x")  
    ('abc*123*x', 2)

    还可以将替换字符串改成函数,以便替换成不同的结果:

    >>> def repl(m):
    ...     print m.group()
    ...     return "*" *len(m.group())
    ... 
    >>> re.subn(r"[a-z]+",repl,"abc,123,x")
    abc
    x
    ('***,123,*', 2)
    >>> 
  • 相关阅读:
    python定制类详解
    python格式化
    python3和2的区别
    深度优先和广度优先遍历
    python偏函数
    python匿名函数
    android 应用能够安装在什么地方
    C语言文件操作函数
    病毒木马查杀实战第026篇:“白加黑”恶意程序研究(上)
    函数指针
  • 原文地址:https://www.cnblogs.com/gsblog/p/3369186.html
Copyright © 2011-2022 走看看