$python正则表达式系列（3）——正则内置属性

zoukankan html css js c++ java

$python正则表达式系列（3）——正则内置属性
本文主要总结一下python正则的一些内置属性的用法。

1. 编译标志：flags

首先来看一下re.findall函数的函数原型：
```
import re 
print('【Output】')
print help(re.findall)
```
```
【Output】
Help on function findall in module re:

findall(pattern, string, flags=0)
    Return a list of all non-overlapping matches in the string.
    
    If one or more groups are present in the pattern, return a
    list of groups; this will be a list of tuples if the pattern
    has more than one group.
    
    Empty matches are included in the result.

None
```
可以看出，re.findall函数的最后一个参数是flags，默认值是0，这个falgs就是编译标志，即正则的内置属性，使用不同的编译标志可以让正则产生不同的匹配效果。那么falgs可以取哪些值呢？用help(re)来看一下re的DATA有哪些：
```
print help(re)

# 【Output】
'''
...
DATA
    DOTALL = 16
    I = 2
    IGNORECASE = 2
    L = 4
    LOCALE = 4
    M = 8
    MULTILINE = 8
    S = 16
    U = 32
    UNICODE = 32
    VERBOSE = 64
    X = 64
...
'''
```
下面试验一下上面的每一种编译标志的作用。

2. DOTALL, S

使"."匹配包括" "在内的所有字符（"."默认是不能匹配" “的），举例：
```
p = r'me.com'
print '【Output】'
print re.findall(p,'me.com')
print re.findall(p,'me
com')
print re.findall(p,'me
com',re.DOTALL)
print re.findall(p,'me
com',re.S)
```
```
【Output】
['me.com']
[]
['me
com']
['me
com']
```
注：使用re.S模式时，正则表达式不能是编译后的正则（re.compile()函数），否则会出错。
使用re.S模式时，"^"字符变为文档开始符而不再是行开始符，"$"字符变为文档结束符而不再是行结束符。

3. IGNORECASE, I

使匹配对大小写不敏感，举例：
```
p = r'a'
print '【Output】'
print re.findall(p,'A')
print re.findall(p,'A',re.IGNORECASE)
print re.findall(p,'A',re.I)
```
```
【Output】
[]
['A']
['A']
```
4. LOCALE, L

本地化匹配，使用了该编译标志后，w,W,,B,s,S等字符的含义就和本地化有关了。

5. MULTILINE, M

开启多行匹配，影响"^"和"$"。举例：
```
s = """
aa bb cc
bb aa
aa ccd
"""
p1 = r'^aa'
p2 = r'cc$'
print '【Output】'
print re.findall(p1,s)
print re.findall(p1,s,re.M)

print re.findall(p2,s)
print re.findall(p2,s,re.M)
```
```
【Output】
[]
['aa', 'aa']
[]
['cc']
```
6. VERBOSE, X

开启正则的多行写法，使之更清晰。举例：
```
p = r"""
d{3,4}
-?
d{7,8}
"""
tel = '010-12345678'
print '【Output】'
print re.findall(p,tel)
print re.findall(p,tel,re.X)
```
```
【Output】
[]
['010-12345678']
```
7. UNICODE, U

以unicode编码进行匹配，比如用's'匹配中文全角的空格符：u3000，不加该编译标志和加该编译标志的效果对比如下：
```
s = u'u3000'
p = r's'
print '【Output】'
print re.findall(p,s)
print re.findall(p,s,re.U)
```
```
【Output】
[]
[u'u3000']
```
8. 如何同时使用多个编译标志？

有时候可能同时要用到多种编译标志，比如我既想在匹配的时候忽略大小写，又想让"."匹配换行符号" "，前面的方式貌似不行了，那怎么办呢？

方法：在正则的任意位置加上这句即可：(?iLmsux)

其中i对应re.I，L对应re.L，m对应re.M，s对应re.S，u对应re.U，x对应re.X。举例：
```
s = 'Abc
com'
p = r'abc.com(?is)'  # 注：编译标志(?is)可以加在正则的任意位置，这里加在了末尾
print '【Output】'
print re.findall(p,s)
```
```
【Output】
['Abc
com']
```
查看全文

相关阅读:
Oracle OCP 11G 053（601-712）答案解析目录_20140304
Oracle OCP 11G 053（201-400）答案解析目录_20140304
Oracle OCP 11G 053（1-200）答案解析目录_20140304
dojo实现表格数据无法展示
 dojo实现表格
 Matlab基本函数-menu函数
 Matlab基本函数-log10函数
 Matlab基本函数-log函数
 Matlab基本函数-length函数
 Matlab基本函数-imag函数

原文地址：https://www.cnblogs.com/jiayongji/p/7118950.html

$python正则表达式系列（3）——正则内置属性

1. 编译标志：flags

2. DOTALL, S

3. IGNORECASE, I

4. LOCALE, L

5. MULTILINE, M

6. VERBOSE, X

7. UNICODE, U

8. 如何同时使用多个编译标志？