python re

zoukankan html css js c++ java

python re
（参考来源：http://funhacks.net/2016/12/27/regular_expression/）

在python中使用正则表达式可以引入re模块。

正则表达式中，使用反斜杠对特殊字符进行转义，但是当正则表达式本身就出现时，进行转义往往会梳理不清楚。re模块中有 r 参数，表示原字符串，不需要用户自己转义特殊字符。

一 re模块的一般使用步骤：

1 使用compile函数将正则表达式的字符串形式编译成一个Pattern对象。

2 通过Pattern对象提供的一系列方法对文本进行匹配查找，得到匹配结果Match对象。

3 通过Match对象的属性和方法来获得信息。

二 compile函数

pattern = re.compile(r'd{3}') 编译生成一个pattern对象，匹配文本中的连续的三个数字字符。

三 pattern中的函数

pattern对象有一些方法，可以用来对文本的匹配查找。以下列举一些常用的函数。

1 match(string[, pos[, endpos]])

可以设置post和endpos来限制文本查找的范围。在没有设置时，默认在字符串开始处匹配。它是一次匹配，只要找到了一个匹配结果就返回，不会查找所有匹配的结果。

匹配成功时，返回一个match对象，没有匹配时，返回None。

match.group() or match.group(0) 返回匹配成功的整个子串，tuple类型。

match.groups() 返回分组匹配成功的所有子串，tuple类型。即正则表达式中()匹配的结果。
import re if __name__ == "__main__": pattern = re.compile(r"(d{1,3})aaa") match = pattern.match("345aaa888test") print(match.group(0)) print(match.groups())
345aaa ('345',)
2 search(string[, pos[, endpos]])

search用于查找字符串的任意位置。它是一次匹配，只要找到了一个匹配结果就返回，不会查找所有匹配的结果。如果没有匹配，返回None。

匹配成功时，返回一个match对象，没有匹配时，返回None。
import re if __name__ == "__main__": pattern = re.compile(r"(d{1,3})aaa") match = pattern.search("aaa0908aaa98bbb") print(match.group()) print(match.groups())
908aaa ('908',)
以上实例，如果使用match匹配，将默认从文本开始处匹配，最终会匹配失败，得到的match=None。

3 findall(string[, pos[, endpos]])

match和search都是一次匹配成功后就返回。然而，大多数时候，我们要查找整个字符串文本，获得所有的匹配结果。

findall 以列表形式返回所有能匹配的子串，如果没有匹配，则返回一个空列表。
import re if __name__ == "__main__": pattern = re.compile(r"d{1,3}aaa") result = pattern.findall("5aaa0908aaa98bbb") print(result)
['5aaa', '908aaa']
如果设置了分组捕获，则findall得到的是分组匹配的结果。
import re if __name__ == "__main__": pattern = re.compile(r"(d{1,3})aaa") result = pattern.findall("5aaa0908aaa98bbb") print(result)
['5', '908']
4 finditer(string[, pos[, endpos]])

finditer 与 findall 行为类似，也是搜索整个字符串，返回所有匹配结果。但它返回一个顺序访问每一个匹配结果（Match对象）的迭代器。
import re if __name__ == "__main__": pattern = re.compile(r"(d{1,3})aaa") result = pattern.finditer("5aaa0908aaa98bbb") print(type(result)) # 迭代器第一种遍历方式 while True: try: match = next(result) print(match.group()) print(match.groups()) except Exception as ex: break # 迭代器第二种遍历方式 for match in result: print(match.group()) print(match.groups())
<class 'callable_iterator'> 5aaa ('5',) 908aaa ('908',)
5 split(string[, maxsplit])

maxsplit 将指定最大分割次数，若不指定，将全部分割。以列表的形式返回分割后的结果。匹配到的子串将被当作分隔符（搜素整个字符串，查找所有匹配），用来分割字符串。

实例一
import re if __name__ == "__main__": pattern = re.compile(r"aaa") result = pattern.split("5aaa0908aaa98bbb") if result is not None: print(result)
['5', '0908', '98bbb']
实例二
import re if __name__ == "__main__": pattern = re.compile(r"[s,;]+") result = pattern.split("我, 你;; 他他们") if result is not None: print(result)
['我', '你', '他', '他们']
6 sub(repl, string[,count])

sub方法用于替换。其中，repl可以是一个字符串，也可以是函数。

如果repl是一个字符串，则将pattern匹配到的每一个子串都用repl替换后，返回结果。

如果repl是一个函数，这个函数应当只接受一个参数，即Match对象，且返回一个字符串用于替换（返回的字符串中不能再引用分组）。

实例一：使用字符串"aaa"替换"aaa989739383aaa8bbb"中正则匹配的子串"aaa989" 和 "aaa8"，最终得到替换后的字符串为 aaa739383aaabbb。
import re if __name__ == "__main__": pattern = re.compile(r"aaad{1,3}") newstr = pattern.sub("aaa", "aaa989739383aaa8bbb") print(newstr)
aaa739383aaabbb
实例二：pattern匹配字符串"aaa989739383 aaa8bbb"得到的子串有两个：aaa987和aaa8，并且对每一个子串进行了分组捕获数字987和8。函数repSome通过每一个Match对象返回新的字符串，用来替换。
def repSome(m): return "after " + m.group(1) import re if __name__ == "__main__": pattern = re.compile(r"aaa(d{1,3})") newstr = pattern.sub(repSome, "aaa989739383 aaa8bbb") print(newstr)
after 989739383 after 8bbb
7 subn(repl, string[,count])

subn与sub类似，也用于替换。它返回一个元组，元组有两个元素，第一个元素是sub替换后的结果，第一个元素是替换次数。

(sub(repl, string[,count]), count)
def repSome(m): return "after " + m.group(1) import re if __name__ == "__main__": pattern = re.compile(r"aaa(d{1,3})") newstr = pattern.subn(repSome, "aaa989739383 aaa8bbb") print(newstr)
('after 989739383 after 8bbb', 2)
四 re的其它函数

上述compile函数生成的pattern对象的一系列方法同re模块的大部分方法类似。不过，compile生成的pattern对象的方法不能指定搜索区间。

比如：pattern.match(string[, post[, endpos]])，相对应的re的方法为 re.match(pattern, string[, flags])。

举例：pattern的方法与re的方法对比

1 re的match匹配
import re if __name__ == "__main__": patternstr = r"aaa(d{1,3})" result = re.match(patternstr, "aaa989739383 aaa8bbb") print(result.group()) print(result.groups()) aaa989 ('989',)
2 改进的re.match匹配。如果patternstr在多处匹配时，最好先将正则表达式编译好之后再使用，不然每次使用时都要重新编译，效率会降低。
import re if __name__ == "__main__": pattern = re.compile(r"aaa(d{1,3})") result = re.match(pattern, "aaa989739383 aaa8bbb") print(result.group()) print(result.groups()) aaa989 ('989',)
3 pattern的match匹配
import re if __name__ == "__main__": pattern = re.compile(r"aaa(d{1,3})") result = pattern.match("aaa989739383 aaa8bbb") print(result.group()) print(result.groups()) aaa989 ('989',)
查看全文

相关阅读:
python 全栈开发，Day21(抽象类,接口类,多态,鸭子类型)
python 全栈开发，Day20(object类,继承与派生,super方法,钻石继承)
python 全栈开发，Day19(组合,组合实例,初识面向对象小结,初识继承)
python 全栈开发，Day18(对象之间的交互,类命名空间与对象,实例的命名空间,类的组合用法)
python 全栈开发，Day17(初识面向对象)
python 全栈开发，Day16(函数第一次考试)
python 全栈开发，Day15(递归函数,二分查找法)
python 全栈开发，Day14(列表推导式,生成器表达式,内置函数)
python 全栈开发，Day13(迭代器,生成器)
python 全栈开发，Day12(函数的有用信息,带参数的装饰器,多个装饰器装饰一个函数)

原文地址：https://www.cnblogs.com/mydesky2012/p/11596692.html

一 re模块的一般使用步骤：

二 compile函数

三 pattern中的函数

1 match(string[, pos[, endpos]])

2 search(string[, pos[, endpos]])

3 findall(string[, pos[, endpos]])

4 finditer(string[, pos[, endpos]])

5 split(string[, maxsplit])

6 sub(repl, string[,count])

7 subn(repl, string[,count])

四 re的其它函数

举例：pattern的方法与re的方法对比