zoukankan html css js c++ java

模块之re模块详解

正则表达式特殊字符介绍

^           匹配行首                                    
$           匹配行尾                                    
.           任意单个字符                    
[]          匹配包含在中括号中的任意字符
[^]         匹配包含在中括号中的字符之外的字符
[-]         匹配指定范围的任意单个字符
？          匹配之前项的1次或者0次
+           匹配之前项的1次或者多次
*           匹配之前项的0次或者多次
{n}         匹配之前项的n次
{m,n}       匹配之前项最大n次，最小m次
{n,}        匹配之前项至少n次
|             匹配|左或|右的字符
(...)         分组匹配

A    效果和^是一样的，只从字符开头匹配
    匹配字符结尾，同$
d    匹配数字0-9
D    匹配非数字
w    匹配[A-Za-z0-9]
W    匹配非[A-Za-z0-9]
s     匹配空白字符、	、
、
(?P<name>...)   分组匹配 
注意：?P为固定语法格式

re模块的方法介绍

1、匹配类方法

a、findall方法

findall方法，该方法在字符串中查找模式匹配，将所有的匹配字符串以列表的形式返回，如果文本中没有任何字符串匹配模式，则返回一个空的列表，
# 如果有一个子字符串匹配模式，则返回包含一个元素的列表，所以，无论怎么匹配，我们都可以直接遍历findall返回的结果而不会出错，这对工程师
# 编写程序来说，减少了异常情况的处理，代码逻辑更加简洁

# re.findall() 用来输出所有符合模式匹配的子串
 
re_str = "hello this is python 2.7.13 and python 3.4.5"
 
pattern = "python [0-9].[0-9].[0-9]"
res = re.findall(pattern=pattern,string=re_str)
print(res)
 
# ['python 2.7.1', 'python 3.4.5']
 
pattern = "python [0-9].[0-9].[0-9]{2,}"
res = re.findall(pattern=pattern,string=re_str)
print(res)
 
# ['python 2.7.13']
 
 
pattern = "python[0-9].[0-9].[0-9]{2,}"
res = re.findall(pattern=pattern,string=re_str)
print(res)
 
# []
 
# re.findall() 方法，返回一个列表，如果匹配到的话，列表中的元素为匹配到的子字符串，如果没有匹配到，则返回一个空的列表
 
re_str = "hello this is python 2.7.13 and Python 3.4.5"
 
pattern = "python [0-9].[0-9].[0-9]"
res = re.findall(pattern=pattern,string=re_str,flags=re.IGNORECASE)
print(res)
 
# ['python 2.7.1', 'Python 3.4.5']
 
# 设置标志flags=re.IGNORECASE，意思为忽略大小写

b、编译的方式使用正则表达式

# 我们一般采用编译的方式使用python的正则模块，如果在大量的数据量中，编译的方式使用正则性能会提高很多，具体读者们可以可以实际测试
re_str = "hello this is python 2.7.13 and Python 3.4.5"
re_obj = re.compile(pattern = "python [0-9].[0-9].[0-9]",flags=re.IGNORECASE)
res = re_obj.findall(re_str)
print(res)

c、match方法

# match方法，类似于字符串中的startwith方法，只是match应用在正则表达式中更加强大，更富有表现力，match函数用以匹配字符串的开始部分，如果模式
# 匹配成功，返回一个SRE_Match类型的对象，如果模式匹配失败，则返回一个None，因此对于普通的前缀匹配，他的用法几乎和startwith一模一样，例如我
# 们要判断data字符串是否以what和是否以数字开头

s_true = "what is a boy"
s_false = "What is a boy"
re_obj = re.compile("what")
 
print(re_obj.match(string=s_true))
# <_sre.SRE_Match object; span=(0, 4), match='what'
 
print(re_obj.match(string=s_false))
# None
 
s_true = "123what is a boy"
s_false = "what is a boy"
 
re_obj = re.compile("d+")
 
print(re_obj.match(s_true))
# <_sre.SRE_Match object; span=(0, 3), match='123'>
 
print(re_obj.match(s_true).start())
# 0
print(re_obj.match(s_true).end())
# 3
print(re_obj.match(s_true).string)
# 123what is a boy
print(re_obj.match(s_true).group())
# 123
 
 
print(re_obj.match(s_false))
# None

d、search方法

# search方法，模式匹配成功后，也会返回一个SRE_Match对象，search方法和match的方法区别在于match只能从头开始匹配，而search可以从
# 字符串的任意位置开始匹配，他们的共同点是，如果匹配成功，返回一个SRE_Match对象，如果匹配失败，返回一个None，这里还要注意，
# search仅仅查找第一次匹配，也就是说一个字符串中包含多个模式的匹配，也只会返回第一个匹配的结果，如果要返回所有的结果，最简单
# 的方法就是findall方法，也可以使用finditer方法

e、finditer方法

# finditer返回一个迭代器，遍历迭代器可以得到一个SRE_Match对象，比如下面的例子

re_str = "what is a different between python 2.7.14 and python 3.5.4"
 
re_obj = re.compile("d{1,}.d{1,}.d{1,}")
 
for i in re_obj.finditer(re_str):
    print(i)
 
# <_sre.SRE_Match object; span=(35, 41), match='2.7.14'>
# <_sre.SRE_Match object; span=(53, 58), match='3.5.4'>

2、修改类方法介绍

a、sub方法

# re模块sub方法类似于字符串中的replace方法，只是sub方法支持使用正则表达式，所以，re模块的sub方法使用场景更加广泛

re_str = "what is a different between python 2.7.14 and python 3.5.4"
 
re_obj = re.compile("d{1,}.d{1,}.d{1,}")
 
print(re_obj.sub("a.b.c",re_str,count=1))
# what is a different between python a.b.c and python 3.5.4
 
print(re_obj.sub("a.b.c",re_str,count=2))
# what is a different between python a.b.c and python a.b.c
 
print(re_obj.sub("a.b.c",re_str))
# what is a different between python a.b.c and python a.b.c

b、split方法

# re模块的split方法和python字符串中的split方法功能是一样的，都是将一个字符串拆分成子字符串的列表，区别在于re模块的split方法能够
# 使用正则表达式
# 比如下面的例子，使用. 空格 : !分割字符串，返回的是一个列表

re_str = "what is a different between python 2.7.14 and python 3.5.4 USA:NewYork!Zidan.FRA"
 
re_obj = re.compile("[. :!]")
 
print(re_obj.split(re_str))
# ['what', 'is', 'a', 'different', 'between', 'python', '2', '7', '14', 'and', 'python', '3', '5', '4', 'USA', 'NewYork', 'Zidan', 'FRA']

c、大小写不敏感设置

# 3、大小写不敏感
 
# re.compile(flags=re.IGNORECASE)

d、非贪婪匹配

# 4、非贪婪匹配，贪婪匹配总是匹配到最长的那个字符串，相应的，非贪婪匹配是匹配到最小的那个字符串，只需要在匹配字符串的时候加一个？即可
 
# 下面的例子，注意两个.
s = "Beautiful is better than ugly.Explicit is better than impliciy."
 
 
re_obj = re.compile("Beautiful.*y.")
 
print(re_obj.findall(s))
# ['Beautiful is better than ugly.Explicit is better than implicit.']
 
re_obj = re.compile("Beautiful.*?.")
 
print(re_obj.findall(s))
# ['Beautiful is better than ugly.']

e、在正则匹配字符串中加一个小括号，会有什么的效果呢？

如果是要配置一个真正的小括号，那么就需要转义符，下面的例子大家仔细看下，注意下search方法返回的对象的group（1）这个方法是报错的

import re
s = "=aa1239d&&& 0a ()--"
 
# obj = re.compile("()")
# search
# rep = obj.search(s)
# print(rep)
# <_sre.SRE_Match object; span=(15, 17), match='()'>
# print(rep.group(1))
# IndexError: no such group
# print(rep.group())
# ()

# findall
 
rep = obj.findall(s)
print(rep)
# ['()']

如果是要返回括号中匹配的字符串中，则该小括号不需要转义符，findall方法返回的是小伙好中匹配到的字符串，search.group（）方法的返回的整个模式匹配到字符串，search.group(1)这个是匹配第一个小括号中的模式匹配到的字符串，search.group(2)这个是匹配第二个小括号中的模式匹配到的字符串，以此类推

s = "=aa1239d&&& 0a ()--"
rep = re.compile("w+(&+)")
 
print(rep.findall(s))
# ['&&&']
print(rep.search(s).group())
# aa1239d&&&
print(rep.search(s).group(1))
# &&&

转载自: https://www.cnblogs.com/bainianminguo/p/10657631.html

查看全文

相关阅读:
ASP.NET CORE 使用Consul实现服务治理与健康检查(2)——源码篇
 ASP.NET CORE 使用Consul实现服务治理与健康检查(1)——概念篇
 Asp.Net Core 单元测试正确姿势
 如何通过 Docker 部署 Logstash 同步 Mysql 数据库数据到 ElasticSearch
Asp.Net Core2.2 源码阅读系列——控制台日志源码解析
 使用VS Code 开发.NET CORE 程序指南
 .NetCore下ES查询驱动 PlainElastic .Net 升级官方驱动 Elasticsearch .Net
重新认识 async/await 语法糖
 EF添加
 EF修改部分字段

原文地址：https://www.cnblogs.com/featherwit/p/13284323.html