正则表达式,又称正规表示式、正规表示法、正规表达式、规则表达式、常规表示法(英语:Regular Expression,在代码中常简写为regex、regexp或RE),计算机科学的一个概念。正则表达式使用单个字符串来描述、匹配一系列匹配某个句法规则的字符串。在很多文本编辑器里,正则表达式通常被用来检索、替换那些匹配某个模式的文本。
许多程序设计语言都支持利用正则表达式进行字符串操作。例如,在Perl中就内建了一个功能强大的正则表达式引擎。正则表达式这个概念最初是由Unix中的工具软件(例如sed和grep)普及开的。正则表达式通常缩写成“regex”,单数有regexp、regex,复数有regexps、regexes、regexen。
引用自维基百科https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F
以上来自https://www.cnblogs.com/chuxiuhong/p/5885073.html
正则表达式用于匹配字符串。
re模块的match()方法是从开头匹配
import re d=re.match('abc','abcdfaff') print(d) #返回:<_sre.SRE_Match object; span=(0, 3), match='abc'>
#要想知道匹配的是什么,就在匹配的返回值变量后加.group(0) import re d=re.match('abc','abcdfaff') print(d.group(0)) #返回:abc
re的findall()方法可以从任意位置处匹配:
#匹配数字0到10次,1到10次的运行结果: import re d=re.findall('[0-9]{0,10}','123456ab789cdfGFFDaff') #d=re.findall('[0-9]{1,10}','987868969354465766776ab6cdfaff') if d: print(d) #the running result:['123456', '', '', '789', '', '', '', '', '', '', '', '', '', '', ''] #the running result:['123456', '789']
#匹配小写与大写字母0到10次,1到10次的运行结果: import re d=re.findall('[a-zA-Z]{1,10}','123456ab789cdfGFFDaff') if d: print(d) #the running result:['', '', '', '', '', '', 'ab', '', '', '', 'cdfGFFDaff', ''] #the running result:['ab', 'cdfGFFDaff']
#匹配一个或者多个字符串: import re d=re.findall('[a-zA-Z]+','123_456ab7.89c~dfGFFDaff') if d: print(d) #the running result:['ab', 'c', 'dfGFFDaff']
re的search()方法:
#匹配一个或者多个数字,从头开始找,直到找到第一个字符串为止: import re d=re.search('d+','def123_456ab7.89c~dfGFFDaff') if d: print(d.group()) #the running result:123
re的sub()方法,用于替换的:
#把所有的数字替换成'<',下面分别展示的是'd'和'd+'方法: import re d=re.sub('d+','<','def123_456ab7.89c~dfGFFDaff') if d: print(d) #the running result:def<<<_<<<ab<.<<c~dfGFFDaff #the running result:def<_<ab<.<c~dfGFFDaff
re的sub()方法,用于部分替换的:
#只替换前两个数字字符串: import re d=re.sub('d+','<','def123_456ab7.89c~dfGFFDaff',count=2) if d: print(d) #the running result:def<_<ab7.89c~dfGFFDaff
查找以数字开头,以数字结尾的字符串:
#查找以数字开头的数字字符,返回数字字符: import re d=re.search('^d','987654321ABCdef123_456ab7.89c~dfGFFDaff555') if d: print(d) #the running result:<_sre.SRE_Match object; span=(0, 1), match='9'> #查找以数字开头的数字字符串,返回数字字符串: import re d=re.search('^d+','987654321ABCdef123_456ab7.89c~dfGFFDaff555') if d: print(d) #the running result:<_sre.SRE_Match object; span=(0, 9), match='987654321'> #查找以数字开头以数字结尾的数字字符串,返回数字字符串: import re d=re.search('^d+$','987654321ABCdef123_456ab7.89c~dfGFFDaff555') print(d) #the running result:None #返回的是None,因为整个字符串不全是数字,而条件中写的是d+,有一个加号, #如果是d=re.search('^d+$','987654321') #则返回结果是: <_sre.SRE_Match object; span=(0, 9), match='987654321'>
Something about the function findall():
#the function called findall() of the re return a string in the form of the list import re s1 = re.findall('org','https://docs.python.org/3/whatsnew/3.6.html') print (s1) #the result:['org']
#if the sign ^ is placed before a string, #the function findall() will return a string which is matched to the original string #in the form of the list import re s = re.findall('^https','https://docs.python.org/3/whatsnew/3.6.html') print(s) #the result:['https']
#if the sign $ is placed after a string, #the function findall() will also return a string which is matched to the original string #in the form of the list import re s = re.findall("html$","https://docs.python.org/3/whatsnew/3.6.html") print(s) #the result:['html']
#the symbol [...] is used to match one of a sigle character from the original string, #the function findall() will return a series of strings which are matched to the # original string in the form of the list import re s = re.findall('[t,w]h','https://docs.python.org/3/whatsnew/3.6.html') print(s) #the result:['th', 'wh']
#the symbol 'd' is used to match a digital from the original string, #the function findall() will return a series of digital character # which are matched to the original string in the form of the list #if you place many 'd', it will return a string composed of #corresponding numbers of digital character import re s1 = re.findall("d","https://docs.python.org/3/whatsnew/3.6.html") s2 = re.findall("ddd","https://docs.python.org/3/whatsnew/3.6.html/1234") print(s1) print(s2) #the result:['3', '3', '6'] #the result:['123']
#the symbol 'D' will shield(屏蔽) all the digitals. #the function findall() will return single character # in the form of the list import re s = re.findall('D','good 123_ mornin_g!') print (s) #the result:['g', 'o', 'o', 'd', ' ', '_', ' ', 'm', 'o', 'r', 'n', 'i', 'n', '_', 'g', '!']
小练习:
import re print(re.match('Liudehua','Liudehua演戏很好!').group())#自身匹配自身 print(re.match('.','Liudehua演戏很好!').group())#匹配任意一个字符 print(re.match('.*','Liudehua演戏很好!').group())#匹配*前一个字符0次或者多次 print(re.match(r'\','Liudehua演戏很好!').group())#\,反斜杠后面跟元字符()去掉元字符的特殊功能 print(re.match('的+','的的的LLLLLiudehua演戏很好!').group())#匹配一次或者多次 print(re.match('的?','的的的iudehua演戏很好!').group())#匹配一个字符0次或者1次 print(re.match('^开头','开头Hiudehua演戏很好!').group())#匹配字符串开头 print(re.match('!末尾$','Hiudehua演戏很好!末尾'))#匹配字符串末尾? print(re.match('的|H','Hiudehua演戏很好!').group())#匹配|两边表达式的任意一个 print(re.match('P{3}','PPPPPPiudehua演戏很好!').group())#匹配三次 print(re.match('.*P{3}','uuu(PPPPPP)dehua演戏很好!').group())#匹配三次 print(re.match('d+','123nihao').group(0))#d相当于[0-9] print(re.match('D','飞雪123nihao').group())#匹配非数字,相当于^d print(re.match('D*sd','月下舞 123nihao').group(0))#s匹配任何空白字符 print(re.match('S','月下舞 123nihao').group(0))#相当于^s,匹配任何非空白字符 print(re.match('w*','月下舞_987 123nihao').group(0))#匹配字母,数字,下划线 print(re.match('W*','***** &&月下舞_987 123nihao').group(0))#匹配非字母,数字,下划线 print(re.match('Aqin','qin月下舞_987 123nihao').group(0))##仅匹配字符串开头,相当于^ print(re.match('hao$','qin月下舞_987 123nihao'))#仅匹配字符串结尾,相当于$?... print(re.findall('tina','tian tinaaaa')) print(re.findall(r'tina','tian tinaaaa'))#匹配单词边界 print(re.findall(r'tina','tian#tinaaaa')) print(re.findall(r'tina','tian#tina@aaa'))