zoukankan      html  css  js  c++  java
  • 正则表达式

    正则表达式,又称正规表示式、正规表示法、正规表达式、规则表达式、常规表示法(英语:Regular Expression,在代码中常简写为regex、regexp或RE),计算机科学的一个概念。正则表达式使用单个字符串来描述、匹配一系列匹配某个句法规则的字符串。在很多文本编辑器里,正则表达式通常被用来检索、替换那些匹配某个模式的文本。

    许多程序设计语言都支持利用正则表达式进行字符串操作。例如,在Perl中就内建了一个功能强大的正则表达式引擎。正则表达式这个概念最初是由Unix中的工具软件(例如sed和grep)普及开的。正则表达式通常缩写成“regex”,单数有regexp、regex,复数有regexps、regexes、regexen。
    引用自维基百科https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F

    以上来自https://www.cnblogs.com/chuxiuhong/p/5885073.html

    正则表达式用于匹配字符串。

    re模块的match()方法是从开头匹配

    import re
    d=re.match('abc','abcdfaff')
    print(d)
    #返回:<_sre.SRE_Match object; span=(0, 3), match='abc'>
    View Code
    #要想知道匹配的是什么,就在匹配的返回值变量后加.group(0)
    import re
    d=re.match('abc','abcdfaff')
    print(d.group(0))
    #返回:abc
    View Code

    re的findall()方法可以从任意位置处匹配:

    #匹配数字0到10次,1到10次的运行结果:
    import re
    d=re.findall('[0-9]{0,10}','123456ab789cdfGFFDaff')
    #d=re.findall('[0-9]{1,10}','987868969354465766776ab6cdfaff')
    if d:
        print(d)
    #the running result:['123456', '', '', '789', '', '', '', '', '', '', '', '', '', '', '']
    #the running result:['123456', '789']
    View Code
    #匹配小写与大写字母0到10次,1到10次的运行结果:
    import re
    d=re.findall('[a-zA-Z]{1,10}','123456ab789cdfGFFDaff')
    if d:
        print(d)
    #the running result:['', '', '', '', '', '', 'ab', '', '', '', 'cdfGFFDaff', '']
    #the running result:['ab', 'cdfGFFDaff']
    View Code
    #匹配一个或者多个字符串:
    import re
    d=re.findall('[a-zA-Z]+','123_456ab7.89c~dfGFFDaff')
    if d:
        print(d)
    #the running result:['ab', 'c', 'dfGFFDaff']
    View Code

    re的search()方法:

    #匹配一个或者多个数字,从头开始找,直到找到第一个字符串为止:
    import re
    d=re.search('d+','def123_456ab7.89c~dfGFFDaff')
    if d:
        print(d.group())
    #the running result:123
    View Code

    re的sub()方法,用于替换的:

    #把所有的数字替换成'<',下面分别展示的是'd'和'd+'方法:
    import re
    d=re.sub('d+','<','def123_456ab7.89c~dfGFFDaff')
    if d:
        print(d)
    #the running result:def<<<_<<<ab<.<<c~dfGFFDaff
    #the running result:def<_<ab<.<c~dfGFFDaff
    View Code

    re的sub()方法,用于部分替换的:

    #只替换前两个数字字符串:
    import re
    d=re.sub('d+','<','def123_456ab7.89c~dfGFFDaff',count=2)
    if d:
        print(d)
    #the running result:def<_<ab7.89c~dfGFFDaff
    View Code

     查找以数字开头,以数字结尾的字符串:

    #查找以数字开头的数字字符,返回数字字符:
    import re
    d=re.search('^d','987654321ABCdef123_456ab7.89c~dfGFFDaff555')
    if d:
        print(d)
    #the running result:<_sre.SRE_Match object; span=(0, 1), match='9'>
    
    
    
    #查找以数字开头的数字字符串,返回数字字符串:
    import re
    d=re.search('^d+','987654321ABCdef123_456ab7.89c~dfGFFDaff555')
    if d:
        print(d)
    #the running result:<_sre.SRE_Match object; span=(0, 9), match='987654321'>
    
    
    
    #查找以数字开头以数字结尾的数字字符串,返回数字字符串:
    import re
    d=re.search('^d+$','987654321ABCdef123_456ab7.89c~dfGFFDaff555')
    print(d)
    #the running result:None
    #返回的是None,因为整个字符串不全是数字,而条件中写的是d+,有一个加号,
    #如果是d=re.search('^d+$','987654321')
    #则返回结果是: <_sre.SRE_Match object; span=(0, 9), match='987654321'>
    View Code

     Something about the function findall():

    #the function called findall() of the re return a string in the form of the list
    import re
    s1 = re.findall('org','https://docs.python.org/3/whatsnew/3.6.html')
    print (s1)
    #the result:['org']
    View Code
    #if the sign ^ is placed before a string,
    #the function findall() will return a string which is matched to the original string
    #in the form of the list
    import re
    s = re.findall('^https','https://docs.python.org/3/whatsnew/3.6.html')
    print(s)
    #the result:['https']
    View Code
    #if the sign $ is placed after a string,
    #the function findall() will also return a string which is matched to the original string
    #in the form of the list
    import re
    s = re.findall("html$","https://docs.python.org/3/whatsnew/3.6.html")
    print(s)
    #the result:['html']
    View Code
    #the symbol [...] is used to match one of a sigle character from the original string,
    #the function findall() will return a series of  strings which are matched to the
    # original string in the form of the list
    import re
    s = re.findall('[t,w]h','https://docs.python.org/3/whatsnew/3.6.html')
    print(s)
    #the result:['th', 'wh']
    View Code
    #the symbol 'd' is used to match a digital from the original string,
    #the function findall() will return a series of digital character
    # which are matched to the original string in the form of the list
    #if you place many 'd', it will return a string composed of 
    #corresponding numbers of digital character
    import re
    s1 = re.findall("d","https://docs.python.org/3/whatsnew/3.6.html")
    s2 = re.findall("ddd","https://docs.python.org/3/whatsnew/3.6.html/1234")
    print(s1)
    print(s2)
    #the result:['3', '3', '6']
    #the result:['123']
    View Code
    #the symbol 'D' will shield(屏蔽) all the digitals.
    #the function findall() will return single character
    # in the form of the list
    import re
    s = re.findall('D','good 123_ mornin_g!')
    print (s)
    #the result:['g', 'o', 'o', 'd', ' ', '_', ' ', 'm', 'o', 'r', 'n', 'i', 'n', '_', 'g', '!']
    View Code

    小练习:

    import re
    print(re.match('Liudehua','Liudehua演戏很好!').group())#自身匹配自身
    print(re.match('.','Liudehua演戏很好!').group())#匹配任意一个字符
    print(re.match('.*','Liudehua演戏很好!').group())#匹配*前一个字符0次或者多次
    print(re.match(r'\','Liudehua演戏很好!').group())#\,反斜杠后面跟元字符()去掉元字符的特殊功能
    print(re.match('的+','的的的LLLLLiudehua演戏很好!').group())#匹配一次或者多次
    print(re.match('的?','的的的iudehua演戏很好!').group())#匹配一个字符0次或者1次
    print(re.match('^开头','开头Hiudehua演戏很好!').group())#匹配字符串开头
    print(re.match('!末尾$','Hiudehua演戏很好!末尾'))#匹配字符串末尾?
    print(re.match('的|H','Hiudehua演戏很好!').group())#匹配|两边表达式的任意一个
    print(re.match('P{3}','PPPPPPiudehua演戏很好!').group())#匹配三次
    print(re.match('.*P{3}','uuu(PPPPPP)dehua演戏很好!').group())#匹配三次
    print(re.match('d+','123nihao').group(0))#d相当于[0-9]
    print(re.match('D','飞雪123nihao').group())#匹配非数字,相当于^d
    print(re.match('D*sd','月下舞   123nihao').group(0))#s匹配任何空白字符
    print(re.match('S','月下舞   123nihao').group(0))#相当于^s,匹配任何非空白字符
    print(re.match('w*','月下舞_987   123nihao').group(0))#匹配字母,数字,下划线
    print(re.match('W*','***** &&月下舞_987   123nihao').group(0))#匹配非字母,数字,下划线
    print(re.match('Aqin','qin月下舞_987   123nihao').group(0))##仅匹配字符串开头,相当于^
    print(re.match('hao$','qin月下舞_987  123nihao'))#仅匹配字符串结尾,相当于$?...
    print(re.findall('tina','tian tinaaaa'))
    print(re.findall(r'tina','tian tinaaaa'))#匹配单词边界
    print(re.findall(r'tina','tian#tinaaaa'))
    print(re.findall(r'tina','tian#tina@aaa'))
    View Code
  • 相关阅读:
    Add Two Numbers
    Reverse Linked List II
    Reverse Linked List
    Remove Duplicates from Sorted List
    Remove Duplicates from Sorted List II
    Partition List
    Intersection of Two Linked Lists
    4Sum
    3Sum
    2Sum
  • 原文地址:https://www.cnblogs.com/yibeimingyue/p/9334759.html
Copyright © 2011-2022 走看看