zoukankan      html  css  js  c++  java
  • python正则表达式

    1.查找文本中的模式:re.search()

    import re
    
    text = "you are python"
    match = re.search('are',text)
    
    # 返回'are'第一个字符在text中的位置,注意是从0开始
    print(match.start())
    
    # 返回'are'最后一个字符在text中的位置,这里是从1开始
    print(match.end())

    结果:

    4
    7

     

    2.编译表达式:compile()

    import re
    
    text = "you are SB"
    # compile()函数会把一个表达式字符串转化为一个RegexObject
    regexes
    = {re.compile(p) for p in ['SB','NB','MB'] } for regex in regexes: if regex.search(text): print('seeking ' + regex.pattern + ' >> match') else: print('seeking ' + regex.pattern + ' >> not match')

    结果:

    seeking NB >> not match
    seeking SB >> match
    seeking MB >> not match

    3.多重匹配:findall(), finditer()

    findall()函数会返回与模式匹配而不重叠的所有字符串。

    import re
    
    text = "ab aa ab bb ab ba"
    
    # 把text所有的'ab'组合都匹配出来 print(re.findall('ab',text))

    结果:

    ['ab', 'ab', 'ab']

    finditer()会返回一个迭代器

    import re
    
    text = "ab aa ab bb ab ba"
    
    for match in re.finditer('ab',text):
        s = match.start()
        e = match.end()
        print('found {} at {} to {}'.format('"ab"',str(s),str(e)))

    结果:

    found "ab" at 0 to 2
    found "ab" at 6 to 8
    found "ab" at 12 to 14

    4.模式语法

    import re
    
    def test_patterns(text,patterns = []):
        # pattern分别返回patterns各项的的第一个元素,desc分别返回patterns各项的的第二个元素
        # 本函数相当于把patterns第一项作为匹配条件来匹配text
        for pattern,desc in patterns:
            print('匹配方式:' + pattern, '说明:' + desc)
            print('文本:' + text)
            for match in re.finditer(pattern,text):
                s = match.start()
                e = match.end()
                print('found {} at {} to {}, result:{}'.format(pattern,str(s),str(e),(text[s:e])))

    执行该函数:

    # 匹配所有a开头,后面是1个b的
    
    >>> test_patterns('a ab abb abbb ',[('ab','a followed by b')])
    匹配方式:ab 说明:a followed by b 文本:a ab abb abbb found ab at
    2 to 4, result:ab found ab at 5 to 7, result:ab found ab at 9 to 11, result:ab >>>

    重复

    模式中有5中表示重复的方法,现在一一展示:

    # * 表示该模式会重复0次或多次(重复0次即意味着它不出现也能被匹配)
    # 匹配a开头,后面为0个或多个b,也就是只要出现a就能匹配
    
    >>> test_patterns('a ac ab abb abbb ',[('ab*','a followed by zero or more b')])
    
    匹配方式:ab* 说明:a followed by zero or more b
    文本:a ac ab abb abbb 
    found ab* at 0 to 1, result:a
    found ab* at 2 to 3, result:a
    found ab* at 5 to 7, result:ab
    found ab* at 8 to 11, result:abb
    found ab* at 12 to 16, result:abbb
    >>>
    # + 表示该模式至少出现一次
    # 匹配a开头,后面至少有1个b的
    
    >>> test_patterns('a ab abb abbb ',[('ab+','a followed by one or more b')]) 匹配方式:ab+ 说明:a followed by one or more b 文本:a ab abb abbb found ab+ at 2 to 4, result:ab found ab+ at 5 to 8, result:abb found ab+ at 9 to 13, result:abbb >>>
    # ? 表示模式出现0次或1次
    # 匹配a开头,后面为0个或1个b,也就是只要出现a就能匹配
    
    >>> test_patterns('a ab abb abbb ',[('ab+','a followed by one or more b')])
    
    匹配方式:ab+ 说明:a followed by one or more b
    文本:a ab abb abbb 
    found ab+ at 2 to 4, result:ab
    found ab+ at 5 to 8, result:abb
    found ab+ at 9 to 13, result:abbb
    >>>
    # 匹配a开头,后面3个b
    
    >>> test_patterns('a ab abb abbb ',[('ab{3}','a followed by three b')])
    匹配方式:ab{
    3} 说明:a followed by three b 文本:a ab abb abbb found ab{3} at 9 to 13, result:abbb >>>
    # 匹配a开头,后面2个或3个b
    
    >>> test_patterns('a ab abb abbb ',[('ab{2,3}','a followed by two or three b')])
    
    匹配方式:ab{2,3} 说明:a followed by two or three b
    文本:a ab abb abbb 
    found ab{2,3} at 5 to 8, result:abb
    found ab{2,3} at 9 to 13, result:abbb
    >>>

    字符集

    字符集是一组字符,包含可以与模式中相应位置匹配的所有字符,例如[ab]可以匹配a或b。

    # 匹配所有a或b
    
    >>> test_patterns('abca',[('[ab]','either a or b')])
    
    匹配方式:[ab] 说明:either a or b
    文本:abca
    found [ab] at 0 to 1, result:a
    found [ab] at 1 to 2, result:b
    found [ab] at 3 to 4, result:a
    >>>
    # 匹配a开头,后面是a或b的
    
    >>> test_patterns('a aa ab ac',[('a[ab]','a followed by a or b')])
    
    匹配方式:a[ab] 说明:a followed by a or b
    文本:a aa ab ac
    found a[ab] at 2 to 4, result:aa
    found a[ab] at 5 to 7, result:ab
    >>>
    # 匹配a开头,后面是1个或多个a或b的
    
    >>> test_patterns('a aaaaa abbb accc',[('a[ab]+','a followed by 1 or more a or b')])
    
    匹配方式:a[ab]+ 说明:a followed by 1 or more a or b
    文本:a aaaaa abbb accc
    found a[ab]+ at 2 to 7, result:aaaaa
    found a[ab]+ at 8 to 12, result:abbb
    >>>
  • 相关阅读:
    10.RobotFramework: 获取当前时间戳
    9.接口自动化-自定义关键字、接口设计规范
    8.接口自动化-RequestLibrary库的介绍与示例讲解
    7.接口自动化-环境常用库搭建
    6.数据库操作(DatabaseLibrary)与常见问题
    5.远程服务器操作(SSHLibrary)
    概率密度与概率分布函数
    Linux安装包类型
    随机变量与随机过程
    hosts文件
  • 原文地址:https://www.cnblogs.com/singeldiego/p/5487442.html
Copyright © 2011-2022 走看看