zoukankan      html  css  js  c++  java
  • Python:正则表达式

    1. 正则表达式概述 

      正则表达式(简称为 regex)是一些由字符和特殊符号组成的字符串, 描述了模式的重复或者表述多个字符。


      换句话说, 它们能够匹配多个字符串。


      解释代码大多摘自《Python编程快速上手  让繁琐工作自动化》

    2. 正则表达式书写


    # 正则表达式书写

    3. 创建正则表达式对象

      孤立的一个正则表达式并不能起到匹配字符串的作用,要让其能够匹配目标字符,需要创建一个正则表达式对象。通常向compile()函数传入一个原始字符形式的正则表达式,即 r'.....'

    >>> # re模块的compile()函数将返回(创建)一个Regex模式对象
    >>> import re
    >>> phoneNumRegex = re.compile(r'ddd-ddd-dddd')

    4. 常用的正则表达式模式

    4.1  括号分组

    >>> Regex = re.compile(r'(ddd)-(ddd-dddd)')
    >>> mo = Regex.search('My number is 415-555-4242.')
    >>> Regex = re.compile(r'(ddd)-(ddd-dddd)') # 创建Regex对象
    >>> mo = Regex.search('My number is 415-555-4242.')   # 返回Match对象
    >>> mo.group()         # 调用Regex对象的group()方法将返回整个匹配文本
    >>> mo.group(1)
    >>> mo.group(2)
    >>> mo.group(0)
    >>> mo.groups()
    ('415', '555-4242')
    >>> a,b = mo.groups()   # groups()方法返回多个值得元组
    >>> a
    >>> b

    4.2  用管道匹配多个分组

    >>> heroRegex = re.compile (r'Batman|Tina Fey')
    >>> mo1 = heroRegex.search('Batman and Tina Fey.')
    >>> mo1.group()
    >>> mo2 = heroRegex.search('Tina Fey and Batman.')
    >>> mo2.group()
    'Tina Fey

    4.3  用问号实现可选匹配

    >>> batRegex = re.compile(r'Bat(wo)?man')   # 如果'wo'没有用括号括起来,则可选的字符将是Batwo
    >>> mo1 = batRegex.search('The Adventures of Batman')
    >>> mo1.group()
    >>> mo2 = batRegex.search('The Adventures of Batwoman')
    >>> mo2.group()

    4.4 用星号匹配零次或多次

    >>> batRegex = re.compile(r'Bat(wo)*man') # 如果要匹配'*'号则用*
    >>> mo1 = batRegex.search('The Adventures of Batman')
    >>> mo1.group()
    >>> mo2 = batRegex.search('The Adventures of Batwoman')
    >>> mo2.group()
    >>> mo3 = batRegex.search('The Adventures of Batwowowowoman')
    >>> mo3.group()

    4.5 用加号匹配一次或多次

    >>> batRegex = re.compile(r'Bat(wo)+man')  # 如果要匹配+号用+
    >>> mo1 = batRegex.search('The Adventures of Batwoman')
    >>> mo1.group()
    >>> mo2 = batRegex.search('The Adventures of Batwowowowoman')
    >>> mo2.group()
    >>> mo3 = batRegex.search('The Adventures of Batman')
    >>> mo3 == None

    4.6 用花括号匹配特定次数

      下面代码的 “?” 表示非贪心匹配。问号在正则表达式中可能有两种含义: 声明非贪心匹配或表示可选的分组。这两种含义是完全无关的。

    >>> greedyHaRegex = re.compile(r'(Ha){3,5}') # 若果要匹配{,则用{
    >>> mo1 = greedyHaRegex.search('HaHaHaHaHa')
    >>> mo1.group()
    >>> nongreedyHaRegex = re.compile(r'(Ha){3,5}?')
    >>> mo2 = nongreedyHaRegex.search('HaHaHaHaHa')
    >>> mo2.group()

    5. 贪心和非贪心匹配


    >>> nongreedyRegex = re.compile(r'<.*?>')
    >>> mo = nongreedyRegex.search('<To serve man> for dinner.>')
    >>> mo.group()
    '<To serve man>'
    >>> greedyRegex = re.compile(r'<.*>')
    >>> mo = greedyRegex.search('<To serve man> for dinner.>')
    >>> mo.group()
    '<To serve man> for dinner.>'

    6. Regex 对象常用方法


    6.1 search(), group(), groups()

    >> Regex = re.compile(r'(ddd)-(ddd-dddd)')
    >>> mo = Regex.search('My number is 415-555-4242.')
    >>> Regex = re.compile(r'(ddd)-(ddd-dddd)') # 创建Regex对象
    >>> mo = Regex.search('My number is 415-555-4242.')   # 返回Match对象
    >>> mo.group()         # 调用Regex对象的group()方法将返回整个匹配文本
    >>> mo.group(1)
    >>> mo.group(2)
    >>> mo.group(0)
    >>> mo.groups()
    ('415', '555-4242')
    >>> a,b = mo.groups()   # groups()方法返回多个值得元组
    >>> a
    >>> b

    6.2 findall()



    >>> Regex = re.compile(r'ddd-ddd-dddd') # has no groups
    >>> Regex.findall('Cell: 415-555-9999 Work: 212-555-0000')
    ['415-555-9999', '212-555-0000']
    >>> Regex = re.compile(r'(ddd)-(ddd)-(dddd)') # has groups
    >>> Regex.findall('Cell: 415-555-9999 Work: 212-555-0000')
    [('415', '555', '1122'), ('212', '555', '0000')]

    6.3 sub()

    >>> namesRegex = re.compile(r'Agent w+')
    >>> namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.')
    'CENSORED gave the secret documents to CENSORED.'
    >>> namesRegex = re.compile(r'Agent w+')
    >>> namesRegex.sub('CENSORED', 'Agent Alice gave the secret documents to Agent Bob.' , 1)  # 匹配1次
    'CENSORED gave the secret documents to Agent Bob.'


      要让正则表达式不区分大小写,可以向 re.compile()传入 re.IGNORECASE 或 re.I,作为第二个参数。

      通过传入 re.DOTALL 作为 re.compile()的第二个参数, 可以让句点字符匹配所有字符, 包括换行字符。

      要在多行正则表达式中添加注释,则向 re.compile()传入变量 re.VERBOSE, 作为第二个参数。

    >>> someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL | re.VERBOSE)

    8. (?:…)

    >>> re.findall(r'http://(?:w+.)*(w+.com)', 'http://google.com http://www.google.com http://code.google.com')
    ['google.com', 'google.com', 'google.com']


    # (文件读写)疯狂填词2.py
    创建一个疯狂填词( Mad Libs)程序,它将读入文本文件, 并让用户在该文本文件中出现 
    ADJECTIVE、 NOUN、 ADVERB 或 VERB 等单词的地方, 加上他们自己的文本。例如,一个文本文件可能看起来像这样:
    The ADJECTIVE panda walked to the NOUN and then VERB. A nearby NOUN was
    unaffected by these events.
    程序将找到这些出现的单词, 并提示用户取代它们。
    Enter an adjective:
    Enter a noun:
    Enter a verb:
    Enter a noun:
    pickup truck
    The silly panda walked to the chandelier and then screamed. A nearby pickup truck was unaffected by these events.
    结果应该打印到屏幕上, 并保存为一个新的文本文件。
    import re
    def mad_libs(filename_path, save_path):
        with open(filename_path,'r') as strings: # 相对路径下的文档
            words = strings.read()
        Regex = re.compile(r'w[A-Z]+')   # w :匹配1个任何字母、数字或下划线
        finds = Regex.findall(words)
        for i in finds:
            replace = input('输入你想替换 {} 的单词:
            Regex2 = re.compile(i)
            words = Regex2.sub(replace,words,1) # 这个变量必须要是words与上面一致否则只打印最后替换的一个,可以画栈堆图跟踪这个变量的值
        # strings.close()  不用这一行,with 上下文管理器会自动关闭
        with open(save_path,'a') as txt: 
            txt.write(words + '
    ') #分行写
        # save_txt = open('保存疯狂填词文档.txt','a')
        # save_txt.write(words)
        # save_txt.close()
    if __name__ == '__main__': 
        filename_path = input('输入要替换的txt文本路径:')    # '疯狂填词原始文档.txt'
        save_path = input('输入要保存的文件路径(包含文件名称):') # '保存疯狂填词文档.txt'
        mad_libs(filename_path, save_path)


  • 相关阅读:
    Hdu 1257 最少拦截系统
    Hdu 1404 Digital Deletions
    Hdu 1079 Calendar Game
    Hdu 1158 Employment Planning(DP)
    Hdu 1116 Play on Words
    Hdu 1258 Sum It Up
    Hdu 1175 连连看(DFS)
    Hdu 3635 Dragon Balls (并查集)
    Hdu 1829 A Bug's Life
    Hdu 1181 变形课
  • 原文地址:https://www.cnblogs.com/ydkh/p/14688029.html
Copyright © 2011-2022 走看看