zoukankan      html  css  js  c++  java
  • python之re正则简单够用

    0.

    1.参考

    Python正则表达式指南

    https://docs.python.org/2/library/re.html

    https://docs.python.org/2/howto/regex.html

    https://docs.python.org/3/library/re.html

    string re 备注
      re.match(pattern, string, flags=0) at the start of the string
    S.find(sub [,start [,end]]) -> int re.search(pattern, string, flags=0) Scan through string looking for a match
    S.replace(old, new[, count]) -> string re.findall(pattern, string, flags=0) re.finditer

    2.分组 m.group()

    xx

    In [560]: m.group?
    Docstring:
    group([group1, ...]) -> str or tuple.
    Return subgroup(s) of the match by indices or names.
    For 0 returns the entire match.
    Type:      builtin_function_or_method
    
    In [542]: m=re.search(r'(-{1,2}(gr))','pro---gram-files')
    
    In [543]: m.group()  #自带
    Out[543]: '--gr'
    
    In [544]: m.group(0)  #自带,返回整个匹配到的字符串 For 0 returns the entire match. 注意 m.string 是被检索的完整原文。。。
    Out[544]: '--gr'
    
    In [545]: m.group(1)
    Out[545]: '--gr'
    
    In [546]: m.group(2)
    Out[546]: 'gr'
    
    In [547]: m.group(3)  #加的 ( 不满足则报错
    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    <ipython-input-547-71a2c7935517> in <module>()
    ----> 1 m.group(3)
    
    IndexError: no such group
    
    In [548]: m.group(1,2)  #选择多个分组,返回tuple
    Out[548]: ('--gr', 'gr')
    
    In [549]: m.groups()  #选择所有分组
    Out[549]: ('--gr', 'gr')

    m.groupdict 用于命名分组

    In [557]: m.groupdict?
    Docstring:
    groupdict([default=None]) -> dict.
    Return a dictionary containing all the named subgroups of the match,
    keyed by the subgroup name. The default argument is used for groups
    that did not participate in the match
    Type:      builtin_function_or_method
    
    In [558]: m=re.search(r'(-{1,2}(?P<GR>gr))','pro---gram-files')
    
    In [559]: m.groupdict()
    Out[559]: {'GR': 'gr'}

    3.提取 re.findall()

    re.findall(pattern, string, flags=0)

    In [97]: text = "He was carefully disguised but captured quickly by police."
    
    In [98]: re.findall(r"w+ly", text)  #相当于 m.group(0)
    Out[98]: ['carefully', 'quickly']
    
    In [99]: re.findall(r"(w+)ly", text)  #手动加单个括号限定内容,相当于返回 m.group(1)
    Out[99]: ['careful', 'quick']
    
    In [100]: re.findall(r"((w+)(ly))", text)  #多个括号,从左到右数 (,相当于返回 m.groups()
    Out[100]: [('carefully', 'careful', 'ly'), ('quickly', 'quick', 'ly')]

      In [102]: re.findall(r"((1w+)(ly))", text)
      Out[102]: []

    4.替换 re.sub() 

    re.sub(pattern, repl, string, count=0, flags=0)

    repl 里面的 前向引用 Backreferences, such as 6, are replaced with the substring matched by group 6 in the pattern. 也可以通过 func 实现。

    注意 mysql regexp 不支持 1

    https://stackoverflow.com/questions/4122393/negative-backreferences-in-mysql-regexp  提到 unless you can install/use LIB_MYSQLUDF_PREG.

    https://stackoverflow.com/questions/7058209/reference-to-groups-in-a-mysql-regex

    In [158]: def func(m):
         ...:     return m.group('DEF')+' '+m.group(2)  #别名
         ...:
    
    In [159]: re.sub(r'(?P<DEF>def)s+([a-z]+)s*(s*):', func, 'def func(): def f():')
    Out[159]: 'def func def f'
    
    In [160]: re.sub(r'(?P<DEF>def)s+([a-z]+)s*(s*):', r'1 2', 'def func(): def f():')  #不支持 别名
    Out[160]: 'def func def f'

    5. Backreferences 前向引用在 pattern

    5.1扑克牌找对子

    In [204]: re.search(r'(.).*1','ab123')
    
    In [205]: re.search(r'(.).*1','ab121')
    Out[205]: <_sre.SRE_Match at 0x71ca120>
    
    In [206]: _.group()
    Out[206]: '121'

    5.2连续多个相同

    In [207]: re.search(r'.{3}','1122')  #错误
    Out[207]: <_sre.SRE_Match at 0x71b94a8>
    
    In [208]: re.search(r'(.){3}','1122') #错误
    Out[208]: <_sre.SRE_Match at 0x71ca198>
    
    In [209]: re.search(r'(.)11','1122') #正确
    
    In [210]: re.search(r'(.)11','1112')
    Out[210]: <_sre.SRE_Match at 0x71ca210>
    
    In [211]: re.search(r'(.)1{2}','1112')
    Out[211]: <_sre.SRE_Match at 0x71ca288>
    
    In [212]: _.group()
    Out[212]: '111'
  • 相关阅读:
    [转] Oracle数据库备份与恢复
    Oracle RMAN 恢复控制文件到指定的路径
    [转] AIX lv 4k偏移量
    关于oracle 10g creating datafile with zero offset for aix
    linux中的chage命令
    [转] Oracle sql 查询突然变慢 -- 案例分析
    [转] Oracle analyze table 使用总结
    [转] Oracle analyze 命令分析
    .net 事务
    _BIN 二进制排序
  • 原文地址:https://www.cnblogs.com/my8100/p/python_re.html
Copyright © 2011-2022 走看看