zoukankan      html  css  js  c++  java
  • [Python] Regular Expressions

    1. regular expression

    Regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are widely used in UNIX world.

    2.re module

    re module supports Perl-like regular expression.

    The re module raises the exception re.error if an error occurs while compiling or using a regular expression.

    To avoid any confusion while dealing with regular expressions, we would use Raw Strings as r'expression'.

     3. match function

    Syntax:
    re.match(pattern, string, flags=0)
    
    pattern #a regular expression to be matched
    string #a string will be searched to match the pattern at the beginning of string
    flags #modifiers. You can specify different flags using bitwise OR (|).
    

      

    returns a match object on success, None on failure

    Example:

    import re
    
    line = "Cats are smarter than dogs"
    
    matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)
    
    if matchObj:
       print "matchObj.group() : ", matchObj.group()
       print "matchObj.group(1) : ", matchObj.group(1)
       print "matchObj.group(2) : ", matchObj.group(2)
    else:
       print "No match!!"
    
    #group() is Match Object Methods
    #group() represent all the string
    #group(1) represent one word before pattern in the string
    #group(2) represent one word after pattern in the string
    

      

    4. search function

    #Syntax:
    re.search(pattern, string, flags=0)
    #pattern: This is the regular expression to be matched.
    #string: This is the string, which would be searched to match the pattern anywhere in the string.
    #flags: the same as match()  
    

      

    returns a match object on success, none on failure

    Its group method is the same as match.

    import re
    
    line = "Cats are smater than dogs."
    
    searchObj = re.search(r'(.*) are (.*?) .*', line, re.M|re.I)
    
    if searchObj:
        print "searchObj.group(): ", searchObj.group()
        print "searchObj.group(1): ", searchObj.group(1)
        print "searchObj.group(2): ", searchObj.group(2)
    else:
        print "no match"
    

      

    5. Match VS Search

    match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string

    import re
    
    line = "Cats are smater than dogs."
    
    searchObj = re.search(r'dogs', line, re.M|re.I)
    matchObj = re.match(r'dogs', line, re.M|re.I)
    
    if searchObj:
        print "searchObj.group(): ", searchObj.group()
    else:
        print "no match
    "
    
    if matchObj:
        print "matchObj.group(): ", matchObj.group()
    else:
        print "no match
    
    

      

    When the code is executed, it produced the following result:

    searchObj.group(): Cats are smater than dogs.
    no match
    

      

    6. sub

    #syntax:
    re.sub(pattern, repl, string, max=0)
    #This method replaces all occurrences of the RE pattern in string with repl,
    #substituting all occurrences unless max provided. 
    #This method returns modified string.
    

      

    Explame:

    import re
    
    phone = "32580-110-517 #nhmhhh"
    
    #Delete python style comment
    num = re.sub(r'#.*$', "", phone)
    print "phone num:", num
    
    #Delete non-digit characters
    num = re.sub(r'D', "", phone)
    print "phone num:", num
    

      

    When the above code is executed, it produces the following result −

    phone num:32580-110-517 
    phone num:32580110517 
    

      

    7. Regular Expression Modifiers: Option flags

     You can provide multiple modifiers using exclusive OR (|).

    re.I #Performs case-insensitive matching.
    re.L #Interprets words according to the current locale.
    re.M #Makes $ match the end of a line
    #(not just the end of the string)
    #makes ^ match the start of any line
    #(not just the start of the string)
    re.S #Makes a period (dot) match any character, including a newline.
    re.U #Interprets letters according to the Unicode character set.
    re.X #Permits "cuter" regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker.
    

      

    8. Regular Expression Patterns

    https://www.tutorialspoint.com/python/python_reg_expressions.htm

      

  • 相关阅读:
    Spring 框架Bean的初始化和销毁---方式:BeanPostProcessor后置处理器
    Spring 框架Bean的初始化和销毁 -- 方式:@PostConstruct注解和@PreDestroy注解
    Spring 框架Bean的初始化和销毁---方式:InitializingBean接口,DisposableBean接口
    Spring 框架Bean的初始化和销毁 ---方式: @Bean(initMethod = "init",destroyMethod = "destroy")
    消息队列入门理解
    springboot的定时任务使用(动态cron,缓存提速)
    【SpringBoot WEB 系列】RestTemplate 之自定义请求头
    【SpringBoot WEB系列】WebFlux静态资源配置与访问
    【SpringBoot WEB系列】静态资源配置与读取
    【SpringCloud 系列】Eureka 注册中心初体验
  • 原文地址:https://www.cnblogs.com/KennyRom/p/6368991.html
Copyright © 2011-2022 走看看