1. regular expression
Regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are widely used in UNIX world.
2.re module
re module supports Perl-like regular expression.
The re module raises the exception re.error if an error occurs while compiling or using a regular expression.
To avoid any confusion while dealing with regular expressions, we would use Raw Strings as r'expression'.
3. match function
Syntax: re.match(pattern, string, flags=0)
pattern #a regular expression to be matched string #a string will be searched to match the pattern at the beginning of string flags #modifiers. You can specify different flags using bitwise OR (|).
returns a match object on success, None on failure
Example:
import re line = "Cats are smarter than dogs" matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I) if matchObj: print "matchObj.group() : ", matchObj.group() print "matchObj.group(1) : ", matchObj.group(1) print "matchObj.group(2) : ", matchObj.group(2) else: print "No match!!" #group() is Match Object Methods #group() represent all the string #group(1) represent one word before pattern in the string #group(2) represent one word after pattern in the string
4. search function
#Syntax: re.search(pattern, string, flags=0) #pattern: This is the regular expression to be matched. #string: This is the string, which would be searched to match the pattern anywhere in the string. #flags: the same as match()
returns a match object on success, none on failure
Its group method is the same as match.
import re line = "Cats are smater than dogs." searchObj = re.search(r'(.*) are (.*?) .*', line, re.M|re.I) if searchObj: print "searchObj.group(): ", searchObj.group() print "searchObj.group(1): ", searchObj.group(1) print "searchObj.group(2): ", searchObj.group(2) else: print "no match"
5. Match VS Search
match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string
import re line = "Cats are smater than dogs." searchObj = re.search(r'dogs', line, re.M|re.I) matchObj = re.match(r'dogs', line, re.M|re.I) if searchObj: print "searchObj.group(): ", searchObj.group() else: print "no match " if matchObj: print "matchObj.group(): ", matchObj.group() else: print "no match
When the code is executed, it produced the following result:
searchObj.group(): Cats are smater than dogs. no match
6. sub
#syntax: re.sub(pattern, repl, string, max=0) #This method replaces all occurrences of the RE pattern in string with repl, #substituting all occurrences unless max provided. #This method returns modified string.
Explame:
import re phone = "32580-110-517 #nhmhhh" #Delete python style comment num = re.sub(r'#.*$', "", phone) print "phone num:", num #Delete non-digit characters num = re.sub(r'D', "", phone) print "phone num:", num
When the above code is executed, it produces the following result −
phone num:32580-110-517 phone num:32580110517
7. Regular Expression Modifiers: Option flags
You can provide multiple modifiers using exclusive OR (|).
re.I #Performs case-insensitive matching. re.L #Interprets words according to the current locale. re.M #Makes $ match the end of a line #(not just the end of the string) #makes ^ match the start of any line #(not just the start of the string) re.S #Makes a period (dot) match any character, including a newline. re.U #Interprets letters according to the Unicode character set. re.X #Permits "cuter" regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker.
8. Regular Expression Patterns
https://www.tutorialspoint.com/python/python_reg_expressions.htm