Python 正则表达式

zoukankan html css js c++ java

Python 正则表达式
正则表达式
prog = re.compile(pattern) result = prog.match(string)
pattern：正则表达式内容
string：使用正则表达式去匹配的字符串

re.compile(pattern)返回一个regular expression object，
可使用regex.search(string[, pos[, endpos]])或regex.match(string[, pos[, endpos]])去匹配string

或者 result = re.match(pattern, string)
1. 贪婪与非贪婪模式
  
  #coding=utf-8 import re rep = """eth0 Link encap:Ethernet HWaddr 78:2B:CB:11:2E:19 inet addr:10.180.45.1 Bcast:10.180.47.255 Mask:255.255.240.0 Interrupt:98 Memory:da000000-da012800 eth1 Link encap:Ethernet HWaddr 78:2B:CB:11:2E:1A inet addr:200.1.1.1 Bcast:200.255.255.255 Mask:255.0.0.0 Interrupt:106 Memory:dc000000-dc012800 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 """ #贪婪模式 regx = re.compile(r'.+Interrupt:(d{1,10})',re.S)
  
  #非贪婪模式
  regx = re.compile(r'.+?Interrupt:(d{1,3})',re.S)
  
  match = regx.match(rep)
  
  if match: print(match.group(0))
  
  输出结果：
  #贪婪模式 eth0 Link encap:Ethernet HWaddr 78:2B:CB:11:2E:19 inet addr:10.180.45.1 Bcast:10.180.47.255 Mask:255.255.240.0 Interrupt:98 Memory:da000000-da012800 eth1 Link encap:Ethernet HWaddr 78:2B:CB:11:2E:1A inet addr:200.1.1.1 Bcast:200.255.255.255 Mask:255.0.0.0 Interrupt:106
  #非贪婪模式
  
  　eth0 Link encap:Ethernet HWaddr 78:2B:CB:11:2E:19
  inet addr:10.180.45.1 Bcast:10.180.47.255 Mask:255.255.240.0
  Interrupt:98
2. re.compile(pattern, flags=0)
  
  　　flags:可用多个flags如re.A | re.I
  
  　　返回一个regular expression object
  
  regx = re.compile(r'.+?Interrupt:(d{1,3})',re.S)
3. match Object
  
  regex.search(string[, pos[, endpos]]) regex.match(string[, pos[, endpos]])
  返回的类型为match object
  
  以下关于match的属性，方法，以及示例来自于AstralWind的BLOG
  
  http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html
  
  属性：
  
  　　　　string: 匹配时使用的文本。
  　　　　re: 匹配时使用的Pattern对象。
  　　　　pos: 文本中正则表达式开始搜索的索引。值与Pattern.match()和Pattern.seach()方法的同名参数相同。
  　　　　endpos: 文本中正则表达式结束搜索的索引。值与Pattern.match()和Pattern.seach()方法的同名参数相同。
  　　　　lastindex: 最后一个被捕获的分组在文本中的索引。如果没有被捕获的分组，将为None。
  　　　　lastgroup: 最后一个被捕获的分组的别名。如果这个分组没有别名或者没有被捕获的分组，将为None。
  　　方法：
  
  　　　　group([group1, …]):
  　　　　　　获得一个或多个分组截获的字符串；指定多个参数时将以元组形式返回。group1可以使用编号也可以使用别名；编号0代表整个匹配的子串；
  
  不填写参数时，返回group(0)；没有截获字符串的组返回None；截获了多次的组返回最后一次截获的子串。
  　　　　groups([default]):
  　　　　　　以元组形式返回全部分组截获的字符串。相当于调用group(1,2,…last)。default表示没有截获字符串的组以这个值替代，默认为None。
  　　　　groupdict([default]):
  　　　　　　返回以有别名的组的别名为键、以该组截获的子串为值的字典，没有别名的组不包含在内。default含义同上。
  　　　　start([group]):
  　　　　　　返回指定的组截获的子串在string中的起始索引（子串第一个字符的索引）。group默认值为0。
  　　　　end([group]):
  　　　　　　返回指定的组截获的子串在string中的结束索引（子串最后一个字符的索引+1）。group默认值为0。
  　　　　span([group]):
  　　　　　　返回(start(group), end(group))。
  　　　　expand(template):
  　　　　　　将匹配到的分组代入template中然后返回。template中可以使用id或g<id>、g<name>引用分组，但不能使用编号0。
  
  　　 id与g<id>是等价的；但10将被认为是第10个分组，如果你想表达1之后是字符'0'，只能使用g<1>0。
  
  import re m = re.match(r'(w+) (w+)(?P<sign>.*)', 'hello world!') print "m.string:", m.string print "m.re:", m.re print "m.pos:", m.pos print "m.endpos:", m.endpos print "m.lastindex:", m.lastindex print "m.lastgroup:", m.lastgroup print "m.group(1,2):", m.group(1, 2) print "m.groups():", m.groups() print "m.groupdict():", m.groupdict() print "m.start(2):", m.start(2) print "m.end(2):", m.end(2) print "m.span(2):", m.span(2) print r"m.expand(r'2 13'):", m.expand(r'2 13') ### output ### # m.string: hello world! # m.re: <_sre.SRE_Pattern object at 0x016E1A38> # m.pos: 0 # m.endpos: 12 # m.lastindex: 3 # m.lastgroup: sign # m.group(1,2): ('hello', 'world') # m.groups(): ('hello', 'world', '!') # m.groupdict(): {'sign': '!'} # m.start(2): 6 # m.end(2): 11 # m.span(2): (6, 11) # m.expand(r'2 13'): world hello!
  
  　
4. 　regular expression object　
  
  regular expression object是一个编译好的正则表达式，通过Pattern提供的一系列方法可以对文本进行匹配查找。
  
  regular expression object不能直接实例化，必须使用re.compile()进行构造
　　　　

regex.search(string[, pos[, endpos]])

Scan through string looking for a location where this regular expression produces a match,
and return a corresponding match object. Return None if no position in the string matches
the pattern; note that this is different from finding a zero-length match at some point in the string.

The optional second parameter pos gives an index in the string where the search is to start;
it defaults to 0. This is not completely equivalent to slicing the string; the '^' pattern character
matches at the real beginning of the string and at positions just after a newline, but not necessarily at
the index where the search is to start.

regex.match(string[, pos[, endpos]])
If zero or more characters at the beginning of string match this regular expression, return a
match object. Return None if the string does not match the pattern; note that this is different
from a zero-length match.

match VS search
Python offers two different primitive operations based on regular expressions: re.match() checks
for a match only at the beginning of the string, while re.search() checks for a match anywhere in the
string (this is what Perl does by default).
>>> re.match("c", "abcdef") # No match >>> re.search("c", "abcdef") # Match <_sre.SRE_Match object; span=(2, 3), match='c'>
Regular expressions beginning with '^' can be used with search() to restrict the match at the beginning of the string:
>>> re.match("c", "abcdef") # No match >>> re.search("^c", "abcdef") # No match >>> re.search("^a", "abcdef") # Match <_sre.SRE_Match object; span=(0, 1), match='a'>
Note however that in MULTILINE mode match() only matches at the beginning of the string, whereas using search() with a regular expression beginning with '^' will match at the beginning of each line.
>>> re.match('X', 'A B X', re.MULTILINE) # No match >>> re.search('^X', 'A B X', re.MULTILINE) # Match <_sre.SRE_Match object; span=(4, 5), match='X'>
查看全文

相关阅读:
（转）C#中String跟string的“区别”
C#中的this关键字
 （转）VS2015基础指定一个或多个项目执行
 C# 中如何输出双引号（转义字符的使用）
（转） C#中使用throw和throw ex抛出异常的区别
 springboot
Zookeeper
Maven
springboot
springboot

原文地址：https://www.cnblogs.com/AlexBai326/p/4090663.html