zoukankan html css js c++ java

正则表达式(一)

正则表达式的元字符有. ^ $ * + ? { [ ] | ( )

．表示任意字符

$ 匹配行结束符。例如正则表达式weasel$ 能够匹配字符串"He's a weasel"的末尾，但是不能匹配字符串"They are a bunch of weasels."

在MULTILINE模式下，"$"也匹配换行之前

[] 用来匹配一个指定的字符类别，所谓的字符类别就是你想匹配的一个字符集，对于字符集中的字符可以理解成或的关系。 [adgk]

( ) (red|blue|green) 查找任何指定的选项。

^ 在[]的首个字符，表示取非，。[^5]表示除了5之外的其他字符；

如果放在字符串的开头，则表示匹配字符串的开始，如“^ab”表示以ab开头的字符串。

如果^不在字符串的开头，则表示它本身。

具有重复功能的元字符(匹配数量)：

* 对于前一个字符重复0到无穷次

+ 对于前一个字符重复1到无穷次

？对于前一个字符重复0到1次

{m,n} 对于前一个字符重复次数在为m到n次，试匹配尽可能多的copy（优先匹配n）其中，{0,} = *,{1,} = , {0,1} = ?

{m,n}? 用来表示前面正则表达式的m到n次copy，尝试匹配尽可能少的copy

{m} 对于前一个字符重复m次

d 匹配任何十进制数；它相当于类 [0-9]。

D 匹配任何非数字字符；它相当于类 [^0-9]。

s 匹配任何空白字符；它相当于类 [ fv]。

S 匹配任何非空白字符；它相当于类 [^ fv]。

w 匹配任何字母数字字符；它相当于类 [a-zA-Z0-9_]。

W 匹配任何非字母数字字符；它相当于类 [^a-zA-Z0-9_]。

http://www.w3school.com.cn/jsref/jsref_regexp_onemore.asp

Note:

写法：期望的字符+数量 ,

. 任意字符都可以

s 这个地方要空白字符

s+ 这个地方要 1-无穷个空白字符

.{0,10}这个地方要匹配0-10个字符

对于字符串本身出现元字符的字符，用来表示本字符意思。如匹配这个值：[main] ，在正则搜索里面要写成 [main]

[.{40}]s+Method.{0,100}groups.{0,12}config.{0,2} replace 则把相应行删除掉，清理日志方便查看

[.{40}]s+Checking.{0,75}groups.{0,12}config.{0,2}

[2015.{40,100}]s+Waitings+fors+(Input|element)s+(xpath=|id=).{0,100}

下面具体python中，结合正则表达式， RE模块的使用：

# coding=gb2312
'''
Created on 2014-06-4
@author: jennifer.huang
'''

import re


print"****re.search()**search和match方法是匹配到就返回，而不是去匹配所有, 并且match只匹配字符串的开始***"
m=re.search("^ab+", "asdfabbbb")  # ^表示开头
print 1, m
m=re.search("ab+", "asdfabbbb")
print 2, m
print 3, m.group()
m=re.search("[^abc]","abcd")  # ^表示取非
print 4, m.group();


m=re.match("ab+", "asdfabbbb")  #区别于search
print 5, m
m=re.match("ab+", "abbbb")
print 6, m.group()

m=re.search("^aw+","abcdfa
a1b2c3",re.MULTILINE) #匹配到就返回，区别于下面的findall
print 7, m.group()
m=re.search("foo.$","foo1
foo2
",re.MULTILINE)
print 8, m.group()
m=re.findall("^aw+","abcdfa
a1b2c3",re.MULTILINE) #返回 list
print 9, m
m=re.findall("foo.$","foo1
foo2
",re.MULTILINE) # $匹配字符串的结尾或者字符串结尾的换行之前(在MULTILINE模式下)
print 10, m
m=re.findall("foo.$","foo1
foo2
") # $匹配字符串的结尾
print 11, m

print 12, re.findall("a{2,4}","aaaaaaaa") #表示前面正则表达式的m到n次copy,尝试匹配尽可能多的copy
print 13, re.findall("a{2,4}?","aaaaaaaa") #表示前面正则表达式的m到n次copy，尝试匹配尽可能少的copy

print 14, re.match(".","
")  #元字符“.”在默认模式下，匹配除换行符外的所有字符
print 15, re.match(".","
",re.DOTALL).group() #在DOTALL模式下，匹配所有字符，包括换行符


m=re.match("(w+) (w+)","abcd efgh, chaj")
print 16, m.group()   # 匹配全部
print 17, m.group(1)  # 第一个括号的子组.
print 18, m.group(2)
print 19, m.group(1,2)   # 多个参数返回一个元组
m=re.match("(?P<first_name>w+) (?P<last_name>w+)","Jennifer Huang")
print 20, m.group("first_name")  #使用group获取含有name的子组
print 21, m.group("last_name")
m=re.match("w+ w+","abcd efgh, chaj") #括号去掉后的区别
print 22, m.group()
#print m.group(1,2)   #IndexError: no such group，与去掉（）后， 19区别

m=re.match("(d+).(d+)","23.123")
print 23, m.groups()


m=re.match("(w+) (w+)","hello world")
print 24, m.groupdict()  #groupdict()对没有name的子组不起作用
m=re.match("(?P<first>w+) (?P<second>w+)","hello world")  
print 25, m.groupdict()

print 26, re.split("W+","words,words,works",1)
print 27, re.split("[a-z]","OA3b9z",re.IGNORECASE)

print 28, re.sub("d","RE","abc1def2hijk")
print 29, re.subn("d","RE","abc1def2hijk")

附：
执行结果如下:

查看全文

相关阅读:
js截取字符串区分汉字字母代码
 List 去处自定义重复对象方法
 63. Unique Paths II
62. Unique Paths
388. Longest Absolute File Path
41. First Missing Positive
140. Word Break II
139. Word Break
239. Sliding Window Maximum
5. Longest Palindromic Substring

原文地址：https://www.cnblogs.com/jenniferhuang/p/3768059.html