RegularExpressions(正则表达式)

zoukankan html css js c++ java

RegularExpressions(正则表达式)
最近在不少地方用到了正则表达式，一直对这一块不太熟悉，今天写一些关于正则表达式的知识，一来是总结自己学的知识，二来今后忘记了可以及时的复习。

在java中想应用正则表达式带来的好处，必须先了解两个类，下面介绍这两个基础的类：

一，Pattern

API介绍：

A compiled representation of a regular expression.

A regular expression, specified as a string, must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher object that can match arbitrary character sequences against the regular expression. All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern.

正则表达式的编译表示形式。

指定为字符串的正则表达式必须首先被编译为此类的实例。然后，可将得到的模式用于创建 Matcher 对象，依照正则表达式，该对象可以与任意字符序列匹配。执行匹配所涉及的所有状态都驻留在匹配器中，所以多个匹配器可以共享同一模式。

二，Matcher

API介绍：

A matcher is created from a pattern by invoking the pattern's matcher method. Once created, a matcher can be used to perform three different kinds of match operations:
- The matches method attempts to match the entire input sequence against the pattern.
- The lookingAt method attempts to match the input sequence, starting at the beginning, against the pattern.
- The find method scans the input sequence looking for the next subsequence that matches the pattern.
通过调用模式的 matcher 方法从模式创建匹配器。创建匹配器后，可以使用它执行三种不同的匹配操作：
- matches 方法尝试将整个输入序列与该模式匹配。
- lookingAt 尝试将输入序列从头开始与该模式匹配。
- find 方法扫描输入序列以查找与该模式匹配的下一个子序列。
正则表达式的应用：

生成一个String对象用来存储指定的正则表达式的字符串序列：

1.String regular="[a-z]{3}";//3位a-z组成的字符串；

2.Pattern p= Pattern.compile(regular);//生成对应的模式；

3.Matcher m=p.matches("asd");//匹配asd字符串，并将结果状态生成存储在返回的Matcher对象中；

对应生成的Matcher对象，可以进行一系列的操作。

代码示例：

1.Mathcer类基本应用
package regularexpression; import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegularExpression { public static void main(String[] args) { // TODO Auto-generated method stub Pattern p = Pattern.compile("cat"); Matcher m = p.matcher("one cat two cats in the yard"); pr("matches方法调用，返回匹配整个字符串的boolean值"+m.matches()); while(m.find()){ pr("find方法,寻找匹配对应模式的子串，直到串尾返回为false"); pr("调用group方法，返回找到的子串："+m.group()); pr("调用start和end方法，返回子串在整个字符串的起始和结束索引："+m.start()+"->"+m.end()); } } public static void pr (String str){ System.out.println(str); } }
2.高级应用，字符串替换修改
Pattern p = Pattern.compile("cat"); Matcher m = p.matcher("one cat two cats in the yard"); pr(m.replaceAll("dog"));//打印 one dog two dogs in the yard
　　replaceAll(String)虽然简单，但是并不灵活，因为他必须替换所有的匹配对象，如果想要替换一部分就很难实现，所以可以使用能灵活调用替换的方法：

　　appendReplacement()和appendTail()这两个方法实现灵活的替换字符串。
Pattern p = Pattern.compile("cat"); Matcher m = p.matcher("one cat two cats in the yard"); int index=0; StringBuffer sb=new StringBuffer(); while(m.find()){ if(index==0){ m.appendReplacement(sb, "dog"); index++; } else { m.appendReplacement(sb, "duck"); } } m.appendTail(sb);//将尾部数据添加到sb上 pr(sb);//one dog two ducks in the yard
这样实现了灵活的替换，很方便，很强大。

3.最后附上一个自己写的代码统计工具（统计代码行，空行，注释行（只写了//类型的注释，/**/懒得写了!））
CodeCount.java package codecount; import java.util.regex.Matcher; import java.util.regex.Pattern; public class CodeCount { public static final String REGULARS_ANNOTATION="^[ \\t]*[/]{2}.*"; public static String REGULARS_BLANK="[ \\t]*"; public static String REGULARS_CODE="[ \\t]*[^/]+[/]?"; public static boolean judge(String str,String regex){ Pattern p=Pattern.compile(regex); Matcher m=p.matcher(str); return m.matches(); } }
Test.java package codecount; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; import java.io.Reader; public class Test { public static void main(String[] args) { // TODO Auto-generated method stub try { BufferedReader br= new BufferedReader(new FileReader("C:\\Users\\Java\\Desktop\\code.java")); String str; int blank=0; int code=0; int annotation=0; while(null!=(str=br.readLine())){ if(CodeCount.judge(str, CodeCount.REGULARS_ANNOTATION))annotation++; if(CodeCount.judge(str, CodeCount.REGULARS_BLANK))blank++; if(CodeCount.judge(str, CodeCount.REGULARS_CODE)){code++;System.out.println(str);} } System.out.println("annotation="+annotation+" line."); System.out.println("blank="+blank+" line."); System.out.println("code="+code+" line."); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
附上正则表达式的规则：

字符
x    字符 x
\\    反斜线字符
\0n    带有八进制值 0 的字符 n (0 <= n <= 7)
\0nn    带有八进制值 0 的字符 nn (0 <= n <= 7)
\0mnn    带有八进制值 0 的字符 mnn（0 <= m <= 3、0 <= n <= 7）
\xhh    带有十六进制值 0x 的字符 hh
\uhhhh    带有十六进制值 0x 的字符 hhhh
\t    制表符 ('\u0009')
\n    新行（换行）符 ('\u000A')
\r    回车符 ('\u000D')
\f    换页符 ('\u000C')
\a    报警 (bell) 符 ('\u0007')
\e    转义符 ('\u001B')
\cx    对应于 x 的控制符

字符类
[abc]    a、b 或 c（简单类）
[^abc]    任何字符，除了 a、b 或 c（否定）
[a-zA-Z]    a 到 z 或 A 到 Z，两头的字母包括在内（范围）
[a-d[m-p]]    a 到 d 或 m 到 p：[a-dm-p]（并集）
[a-z&&[def]]    d、e 或 f（交集）
[a-z&&[^bc]]    a 到 z，除了 b 和 c：[ad-z]（减去）
[a-z&&[^m-p]]    a 到 z，而非 m 到 p：[a-lq-z]（减去）

预定义字符类
.    任何字符（与行结束符可能匹配也可能不匹配）
\d    数字：[0-9]
\D    非数字： [^0-9]
\s    空白字符：[ \t\n\x0B\f\r]
\S    非空白字符：[^\s]
\w    单词字符：[a-zA-Z_0-9]
\W    非单词字符：[^\w]

Greedy 数量词
X?    X，一次或一次也没有
X*    X，零次或多次
X+    X，一次或多次
X{n}    X，恰好 n 次
X{n,}    X，至少 n 次
X{n,m}    X，至少 n 次，但是不超过 m 次


边界匹配器
^    行的开头
$    行的结尾
\b    单词边界
\B    非单词边界
\A    输入的开头
\G    上一个匹配的结尾
\Z    输入的结尾，仅用于最后的结束符（如果有的话）
\z    输入的结尾

\t tab
\n 换行
\r 回车
查看全文

相关阅读:
Linux驱动学习时间、延迟及延缓操作3
Windows 系统下Git安装图解
 [整理]Android Intent和PendingIntent的区别
 C++篇实现MD5算法
 重温数据结构——（2）
重温数据结构——（1）
红黑树——1.介绍与查找
 Ubuntu Telnet 服务
 文本框垂直居中
 文本框透明无边框

原文地址：https://www.cnblogs.com/lfjjava/p/5468503.html