组是用括号划分的正则表达式,可以根据组的编号来引用整个组。组号为0表示整个表达式,组号为1表示被第一对括号括起的组,依次类推。因此,在下面这个表达式,
A(B(C))D
中有三个组,组0是 ABCD
,组1是 BC
,组2是 C
。
使用示例:
//: strings/Groups.java
import java.util.regex.*;
import static net.mindview.util.Print.*;
public class Groups {
static public final String POEM =
"Twas brillig, and the slithy toves
" +
"Did gyre and gimble in the wabe.
" +
"All mimsy were the borogoves,
" +
"And the mome raths outgrabe.
" +
"Beware the Jabberwock, my son,
" +
"The jaws that bite, the claws that catch.
" +
"Beware the Jubjub bird, and shun
" +
"The frumious Bandersnatch.";
public static void main(String[] args) {
Matcher m =
Pattern.compile("(?m)(\S+)\s+((\S+)\s+(\S+))$")
.matcher(POEM);
while(m.find()) {
for(int j = 0; j <= m.groupCount(); j++)
printnb("[" + m.group(j) + "]");
print();
}
}
} /* Output:
[the slithy toves][the][slithy toves][slithy][toves]
[in the wabe.][in][the wabe.][the][wabe.]
[were the borogoves,][were][the borogoves,][the][borogoves,]
[mome raths outgrabe.][mome][raths outgrabe.][raths][outgrabe.]
[Jabberwock, my son,][Jabberwock,][my son,][my][son,]
[claws that catch.][claws][that catch.][that][catch.]
[bird, and shun][bird,][and shun][and][shun]
[The frumious Bandersnatch.][The][frumious Bandersnatch.][frumious][Bandersnatch.]
*///:~
分析:
首先,regex"(?m)(\S+)\s+((\S+)\s+(\S+))$"中(?m)表示多行模式;每行以$结束,这里表示以$符号前面正则匹配到的东西结尾。
这个正则的目的是捕获每行最后的3个词。从输出的结果我们可以看到,一共有5组,(\S+)\s+((\S+)\s+(\S+))$中有4对括号,加上group(0),也就是整个表达式,就是5组了。
这里按照顺序分出组0组1组2组3组4的方法和上面相同,就不复述,这个从最后的输出结果中也可以验证。