zoukankan      html  css  js  c++  java
  • 正则表达式的捕获组(capture group)在Java中的使用

    原文: http://blog.csdn.net/just4you/article/details/70767928

    -----------------------------------------------------------------------------------------------

    捕获组分类

    1. 普通捕获组(Expression)
    2. 命名捕获组(?<name>Expression)

    普通捕获组

    从正则表达式左侧开始,每出现一个左括号“(”记做一个分组,分组编号从1开始。0代表整个表达式。

    对于时间字符串:2017-04-25,表达式如下

    (\d{4})-((\d{2})-(\d{2}))
    • 1

    有4个左括号,所以有4个分组

    编号捕获组匹配
    0 (d{4})-((d{2})-(d{2})) 2017-04-25
    1 (d{4}) 2017
    2 ((d{2})-(d{2})) 04-25
    3 (d{2}) 04
    4 (d{2}) 25
    public static final String DATE_STRING = "2017-04-25";
    public static final String P_COMM = "(\d{4})-((\d{2})-(\d{2}))";
    
    Pattern pattern = Pattern.compile(P_COMM);
    Matcher matcher = pattern.matcher(DATE_STRING);
    matcher.find();//必须要有这句
    System.out.printf("
    matcher.group(0) value:%s", matcher.group(0));
    System.out.printf("
    matcher.group(1) value:%s", matcher.group(1));
    System.out.printf("
    matcher.group(2) value:%s", matcher.group(2));
    System.out.printf("
    matcher.group(3) value:%s", matcher.group(3));
    System.out.printf("
    matcher.group(4) value:%s", matcher.group(4));

    命名捕获组

    每个以左括号开始的捕获组,都紧跟着“?”,而后才是正则表达式。

    对于时间字符串:2017-04-25,表达式如下

    (?<year>\d{4})-(?<md>(?<month>\d{2})-(?<date>\d{2}))
    • 1

    有4个命名的捕获组,分别是

    编号名称捕获组匹配
    0 0 (?d{4})-(?(?d{2})-(?d{2})) 2017-04-25
    1 year (?d{4})- 2017
    2 md (?(?d{2})-(?d{2})) 04-25
    3 month (?d{2}) 04
    4 date (?d{2}) 25

    命名的捕获组同样也可以使用编号获取相应值

    public static final String P_NAMED = "(?<year>\d{4})-(?<md>(?<month>\d{2})-(?<date>\d{2}))";
    public static final String DATE_STRING = "2017-04-25";
    
    Pattern pattern = Pattern.compile(P_NAMED);
    Matcher matcher = pattern.matcher(DATE_STRING);
    matcher.find();
    System.out.printf("
    ===========使用名称获取=============");
    System.out.printf("
    matcher.group(0) value:%s", matcher.group(0));
    System.out.printf("
     matcher.group('year') value:%s", matcher.group("year"));
    System.out.printf("
    matcher.group('md') value:%s", matcher.group("md"));
    System.out.printf("
    matcher.group('month') value:%s", matcher.group("month"));
    System.out.printf("
    matcher.group('date') value:%s", matcher.group("date"));
    matcher.reset();
    System.out.printf("
    ===========使用编号获取=============");
    matcher.find();
    System.out.printf("
    matcher.group(0) value:%s", matcher.group(0));
    System.out.printf("
    matcher.group(1) value:%s", matcher.group(1));
    System.out.printf("
    matcher.group(2) value:%s", matcher.group(2));
    System.out.printf("
    matcher.group(3) value:%s", matcher.group(3));
    System.out.printf("
    matcher.group(4) value:%s", matcher.group(4));

    PS:非捕获组

    在左括号后紧跟“?:”,而后再加上正则表达式,构成非捕获组(?:Expression)。

    对于时间字符串:2017-04-25,表达式如下

    (?:\d{4})-((\d{2})-(\d{2}))
    • 1

    这个正则表达式虽然有四个左括号,理论上有4个捕获组。但是第一组(?:d{4}),其实是被忽略的。当使用matcher.group(4)时,系统会报错。

    编号捕获组匹配
    0 (d{4})-((d{2})-(d{2})) 2017-04-25
    1 ((d{2})-(d{2})) 04-25
    2 (d{2}) 04
    3 (d{2}) 25
    public static final String P_UNCAP = "(?:\d{4})-((\d{2})-(\d{2}))";
    public static final String DATE_STRING = "2017-04-25";
    
    Pattern pattern = Pattern.compile(P_UNCAP);
    Matcher matcher = pattern.matcher(DATE_STRING);
    matcher.find();
    System.out.printf("
    matcher.group(0) value:%s", matcher.group(0));
    System.out.printf("
    matcher.group(1) value:%s", matcher.group(1));
    System.out.printf("
    matcher.group(2) value:%s", matcher.group(2));
    System.out.printf("
    matcher.group(3) value:%s", matcher.group(3));
    
    // Exception in thread "main" java.lang.IndexOutOfBoundsException: No group 4
    System.out.printf("
    matcher.group(4) value:%s", matcher.group(4));

    总结

    1. 普通捕获组使用方便;
    2. 命名捕获组使用清晰;
    3. 非捕获组目前在项目中还没有用武之地。
  • 相关阅读:
    django表单字段
    python3之Django表单(一)
    python3之Django模型(一)
    python3迭代器和生成器
    python3数字、日期和时间
    python3字符串与文本处理
    python3数据结构与算法
    git仓库使用
    django邮件
    python3光学字符识别模块tesserocr与pytesseract
  • 原文地址:https://www.cnblogs.com/oxspirt/p/8037101.html
Copyright © 2011-2022 走看看