zoukankan      html  css  js  c++  java
  • .NET Regular Expressions: Regex and Balanced Matching (转)

    < DOCTYPE html PUBLIC -WCDTD XHTML StrictEN httpwwwworgTRxhtmlDTDxhtml-strictdtd>

    One of the questions that seems to come up a lot is that someone wants to match balanced parenthesis. Something like the string “(aa (bbb) (bbb) aa)” and they want to match from the beginning parenthesis to the matching end parenthesis. Generally this is not possible with regular expression, that language just is not descriptive enough to handle this. For the longest time this is how I answered these question when they came to me.

    However in .Net this is actually possible with something called Balancing Group Definition. This construct generally looks like (?<name1-name2>). The following is what MSDN has to say about this:

    Balancing group definition. Deletes the definition of the previously defined group name2 and stores in group name1 the interval between the previously defined name2 group and the current group. If no group name2 is defined, the match backtracks. Because deleting the last definition of name2 reveals the previous definition of name2, this construct allows the stack of captures for group name2 to be used as a counter for keeping track of nested constructs such as parentheses. In this construct, name1 is optional. You can use single quotes instead of angle brackets; for example, (?'name1-name2').

    The following expression matches all balanced opening and closing angle brackets(<>). Angle brackets were used because they do no require escaping like parenthesis and make the expression a little easier to read:

                <

                [^<>]*

                (

                            (

                                        (?<Open><)

                                        [^<>]*

                            )+

                            (

                                        (?<Close-Open>>)

                                        [^<>]*

                            )+

                )*

                (?(Open)(?!))

    >

    The outer most group just matches an open angle bracket followed by anything that is not a angle bracket followed by close angle bracket. I will explain “(?(Open)(?!))” later.

    The inner group does all of the interesting angle bracket matching. The Open group matches only the open angle bracket and the following part of expression matches anything that is not an angle bracket. So the first group will basically match anything up till the first close angle bracket.

    It is best to think of a Group as a Stack of captures. Where the top of the stack is the last capture made. (?<Close-Open>\)) Matches to “)” and pops a capture off of the Open group’s capture stack. This match can only be successful if and only if the Open group’s capture stack is not empty. This is a fancy way of saying that for every match of this group there must be a match of the group Open.

    So now we know that for every closing angle bracket there must have been an opening angle bracket. However we still have done nothing to assert that for every opening angle bracket there is a matching closing angle bracket. That is where the (?(Open)(?!)) part of the expression comes into play. This expression tells Regex to match (?!) if the Open group still contains a match(i.e. there were more open angle brackets then close angle brackets). Trying to match (?!) will always cause the expression to fail. Basically this is a way of making the expression fail if the Open group still contains a capture.

  • 相关阅读:
    caffe分类网络训练及测试步骤
    python去掉文件名字里面的空格
    Python替换一个文件里面的内容_Python修改深度学习数据标注的txt格式
    Python根据label删除图片
    python删除格式错误的txt文件
    Python修改文件的后缀名
    Python把txt文件格式转换成VOC数据集的xml文件
    winscp上传文件到ubuntu上文件名乱码问题解决
    深度学习的数据增强(亮度,对比度,旋转)
    一个未完成的2.6.32-220内核踩内存crash分析记录
  • 原文地址:https://www.cnblogs.com/netcorner/p/2912102.html
Copyright © 2011-2022 走看看