zoukankan      html  css  js  c++  java
  • 基于Predictive Parsing的ABNF语法分析器(十)——AbnfParser文法解析器之数值类型(num-val)

    ANBF语法中的数值类型有3种:二进制、十进制和十六进制,可以是一个以点号分隔的数列,也可以是一个数值的范围。例如,%d11.22.33.44.55表示五个有次序的十进制数字“11、22、33、44、55”,而%x80-ff表示一个字节,这个字节的数值可以是在0x80至0xff之间。

    我把以点号分隔的数列定义为NumVal,把范围类型的数值定义为RangedNumVal。这两个类实现了Element,其实我觉得应该定义一个接口NumVal(继承Element),然后一个SerialNumVal和一个RangedNumVal(实现NumVal),这样看起来更漂亮?作为一个完美主义者看到现在这个定义真是很蛋疼,有时间再重新考虑吧。

    由于二进制、十进制和十六进制的构成都是很相似的,只是进制符号(b、d、x)以及数字符号(01、0123456789、0123456789abcdef)不同而已,为了避免重复地写三个很相像的方法,我投机取巧的定义了一个Matcher接口,这个接口是用来判断字符是否在预设的符号集里面的,没什么技术含量,看代码就明白了。

    先来看看解析代码:

    /*
        This file is one of the component a Context-free Grammar Parser Generator,
        which accept a piece of text as the input, and generates a parser
        for the inputted context-free grammar.
        Copyright (C) 2013, Junbiao Pan (Email: panjunbiao@gmail.com)
    
        This program is free software: you can redistribute it and/or modify
        it under the terms of the GNU General Public License as published by
        the Free Software Foundation, either version 3 of the License, or
        any later version.
    
        This program is distributed in the hope that it will be useful,
        but WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
        GNU General Public License for more details.
    
        You should have received a copy of the GNU General Public License
        along with this program.  If not, see <http://www.gnu.org/licenses/>.
     */
    
        //		        bin-val        =  "b" 1*BIT
    //		                          [ 1*("." 1*BIT) / ("-" 1*BIT) ]
    //  BIT            =  "0" / "1"
    //  二进制解析器
        protected Element bin_val() throws IOException, MatchException {
    //  真正的解析工作由val方法完成,只要把二进制数的符号集{0、1}通过Matcher实例传递给它就OK了。
            return val('b', new Matcher() {
                @Override
                public boolean match(int value) {
    //              如果符号是0或1就匹配
                    return value == '0' || value == '1';
                }
    
                @Override
                public String expected() {
    //              提示符号不在符号集内(仅用于异常情况)
                    return "['0', '1']";
                }
            });
        }
    
    //		        dec-val        =  "d" 1*DIGIT
    //		                          [ 1*("." 1*DIGIT) / ("-" 1*DIGIT) ]
        protected Element dec_val() throws IOException, MatchException {
    //      同上,把十进制的符号集0~9传递给val方法
            return val('d', new Matcher() {
                @Override
                public boolean match(int value) {
    //              直到写博客才发现这段代码错了,符号集不应该包含A~F的情形啊,居然单元测试已经通过了,尼玛这是什么测试质量!
    //              PS:单元测试代码也是我自己写的。。。
                    return (value >= 0x30 && value <= 0x39) || (value >= 'A' && value <= 'F') || (value >= 'a' && value <= 'f');
                }
    
                @Override
                public String expected() {
    //              错误代码,无语了。。。
                    return "['0'-'9', 'A'-'F', 'a'-'f']";
                }
            });
        }
    
    //		        hex-val        =  "x" 1*HEXDIG
    //		                          [ 1*("." 1*HEXDIG) / ("-" 1*HEXDIG) ]
        protected Element hex_val() throws IOException, MatchException {
    //      将十六进制的符号集通过Matcher实例传递给val方法进行解析
            return val('x', new Matcher() {
                @Override
                public boolean match(int value) {
                    return (value >= 0x30 && value <= 0x39) || (value >= 'A' && value <= 'F') || (value >= 'a' && value <= 'f');
                }
    
                @Override
                public String expected() {
                    return "['0'-'9', 'A'-'F', 'a'-'f']";
                }
            });
        }
    
    //  解析各个进制
        protected Element val(char base, Matcher matcher) throws IOException, MatchException {
    //      检查进制符号
            assertMatch(is.peek(), base);
            int baseValue = is.read();
            String from = "";
            String val = "";
    
    //      进制符号之后的第一个字符,必须在Matcher定义的字符集内,否则异常
            if (matcher.match(is.peek())) {
    //          连续读入符合字符的字符,构成NumVal的第一个数值。
                while (matcher.match(is.peek())) {
                    from += (char)is.read();
                }
    //          第一个数值后面如果是跟着点号,则是一个数列NumVal,如果是-破折号,则是一个范围型数值RangedNumVal,如果都不是,则是单一个数值
                if (match(is.peek(), '.')) {
                    NumVal numval = new NumVal(String.valueOf((char)baseValue));
    //              将刚才匹配到的数值作为第一个数值加到将要返回的NumVal中
                    numval.addValue(from);
    //              如果后面跟着点号,则继续加入新的数值到NumVal中
                    while (match(is.peek(), '.')) {
                        int next = is.peek(1);
                        if (!(matcher.match(next))) {
                            break;
                        }
                        is.read();
                        val = "";
                        while (matcher.match(is.peek())) {
                            val += (char)is.read();
                        }
                        numval.addValue(val);
                    }
    //              直到不能匹配到点号,数列结束,返回
                    return numval;
                } else if (match(is.peek(), '-')) {
    //              这里向前读取两个字符,因此即使破折号后面跟着的不是数字,也能返回单一个数字而且将破折号留给后面的分析程序
    //              这是本程序里为数不多的能够具备回溯的代码段之一,嘿嘿。
                    int next = is.peek(1);
                    if (!(matcher.match(next))) {
    //                  如果破折号后面跟的不是数字,则破折号不读入,返回单一数值
                        NumVal numval = new NumVal(String.valueOf((char)baseValue));
                        numval.addValue(from);
                        return numval;
                    }
    //              否则,破折号后面是数值,读取之,并返回RangedNumVal类型
                    is.read();
                    val ="";
                    val += (char)is.read();
                    while (matcher.match(is.peek())) {
                        val += (char)is.read();
                    }
                    return new RangedNumVal(String.valueOf((char)baseValue), from, val);
                } else {
    //              第一个数值之后跟的既不是点号,也不是破折号,则返回单一数值              
                    NumVal numval = new NumVal(String.valueOf((char)baseValue));
                    numval.addValue(from);
                    return numval;
                }
            } else {
                throw new MatchException(matcher.expected(), is.peek(), is.getPos(), is.getLine());
            }
    
        }
    
        //		        num-val        =  "%" (bin-val / dec-val / hex-val)
    //      解析num-val
    	protected Element num_val() throws IOException, MatchException {
    		String base = "", from ="", val ="";
    //              百分号开头
    		assertMatch(is.peek(), '%');
            is.read();
    //              根据进制符号选择相应的解析方法(函数)
    		switch ((char)is.peek()) {
                case 'b': case 'B': return bin_val();
    		    case 'd': case 'D': return dec_val();
    		    case 'x': case 'X': return hex_val();
        		default: throw new MatchException("['b', 'd', 'x']", is.peek(), is.getPos(), is.getLine());
    		}
    	}
    

    接下来看看单元测试部分,不详细说了,其中有一句注释说明为什么上面有错误代码不能测试出来:

    /*
        This file is one of the component a Context-free Grammar Parser Generator,
        which accept a piece of text as the input, and generates a parser
        for the inputted context-free grammar.
        Copyright (C) 2013, Junbiao Pan (Email: panjunbiao@gmail.com)
    
        This program is free software: you can redistribute it and/or modify
        it under the terms of the GNU General Public License as published by
        the Free Software Foundation, either version 3 of the License, or
        any later version.
    
        This program is distributed in the hope that it will be useful,
        but WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
        GNU General Public License for more details.
    
        You should have received a copy of the GNU General Public License
        along with this program.  If not, see <http://www.gnu.org/licenses/>.
     */
    
        //		        bin-val        =  "b" 1*BIT
    //		                          [ 1*("." 1*BIT) / ("-" 1*BIT) ]
    //  BIT            =  "0" / "1"
    //  测试二进制数的解析
        @Test
        public void testBin_val() throws Exception {
            Tester<String> tester = new Tester<String>() {
                @Override
                public String test(AbnfParser parser) throws MatchException, IOException {
                    return parser.bin_val().toString();
                }
            };
            String input;
            input = "b1";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).bin_val().toString());
            input = "b1010";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).bin_val().toString());
            input = "B1";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).bin_val().toString());
            input = "b1.1";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).bin_val().toString());
            input = "b0101.1111";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).bin_val().toString());
            input = "b0000-1111";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).bin_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+".00").bin_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+"-1234").bin_val().toString());
            input = "b00.11.00.01.10.00.11.00.11";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).bin_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+".").bin_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+"..").bin_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+".bb").bin_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+"-00").bin_val().toString());
    
            Assertion.assertMatchException("", tester, 1, 1);
            Assertion.assertMatchException("b", tester, 2,1);
            Assertion.assertMatchException("bg", tester, 2, 1);
            Assertion.assertMatchException("b.", tester, 2, 1);
            Assertion.assertMatchException("b-", tester, 2, 1);
        }
    
        //		        dec-val        =  "d" 1*DIGIT
    //		                          [ 1*("." 1*DIGIT) / ("-" 1*DIGIT) ]
    //  测试十进制数的解析
        @Test
        public void testDec_val() throws Exception {
            Tester<String> tester = new Tester<String>() {
                @Override
                public String test(AbnfParser parser) throws MatchException, IOException {
                    return parser.dec_val().toString();
                }
            };
    
            String input;
            input = "d1";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).dec_val().toString());
            input = "d1234";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).dec_val().toString());
            input = "D1";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).dec_val().toString());
            input = "d1.2";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).dec_val().toString());
            input = "d1234.5678";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).dec_val().toString());
            input = "d1234-5678";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).dec_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+".00").dec_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+"-1234").dec_val().toString());
            input = "d12.34.56.78.9a.bc.de.f0";
    //      看看这里,就明白为什么单元测试测不出十进制数带有a~f符号的问题了,竟然有这样错误的测试用例!!!
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).dec_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+".").dec_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+"..").dec_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+".##").dec_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+"-00").dec_val().toString());
    
            Assertion.assertMatchException("", tester, 1, 1);
            Assertion.assertMatchException("d", tester, 2, 1);
            Assertion.assertMatchException("dg", tester, 2, 1);
            Assertion.assertMatchException("d.", tester, 2, 1);
            Assertion.assertMatchException("d-", tester, 2, 1);
        }
    
        //		        hex-val        =  "x" 1*HEXDIG
    //		                          [ 1*("." 1*HEXDIG) / ("-" 1*HEXDIG) ]
    //  测试十六进制数的解析
        @Test
        public void testHex_val() throws Exception {
            Tester<String> tester = new Tester<String>() {
                @Override
                public String test(AbnfParser parser) throws MatchException, IOException {
                    return parser.hex_val().toString();
                }
            };
    
            String input;
            input = "x1";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).hex_val().toString());
            input = "x1234";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).hex_val().toString());
            input = "X1";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).hex_val().toString());
            input = "x1.2";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).hex_val().toString());
            input = "x1234.5678";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).hex_val().toString());
            input = "xabcd.ef";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).hex_val().toString());
            input = "xA1.2B";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).hex_val().toString());
            input = "x1234-abCD";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).hex_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input+"-").hex_val().toString());
            input = "x12.34.56.78.9a.bc.de.f0";
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input).hex_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input + ".").hex_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input + ".g0").hex_val().toString());
            Assert.assertEquals("%" + input, AbnfParserFactory.newInstance(input + "-00").hex_val().toString());
    
            Assertion.assertMatchException("", tester, 1, 1);
            Assertion.assertMatchException("x", tester, 2, 1);
            Assertion.assertMatchException("xg", tester, 2, 1);
            Assertion.assertMatchException("x.", tester, 2, 1);
            Assertion.assertMatchException("x-", tester, 2, 1);
    
        }
    
        //		        num-val        =  "%" (bin-val / dec-val / hex-val)
    //  综合情况测试
        @Test
        public void testNum_val() throws Exception {
            String input;
            input = "%b0101";
            Assert.assertEquals(input, AbnfParserFactory.newInstance(input).num_val().toString());
            input = "%b0101.1010.1111";
            Assert.assertEquals(input, AbnfParserFactory.newInstance(input).num_val().toString());
            input = "%b0101-1111";
            Assert.assertEquals(input, AbnfParserFactory.newInstance(input).num_val().toString());
            input = "%d1234";
            Assert.assertEquals(input, AbnfParserFactory.newInstance(input).num_val().toString());
            input = "%d0123.4567.8901";
            Assert.assertEquals(input, AbnfParserFactory.newInstance(input).num_val().toString());
            input = "%d12345-67890";
            Assert.assertEquals(input, AbnfParserFactory.newInstance(input).num_val().toString());
            input = "%x0123";
            Assert.assertEquals(input, AbnfParserFactory.newInstance(input).num_val().toString());
            input = "%x0123.4567.89ab.CDEF";
            Assert.assertEquals(input, AbnfParserFactory.newInstance(input).num_val().toString());
            input = "%x0123456789-ABCDEFabcdef09";
            Assert.assertEquals(input, AbnfParserFactory.newInstance(input).num_val().toString());
        }
    

    本系列文章索引:基于预测的ABNF文法分析器

  • 相关阅读:
    原型,构造函数,实例,__proto__
    To me
    那么再会吧!OI!(HNOI2019退役记)
    中山纪念中学培训杂题(难的都不在这里面qwq)
    关于菜鸡我
    树链剖分讲解
    [luogu] P4823 [TJOI2013]拯救小矮人(贪心)
    [luogu] P4551 最长异或路径(贪心)
    [luogu] P4364 [九省联考2018]IIIDX(贪心)
    [luogu] P4155 [SCOI2015]国旗计划(贪心)
  • 原文地址:https://www.cnblogs.com/snake-hand/p/3141231.html
Copyright © 2011-2022 走看看