zoukankan      html  css  js  c++  java
  • 临时笔记, 有意思的东西

    一些编译器理论的简单介绍,和现代Parser研究的新进展。

    http://www.antlr.org/article/needlook.html

    http://citeseer.comp.nus.edu.sg/440034.html

    Tomita(GLR) Parser

    Packrat parser (use TDPL)

    http://java.sun.com/docs/books/jls/first_edition/html/19.doc.html

    http://www.cs.berkeley.edu/~smcpeak/elkhound/

    http://www.mollypages.org/page/grammar/index.mp

    http://lambda.uta.edu/cse5317/notes/node20.html

    http://pages.cpsc.ucalgary.ca/~robin/class/411/LR.1.html

    http://en.wikipedia.org/wiki/Memoization

    http://en.wikipedia.org/wiki/Comparison_of_parser_generators

    http://en.wikipedia.org/wiki/Parsing_expression_grammar



    > The author states that he wrote the GLR parser generator solely to
    > handle C++ language spec [and someone lapped it up to handle Java].
    >
    > What exactly is it about OO languages that an LALR(1) parser cannot
    > handle?

    As the moderator noted, there is nothing about "OO" languages that
    LALR(1) parsers cannot handle, but C++ itself is problematic. There
    are LALR(1) and LL grammars for Java.

    One of the problems with C++, is that expressions and declarations can
    look exactly the same (technically, any language containing those the
    or of the two productions is ambiguous) and C++ gets around that by
    saying, if it looks like a declaration, it is a declaration (forcing
    the "or" to be resolved in a particular declaration (and resolving the
    ambiguity). However, that resolution is not expressed gramatically,
    and one can not take two random context free rules and difference them
    and expect the result to be a context free language, which is what the
    C++ ambiguity resolution requires one to do.

    In contrast, GLR grammars are not required to be unambiguous. Any
    ambiguity is resolved by producing a resulting parse-forest that
    represents all the potential mabiguous choices and requiring a later
    "semantic" pass to choose which parse tree in the forst is the desired
    one. Thus, with a GLR parser, one can disambiguate the C++ problem by
    selecting the parse tree that treats all the ambiguous expression/
    declaration sub-trees as declarations.

    The only problem with GLR as a technology is that are no "warnings"
    from the grammar processing tool that the language is ambiguous.
    Well, there are warnings that the language is not LR (or LALR) or
    whatever technology the GLR parser uses as a base. However, some of
    those grammars will actually not be ambiguous and some of the will be
    ambiguous. However, in any case, once your GLR generator has given a
    warning, one either must prove that the language actually isn't
    ambiguous or write your semantic phase assuming that the language is
    ambiguous and disambiguate the resulting forest.

    It is worth mentioning that there are other ways of handling ambiguous
    grammars. In particular, one can use predicates to resolve
    ambiguities. Predicates allow one to take the difference of two
    productions in a controlled manner. In particular, it is possible to
    write a syntactic rules that says, try to parse this as a declaration
    and if it isn't parse it as an expression. The difference between the
    predicated and the GLR solution is that predicated grammars are still
    deterministic. There are no hidden ambiguities in a predicated
    grammar. If your predicated parser generator gives you an error, you
    still have an unresolved ambiguity and if it doesn't the resulting
    parser will always construct a parse tree (and not a forest).

    I would be remiss if I also did not point out backtracking parsers,
    which are another solution to the problem. In fact, all the
    implementations of predicated parsers that I know of, use some form of
    backtracking in their implementation. General backtrakcing parsers
    share the characteristic with GLR parsers that they can parse
    ambiguous grammars. Backtracking parsers generally also produce a
    parse tree (although in theory they could also produce a forest).
    Backtracking parsers have their own deficits though. Many
    backtracking parsers will loop forever on some ambiguous grammars.
    (Predicated backtraking parsers do not generally have this problem,
    although they do not make the same linear time guarantees that pure LL
    and LR parsers do(see note)--of course, any parser generator that can
    handle a significant class of ambiguous must be inherently non-linear
    for some grammars, and GLR parsers have a cubic worst case, same as
    Earley parsers.) In addition, most backtracking parsers resolve
    ambiguities by selecting one parse tree out of the forest to return.
    This is generally done by the order of the rules in the grammar (which
    determines the order the rules are tried in in ambiguous cases). If
    one looks closely, this is very similar to using predicates
    "implicitly" in the grammar. The key difference being that the tool
    inserts the predicates rather than the user and does so without
    warning and usually without the run-time termination guarantees.

    I would like to mention that it is possible to build a predicated
    parser using GLR technology, although I don't know of anyone
    attempting to do so right now. From thought-experiments I have done
    considering whether to implement such a tool, it seems like there
    would be some advantages to building such a tool.

    Again, I do not want to imply that these are the only techniques for
    dealing with ambiguity. For example, Ralph Boland is pursing some
    generalization of LR technology that I gather will handle a wider
    class of languages and I don't think his technique is any of the
    above.

    Note: Bryan Ford recently published a paper on a "predicated" parsing
    technique that made extensive use of memoization and lazy evaluation
    to achieve (if I recall correctly) a linear time guarantee. His
    technique shares a characterisitic with general backtracking parsers
    in that the order of rules determines what is matched and the the
    entire tree is disambiguated that way. He uses an "ordered" or clause
    to implement this.

    Hope this helps,
    -Chris


    Chris Clark said (in part):

    > It is worth mentioning that there are other ways of handling ambiguous
    > grammars. In particular, one can use predicates to resolve
    > ambiguities. Predicates allow one to take the difference of two
    > productions in a controlled manner. In particular, it is possible to
    > write a syntactic rules that says, try to parse this as a declaration
    > and if it isn't parse it as an expression. The difference between the
    > predicated and the GLR solution is that predicated grammars are still
    > deterministic. There are no hidden ambiguities in a predicated
    > grammar. If your predicated parser generator gives you an error, you
    > still have an unresolved ambiguity and if it doesn't the resulting
    > parser will always construct a parse tree (and not a forest)

    I agree with Chris about the use of predicates.

    Interestingly, the use of predicates alone can some Type 1 power to a
    grammar.

    For instance:

    L1 = {a^n b^n c+} // clearly a type 2 language
    L2 = {a+ b^n c^n} // clearly a type 2 language

    L1 intersect L2 = {a^n b^n c^n} // a type 1 language

    Current research in the area of this class of grammars can be found here:

    http://www.cs.queensu.ca/home/okhotin/

    See the section on "Boolean grammars." Intersection can get quite a bit of
    power out of a formalism.

    My most recent paper deals with several difficult to parse languages of the
    classical sort, including the particularly nasty to parse:

    L = {a^m b^n c^mn}

    The only grammar I've seen expressed for that one in classical form is in
    Type 0 due to a length increasing production:

      (1) <S> ::= <H><S> | <H><B>
      (2) <B> ::= <B><B> | <C>
      (3) <H><B> ::= <A><X><N><B>
      (4) <N><B> ::= <B><N>
      (5) <B><M> ::= <M><B>
      (6) <N><C> ::= <M>c
      (7) <N>c ::= <M>cc
      (8) <X><M><B><B> ::= <B><X><N><B>
      (9) <X><B><M>c ::= <B>c
    (10) <H><A> ::= <A><H>
    (11) <A> ::= a
    (12) <B> ::= b

    Because production (9) is length increasing, the grammar is in Type 0 form,
    even though the language itself is Type 1. I'd like to see that grammar
    normalized to a Type 1 -- but haven't been able to find one.

    The longest derivation I've been able to do with that by hand is aabcc,
    which is:

    (11) aabcc --> <A>abcc
    (11) <A>abcc --> <A><A>bcc
    (12) <A><A>bcc --> <A><A><B>cc
      (9) <A><A><B>cc --> <A><A><X><B><M>cc
      (7) <A><A><X><B><M>cc --> <A><A><X><B><N>c
      (4) <A><A><X><B><N>c --> <A><A><X><N><B>c
      (3) <A><A><X><N><B>c --> <A><H><B>c
      (9) <A><H><B>c --> <A><H><X><B><M>c
      (6) <A><H><X><B><M>c --> <A><H><X><B><N><C>
      (4) <A><H><X><B><N><C> --> <A><H><X><N><B><B>
      (2) <A><H><X><N><B><B> --> <A><H><X><N><B>
    (10) <A><H><X><N><B> --> <H><A><X><N><B>
      (3) <H><A><X><N><B> --> <H><H><B>
      (1) <H><H><B> --> <H><S>
      (1) <H><S> --> <S>

    I started on aaabbcccccc -- but got lost in the shuffle. :-(

    If anyone wants to try a purely "predicate" approach to the above
    language -- I'd love to see it. (Or if anyone would care to post the
    derivation of aaabbcccccc, I'd really love to see that, too.)

    The $-grammar I wrote for that one accepts strings in O(n^2.3), and has 6
    productions and 2 of those are predicates, but also makes use of 2
    phi-expressions, which implies a total of 4 predicates (since there is an
    implied predicate with every phi-expression) and at least 2 name-indexed
    tries.

    Also of note is that I allow for a-expressions in predicates, which allows
    for substrings that have been parsed to be concatenated to form entirely new
    input that is then passed to the predicates. (The $-grammar for a^m b^n c^mn
    uses this.) This is similar to what is known as "length-increasing".

    Anyway, a few nights ago, I was asking myself about this formation in C++:

    class Foo
    {
    int inline_function(int x)
    {
    return __y * x; // __y is used before being seen
    }

    int __y;
    }; // this class is legal

    class Bar
    {
    int inline_function(int x)
    {
    return __y * x; // __y is an undeclared variable
    }
    }; // this class is not legal because __y never gets declared

    $-calculus was able to handle this ... without any code, accepting Foo and
    rejecting Bar -- using all of the above mentioned techniques.

    Although the resulting $-grammar uses more than 1 explicit predicate, it
    does make use of phi-expressions, and these have implied predicates -- so I
    was not able to do it without extensive overall use of predication.

    Anyway -- I wrote it up and am looking for somewhere that is looking for a
    3.5 page paper on such things. Any thoughts?

    [BTW -- Chris -- direct email to you bounces from my account. Is that a spam
    guard?]
  • 相关阅读:
    python自动化_day6_面向对象_组合,继承,多态
    python自动化_day5_二分查找,模块和正则表达式
    python自动化_day4_迭代器生成器内置函数和匿名函数
    python自动化_day4_装饰器复习和装饰器进阶
    python目录
    python3的ExecJS安装使用
    python运算符^&|~>><<
    python有哪些优点跟缺点
    26-30题
    21-25题
  • 原文地址:https://www.cnblogs.com/guaiguai/p/1227082.html
Copyright © 2011-2022 走看看