zoukankan      html  css  js  c++  java
  • 4.7 More Powerful LR Parsers

    4.7 More Powerful LR Parsers

    In this section, we shall extend the previous LR parsing techniques to use one symbol of lookahead on the input. There are two different methods:

    1. The "canonical-LR" or just "LR" method, which makes full use of the lookahead symbol(s). This method uses a large set of items, called the LR(1) items.
    2. The "lookahead-LR" or "LALR" method, which is based on the LR(0) sets of items, and has many fewer states than typical parsers based on the LR(1) items. By carefully introducing lookaheads into the LR(0) items, we can handle many more grammars with the LALR method than with the SLR method, and build parsing tables that are no bigger than the SLR tables. LALR is the method of choice in most situations.

    After introducing both these methods, we conclude with a discussion of how to compact LR parsing tables for environments with limited memory.

    4.7.1 Canonical LR(1) Items

    We shall now present the most general technique for constructing an LR parsing table from a grammar. Recall that in the SLR method, state i calls for reduction by A -> α if the set of items Ii contains item [A -> α@] and a is in FOLLOW(A). In some situations, however, when state i appears on top of the stack, the viable prefix βα on the stack is such that βA cannot be followed by a in any right-sentential form. Thus, the reduction by A -> α should be invalid on input a.

    Example 4.51: Let us reconsider Example 4.48, where in state 2 we had item R -> L@, which could correspond to A -> α above, and a could be the = sign, which is in FOLLOW(R). Thus, the SLR parser calls for reduction by R -> L in state 2 with = as the next input (the shift action is also called for, because of item S -> L@=R in state 2). However, there is no right-sentential form of the grammar in Example 4.48 that begins R = ... . Thus state 2, which is the state corresponding to viable prefix L only, should not really call for reduction of that L to R.

    It is possible to carry more information in the state that will allow us to rule out some of these invalid reductions by A -> α. By splitting states when necessary, we can arrange to have each state of an LR parser indicate exactly which input symbols can follow a handle α for which there is a possible reduction to A.

    The extra information is incorporated into the state by redefining items to include a terminal symbol as a second component. The general form of an item becomes [A -> α@β, a] , where A -> αβ is a production and a is a terminal or the right endmarker $. We call such an object an LR(1) item. The 1 refers to the length of the second component, called the lookahead of the item.@6 The lookahead has no effect in an item of the form [A -> α@β, a], where β is not ε, but an item of the form [A -> α@, a] calls for a reduction by A->α only if the next input symbol is a. Thus, we are compelled to reduce by A->α only on those input symbols a for which [A -> α@, a] is an LR(1) item in the state on top of the stack. The set of such a's will always be a subset of FOLLOW(A), but it could be a proper subset, as in Example 4.5l.

    @6: Lookaheads that are strings of length greater than one are possible, of course, but we shall not consider such lookaheads here.

    Formally, we say LR(1) item [A -> α@β, a] is valid for a viable prefix γ if there is a derivation S => δAω => δαβω , where

    1. γ = δα, and
    2. Either a is the first symbol of ω, or ω is ε and a is $.

    Example 4.52: Let us consider the grammar

    S->BB

    B->aB|b

    There is a rightmost derivation S => aaBab => aaaBab. We see that item [B -> a@B, a] is valid for a viable prefix γ = aaa by letting δ = aa, A = B, ω = ab, α = a, and β = B in the above definition. There is also a rightmost derivation S => BaB => BaaB. From this derivation we see that item [B -> a@B, $] is valid for viable prefix Baa.

    4.7.2 Constructing LR(1) Sets of Items

    The method for building the collection of sets of valid LR (1) items is essentially the same as the one for building the canonical collection of sets of LR (0) items. We need only to modify the two procedures CLOSURE and GOTO.

    Figure 4.40: Sets-of-LR(1)-items construction for grammar G'

    To appreciate the new definition of the CLOSURE operation, in particular, why b must be in FIRST(βα), consider an item of the form [A -> α@Bβ, a] in the set of items valid for some viable prefix γ. Then there is a rightmost derivation S => δAax => δαBβax, where γ=δα. Suppose βax derives terminal string by. Then for each production of the form B -> η for some η , we have derivation S => γBby => γηby. Thus, [B -> @η, b] is valid for γ. Note that b can be the first terminal derived from β, or it is possible that β derives ε in the derivation βax => by, and b can therefore be a. To summarize both possibilities we say that b can be any terminal in FIRST(βax), where FIRST is the function from Section 4.4. Note that x cannot contain the first terminal of by, so FIRST(βax) = FIRST(βa) . We now give the LR(1) sets of items construction.

    Figure 4.41: The GOTO graph for grammar (4.55)

  • 相关阅读:
    ZOJ 3018
    poj2464
    Gauss
    【C】关于内存地址
    【C】typedef与define的区别
    C位移操作
    操作系统使用批处理文件更改网络配置
    【Linux】查看某个进程的线程数量(转)
    数据结构快速排序
    C++Explanation of ++val++ and ++*p++ in C
  • 原文地址:https://www.cnblogs.com/cuishengli/p/2700275.html
Copyright © 2011-2022 走看看