4.7 More Powerful LR Parsers

zoukankan html css js c++ java

4.7 More Powerful LR Parsers
4.7 More Powerful LR Parsers

In this section, we shall extend the previous LR parsing techniques to use one symbol of lookahead on the input. There are two different methods:
1. The "canonical-LR" or just "LR" method, which makes full use of the lookahead symbol(s). This method uses a large set of items, called the LR(1) items.
2. The "lookahead-LR" or "LALR" method, which is based on the LR(0) sets of items, and has many fewer states than typical parsers based on the LR(1) items. By carefully introducing lookaheads into the LR(0) items, we can handle many more grammars with the LALR method than with the SLR method, and build parsing tables that are no bigger than the SLR tables. LALR is the method of choice in most situations.
After introducing both these methods, we conclude with a discussion of how to compact LR parsing tables for environments with limited memory.

4.7.1 Canonical LR(1) Items

We shall now present the most general technique for constructing an LR parsing table from a grammar. Recall that in the SLR method, state i calls for reduction by A -> α if the set of items Ii contains item [A -> α@] and a is in FOLLOW(A). In some situations, however, when state i appears on top of the stack, the viable prefix βα on the stack is such that βA cannot be followed by a in any right-sentential form. Thus, the reduction by A -> α should be invalid on input a.

Example 4.51: Let us reconsider Example 4.48, where in state 2 we had item R -> L@, which could correspond to A -> α above, and a could be the = sign, which is in FOLLOW(R). Thus, the SLR parser calls for reduction by R -> L in state 2 with = as the next input (the shift action is also called for, because of item S -> L@=R in state 2). However, there is no right-sentential form of the grammar in Example 4.48 that begins R = ... . Thus state 2, which is the state corresponding to viable prefix L only, should not really call for reduction of that L to R.

It is possible to carry more information in the state that will allow us to rule out some of these invalid reductions by A -> α. By splitting states when necessary, we can arrange to have each state of an LR parser indicate exactly which input symbols can follow a handle α for which there is a possible reduction to A.

The extra information is incorporated into the state by redefining items to include a terminal symbol as a second component. The general form of an item becomes [A -> α@β, a] , where A -> αβ is a production and a is a terminal or the right endmarker $. We call such an object an LR(1) item. The 1 refers to the length of the second component, called the lookahead of the item.@6 The lookahead has no effect in an item of the form [A -> α@β, a], where β is not ε, but an item of the form [A -> α@, a] calls for a reduction by A->α only if the next input symbol is a. Thus, we are compelled to reduce by A->α only on those input symbols a for which [A -> α@, a] is an LR(1) item in the state on top of the stack. The set of such a's will always be a subset of FOLLOW(A), but it could be a proper subset, as in Example 4.5l.

@6: Lookaheads that are strings of length greater than one are possible, of course, but we shall not consider such lookaheads here.

Formally, we say LR(1) item [A -> α@β, a] is valid for a viable prefix γ if there is a derivation S => δAω => δαβω , where
1. γ = δα, and
2. Either a is the first symbol of ω, or ω is ε and a is $.
Example 4.52: Let us consider the grammar

S->BB

B->aB|b

There is a rightmost derivation S => aaBab => aaaBab. We see that item [B -> a@B, a] is valid for a viable prefix γ = aaa by letting δ = aa, A = B, ω = ab, α = a, and β = B in the above definition. There is also a rightmost derivation S => BaB => BaaB. From this derivation we see that item [B -> a@B, $] is valid for viable prefix Baa.

4.7.2 Constructing LR(1) Sets of Items

The method for building the collection of sets of valid LR (1) items is essentially the same as the one for building the canonical collection of sets of LR (0) items. We need only to modify the two procedures CLOSURE and GOTO.

Figure 4.40: Sets-of-LR(1)-items construction for grammar G'

To appreciate the new definition of the CLOSURE operation, in particular, why b must be in FIRST(βα), consider an item of the form [A -> α@Bβ, a] in the set of items valid for some viable prefix γ. Then there is a rightmost derivation S => δAax => δαBβax, where γ=δα. Suppose βax derives terminal string by. Then for each production of the form B -> η for some η , we have derivation S => γBby => γηby. Thus, [B -> @η, b] is valid for γ. Note that b can be the first terminal derived from β, or it is possible that β derives ε in the derivation βax => by, and b can therefore be a. To summarize both possibilities we say that b can be any terminal in FIRST(βax), where FIRST is the function from Section 4.4. Note that x cannot contain the first terminal of by, so FIRST(βax) = FIRST(βa) . We now give the LR(1) sets of items construction.

Figure 4.41: The GOTO graph for grammar (4.55)
查看全文

相关阅读:
阿里云系列——3.企业网站备案步骤---2018-1-4
关于VS2017安装的一点扩充说明（15.5）
Git环境配置+VSCode中文乱码问题
 抛砖引玉之~sftp
关于链接文件的探讨
 VSCode插件MSSQL教程（昨天提了一下）
SQL Server 2017 安装过程中的一点说明（有点意思）
PS如何批量生成缩略图（方法可以通用其他重复劳动）
mdb导入SqlServer
01.码医入门（完篇）

原文地址：https://www.cnblogs.com/cuishengli/p/2700275.html

4.7 More Powerful LR Parsers

4.7 More Powerful LR Parsers

4.7.1 Canonical LR(1) Items

4.7.2 Constructing LR(1) Sets of Items