Proj. THUIoTFuzz Java工具-Antlr

zoukankan html css js c++ java

Proj. THUIoTFuzz Java工具-Antlr
First Example
```
grammar Hello;
r  : 'hello' ID ;         // match keyword hello followed by an identifier
ID : [a-z]+ ;             // match lower-case identifiers
WS : [ 	
]+ -> skip ; // skip spaces, tabs, newlines
```
Comments

C语言风格, /**/或者//

Identifiers

Identifiers允许使用下划线，[a-zA-Z9-0]或者多种Unicode。Token一定要首字母大写，ParserRule一定要首字母小写。
例如
```
ID, LPAREN, RIGHT_CURLY // token names/lexer rules
expr, simpleDeclarator, d2, header_file // parser rule names
```
ANTLR也允许变量名称中有Unicode。
```
grammar 外;
a : '外';
```
Literals

ANTLR不区分char和string literals，都用单引号，例如, ' ', '>='。
要表示unicode，可以使用’U+FFFF’也可以用u{1F4A9}。

Actions

不一定一定要是目标语言的代码，可以理解为{}包括的一系列字符串。对于括号内部的{和}符号来说，不用对"{"和/}/这种包裹在引号或者包裹在注释中的大括号符号转义，也不需要对能够balanced的-恰好一对套一对的大括号转义，但是那种悬置的大括号符号就需要转义，从'{'变成'{'，从'}'变成'}'。

嵌套的代码还能出现在@header和@members，parser和lexer的规则，exeption catching规定，parser的attribute中。此外，一些rule element options中比如predicates也可以定义。

Keywords
```
import, fragment, lexer, parser, grammar, returns,
locals, throws, catch, finally, mode, options, tokens
```
此外，目标语言的关键字也要避免用作规则名

Basic Grammar Structure
```
/** Optional javadoc style comment */
grammar Name; ①
options {...}
import ... ;
 	
tokens {...}
channels {...} // lexer only
@actionName {...}
 	 
rule1 // parser and lexer rules, possibly intermingled
...
ruleN
```
必须的内容只有grammar Name;和至少1条规则。此外，语法文件应该用grammarName来命名。对应到例子，就应该命名为Name.g4。
grammar Name的意思是既生成parser又生成lexer。如果只要这二者之一，可以写parser grammar Name;或者lexer grammar Name;
只有lexer grammar能够用mode和CHANNEL相关。比如
```
channels {
  WHITESPACE_CHANNEL,
  COMMENTS_CHANNEL
}
```
这些channel可以定义之后当成enum一般来使用。

Grammar imports

import对象就像父类，当前语法会继承所有的rules，tokens和named actions，此外，还会重载一部分规则比如main grammar中的规则等。

对于main grammar和imported grammar中的modes，antlr会将其没有overriden的规则合并起来。不过，如果某个mode已经没有未overriden的规则，那么antlr就会直接删掉它。
tokens相关的规则也会被合并。channel也会被合并。@members等named action也会被合并。
被引用的Grammars也可能引用其他grammar。所以，ANTLR采用一个深度优先的引用策略。
注意，lexer grammar只能引用lexer grammar，parser能引用parser，而combined grammar能引用parser或者是不带mode的lexer。

Tokens Section

e.g:
```
// explicitly define keyword token types to avoid implicit definition warnings
tokens { BEGIN, END, IF, THEN, WHILE }
 
@lexer::members { // keywords map used in lexer to assign token types
Map<String,Integer> keywords = new HashMap<String,Integer>() {{
	put("begin", KeywordsParser.BEGIN);
	put("end", KeywordsParser.END);
	...
}};
}
```
打印xxx.tokens就能得到
```
BEGIN=1
END=2
...
```
Actions at the Grammar level

@header将代码注入recognizer class definition之前。@members将代码注入recogizer class definition中，可以认为是为recognizer写入fields和methods。
```
grammar Count;

 
@header {
package foo;
}
 
@members {
int count = 0;
}
 
list
@after {System.out.println(count+" ints");}
: INT {count++;} (',' INT {count++;} )*
;
 
INT : [0-9]+ ;
WS : [ 
	
]+ -> skip ;
```
```
$ cd foo
$ antlr4 Count.g4 # generates code in the current directory (foo)
$ ls
Count.g4		CountLexer.java	CountParser.java
Count.tokens	CountLexer.tokens
CountBaseListener.java CountListener.java
$ javac *.java
$ cd ..
$ grun foo.Count list
=> 	9, 10, 11
=> 	EOF
<= 	3 ints
```
ParserRule
1. 或(x|y|z) e.g: returnType : (type | 'void') ;
2. (x|y|z)? e.g: classDeclaration : 'class' ID (typeParameters)? ('extends' type)? ('implements' typeList)? classBody ;
3. (x|y|z)*
4. (x|y|z)+
Alternative Labels

可以用'#'为rule最外围的选项支起别名。注意这个别名可以用在多条alternative rules上。
注意对一个选项，要么全部的选项支都给标上alternative name，要么就全部都不标。
```
e   : e '*' e # Mult

    | e '+' e # Add
    | INT # Int
    ;
```
如果这个alternative name和正式的rule name有冲突，ANTLR会报错。error(124): A.g4:5:23: rule alt label e conflicts with rule e
```
e : e '*' e # BinaryOp
 	| e '+' e # BinaryOp
 	| INT # Int
 	;
```
对应
```
void enterBinaryOp(AParser.BinaryOpContext ctx);
 	void exitBinaryOp(AParser.BinaryOpContext ctx);
 	void enterInt(AParser.IntContext ctx);
 	void exitInt(AParser.IntContext ctx);
```
查看全文

相关阅读:
事件(三)：事件对象
 事件(二)：事件处理程序
 事件(一)：事件流
 nginx里面的rewrite配置
 详解 CSS 居中布局技巧
 jQuery 效率提升建议
 web的攻击技术
 前端优化点总结
 深入理解递归和闭包
 创建对象

原文地址：https://www.cnblogs.com/xuesu/p/14389914.html

Proj. THUIoTFuzz Java工具-Antlr

First Example

Comments

Identifiers

Literals

Actions

Keywords

Basic Grammar Structure

Grammar imports

Tokens Section

Actions at the Grammar level

ParserRule

Alternative Labels