zoukankan      html  css  js  c++  java
  • antlr解析hive语句

    hive是使用antlr来解析的

    parser要做的事情,是从无结构的字符串里面,解码产生有结构的数据结构(a parser is a function accepting strings as input and returning some structure as output),参考 Parser_combinator wiki

    parser分成两种,一种是parser combinator,一种是parser generator,区别可以参考 王垠的文章——对 Parser 的误解

    parser combinator是需要手写parser,a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output

    parser generator是需要你用某种指定的描述语言来表示出语法,然后自动把他们转换成parser的代码,比如Antlr里面的g4语法文件,以及calcite的ftl语法文件,缺点是由于代码是生成的,排错比较困难

    使用了Antlr的parser有Hive,Presto,Spark SQL

    美团点评的文章

    https://tech.meituan.com/2014/02/12/hive-sql-to-mapreduce.html
    

    以及hive源码的测试用例

    https://github.com/apache/hive/blob/branch-1.1/ql/src/test/org/apache/hadoop/hive/ql/parse/TestHiveDecimalParse.java
    

    hive的g4文件如下

    老版本的hive

    https://github.com/apache/hive/blob/59d8665cba4fe126df026f334d35e5b9885fc42c/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
    

    新版本的hive

    https://github.com/apache/hive/blob/master/hplsql/src/main/antlr4/org/apache/hive/hplsql/Hplsql.g4
    

    spark的g4文件如下

    https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
    

    Presto的g4文件如下

    https://github.com/prestodb/presto/blob/master/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
    

    使用了Apache Calcite的parser有Apache Flink,Mybatis,Apache Storm等

    参考:Apache Calcite 为什么能这么流行

    Flink的ftl文件如下

    https://github.com/apache/flink/blob/master/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl
    

    Mybatis的mapper模板生成

    https://github.com/abel533/Mapper/blob/master/generator/src/main/resources/generator/mapper.ftl
    

    Storm的ftl文件如下

    https://github.com/apache/storm/blob/master/sql/storm-sql-core/src/codegen/includes/parserImpls.ftl
    

    以hive的Hplsql.g4为例,来解析一句sql

    antlr4 Hplsql.g4
    javac Hplsql*.java
    

    解析select语句

    grun Hplsql r -tokens
    Warning: TestRig moved to org.antlr.v4.gui.TestRig; calling automatically
    select * from db1.tb1;
    [@0,0:5='select',<T_SELECT>,1:0]
    [@1,7:7='*',<'*'>,1:7]
    [@2,9:12='from',<T_FROM>,1:9]
    [@3,14:16='db1',<L_ID>,1:14]
    [@4,17:17='.',<'.'>,1:17]
    [@5,18:20='tb1',<L_ID>,1:18]
    [@6,21:21=';',<';'>,1:21]
    [@7,23:22='<EOF>',<EOF>,2:0]
    No method for rule r or it has arguments
    

    可以看到打印出token流

    解析建表语句

    grun Hplsql r -tokens
    Warning: TestRig moved to org.antlr.v4.gui.TestRig; calling automatically
    CREATE TABLE IF NOT EXISTS db1.tb1 (
      `f1` string,
      `f2` bigint,
      `f3` string,
      `f4` string,
      `f5` string)
    partitioned by(ds string)
    stored as parquet
    TBLPROPERTIES ("parquet.compression"="SNAPPY");
    [@0,0:5='CREATE',<T_CREATE>,1:0]
    [@1,7:11='TABLE',<T_TABLE>,1:7]
    [@2,13:14='IF',<T_IF>,1:13]
    [@3,16:18='NOT',<T_NOT>,1:16]
    [@4,20:25='EXISTS',<T_EXISTS>,1:20]
    [@5,27:29='db1',<L_ID>,1:27]
    [@6,30:30='.',<'.'>,1:30]
    [@7,31:33='tb1',<L_ID>,1:31]
    [@8,35:35='(',<'('>,1:35]
    [@9,39:42='`f1`',<L_ID>,2:2]
    [@10,44:49='string',<T_STRING>,2:7]
    [@11,50:50=',',<','>,2:13]
    [@12,54:57='`f2`',<L_ID>,3:2]
    [@13,59:64='bigint',<T_BIGINT>,3:7]
    [@14,65:65=',',<','>,3:13]
    [@15,69:72='`f3`',<L_ID>,4:2]
    [@16,74:79='string',<T_STRING>,4:7]
    [@17,80:80=',',<','>,4:13]
    [@18,84:87='`f4`',<L_ID>,5:2]
    [@19,89:94='string',<T_STRING>,5:7]
    [@20,95:95=',',<','>,5:13]
    [@21,99:102='`f5`',<L_ID>,6:2]
    [@22,104:109='string',<T_STRING>,6:7]
    [@23,110:110=')',<')'>,6:13]
    [@24,112:122='partitioned',<L_ID>,7:0]
    [@25,124:125='by',<T_BY>,7:12]
    [@26,126:126='(',<'('>,7:14]
    [@27,127:128='ds',<L_ID>,7:15]
    [@28,130:135='string',<T_STRING>,7:18]
    [@29,136:136=')',<')'>,7:24]
    [@30,138:143='stored',<T_STORED>,8:0]
    [@31,145:146='as',<T_AS>,8:7]
    [@32,148:154='parquet',<L_ID>,8:10]
    [@33,156:168='TBLPROPERTIES',<L_ID>,9:0]
    [@34,170:170='(',<'('>,9:14]
    [@35,171:191='"parquet.compression"',<L_ID>,9:15]
    [@36,192:192='=',<'='>,9:36]
    [@37,193:200='"SNAPPY"',<L_ID>,9:37]
    [@38,201:201=')',<')'>,9:45]
    [@39,202:202=';',<';'>,9:46]
    [@40,204:203='<EOF>',<EOF>,10:0]
    No method for rule r or it has arguments
    
  • 相关阅读:
    Visual Studio Code插件安装步骤
    JS省城级联
    JS省城级联
    JS省城级联
    JS省城级联
    【JAVA零基础入门系列】Day9 Java中的那个大数值
    【JAVA零基础入门系列】Day9 Java中的那个大数值
    【JAVA零基础入门系列】Day9 Java中的那个大数值
    [js插件开发教程]实现一个比较完整的开源级选项卡插件
    VS2017桌面应用程序打包成.msi或者.exe
  • 原文地址:https://www.cnblogs.com/tonglin0325/p/12212866.html
Copyright © 2011-2022 走看看