zoukankan      html  css  js  c++  java
  • 手写一个简单的ElasticSearch SQL转换器(一)

       一.前言

       之前有个需求,是使ElasticSearch支持使用SQL进行简单查询,较新版本的ES已经支持该特性(不过貌似还是实验性质的?) ,而且git上也有elasticsearch-sql

    插件,之所以决定手写一个,主要有两点原因:

          1. 目前用的ES版本较老

          2. elasticsearch-sql虽好,但比较复杂,代码也不易维护

          3. 练练手

     二.技术选型

       目前主流软件中通常使用ANTLR做词法语法分析,诸如著名的Hibernate,Spark,Hive等项目,之前因为工作原因也有所接触,不过如果只是解析标准SQL的话,

     其实还有更好的选择,如使用Hibernate或阿里巴巴的数据库Druid(Druid采用了手写词法语法分析器的方案,这种方式当然比自动ANTLR生成的解析器性能高得多), 这里

     我选择了第二种方案。

         开始之前先看下我们可以通过Druid拿到的SQL语言的抽象语法树:

        

                                                      图片:https://www.jianshu.com/p/437aa22ea3ca

     三.技术实现

         首先我们创建一个SqlParser类,主流程都在parse方法中,该方法负责将一个SQL字符串解析(顺便说一句,Druid支持多种SQL方言,这里我选择了MySQL),

     并返回SearchSourceBuilder对象,这是一个ElasticSearch提供的DSL构建器,以该对象作为参数,ES client端即可发起对ES 服务端搜索请求。

        

     1 /**
     2  * 
     3  * @author fred
     4  *
     5  */
     6 public class SqlParser {
     7     private final static String dbType = JdbcConstants.MYSQL;
     8     private final static Logger logger = LoggerFactory.getLogger(SqlParser.class);
     9     private SearchSourceBuilder builder;
    10 
    11     public SqlParser(SearchSourceBuilder builder) {
    12         this.builder = builder;
    13     }
    14     /**
    15      * 将SQL解析为ES查询
    16      */
    17     public SearchSourceBuilder parse(String sql) throws Exception {
    18         if (Objects.isNull(sql)) {
    19             throw new IllegalArgumentException("输入语句不得为空");
    20         }
    21         sql = sql.trim().toLowerCase();
    22         List<SQLStatement> stmtList = SQLUtils.parseStatements(sql, dbType);
    23         if (Objects.isNull(stmtList) || stmtList.size() != 1) {
    24             throw new IllegalArgumentException("必须输入一句查询语句");
    25         }
    26         // 使用Parser解析生成AST
    27         SQLStatement stmt = stmtList.get(0);
    28         if (!(stmt instanceof SQLSelectStatement)) {
    29             throw new IllegalArgumentException("输入语句须为Select语句");
    30         }
    31         SQLSelectStatement sqlSelectStatement = (SQLSelectStatement) stmt;
    32         SQLSelectQuery sqlSelectQuery = sqlSelectStatement.getSelect().getQuery();
    33         SQLSelectQueryBlock sqlSelectQueryBlock = (SQLSelectQueryBlock) sqlSelectQuery;
    34 
    35         SQLExpr whereExpr = sqlSelectQueryBlock.getWhere();
    36 
    37         // 生成ES查询条件
    38         BoolQueryBuilder bridge = QueryBuilders.boolQuery();
    39         bridge.must();
    40 
    41         QueryBuilder whereBuilder = whereHelper(whereExpr); // 处理where
    42         bridge.must(whereBuilder);
    43         SQLOrderBy orderByExpr = sqlSelectQueryBlock.getOrderBy(); // 处理order by
    44         if (Objects.nonNull(orderByExpr)) {
    45             orderByHelper(orderByExpr, bridge);
    46         }
    47         builder.query(bridge);
    48         return builder;
    49     }

         

        主流程很简单,拿到SQL字符串后,直接通过Druid API将其转换为抽象语法树,我们要求输入语句必须为Select语句。接下来是对where语句和order by语句的处理,

      目前的难点其实主要在于如何将where语句映射到ES查询中。

         先从简单的看起,如何处理order by呢?SQL语句中 order by显然可以允许用户根据多字段排序,所以排序字段肯定是一个List<排序字段>,我们要做的就是将这个List映射到

    SearchSourceBuilder对象中。见下面代码:

        

     1     /**
     2      * 处理所有order by字段
     3      * 
     4      * @param orderByExpr
     5      */
     6     private void orderByHelper(SQLOrderBy orderByExpr, BoolQueryBuilder bridge) {
     7         List<SQLSelectOrderByItem> orderByList = orderByExpr.getItems(); // 待排序的列
     8         for (SQLSelectOrderByItem sqlSelectOrderByItem : orderByList) {
     9             if (sqlSelectOrderByItem.getType() == null) {
    10                 sqlSelectOrderByItem.setType(SQLOrderingSpecification.ASC); // 默认升序
    11             }
    12             String orderByColumn = sqlSelectOrderByItem.getExpr().toString();
    13             builder.sort(orderByColumn,
    14                     sqlSelectOrderByItem.getType().equals(SQLOrderingSpecification.ASC) ? SortOrder.ASC
    15                             : SortOrder.DESC);
    16         }
    17     }

       通过Druid的API,我们很容易拿到了SQL语句中所有的排序字段,我们逐个遍历这些字段,拿到排序的列名字面量和顺序,传递给SearchSourceBuilder的sort方法,需注意的

    是, 如果原始SQL中没有指定字段是顺序,我们默认升序。

       

        接下来我们处理稍微有点麻烦的where语句,因为SQL语句被解析成了语法树,很自然的我们想到使用递归方式进行处理。 而通常在处理递归问题的时候,

      我习惯于从递归的base case开始考虑,where语句中的运算符根据Druid API中的定义主要分为以下三种:

        1. 简单二元运算符:包括逻辑处理,如and, or 和大部分关系运算(后续会详细讲)

        2. between或not between运算符:我们可以简单的将其映射成ES中的Range Query

        3. in, not in 运算符: 可以简单的映射成ES中的Term Query

       通过Druid,我们可以很方便的获取每种运算中的运算符与操作数

     1 /**
     2      * 递归遍历“where”子树
     3      * 
     4      * @return
     5      */
     6     private QueryBuilder whereHelper(SQLExpr expr) throws Exception {
     7         if (Objects.isNull(expr)) {
     8             throw new NullPointerException("节点不能为空!");
     9         }
    10         BoolQueryBuilder bridge = QueryBuilders.boolQuery();
    11         if (expr instanceof SQLBinaryOpExpr) { // 二元运算
    12             SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 获取运算符
    13             if (operator.isLogical()) { // and,or,xor
    14                 return handleLogicalExpr(expr);
    15             } else if (operator.isRelational()) { // 具体的运算,位于叶子节点
    16                 return handleRelationalExpr(expr);
    17             }
    18         } else if (expr instanceof SQLBetweenExpr) { // between运算
    19             SQLBetweenExpr between = ((SQLBetweenExpr) expr);
    20             boolean isNotBetween = between.isNot(); // between or not between ?
    21             String testExpr = between.testExpr.toString();
    22             String fromStr = formatSQLValue(between.beginExpr.toString());
    23             String toStr = formatSQLValue(between.endExpr.toString());
    24             if (isNotBetween) {
    25                 bridge.must(QueryBuilders.rangeQuery(testExpr).lt(fromStr).gt(toStr));
    26             } else {
    27                 bridge.must(QueryBuilders.rangeQuery(testExpr).gte(fromStr).lte(toStr));
    28             }
    29             return bridge;
    30         } else if (expr instanceof SQLInListExpr) { // SQL的 in语句,ES中对应的是terms
    31             SQLInListExpr siExpr = (SQLInListExpr) expr;
    32             boolean isNotIn = siExpr.isNot(); // in or not in?
    33             String leftSide = siExpr.getExpr().toString();
    34             List<SQLExpr> inSQLList = siExpr.getTargetList();
    35             List<String> inList = new ArrayList<>();
    36             for (SQLExpr in : inSQLList) {
    37                 String str = formatSQLValue(in.toString());
    38                 inList.add(str);
    39             }
    40             if (isNotIn) {
    41                 bridge.mustNot(QueryBuilders.termsQuery(leftSide, inList));
    42             } else {
    43                 bridge.must(QueryBuilders.termsQuery(leftSide, inList));
    44             }
    45             return bridge;
    46         }
    47         return bridge;
    48     }

       上述第一种情况比较复杂,首先我们先看看运算符是逻辑运算的情况:

        如下面的代码所示,如果运算符是逻辑运算符,我们需要对左右操作数分别递归,然后根据运算符类型归并结果:or可以映射成ES 中的Should,而and则映射成Must.

       

        /**
         * 逻辑运算符,目前支持and,or
         * 
         * @return
         * @throws Exception
         */
        private QueryBuilder handleLogicalExpr(SQLExpr expr) throws Exception {
            BoolQueryBuilder bridge = QueryBuilders.boolQuery();
            SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 获取运算符
            SQLExpr leftExpr = ((SQLBinaryOpExpr) expr).getLeft();
            SQLExpr rightExpr = ((SQLBinaryOpExpr) expr).getRight();
    
            // 分别递归左右子树,再根据逻辑运算符将结果归并
            QueryBuilder leftBridge = whereHelper(leftExpr);
            QueryBuilder rightBridge = whereHelper(rightExpr);
            if (operator.equals(SQLBinaryOperator.BooleanAnd)) {
                bridge.must(leftBridge).must(rightBridge);
            } else if (operator.equals(SQLBinaryOperator.BooleanOr)) {
                bridge.should(leftBridge).should(rightBridge);
            }
            return bridge;
        }

       下面来讨论下第一种情况中,如果运算符是关系运算符的情况,我们知道,SQL中的关系运算主要就是一些比较运算符,诸如大于,小于,等于,Like等,这里我还加上了

    正则搜索(不过貌似性能比较差,ES对正则搜索的限制颇多,不太建议使用)。

      

    /**
         * 大于小于等于正则
         * 
         * @param expr
         * @return
         */
        private QueryBuilder handleRelationalExpr(SQLExpr expr) {
            SQLExpr leftExpr = ((SQLBinaryOpExpr) expr).getLeft();
            if (Objects.isNull(leftExpr)) {
                throw new NullPointerException("表达式左侧不得为空");
            }
            String leftExprStr = leftExpr.toString();
            String rightExprStr = formatSQLValue(((SQLBinaryOpExpr) expr).getRight().toString()); // TODO:表达式右侧可以后续支持方法调用
            SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 获取运算符
            QueryBuilder queryBuilder;
            switch (operator) {
            case GreaterThanOrEqual:
                queryBuilder = QueryBuilders.rangeQuery(leftExprStr).gte(rightExprStr);
                break;
            case LessThanOrEqual:
                queryBuilder = QueryBuilders.rangeQuery(leftExprStr).lte(rightExprStr);
                break;
            case Equality:
                queryBuilder = QueryBuilders.boolQuery();
                TermQueryBuilder eqCond = QueryBuilders.termQuery(leftExprStr, rightExprStr);
                ((BoolQueryBuilder) queryBuilder).must(eqCond);
                break;
            case GreaterThan:
                queryBuilder = QueryBuilders.rangeQuery(leftExprStr).gt(rightExprStr);
                break;
            case LessThan:
                queryBuilder = QueryBuilders.rangeQuery(leftExprStr).lt(rightExprStr);
                break;
            case NotEqual:
                queryBuilder = QueryBuilders.boolQuery();
                TermQueryBuilder notEqCond = QueryBuilders.termQuery(leftExprStr, rightExprStr);
                ((BoolQueryBuilder) queryBuilder).mustNot(notEqCond);
                break;
            case RegExp: // 对应到ES中的正则查询
                queryBuilder = QueryBuilders.boolQuery();
                RegexpQueryBuilder regCond = QueryBuilders.regexpQuery(leftExprStr, rightExprStr);
                ((BoolQueryBuilder) queryBuilder).mustNot(regCond);
                break;
            case NotRegExp:
                queryBuilder = QueryBuilders.boolQuery();
                RegexpQueryBuilder notRegCond = QueryBuilders.regexpQuery(leftExprStr, rightExprStr);
                ((BoolQueryBuilder) queryBuilder).mustNot(notRegCond);
                break;
            case Like:
                queryBuilder = QueryBuilders.boolQuery();
                MatchPhraseQueryBuilder likeCond = QueryBuilders.matchPhraseQuery(leftExprStr,
                        rightExprStr.replace("%", ""));
                ((BoolQueryBuilder) queryBuilder).must(likeCond);
                break;
            case NotLike:
                queryBuilder = QueryBuilders.boolQuery();
                MatchPhraseQueryBuilder notLikeCond = QueryBuilders.matchPhraseQuery(leftExprStr,
                        rightExprStr.replace("%", ""));
                ((BoolQueryBuilder) queryBuilder).mustNot(notLikeCond);
                break;
            default:
                throw new IllegalArgumentException("暂不支持该运算符!" + operator.toString());
            }
            return queryBuilder;
        }

        到这里我们就完成了SQL转ES DSL的功能了(其实只是简单查询的转换),下面我们写几个Junit测试一下吧:

        首先是简单的比较运算:

    public void normalSQLTest() {
            String sql = "select * from test where time>= 1";
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
             try {
                 searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql);
            } catch (Exception e) {
                e.printStackTrace();
            }
             System.out.println(searchSourceBuilder);
             SearchSourceBuilder builderToCompare = new SearchSourceBuilder();
             QueryBuilder whereBuilder = QueryBuilders.rangeQuery("time").gte("1");
             BoolQueryBuilder briage = QueryBuilders.boolQuery();
             briage.must();
             briage.must(whereBuilder);
             builderToCompare.query(briage);
             assertEquals(searchSourceBuilder,builderToCompare);
        }

      下面是输出的ES 查询语句:

    {
      "query" : {
        "bool" : {
          "must" : [
            {
              "range" : {
                "time" : {
                  "from" : "1",
                  "to" : null,
                  "include_lower" : true,
                  "include_upper" : true,
                  "boost" : 1.0
                }
              }
            }
          ],
          "disable_coord" : false,
          "adjust_pure_negative" : true,
          "boost" : 1.0
        }
      }
    }

      再来个带排序的:

       

        @Test
        public void normalSQLWithOrderByTest() {
            String sql = "select * from test where time>= 1 order by time desc";
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
             try {
                 searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql);
            } catch (Exception e) {
                e.printStackTrace();
            }
             System.out.println(searchSourceBuilder);
             SearchSourceBuilder builderToCompare = new SearchSourceBuilder();
             QueryBuilder whereBuilder = QueryBuilders.rangeQuery("time").gte("1");
             BoolQueryBuilder briage = QueryBuilders.boolQuery();
             briage.must();
             briage.must(whereBuilder);
             builderToCompare.sort("time",SortOrder.DESC);
             builderToCompare.query(briage);
             assertEquals(searchSourceBuilder,builderToCompare);
        }

       between, in这些没什么区别,就不贴代码了,最后看看稍微复杂点儿,带逻辑运算的查询:

      

    @Test
        public void sqlLogicTest() {
            String sql = "select * from test where raw_log not like"+"'%aaa' && b=1 or c=0";
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
             try {
                 searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql);
            } catch (Exception e) {
                e.printStackTrace();
            }
             System.out.println(searchSourceBuilder);
             SearchSourceBuilder builderToCompare = new SearchSourceBuilder();
             QueryBuilder builder =QueryBuilders.matchPhraseQuery("raw_log","aaa"); 
             
             BoolQueryBuilder briage1 = QueryBuilders.boolQuery();//raw log not like
             briage1.mustNot(builder);  
             
             BoolQueryBuilder briage2 = QueryBuilders.boolQuery();  //b=1
             briage2.must(QueryBuilders.termQuery("b","1"));
             
             BoolQueryBuilder briage3 = QueryBuilders.boolQuery();   // not like and b=1
             briage3.must(briage1).must(briage2);
             
             BoolQueryBuilder briage4 = QueryBuilders.boolQuery();    //c =0
             briage4.must(QueryBuilders.termQuery("c","0"));
             
             BoolQueryBuilder briage5 = QueryBuilders.boolQuery();  // not like and b =1 or c =0
             briage5.should(briage3).should(briage4);
             
             
             
             BoolQueryBuilder briage6 = QueryBuilders.boolQuery();
             briage6.must();
             briage6.must(briage5);
             builderToCompare.query(briage6);
             assertEquals(searchSourceBuilder,builderToCompare);
        }

     下面是生成的查询语句:

       

    {
      "query" : {
        "bool" : {
          "must" : [
            {
              "bool" : {
                "should" : [
                  {
                    "bool" : {
                      "must" : [
                        {
                          "bool" : {
                            "must_not" : [
                              {
                                "match_phrase" : {
                                  "raw_log" : {
                                    "query" : "aaa",
                                    "slop" : 0,
                                    "boost" : 1.0
                                  }
                                }
                              }
                            ],
                            "disable_coord" : false,
                            "adjust_pure_negative" : true,
                            "boost" : 1.0
                          }
                        },
                        {
                          "bool" : {
                            "must" : [
                              {
                                "term" : {
                                  "b" : {
                                    "value" : "1",
                                    "boost" : 1.0
                                  }
                                }
                              }
                            ],
                            "disable_coord" : false,
                            "adjust_pure_negative" : true,
                            "boost" : 1.0
                          }
                        }
                      ],
                      "disable_coord" : false,
                      "adjust_pure_negative" : true,
                      "boost" : 1.0
                    }
                  },
                  {
                    "bool" : {
                      "must" : [
                        {
                          "term" : {
                            "c" : {
                              "value" : "0",
                              "boost" : 1.0
                            }
                          }
                        }
                      ],
                      "disable_coord" : false,
                      "adjust_pure_negative" : true,
                      "boost" : 1.0
                    }
                  }
                ],
                "disable_coord" : false,
                "adjust_pure_negative" : true,
                "boost" : 1.0
              }
            }
          ],
          "disable_coord" : false,
          "adjust_pure_negative" : true,
          "boost" : 1.0
        }
      }
    }

         

       四.总结

         本篇文章主要讲述了如何使用Druid实现SQL语句转换ES DSL进行搜索的功能,后续文章中会陆续完善这个功能,实现诸如聚合查询,分页查询等功能。

  • 相关阅读:
    【BZOJ 4581】【Usaco2016 Open】Field Reduction
    【BZOJ 4582】【Usaco2016 Open】Diamond Collector
    【BZOJ 4580】【Usaco2016 Open】248
    【BZOJ 3754】Tree之最小方差树
    【51Nod 1501】【算法马拉松 19D】石头剪刀布威力加强版
    【51Nod 1622】【算法马拉松 19C】集合对
    【51Nod 1616】【算法马拉松 19B】最小集合
    【51Nod 1674】【算法马拉松 19A】区间的价值 V2
    【BZOJ 2541】【Vijos 1366】【CTSC 2000】冰原探险
    【BZOJ 1065】【Vijos 1826】【NOI 2008】奥运物流
  • 原文地址:https://www.cnblogs.com/showing/p/11774719.html
Copyright © 2011-2022 走看看