zoukankan      html  css  js  c++  java
  • Calcite(二): 从list到tree的转换1

      语法解析有个核心目标,那就是需要构建抽象语法树。虽然说语法解析框架可以很容易的识别出各节点的结构,但还需要我们按照自行需求,转换成期望的树结构,才能够方便使用。

      基本上,所有的表达式,都会进行嵌套组合,从而才能够发挥其强大的作用。但,往往我们在做解析的时候,又是线性的解析。所以,最初,我们能得到的必然是一个个token列表。所以,如何将一个个平铺的列表,转换成树状结构,将会是一个比较重要的话题。我们今天就来看看calcite中的其中一非常小的点,它是如何将一个list转换为tree的吧。

    1. 工具类入口

      calcite 树的转化过程: list -> tree,是一个相对独立的过程,所以被写到工具类中去了。其最终结果是用 SqlNode 承载的。

      // org.apache.calcite.sql.parser.SqlParserUtil#toTree
      /**
       * Converts a list of {expression, operator, expression, ...} into a tree,
       * taking operator precedence and associativity into account.
       */
      public static @Nullable SqlNode toTree(List<@Nullable Object> list) {
        if (list.size() == 1
            && list.get(0) instanceof SqlNode) {
          // Short-cut for the simple common case
          return (SqlNode) list.get(0);
        }
        LOGGER.trace("Attempting to reduce {}", list);
        final OldTokenSequenceImpl tokenSequence = new OldTokenSequenceImpl(list);
        final SqlNode node = toTreeEx(tokenSequence, 0, 0, SqlKind.OTHER);
        LOGGER.debug("Reduced {}", node);
        return node;
      }
      // org.apache.calcite.sql.parser.SqlParserUtil#toTreeEx
      /**
       * Converts a list of {expression, operator, expression, ...} into a tree,
       * taking operator precedence and associativity into account.
       *
       * @param list        List of operands and operators. This list is modified as
       *                    expressions are reduced.
       * @param start       Position of first operand in the list. Anything to the
       *                    left of this (besides the immediately preceding operand)
       *                    is ignored. Generally use value 1.
       * @param minPrec     Minimum precedence to consider. If the method encounters
       *                    an operator of lower precedence, it doesn't reduce any
       *                    further.
       * @param stopperKind If not {@link SqlKind#OTHER}, stop reading the list if
       *                    we encounter a token of this kind.
       * @return the root node of the tree which the list condenses into
       */
      public static SqlNode toTreeEx(SqlSpecialOperator.TokenSequence list,
          int start, final int minPrec, final SqlKind stopperKind) {
        PrecedenceClimbingParser parser = list.parser(start,
            token -> {
              if (token instanceof PrecedenceClimbingParser.Op) {
                PrecedenceClimbingParser.Op tokenOp = (PrecedenceClimbingParser.Op) token;
                final SqlOperator op = ((ToTreeListItem) tokenOp.o()).op;
                return stopperKind != SqlKind.OTHER
                    && op.kind == stopperKind
                    || minPrec > 0
                    && op.getLeftPrec() < minPrec;
              } else {
                return false;
              }
            });
        final int beforeSize = parser.all().size();
        // 将list形式的token转换成树形式的token
        parser.partialParse();
        final int afterSize = parser.all().size();
        // 将树形token转换成SqlNode表示
        final SqlNode node = convert(parser.all().get(0));
        // 将转换掉的token占位全部清空,将在第一个位置处替换为 SqlNode
        list.replaceSublist(start, start + beforeSize - afterSize + 1, node);
        return node;
      }
    
      // org.apache.calcite.sql.parser.SqlParserUtil#convert
      private static SqlNode convert(PrecedenceClimbingParser.Token token) {
        switch (token.type) {
        case ATOM:
          return requireNonNull((SqlNode) token.o);
        case CALL:
          final PrecedenceClimbingParser.Call call =
              (PrecedenceClimbingParser.Call) token;
          final List<@Nullable SqlNode> list = new ArrayList<>();
          for (PrecedenceClimbingParser.Token arg : call.args) {
            list.add(convert(arg));
          }
          final ToTreeListItem item = (ToTreeListItem) call.op.o();
          if (list.size() == 1) {
            SqlNode firstItem = list.get(0);
            if (item.op == SqlStdOperatorTable.UNARY_MINUS
                && firstItem instanceof SqlNumericLiteral) {
              return SqlLiteral.createNegative((SqlNumericLiteral) firstItem,
                  item.pos.plusAll(list));
            }
            if (item.op == SqlStdOperatorTable.UNARY_PLUS
                && firstItem instanceof SqlNumericLiteral) {
              return firstItem;
            }
          }
          return item.op.createCall(item.pos.plusAll(list), list);
        default:
          throw new AssertionError(token);
        }
      }

      以上就是其转换list到tree的框架代码了,关键词是:优先级,转换,。。。

    2. 具体的list->tree过程

      树的转换过程,主要是将list进行合并组合的过程。大体是按照每个符号的优先级,将其前后元素作为其操作数,合并。比如:a > 1 or b < 2, 会被构建 >a1 or b < 2, >a1 or <b2, (or)(>a1)(<b2) 。  而要选出优先级最高的元素,优先从其开始做树合并,才是正确的选择。确定最高优先级元素过程示意图如下:

       具体解析过程如下:

        // org.apache.calcite.sql.parser.SqlParserUtil.OldTokenSequenceImpl#parser
        @Override public PrecedenceClimbingParser parser(int start,
            Predicate<PrecedenceClimbingParser.Token> predicate) {
          final PrecedenceClimbingParser.Builder builder =
              new PrecedenceClimbingParser.Builder();
          for (Object o : Util.skip(list, start)) {
            if (o instanceof ToTreeListItem) {
              final ToTreeListItem item = (ToTreeListItem) o;
              final SqlOperator op = item.getOperator();
              if (op instanceof SqlPrefixOperator) {
                builder.prefix(item, op.getLeftPrec());
              } else if (op instanceof SqlPostfixOperator) {
                builder.postfix(item, op.getRightPrec());
              } else if (op instanceof SqlBinaryOperator) {
                builder.infix(item, op.getLeftPrec(),
                    op.getLeftPrec() < op.getRightPrec());
              } else if (op instanceof SqlSpecialOperator) {
                builder.special(item, op.getLeftPrec(), op.getRightPrec(),
                    (parser, op2) -> {
                      final List<PrecedenceClimbingParser.Token> tokens =
                          parser.all();
                      final SqlSpecialOperator op1 =
                          (SqlSpecialOperator) requireNonNull((ToTreeListItem) op2.o, "op2.o").op;
                      SqlSpecialOperator.ReduceResult r =
                          op1.reduceExpr(tokens.indexOf(op2),
                              new TokenSequenceImpl(parser));
                      return new PrecedenceClimbingParser.Result(
                          tokens.get(r.startOrdinal),
                          tokens.get(r.endOrdinal - 1),
                          parser.atom(r.node));
                    });
              } else {
                throw new AssertionError();
              }
            } else {
              builder.atom(requireNonNull(o, "o"));
            }
          }
          return builder.build();
        }
    
      // org.apache.calcite.util.PrecedenceClimbingParser#partialParse
      public void partialParse() {
        for (;;) {
          // 每次循环,找到一个符号,将树收缩,若没有找到,则说明树已全部构建完成
          // 按照优先级,会先将 > < = 这些符号替换完,然后再替换 and or 等等
          // 比如: a > 1 or b < 2, 会被构建 >a1 or b < 2, >a1 or <b2, (or)(>a1)(<b2)
          // 所以,优先级的定义非常重要,它是在符号定义的时候就确定下来的
          Op op = highest();
          if (op == null) {
            return;
          }
          final Token t;
          switch (op.type) {
          case POSTFIX: {
            Token previous = requireNonNull(op.previous, () -> "previous of " + op);
            t = call(op, ImmutableList.of(previous));
            replace(t, previous.previous, op.next);
            break;
          }
          case PREFIX: {
            Token next = requireNonNull(op.next, () -> "next of " + op);
            t = call(op, ImmutableList.of(next));
            replace(t, op.previous, next.next);
            break;
          }
          case INFIX: {
            Token previous = requireNonNull(op.previous, () -> "previous of " + op);
            // 构造token关系,如 = a b
            Token next = requireNonNull(op.next, () -> "next of " + op);
            // 替换首尾节点关系
            // 此处的call, 会将left,right 置为-1, 以便在后续的遍历中, 不再找出当前节点
              // replace 将call的next, previous 设置为下一跳节点, 将call设置到整个树的尾部, 即整个树形结构收缩
            t = call(op, ImmutableList.of(previous, next));
            replace(t, previous.previous, next.next);
            // switch 的break, 转到下一次for循环
            break;
          }
          case SPECIAL: {
            Result r = ((SpecialOp) op).special.apply(this, (SpecialOp) op);
            requireNonNull(r, "r");
            replace(r.replacement, r.first.previous, r.last.next);
            break;
          }
          default:
            throw new AssertionError();
          }
          // debug: System.out.println(this);
        }
      }
      // org.apache.calcite.util.PrecedenceClimbingParser#replace
      private void replace(Token t, @Nullable Token previous, @Nullable Token next) {
        t.previous = previous;
        t.next = next;
        // 如果上一节点不为空,则将上一节点的下
        if (previous == null) {
          first = t;
        } else {
          previous.next = t;
        }
        if (next == null) {
          last = t;
        } else {
          next.previous = t;
        }
      }
    
      // org.apache.calcite.sql.parser.SqlParserUtil#replaceSublist
      /**
       * Replaces a range of elements in a list with a single element. For
       * example, if list contains <code>{A, B, C, D, E}</code> then <code>
       * replaceSublist(list, X, 1, 4)</code> returns <code>{A, X, E}</code>.
       */
      public static <T> void replaceSublist(
          List<T> list,
          int start,
          int end,
          T o) {
        requireNonNull(list, "list");
        Preconditions.checkArgument(start < end);
        // 从后往前remove, 保证remove的准确性
        for (int i = end - 1; i > start; --i) {
          list.remove(i);
        }
        list.set(start, o);
      }

      

    3. 各符号定义

      符号定义时,就将优先级定义好了。以便在后续构建时使用。其基本都被定义在 SqlStdOperator 中。 以加减乘除为例,加减会是同一个优先级,乘除是另一个高优先级的操作。

          // org.apache.calcite.sql.fun.SqlStdOperatorTable#AND
        public static final SqlBinaryOperator AND =
          new SqlBinaryOperator(
              "AND",
              SqlKind.AND,
              24,        // AND 优先级24
              true,
              ReturnTypes.BOOLEAN_NULLABLE_OPTIMIZED,
              InferTypes.BOOLEAN,
              OperandTypes.BOOLEAN_BOOLEAN);
      /**
       * Arithmetic division operator, '<code>/</code>'.
       */
      public static final SqlBinaryOperator DIVIDE =
          new SqlBinaryOperator(
              "/",
              SqlKind.DIVIDE,
              60,        // 除号的优先级比较高
              true,
              ReturnTypes.QUOTIENT_NULLABLE,
              InferTypes.FIRST_KNOWN,
              OperandTypes.DIVISION_OPERATOR);
      /**
       * Arithmetic multiplication operator, '<code>*</code>'.
       */
      public static final SqlBinaryOperator MULTIPLY =
          new SqlMonotonicBinaryOperator(
              "*",
              SqlKind.TIMES,
              60,
              true,
              ReturnTypes.PRODUCT_NULLABLE,
              InferTypes.FIRST_KNOWN,
              OperandTypes.MULTIPLY_OPERATOR);
      /**
       * Infix arithmetic minus operator, '<code>-</code>'.
       *
       * <p>Its precedence is less than the prefix {@link #UNARY_PLUS +}
       * and {@link #UNARY_MINUS -} operators.
       */
      public static final SqlBinaryOperator MINUS =
          new SqlMonotonicBinaryOperator(
              "-",
              SqlKind.MINUS,
              40,
              true,
    
              // Same type inference strategy as sum
              ReturnTypes.NULLABLE_SUM,
              InferTypes.FIRST_KNOWN,
              OperandTypes.MINUS_OPERATOR);
      /**
       * Logical equals operator, '<code>=</code>'.
       */
      public static final SqlBinaryOperator EQUALS =
          new SqlBinaryOperator(
              "=",
              SqlKind.EQUALS,
              30,            // =号的优先级比较小
              true,
              ReturnTypes.BOOLEAN_NULLABLE,
              InferTypes.FIRST_KNOWN,
              OperandTypes.COMPARABLE_UNORDERED_COMPARABLE_UNORDERED);
    
      /**
       * Logical less-than-or-equal operator, '<code>&lt;=</code>'.
       */
      public static final SqlBinaryOperator LESS_THAN_OR_EQUAL =
          new SqlBinaryOperator(
              "<=",
              SqlKind.LESS_THAN_OR_EQUAL,
              30,
              true,
              ReturnTypes.BOOLEAN_NULLABLE,
              InferTypes.FIRST_KNOWN,
              OperandTypes.COMPARABLE_ORDERED_COMPARABLE_ORDERED);

       最终的树形结果示例如下:

      即由操作数和操作符组成的树结构,即可认为它是一种基于栈的编译结构,对于我们表达语义,比较清晰。

      实际上,树结构只是一种表现形式,它需要在不同的场合应用不同的结构,灵活变换,方能如鱼得水。比如整个sql语句,在calcite的树结构中,又不是这样的了。

    不要害怕今日的苦,你要相信明天,更苦!
  • 相关阅读:
    python中关于with以及contextlib的使用
    Python之Redis操作
    Python操作memecache
    COM组件技术名称解释
    C++11-新增正则表达式
    BSTR与char*、cstring、CComBSTR的转换
    ATL字符宏使用以及代码测试
    获取与一个磁盘的组织以及剩余空间容量有关的信息以及代码测试
    关于cstring ->string-> const char * 用U2A一步转换 错误的内存问题
    cstring、string、wstring、int、char*、tchar、 int、dword等相互转换代码输出测试
  • 原文地址:https://www.cnblogs.com/yougewe/p/15202032.html
Copyright © 2011-2022 走看看