zoukankan      html  css  js  c++  java
  • Calcite(二): 从list到tree的转换1

      语法解析有个核心目标,那就是需要构建抽象语法树。虽然说语法解析框架可以很容易的识别出各节点的结构,但还需要我们按照自行需求,转换成期望的树结构,才能够方便使用。

      基本上,所有的表达式,都会进行嵌套组合,从而才能够发挥其强大的作用。但,往往我们在做解析的时候,又是线性的解析。所以,最初,我们能得到的必然是一个个token列表。所以,如何将一个个平铺的列表,转换成树状结构,将会是一个比较重要的话题。我们今天就来看看calcite中的其中一非常小的点,它是如何将一个list转换为tree的吧。

    1. 工具类入口

      calcite 树的转化过程: list -> tree,是一个相对独立的过程,所以被写到工具类中去了。其最终结果是用 SqlNode 承载的。

      // org.apache.calcite.sql.parser.SqlParserUtil#toTree
      /**
       * Converts a list of {expression, operator, expression, ...} into a tree,
       * taking operator precedence and associativity into account.
       */
      public static @Nullable SqlNode toTree(List<@Nullable Object> list) {
        if (list.size() == 1
            && list.get(0) instanceof SqlNode) {
          // Short-cut for the simple common case
          return (SqlNode) list.get(0);
        }
        LOGGER.trace("Attempting to reduce {}", list);
        final OldTokenSequenceImpl tokenSequence = new OldTokenSequenceImpl(list);
        final SqlNode node = toTreeEx(tokenSequence, 0, 0, SqlKind.OTHER);
        LOGGER.debug("Reduced {}", node);
        return node;
      }
      // org.apache.calcite.sql.parser.SqlParserUtil#toTreeEx
      /**
       * Converts a list of {expression, operator, expression, ...} into a tree,
       * taking operator precedence and associativity into account.
       *
       * @param list        List of operands and operators. This list is modified as
       *                    expressions are reduced.
       * @param start       Position of first operand in the list. Anything to the
       *                    left of this (besides the immediately preceding operand)
       *                    is ignored. Generally use value 1.
       * @param minPrec     Minimum precedence to consider. If the method encounters
       *                    an operator of lower precedence, it doesn't reduce any
       *                    further.
       * @param stopperKind If not {@link SqlKind#OTHER}, stop reading the list if
       *                    we encounter a token of this kind.
       * @return the root node of the tree which the list condenses into
       */
      public static SqlNode toTreeEx(SqlSpecialOperator.TokenSequence list,
          int start, final int minPrec, final SqlKind stopperKind) {
        PrecedenceClimbingParser parser = list.parser(start,
            token -> {
              if (token instanceof PrecedenceClimbingParser.Op) {
                PrecedenceClimbingParser.Op tokenOp = (PrecedenceClimbingParser.Op) token;
                final SqlOperator op = ((ToTreeListItem) tokenOp.o()).op;
                return stopperKind != SqlKind.OTHER
                    && op.kind == stopperKind
                    || minPrec > 0
                    && op.getLeftPrec() < minPrec;
              } else {
                return false;
              }
            });
        final int beforeSize = parser.all().size();
        // 将list形式的token转换成树形式的token
        parser.partialParse();
        final int afterSize = parser.all().size();
        // 将树形token转换成SqlNode表示
        final SqlNode node = convert(parser.all().get(0));
        // 将转换掉的token占位全部清空,将在第一个位置处替换为 SqlNode
        list.replaceSublist(start, start + beforeSize - afterSize + 1, node);
        return node;
      }
    
      // org.apache.calcite.sql.parser.SqlParserUtil#convert
      private static SqlNode convert(PrecedenceClimbingParser.Token token) {
        switch (token.type) {
        case ATOM:
          return requireNonNull((SqlNode) token.o);
        case CALL:
          final PrecedenceClimbingParser.Call call =
              (PrecedenceClimbingParser.Call) token;
          final List<@Nullable SqlNode> list = new ArrayList<>();
          for (PrecedenceClimbingParser.Token arg : call.args) {
            list.add(convert(arg));
          }
          final ToTreeListItem item = (ToTreeListItem) call.op.o();
          if (list.size() == 1) {
            SqlNode firstItem = list.get(0);
            if (item.op == SqlStdOperatorTable.UNARY_MINUS
                && firstItem instanceof SqlNumericLiteral) {
              return SqlLiteral.createNegative((SqlNumericLiteral) firstItem,
                  item.pos.plusAll(list));
            }
            if (item.op == SqlStdOperatorTable.UNARY_PLUS
                && firstItem instanceof SqlNumericLiteral) {
              return firstItem;
            }
          }
          return item.op.createCall(item.pos.plusAll(list), list);
        default:
          throw new AssertionError(token);
        }
      }

      以上就是其转换list到tree的框架代码了,关键词是:优先级,转换,。。。

    2. 具体的list->tree过程

      树的转换过程,主要是将list进行合并组合的过程。大体是按照每个符号的优先级,将其前后元素作为其操作数,合并。比如:a > 1 or b < 2, 会被构建 >a1 or b < 2, >a1 or <b2, (or)(>a1)(<b2) 。  而要选出优先级最高的元素,优先从其开始做树合并,才是正确的选择。确定最高优先级元素过程示意图如下:

       具体解析过程如下:

        // org.apache.calcite.sql.parser.SqlParserUtil.OldTokenSequenceImpl#parser
        @Override public PrecedenceClimbingParser parser(int start,
            Predicate<PrecedenceClimbingParser.Token> predicate) {
          final PrecedenceClimbingParser.Builder builder =
              new PrecedenceClimbingParser.Builder();
          for (Object o : Util.skip(list, start)) {
            if (o instanceof ToTreeListItem) {
              final ToTreeListItem item = (ToTreeListItem) o;
              final SqlOperator op = item.getOperator();
              if (op instanceof SqlPrefixOperator) {
                builder.prefix(item, op.getLeftPrec());
              } else if (op instanceof SqlPostfixOperator) {
                builder.postfix(item, op.getRightPrec());
              } else if (op instanceof SqlBinaryOperator) {
                builder.infix(item, op.getLeftPrec(),
                    op.getLeftPrec() < op.getRightPrec());
              } else if (op instanceof SqlSpecialOperator) {
                builder.special(item, op.getLeftPrec(), op.getRightPrec(),
                    (parser, op2) -> {
                      final List<PrecedenceClimbingParser.Token> tokens =
                          parser.all();
                      final SqlSpecialOperator op1 =
                          (SqlSpecialOperator) requireNonNull((ToTreeListItem) op2.o, "op2.o").op;
                      SqlSpecialOperator.ReduceResult r =
                          op1.reduceExpr(tokens.indexOf(op2),
                              new TokenSequenceImpl(parser));
                      return new PrecedenceClimbingParser.Result(
                          tokens.get(r.startOrdinal),
                          tokens.get(r.endOrdinal - 1),
                          parser.atom(r.node));
                    });
              } else {
                throw new AssertionError();
              }
            } else {
              builder.atom(requireNonNull(o, "o"));
            }
          }
          return builder.build();
        }
    
      // org.apache.calcite.util.PrecedenceClimbingParser#partialParse
      public void partialParse() {
        for (;;) {
          // 每次循环,找到一个符号,将树收缩,若没有找到,则说明树已全部构建完成
          // 按照优先级,会先将 > < = 这些符号替换完,然后再替换 and or 等等
          // 比如: a > 1 or b < 2, 会被构建 >a1 or b < 2, >a1 or <b2, (or)(>a1)(<b2)
          // 所以,优先级的定义非常重要,它是在符号定义的时候就确定下来的
          Op op = highest();
          if (op == null) {
            return;
          }
          final Token t;
          switch (op.type) {
          case POSTFIX: {
            Token previous = requireNonNull(op.previous, () -> "previous of " + op);
            t = call(op, ImmutableList.of(previous));
            replace(t, previous.previous, op.next);
            break;
          }
          case PREFIX: {
            Token next = requireNonNull(op.next, () -> "next of " + op);
            t = call(op, ImmutableList.of(next));
            replace(t, op.previous, next.next);
            break;
          }
          case INFIX: {
            Token previous = requireNonNull(op.previous, () -> "previous of " + op);
            // 构造token关系,如 = a b
            Token next = requireNonNull(op.next, () -> "next of " + op);
            // 替换首尾节点关系
            // 此处的call, 会将left,right 置为-1, 以便在后续的遍历中, 不再找出当前节点
              // replace 将call的next, previous 设置为下一跳节点, 将call设置到整个树的尾部, 即整个树形结构收缩
            t = call(op, ImmutableList.of(previous, next));
            replace(t, previous.previous, next.next);
            // switch 的break, 转到下一次for循环
            break;
          }
          case SPECIAL: {
            Result r = ((SpecialOp) op).special.apply(this, (SpecialOp) op);
            requireNonNull(r, "r");
            replace(r.replacement, r.first.previous, r.last.next);
            break;
          }
          default:
            throw new AssertionError();
          }
          // debug: System.out.println(this);
        }
      }
      // org.apache.calcite.util.PrecedenceClimbingParser#replace
      private void replace(Token t, @Nullable Token previous, @Nullable Token next) {
        t.previous = previous;
        t.next = next;
        // 如果上一节点不为空,则将上一节点的下
        if (previous == null) {
          first = t;
        } else {
          previous.next = t;
        }
        if (next == null) {
          last = t;
        } else {
          next.previous = t;
        }
      }
    
      // org.apache.calcite.sql.parser.SqlParserUtil#replaceSublist
      /**
       * Replaces a range of elements in a list with a single element. For
       * example, if list contains <code>{A, B, C, D, E}</code> then <code>
       * replaceSublist(list, X, 1, 4)</code> returns <code>{A, X, E}</code>.
       */
      public static <T> void replaceSublist(
          List<T> list,
          int start,
          int end,
          T o) {
        requireNonNull(list, "list");
        Preconditions.checkArgument(start < end);
        // 从后往前remove, 保证remove的准确性
        for (int i = end - 1; i > start; --i) {
          list.remove(i);
        }
        list.set(start, o);
      }

      

    3. 各符号定义

      符号定义时,就将优先级定义好了。以便在后续构建时使用。其基本都被定义在 SqlStdOperator 中。 以加减乘除为例,加减会是同一个优先级,乘除是另一个高优先级的操作。

          // org.apache.calcite.sql.fun.SqlStdOperatorTable#AND
        public static final SqlBinaryOperator AND =
          new SqlBinaryOperator(
              "AND",
              SqlKind.AND,
              24,        // AND 优先级24
              true,
              ReturnTypes.BOOLEAN_NULLABLE_OPTIMIZED,
              InferTypes.BOOLEAN,
              OperandTypes.BOOLEAN_BOOLEAN);
      /**
       * Arithmetic division operator, '<code>/</code>'.
       */
      public static final SqlBinaryOperator DIVIDE =
          new SqlBinaryOperator(
              "/",
              SqlKind.DIVIDE,
              60,        // 除号的优先级比较高
              true,
              ReturnTypes.QUOTIENT_NULLABLE,
              InferTypes.FIRST_KNOWN,
              OperandTypes.DIVISION_OPERATOR);
      /**
       * Arithmetic multiplication operator, '<code>*</code>'.
       */
      public static final SqlBinaryOperator MULTIPLY =
          new SqlMonotonicBinaryOperator(
              "*",
              SqlKind.TIMES,
              60,
              true,
              ReturnTypes.PRODUCT_NULLABLE,
              InferTypes.FIRST_KNOWN,
              OperandTypes.MULTIPLY_OPERATOR);
      /**
       * Infix arithmetic minus operator, '<code>-</code>'.
       *
       * <p>Its precedence is less than the prefix {@link #UNARY_PLUS +}
       * and {@link #UNARY_MINUS -} operators.
       */
      public static final SqlBinaryOperator MINUS =
          new SqlMonotonicBinaryOperator(
              "-",
              SqlKind.MINUS,
              40,
              true,
    
              // Same type inference strategy as sum
              ReturnTypes.NULLABLE_SUM,
              InferTypes.FIRST_KNOWN,
              OperandTypes.MINUS_OPERATOR);
      /**
       * Logical equals operator, '<code>=</code>'.
       */
      public static final SqlBinaryOperator EQUALS =
          new SqlBinaryOperator(
              "=",
              SqlKind.EQUALS,
              30,            // =号的优先级比较小
              true,
              ReturnTypes.BOOLEAN_NULLABLE,
              InferTypes.FIRST_KNOWN,
              OperandTypes.COMPARABLE_UNORDERED_COMPARABLE_UNORDERED);
    
      /**
       * Logical less-than-or-equal operator, '<code>&lt;=</code>'.
       */
      public static final SqlBinaryOperator LESS_THAN_OR_EQUAL =
          new SqlBinaryOperator(
              "<=",
              SqlKind.LESS_THAN_OR_EQUAL,
              30,
              true,
              ReturnTypes.BOOLEAN_NULLABLE,
              InferTypes.FIRST_KNOWN,
              OperandTypes.COMPARABLE_ORDERED_COMPARABLE_ORDERED);

       最终的树形结果示例如下:

      即由操作数和操作符组成的树结构,即可认为它是一种基于栈的编译结构,对于我们表达语义,比较清晰。

      实际上,树结构只是一种表现形式,它需要在不同的场合应用不同的结构,灵活变换,方能如鱼得水。比如整个sql语句,在calcite的树结构中,又不是这样的了。

    不要害怕今日的苦,你要相信明天,更苦!
  • 相关阅读:
    70.BOM
    69.捕获错误try catch
    68.键盘事件
    523. Continuous Subarray Sum
    901. Online Stock Span
    547. Friend Circles
    162. Find Peak Element
    1008. Construct Binary Search Tree from Preorder Traversal
    889. Construct Binary Tree from Preorder and Postorder Traversal
    106. Construct Binary Tree from Inorder and Postorder Traversal
  • 原文地址:https://www.cnblogs.com/yougewe/p/15202032.html
Copyright © 2011-2022 走看看