zoukankan      html  css  js  c++  java
  • Calcite分析

    Calcite源码分析,参考:

    http://matt33.com/2019/03/07/apache-calcite-process-flow/

    https://matt33.com/2019/03/17/apache-calcite-planner/

    Rule作为Calcite查询优化的核心,

    具体看几个有代表性的Rule,看看是如何实现的

    最简单的例子,Join结合律,JoinAssociateRule

    首先所有的Rule都继承RelOptRule类

    /**
     * Planner rule that changes a join based on the associativity rule.
     *
     * <p>((a JOIN b) JOIN c) &rarr; (a JOIN (b JOIN c))</p>
     *
     * <p>We do not need a rule to convert (a JOIN (b JOIN c)) &rarr;
     * ((a JOIN b) JOIN c) because we have
     * {@link JoinCommuteRule}.
     *
     * @see JoinCommuteRule
     */
    public class JoinAssociateRule extends RelOptRule {

    RelOptRule用于transform expression

    它维护的Operand tree表明,该rule可以适用于何种树结构,具体看下面的例子

    /**
     * A <code>RelOptRule</code> transforms an expression into another. It has a
     * list of {@link RelOptRuleOperand}s, which determine whether the rule can be
     * applied to a particular section of the tree.
     *
     * <p>The optimizer figures out which rules are applicable, then calls
     * {@link #onMatch} on each of them.</p>
     */
    public abstract class RelOptRule {
     
      /**
       * Root of operand tree.
       */
      private final RelOptRuleOperand operand;
    
      /** Factory for a builder for relational expressions.
       *
       * <p>The actual builder is available via {@link RelOptRuleCall#builder()}. */
      public final RelBuilderFactory relBuilderFactory;
    
      /**
       * Flattened list of operands.
       */
      public final List<RelOptRuleOperand> operands;
    
      //~ Constructors -----------------------------------------------------------
    
      /**
       * Creates a rule with an explicit description.
       *
       * @param operand     root operand, must not be null
       * @param description Description, or null to guess description
       * @param relBuilderFactory Builder for relational expressions
       */
      public RelOptRule(RelOptRuleOperand operand,
          RelBuilderFactory relBuilderFactory, String description) {
        this.operand = Objects.requireNonNull(operand);
        this.relBuilderFactory = Objects.requireNonNull(relBuilderFactory);this.description = description;
        this.operands = flattenOperands(operand);
        assignSolveOrder();
      }

    比如对于Join结合律,调用super,即RelOptRule的构造函数

      /**
       * Creates a JoinAssociateRule.
       */
      public JoinAssociateRule(RelBuilderFactory relBuilderFactory) {
        super(
            operand(Join.class,
                operand(Join.class, any()),
                operand(RelSubset.class, any())),
            relBuilderFactory, null);
      }

    operand也是一个树形结构,Top的Operand的类型是Join,他有两个children,其中一个也是join,另一个是RelSubset

    他们的children是any()

      /**
       * Creates a list of child operands that signifies that the operand matches
       * any number of child relational expressions.
       *
       * @return List of child operands that signifies that the operand matches
       *   any number of child relational expressions
       */
      public static RelOptRuleOperandChildren any() {
        return RelOptRuleOperandChildren.ANY_CHILDREN;
      }

    然后要理解Rule如果被使用的,来看看HepPlaner里面的代码

    因为HepPlaner逻辑比较简单,就是遍历所有的HepRelVertex,看看RelOptRule是否可以匹配,如果匹配就应用Rule

    所以会调用到applyRule,输入包含Rule和Vertex

      private HepRelVertex applyRule(
          RelOptRule rule,
          HepRelVertex vertex,
          boolean forceConversions) {
    
        final List<RelNode> bindings = new ArrayList<>();
        final Map<RelNode, List<RelNode>> nodeChildren = new HashMap<>();
        boolean match =
            matchOperands(
                rule.getOperand(),
                vertex.getCurrentRel(),
                bindings,
                nodeChildren);
    
        if (!match) {
          return null;
        }
    
        HepRuleCall call =
            new HepRuleCall(
                this,
                rule.getOperand(),
                bindings.toArray(new RelNode[0]),
                nodeChildren,
                parents);
    
        // Allow the rule to apply its own side-conditions.
        if (!rule.matches(call)) {
          return null;
        }
    
        fireRule(call);
    
        if (!call.getResults().isEmpty()) {
          return applyTransformationResults(
              vertex,
              call,
              parentTrait);
        }
        return null;
      }

    1. 首先调用,matchOperands,看看是否match,

      private boolean matchOperands(
          RelOptRuleOperand operand,  //Rule中的Operand
          RelNode rel,  //vertex中的RelNode
          List<RelNode> bindings,
          Map<RelNode, List<RelNode>> nodeChildren) {
        if (!operand.matches(rel)) { //先比较top node和operand是否match
          return false;
        }
        bindings.add(rel); //如果match,把top relnode加入bindings
        //接着来比较child是否match
        //child有几种类型:
        //Any,随意,直接返回true
        //Unordered,无序,对于每个operand只要有任何一个child RelNode可满足即可
        //Default,Some,Operand和RelNode严格有序匹配
        List<HepRelVertex> childRels = (List) rel.getInputs();
        switch (operand.childPolicy) {
        case ANY:
          return true;
        case UNORDERED:
          // For each operand, at least one child must match. If
          // matchAnyChildren, usually there's just one operand.
          for (RelOptRuleOperand childOperand : operand.getChildOperands()) {
            boolean match = false;
            for (HepRelVertex childRel : childRels) {
              match =
                  matchOperands( //递归调用matchOperands
                      childOperand,
                      childRel.getCurrentRel(),
                      bindings,
                      nodeChildren);
              if (match) {
                break;
              }
            }
            if (!match) {
              return false;
            }
          }
          final List<RelNode> children = new ArrayList<>(childRels.size());
          for (HepRelVertex childRel : childRels) {
            children.add(childRel.getCurrentRel());
          }
          nodeChildren.put(rel, children);
          return true;
        default:
          int n = operand.getChildOperands().size();
          if (childRels.size() < n) {
            return false;
          }
          //一一按顺序对应match
          for (Pair<HepRelVertex, RelOptRuleOperand> pair
              : Pair.zip(childRels, operand.getChildOperands())) {
            boolean match =
                matchOperands(
                    pair.right,
                    pair.left.getCurrentRel(),
                    bindings,
                    nodeChildren);
            if (!match) {
              return false;
            }
          }
          return true;
        }
      }

    逻辑是先比较自身,然后再递归比较children

    比较函数,

    可以看出,从类型,Trait,Predicate上来比较,是否匹配

     /**
       * Returns whether a relational expression matches this operand. It must be
       * of the right class and trait.
       */
      public boolean matches(RelNode rel) {
        if (!clazz.isInstance(rel)) {
          return false;
        }
        if ((trait != null) && !rel.getTraitSet().contains(trait)) {
          return false;
        }
        return predicate.test(rel);
      }

    child的类型分为,

    /**
     * Policy by which operands will be matched by relational expressions with
     * any number of children.
     */
    public enum RelOptRuleOperandChildPolicy {
      /**
       * Signifies that operand can have any number of children.
       */
      ANY,
    
      /**
       * Signifies that operand has no children. Therefore it matches a
       * leaf node, such as a table scan or VALUES operator.
       *
       * <p>{@code RelOptRuleOperand(Foo.class, NONE)} is equivalent to
       * {@code RelOptRuleOperand(Foo.class)} but we prefer the former because
       * it is more explicit.</p>
       */
      LEAF,
    
      /**
       * Signifies that the operand's children must precisely match its
       * child operands, in order.
       */
      SOME,
    
      /**
       * Signifies that the rule matches any one of its parents' children.
       * The parent may have one or more children.
       */
      UNORDERED,
    }

    不同的类型按照注释中的规则进行match

    2. 如果match,继续会把当前结果封装成HepRuleCall,继承自RelOptRuleCall

    是RelOptRule的一次invocation,即会记录调用中用到的上下文数据和结果

    /**
     * A <code>RelOptRuleCall</code> is an invocation of a {@link RelOptRule} with a
     * set of {@link RelNode relational expression}s as arguments.
     */
    public abstract class RelOptRuleCall {
    
      /**
       * Generator for {@link #id} values.
       */
      private static int nextId = 0;
    
      //~ Instance fields --------------------------------------------------------
    
      public final int id;
      protected final RelOptRuleOperand operand0; //Rule的Operand树的root
      protected Map<RelNode, List<RelNode>> nodeInputs; //所有的RelNode和他们的inputs
      public final RelOptRule rule;
      public final RelNode[] rels; //match到的所有RelNodes
      private final RelOptPlanner planner;
      private final List<RelNode> parents; //对于Top RelNodes而言的parents

    3. 调用rule.matches(call)

    默认实现是,return true,即不检查,这里允许Rule加一下特定的match检测

    4. 调用fireRule(call)

    而其中主要是调用,ruleCall.getRule().onMatch(ruleCall);

    所以做的事情是需要在Rule的onMatch里面定义的

    下面看下JoinAssociateRule的实现,

      public void onMatch(final RelOptRuleCall call) {
        final Join topJoin = call.rel(0);
        final Join bottomJoin = call.rel(1);
        final RelNode relA = bottomJoin.getLeft();
        final RelNode relB = bottomJoin.getRight();
        final RelSubset relC = call.rel(2);
        final RelOptCluster cluster = topJoin.getCluster();
        final RexBuilder rexBuilder = cluster.getRexBuilder();
    
        if (relC.getConvention() != relA.getConvention()) {
          // relC could have any trait-set. But if we're matching say
          // EnumerableConvention, we're only interested in enumerable subsets.
          return;
        }
    
        //        topJoin
        //        /     
        //   bottomJoin  C
        //    /    
        //   A      B
    
        final int aCount = relA.getRowType().getFieldCount();
        final int bCount = relB.getRowType().getFieldCount();
        final int cCount = relC.getRowType().getFieldCount();
        final ImmutableBitSet aBitSet = ImmutableBitSet.range(0, aCount);
        final ImmutableBitSet bBitSet =
            ImmutableBitSet.range(aCount, aCount + bCount);

    这部分都是在初始化,上面好理解,下面算这些count是干啥?

    其实是在算RexInputRef,

    RexNode和RelNode的共同点是,他们都代表一个树

    差别是他们代表的东西不同,RelNode代表关系代数算子组成的数,而RexNode表示表达式树

    比如RexLiteral表示常量,RexVariable表示变量,RexCall表示操作来连接Literal和Variable

    RexVariable往往表示输入的某个field,为了效率,这里只会记录field的id,即这个RexInputRef

    /**
     * Variable which references a field of an input relational expression.
     *
     * <p>Fields of the input are 0-based. If there is more than one input, they are
     * numbered consecutively. For example, if the inputs to a join are</p>
     *
     * <ul>
     * <li>Input #0: EMP(EMPNO, ENAME, DEPTNO) and</li>
     * <li>Input #1: DEPT(DEPTNO AS DEPTNO2, DNAME)</li>
     * </ul>
     *
     * <p>then the fields are:</p>
     *
     * <ul>
     * <li>Field #0: EMPNO</li>
     * <li>Field #1: ENAME</li>
     * <li>Field #2: DEPTNO (from EMP)</li>
     * <li>Field #3: DEPTNO2 (from DEPT)</li>
     * <li>Field #4: DNAME</li>
     * </ul>
     *
     * <p>So <code>RexInputRef(3, Integer)</code> is the correct reference for the
     * field DEPTNO2.</p>
     */
    public class RexInputRef extends RexSlot {

    而RexInputRef的index不是固定的,是按顺序排的,

    所以按上面把fieldcount都算出来,就可以排出对应的index 

    接着做的一堆都是为了调整conditions的位置和相对应的index

        // Goal is to transform to
        //
        //       newTopJoin
        //        /     
        //       A   newBottomJoin
        //               /    
        //              B      C
    
        // Split the condition of topJoin and bottomJoin into a conjunctions. A
        // condition can be pushed down if it does not use columns from A.
        final List<RexNode> top = new ArrayList<>();
        final List<RexNode> bottom = new ArrayList<>();
        JoinPushThroughJoinRule.split(topJoin.getCondition(), aBitSet, top, bottom);
        JoinPushThroughJoinRule.split(bottomJoin.getCondition(), aBitSet, top,
            bottom);
    
        // Mapping for moving conditions from topJoin or bottomJoin to
        // newBottomJoin.
        // target: | B | C      |
        // source: | A       | B | C      |
        final Mappings.TargetMapping bottomMapping =
            Mappings.createShiftMapping(
                aCount + bCount + cCount,
                0, aCount, bCount,
                bCount, aCount + bCount, cCount);
        final List<RexNode> newBottomList = new ArrayList<>();
        new RexPermuteInputsShuttle(bottomMapping, relB, relC)
            .visitList(bottom, newBottomList);
        RexNode newBottomCondition =
            RexUtil.composeConjunction(rexBuilder, newBottomList);

    1. 把原先TopJoin和BottomJoin里面的conditions,分成和A相关的和无关的

    因为调整完以后,和A相关的conditions要放到TopJoin,而和A无关的需要放到BottomJoin

    aBitSet里面记录了所有A的fields的index,

      /**
       * Splits a condition into conjunctions that do or do not intersect with
       * a given bit set.
       */
      static void split(
          RexNode condition,
          ImmutableBitSet bitSet,
          List<RexNode> intersecting,
          List<RexNode> nonIntersecting) {
        for (RexNode node : RelOptUtil.conjunctions(condition)) {//把conjunction的条件拆分开
          ImmutableBitSet inputBitSet = RelOptUtil.InputFinder.bits(node); //找出condition所用到的fields
          if (bitSet.intersects(inputBitSet)) {
            intersecting.add(node);
          } else {
            nonIntersecting.add(node);
          }
        }
      }

    如何找出condition所用到的fields?

    如下,在递归的过程做回把所有碰到的inputRef记录下来

        /**
         * Returns a bit set describing the inputs used by an expression.
         */
        public static ImmutableBitSet bits(RexNode node) {
          return analyze(node).inputBitSet.build();
        }
    
      /** Returns an input finder that has analyzed a given expression. */
        public static InputFinder analyze(RexNode node) {
          final InputFinder inputFinder = new InputFinder();
          node.accept(inputFinder); //Visitor模式
          return inputFinder;
        }
    
       //如果RexNode是InputRef,就记录下该Ref的index
        public Void visitInputRef(RexInputRef inputRef) {
          inputBitSet.set(inputRef.getIndex());
          return null;
        }
      
      //如果RexNode是Call,继续递归accept该Call的相关Operands
      public R visitCall(RexCall call) {
        if (!deep) {
          return null;
        }
    
        R r = null;
        for (RexNode operand : call.operands) {
          r = operand.accept(this);
        }
        return r;
      }

    2. 由于树结构变了,会导致整个field的index发生改变

    对于原先的,bottomJoin,A的field从0开始,B是从aCount开始
    而对于变换后的,BottomJoin,B的field从0开始,C是从bCount开始

    所以需要完成Ref的修正,

    createShiftMapping,生成的mapping,包含原先的index,和当前新的index

    第一个参数是index的范围,一共aCount + bCount + cCount,所以index需要小于这个值

    然后,只有B,C的index需要调整,A的index不需要调整

    其中,B的原先是从aCount开始,当前从0开始,一共bCount个fields;C的原先是从aCount + bCount开始,当前从bCount开始,一共cCount个fields

    接着,RexPermuteInputsShuttle做具体的更新

    /**
     * Shuttle which applies a permutation to its input fields.
     *
     * @see RexPermutationShuttle
     * @see RexUtil#apply(org.apache.calcite.util.mapping.Mappings.TargetMapping, RexNode)
     */
    public class RexPermuteInputsShuttle extends RexShuttle {
      //~ Instance fields --------------------------------------------------------
    
      private final Mappings.TargetMapping mapping;
      private final ImmutableList<RelDataTypeField> fields;
    
      //对于每个InputRef,根据Mapping更新index
      @Override public RexNode visitInputRef(RexInputRef local) {
        final int index = local.getIndex();
        int target = mapping.getTarget(index);
        return new RexInputRef(
            target,
            local.getType());
      }

    最后,构建新的join,

        RexNode newBottomCondition =
            RexUtil.composeConjunction(rexBuilder, newBottomList);
    
        final Join newBottomJoin =
            bottomJoin.copy(bottomJoin.getTraitSet(), newBottomCondition, relB,
                relC, JoinRelType.INNER, false);
    
        // Condition for newTopJoin consists of pieces from bottomJoin and topJoin.
        // Field ordinals do not need to be changed.
        RexNode newTopCondition = RexUtil.composeConjunction(rexBuilder, top);
        @SuppressWarnings("SuspiciousNameCombination")
        final Join newTopJoin =
            topJoin.copy(topJoin.getTraitSet(), newTopCondition, relA,
                newBottomJoin, JoinRelType.INNER, false);
    
        call.transformTo(newTopJoin);
  • 相关阅读:
    [Tutorial] How to check and kill running processes in Ubuntu
    [Tutorial] Getting started with Gazebo in ROS
    Linux基础命令
    Linux安装系统
    vue 前后端数据交互问题解决
    如何在cmd中启动MongoDB服务器和客户端
    selenuim模块的使用 解析库
    beautifhulsoup4的使用
    浅谈scrapy框架安装使用
    自动登录 点赞 评论 抽屉网
  • 原文地址:https://www.cnblogs.com/fxjwind/p/11279080.html
Copyright © 2011-2022 走看看