zoukankan      html  css  js  c++  java
  • Hive中的Predicate Pushdown Rules(谓词下推规则)

    引用:https://blog.csdn.net/strongyoung88/article/details/81156271

    谓词下推概念

    谓词下推 Predicate Pushdown(PPD):简而言之,就是在不影响结果的情况下,尽量将过滤条件提前执行。谓词下推后,过滤条件在map端执行,减少了map端的输出,降低了数据在集群上传输的量,节约了集群的资源,也提升了任务的性能。

    PPD 配置

    PPD控制参数:hive.optimize.ppd

    • Default Value: true
    • Added In: Hive 0.4.0

    Push:谓词下推,可以理解为被优化
    Not Push:谓词没有下推,可以理解为没有被优化

    实验

    实验结果列表形式:

    Pushed or NotSQL
    Pushed select ename,dept_name from E join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
    Pushed select ename,dept_name from E join D on E.dept_id = D.dept_id where E.eid='HZ001';
    Pushed select ename,dept_name from E join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
    Pushed select ename,dept_name from E join D on E.dept_id = D.dept_id where D.dept_id='D001';
    Not Pushed select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
    Pushed select ename,dept_name from E left outer join D on E.dept_id = D.dept_id where E.eid='HZ001';
    Pushed select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
    Not Pushed select ename,dept_name from E left outer join D on E.dept_id = D.dept_id where D.dept_id='D001';
    Pushed select ename,dept_name from E right outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
    Not Pushed select ename,dept_name from E right outer join D on E.dept_id = D.dept_id where E.eid='HZ001';
    Not Pushed select ename,dept_name from E right outer join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
    Pushed select ename,dept_name from E right outer join D on E.dept_id = D.dept_id where D.dept_id='D001';
    Not Pushed select ename,dept_name from E full outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
    Not Pushed select ename,dept_name from E full outer join D on E.dept_id = D.dept_id where E.eid='HZ001';
    Not Pushed select ename,dept_name from E full outer join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
    Not Pushed select ename,dept_name from E full outer join D on E.dept_id = D.dept_id where D.dept_id='D001';

    实验结果表格形式:

     Join(inner join)Left Outer JoinRight Outer JoinFull Outer Join
    Left TableRight TableLeft TableRight TableLeft TableRight TableLeft TableRight Table
    Join Predicate Pushed Pushed Not Pushed Pushed Pushed Not Pushed Not Pushed Not Pushed
    Where Predicate Pushed Pushed Pushed Not Pushed Not Pushed Pushed Not Pushed Not Pushed

    此表实际上就是上述PPD规则表

    结论

    1、对于Join(Inner Join)、Full outer Join,条件写在on后面,还是where后面,性能上面没有区别;
    2、对于Left outer Join ,右侧的表写在on后面、左侧的表写在where后面,性能上有提高;
    3、对于Right outer Join,左侧的表写在on后面、右侧的表写在where后面,性能上有提高;
    4、当条件分散在两个表时,谓词下推可按上述结论2和3自由组合,情况如下:

    SQL过滤时机
    select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001' and D.dept_id = 'D001'); dept_id在map端过滤,eid在reduce端过滤
    select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and D.dept_id = 'D001') where E.eid='HZ001'; dept_id,eid都在map端过滤
    select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001') where D.dept_id = 'D001'; dept_id,eid都在reduce端过滤
    select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id ) where E.eid='HZ001' and D.dept_id = 'D001'; dept_id在reduce端过滤,eid在map端过滤

    注意:如果在表达式中含有不确定函数,整个表达式的谓词将不会被pushed,例如

    select a.* 
    from a join b on a.id = b.id
    where a.ds = '2019-10-09' and a.create_time = unix_timestamp();

    因为unix_timestamp是不确定函数,在编译的时候无法得知,所以,整个表达式不会被pushed,即ds='2019-10-09'也不会被提前过滤。类似的不确定函数还有rand()等。

  • 相关阅读:
    node.js学习二---------------------同步API和异步API的区别
    node.js学习一---------------------模块的导入
    ES6函数的特性(箭头语法)
    10分钟了解Android的Handler机制
    10分钟了解Android的事件分发
    SwipeRefreshLayout,用最少的代码定制最美的上下拉刷新样式
    手把手教你React Native 实战之开山篇《一》
    Android 组件化方案探索与思考
    2018谷歌I/O开发者大会8大看点汇总 新品有哪些
    Glide高级详解—缓存与解码复用
  • 原文地址:https://www.cnblogs.com/LIAOBO/p/14236648.html
Copyright © 2011-2022 走看看