zoukankan      html  css  js  c++  java
  • Hive中的Predicate Pushdown Rules(谓词下推规则)

    引用:https://blog.csdn.net/strongyoung88/article/details/81156271

    谓词下推概念

    谓词下推 Predicate Pushdown(PPD):简而言之,就是在不影响结果的情况下,尽量将过滤条件提前执行。谓词下推后,过滤条件在map端执行,减少了map端的输出,降低了数据在集群上传输的量,节约了集群的资源,也提升了任务的性能。

    PPD 配置

    PPD控制参数:hive.optimize.ppd

    • Default Value: true
    • Added In: Hive 0.4.0

    Push:谓词下推,可以理解为被优化
    Not Push:谓词没有下推,可以理解为没有被优化

    实验

    实验结果列表形式:

    Pushed or NotSQL
    Pushed select ename,dept_name from E join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
    Pushed select ename,dept_name from E join D on E.dept_id = D.dept_id where E.eid='HZ001';
    Pushed select ename,dept_name from E join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
    Pushed select ename,dept_name from E join D on E.dept_id = D.dept_id where D.dept_id='D001';
    Not Pushed select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
    Pushed select ename,dept_name from E left outer join D on E.dept_id = D.dept_id where E.eid='HZ001';
    Pushed select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
    Not Pushed select ename,dept_name from E left outer join D on E.dept_id = D.dept_id where D.dept_id='D001';
    Pushed select ename,dept_name from E right outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
    Not Pushed select ename,dept_name from E right outer join D on E.dept_id = D.dept_id where E.eid='HZ001';
    Not Pushed select ename,dept_name from E right outer join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
    Pushed select ename,dept_name from E right outer join D on E.dept_id = D.dept_id where D.dept_id='D001';
    Not Pushed select ename,dept_name from E full outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001');
    Not Pushed select ename,dept_name from E full outer join D on E.dept_id = D.dept_id where E.eid='HZ001';
    Not Pushed select ename,dept_name from E full outer join D on ( E.dept_id = D.dept_id and D.dept_id='D001');
    Not Pushed select ename,dept_name from E full outer join D on E.dept_id = D.dept_id where D.dept_id='D001';

    实验结果表格形式:

     Join(inner join)Left Outer JoinRight Outer JoinFull Outer Join
    Left TableRight TableLeft TableRight TableLeft TableRight TableLeft TableRight Table
    Join Predicate Pushed Pushed Not Pushed Pushed Pushed Not Pushed Not Pushed Not Pushed
    Where Predicate Pushed Pushed Pushed Not Pushed Not Pushed Pushed Not Pushed Not Pushed

    此表实际上就是上述PPD规则表

    结论

    1、对于Join(Inner Join)、Full outer Join,条件写在on后面,还是where后面,性能上面没有区别;
    2、对于Left outer Join ,右侧的表写在on后面、左侧的表写在where后面,性能上有提高;
    3、对于Right outer Join,左侧的表写在on后面、右侧的表写在where后面,性能上有提高;
    4、当条件分散在两个表时,谓词下推可按上述结论2和3自由组合,情况如下:

    SQL过滤时机
    select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001' and D.dept_id = 'D001'); dept_id在map端过滤,eid在reduce端过滤
    select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and D.dept_id = 'D001') where E.eid='HZ001'; dept_id,eid都在map端过滤
    select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id and E.eid='HZ001') where D.dept_id = 'D001'; dept_id,eid都在reduce端过滤
    select ename,dept_name from E left outer join D on ( E.dept_id = D.dept_id ) where E.eid='HZ001' and D.dept_id = 'D001'; dept_id在reduce端过滤,eid在map端过滤

    注意:如果在表达式中含有不确定函数,整个表达式的谓词将不会被pushed,例如

    select a.* 
    from a join b on a.id = b.id
    where a.ds = '2019-10-09' and a.create_time = unix_timestamp();

    因为unix_timestamp是不确定函数,在编译的时候无法得知,所以,整个表达式不会被pushed,即ds='2019-10-09'也不会被提前过滤。类似的不确定函数还有rand()等。

  • 相关阅读:
    水晶报表参数字段在代码中赋值
    存储过程编写经验和优化措施
    积分与排名
    大话处理器
    抽象数学
    开普勒:天空的立法者
    Scalable Face Image Retrieval with IdentityBased Quantization and Multireference Reranking
    配色辞典
    图像识别顶级赛事
    Information Geometry
  • 原文地址:https://www.cnblogs.com/LIAOBO/p/14236648.html
Copyright © 2011-2022 走看看