zoukankan      html  css  js  c++  java
  • pgsql_sql查询效率优化

    在pgsql中执行一个 5表 关联查询,效率比较差,问题定位

    环境说明
    5张外表,其中with 中的临时表总记录数比较大,共有 2 亿条记录,通过时间序模型提高查询速度
    另外4张表 左表的记录非常小,最大的记录数不超过 1w 条

    在没有做过任何调优的pgsql 中执行explain,会发现它的访问计划中包含很多的 nested loop join

     Aggregate  (cost=99723528.30..99723528.31 rows=1 width=0)
       CTE f_acct_vchr_1_tmp
         ->  Foreign Scan on hdmp_pri5_fdm_f_acct_vchr vo_1  (cost=0.00..99722420.16 rows=1 width=1448)
               Filter: ((posting_dt >= '2015-12-01'::date) AND (posting_dt <= '2015-12-31'::date) AND (trans_no ~~ '301%'::text) AND (a
    mt = 1000::double precision) AND ((posting_flg = 'Y'::text) OR (gl_acc_id = '99900'::text)))
               Foreign Namespace: hdmp_pri5_fdm.f_acct_vchr
       ->  Nested Loop Left Join  (cost=0.00..1108.15 rows=1 width=0)
             Join Filter: (vo.calc_trans_action = d3.trans_action_cd)
             ->  Nested Loop Left Join  (cost=0.00..902.53 rows=1 width=32)
                   Join Filter: (vo.trans_action_cd = d2.trans_action_cd)
                   ->  Nested Loop Left Join  (cost=0.00..696.92 rows=1 width=64)
                         Join Filter: (vo.fund_tnl_cd = f1.prod_cd)
                         ->  Nested Loop Left Join  (cost=0.00..360.10 rows=1 width=96)
                               Join Filter: (vo.calc_unit_id = u1.calc_unit_id)
                               ->  Nested Loop Left Join  (cost=0.00..352.15 rows=1 width=104)
                                     Join Filter: (vo.modl_id = d1.modl_id)
                                     ->  Nested Loop Left Join  (cost=0.00..336.84 rows=1 width=112)
                                           Join Filter: (vo.prod_cd = p.prod_cd)
                                           ->  CTE Scan on f_acct_vchr_1_tmp vo  (cost=0.00..0.02 rows=1 width=144)
                                           ->  Foreign Scan on d_prod p  (cost=0.00..336.22 rows=48 width=32)
                                                 Filter: (eff_flg = 'Y'::text)
                                                 Foreign Namespace: hdmp_pri5_fdm.d_prod
                                     ->  Foreign Scan on d_modl d1  (cost=0.00..13.36 rows=156 width=8)
                                           Foreign Namespace: hdmp_pri5_fdm.d_modl
                               ->  Foreign Scan on d_calc_unit u1  (cost=0.00..7.93 rows=1 width=8)
                                     Filter: (eff_flg = 'Y'::text)
                                     Foreign Namespace: hdmp_pri5_fdm.d_calc_unit
                         ->  Foreign Scan on d_prod f1  (cost=0.00..336.22 rows=48 width=32)
                               Filter: (eff_flg = 'Y'::text)
                               Foreign Namespace: hdmp_pri5_fdm.d_prod

    我们通过对复杂sql 做进一步分析,发现临时表 (with 里面的表)出来的结果集为 350 条记录,不算太多,但是也不少
    如果我们减少临时表中的where 条件,将临时表的结果集增大到 8700 条记录,再执行 exlain 查看访问计划,发现变成以下这样

     Aggregate  (cost=99723547.48..99723547.49 rows=1 width=0)
       CTE f_acct_vchr_1_tmp
         ->  Foreign Scan on hdmp_pri5_fdm_f_acct_vchr vo_1  (cost=0.00..99722428.03 rows=127 width=1448)
               Filter: ((posting_dt >= '2015-12-01'::date) AND (posting_dt <= '2015-12-31'::date) AND (trans_no ~~ '301%'::text) AND ((
    posting_flg = 'Y'::text) OR (gl_acc_id = '99900'::text)))
               Foreign Namespace: hdmp_pri5_fdm.f_acct_vchr
       ->  Hash Left Join  (cost=771.19..1119.14 rows=127 width=0)
             Hash Cond: (vo.fund_tnl_cd = f1.prod_cd)
             ->  Nested Loop Left Join  (cost=434.36..780.90 rows=127 width=32)
                   Join Filter: (vo.calc_unit_id = u1.calc_unit_id)
                   ->  Hash Right Join  (cost=434.36..771.07 rows=127 width=40)
                         Hash Cond: (p.prod_cd = vo.prod_cd)
                         ->  Foreign Scan on d_prod p  (cost=0.00..336.22 rows=48 width=32)
                               Filter: (eff_flg = 'Y'::text)
                               Foreign Namespace: hdmp_pri5_fdm.d_prod
                         ->  Hash  (cost=432.78..432.78 rows=127 width=72)
                               ->  Hash Left Join  (cost=226.27..432.78 rows=127 width=72)
                                     Hash Cond: (vo.calc_trans_action = d3.trans_action_cd)
                                     ->  Hash Right Join  (cost=20.65..226.20 rows=127 width=104)
                                           Hash Cond: (d2.trans_action_cd = vo.trans_action_cd)
                                           ->  Foreign Scan on d_trans_action d2  (cost=0.00..205.28 rows=27 width=32)
                                                 Filter: (eff_flg = 'Y'::text)
                                                 Foreign Namespace: hdmp_pri5_fdm.d_trans_action
                                           ->  Hash  (cost=19.06..19.06 rows=127 width=136)
                                                 ->  Hash Right Join  (cost=4.13..19.06 rows=127 width=136)
                                                       Hash Cond: (d1.modl_id = vo.modl_id)
                                                       ->  Foreign Scan on d_modl d1  (cost=0.00..13.36 rows=156 width=8)
                                                             Foreign Namespace: hdmp_pri5_fdm.d_modl
                                                       ->  Hash  (cost=2.54..2.54 rows=127 width=144)
                                                             ->  CTE Scan on f_acct_vchr_1_tmp vo  (cost=0.00..2.54 rows=127 width=144)

    nl join 减少了,查询的效率也有相应的提升

    我们再进一步分析sql 中的右表
    其实通过count 命令,我们可以了解到,右表的结果集都非常小,最大的表只有 1w 条记录而已
    这样我们就能理解,为什么临时表只有 350 条记录的查询效率竟然会比 临时表中有 8700 条记录的查询效率差

    因为在第一个sql 中,关联查询基本上都是走 nl join ,需要不断的访问右表,并且在同时 5张表的关联情况下,效率极低
    而第二个sql中,由于临时表的结果集为 8700 条,数量比较多,所以pgsql 的调度引擎自动帮助用户优化为大部分 hash join ,少部分 nl join

    我们从数据库关联的原理上理解,像这种查询场景,应该所有的关联查询使用 hash join 是 效率最高的,因为临时表出来的结果集不会太大,所有左表的结果集也比较小

    pgsql 设置关闭 nl join 的命令

    set enable_nestloop=off

    关闭 nl join之后,再执行 explain 查看访问计划

     Aggregate  (cost=99723457.95..99723457.96 rows=1 width=0)
       CTE f_acct_vchr_1_tmp
         ->  Foreign Scan on hdmp_pri5_fdm_f_acct_vchr vo_1  (cost=0.00..99722420.16 rows=1 width=1448)
               Filter: ((posting_dt >= '2015-12-01'::date) AND (posting_dt <= '2015-12-31'::date) AND (trans_no ~~ '301%'::text) AND (a
    mt = 1000::double precision) AND ((posting_flg = 'Y'::text) OR (gl_acc_id = '99900'::text)))
               Foreign Namespace: hdmp_pri5_fdm.f_acct_vchr
       ->  Hash Left Join  (cost=724.37..1037.79 rows=1 width=0)
             Hash Cond: (vo.calc_unit_id = (u1.calc_unit_id)::double precision)
             ->  Hash Right Join  (cost=716.42..1029.83 rows=1 width=8)
                   Hash Cond: (f1.prod_cd = vo.fund_tnl_cd)
                   ->  Foreign Scan on hdmp_pri5_fdm_d_prod f1  (cost=0.00..313.22 rows=48 width=32)
                         Filter: (eff_flg = 'Y'::text)
                         Foreign Namespace: hdmp_pri5_fdm.d_prod
                   ->  Hash  (cost=716.41..716.41 rows=1 width=40)
                         ->  Hash Right Join  (cost=403.00..716.41 rows=1 width=40)
                               Hash Cond: (p.prod_cd = vo.prod_cd)
                               ->  Foreign Scan on hdmp_pri5_fdm_d_prod p  (cost=0.00..313.22 rows=48 width=32)
                                     Filter: (eff_flg = 'Y'::text)
                                     Foreign Namespace: hdmp_pri5_fdm.d_prod
                               ->  Hash  (cost=402.98..402.98 rows=1 width=72)
                                     ->  Hash Right Join  (cost=208.60..402.98 rows=1 width=72)
                                           Hash Cond: (d3.trans_action_cd = vo.calc_trans_action)
                                           ->  Foreign Scan on hdmp_pri5_fdm_d_trans_action d3  (cost=0.00..194.28 rows=27 width=32)
                                                 Filter: (eff_flg = 'Y'::text)
                                                 Foreign Namespace: hdmp_pri5_fdm.d_trans_action
                                           ->  Hash  (cost=208.58..208.58 rows=1 width=104)
                                                 ->  Hash Right Join  (cost=14.20..208.58 rows=1 width=104)
                                                       Hash Cond: (d2.trans_action_cd = vo.trans_action_cd)
                                                       ->  Foreign Scan on hdmp_pri5_fdm_d_trans_action d2  (cost=0.00..194.28 rows=27 width=32)

    已经变成所有关联都是 hash join 了,查询效率也从最开始的 120 Sec 提升到 800 ms

    总结
    sql 查询效率不好,一定要活用 explain 命令定位问题,像这个场景里,我们就能知道是由于 nl join 过多,导致了性能问题
    其实sql 优化是一个系统的工作,有时候 需要多观察,例如with 这个命令,在 pg 的外表中,也是比较好用的,大家有时间可以好好研究一下

    ************************************

    第一个 sql 命令,临时表的结果集为 350 条

    explain WITH
        f_acct_vchr_1_tmp AS
        (
            SELECT
                *
            FROM
                hdmp_pri5_fdm_f_acct_vchr vo
            WHERE
                1=1
            AND posting_dt >= '2015-12-01'
            AND posting_dt <= '2015-12-31'
            AND trans_no LIKE '301%'
            and amt = 1000.00
            AND (
                    posting_flg = 'Y'
                OR  vo.gl_acc_id = '99900')
                
                       
        )
    SELECT
       count(1)
    FROM
        F_ACCT_VCHR_1_tmp vo
    LEFT JOIN
        d_prod p
    ON
        vo.prod_cd=p.prod_cd
    AND p.eff_flg = 'Y'
    LEFT JOIN
        d_modl d1
    ON
        vo.modl_id=d1.modl_id
    LEFT JOIN
        d_calc_unit u1
    ON
        vo.calc_unit_id=u1.calc_unit_id
    AND u1.eff_flg = 'Y'
    LEFT JOIN
        d_prod f1
    ON
        vo.fund_tnl_cd=f1.prod_cd
    AND f1.eff_flg = 'Y'
    LEFT JOIN
        d_trans_action d2
    ON
        vo.trans_action_cd=d2.trans_action_cd
    AND d2.eff_flg = 'Y'
    LEFT JOIN
        d_trans_action d3
    ON
        vo.calc_trans_action=d3.trans_action_cd
    AND d3.eff_flg = 'Y'


    ####################
    第二个 sql,临时表的结果集为 8700 条

    explain WITH
        f_acct_vchr_1_tmp AS
        (
            SELECT
                *
            FROM
                hdmp_pri5_fdm_f_acct_vchr vo
            WHERE
                1=1
            AND posting_dt >= '2015-12-01'
            AND posting_dt <= '2015-12-31'
            AND trans_no LIKE '301%'
    
            AND (
                    posting_flg = 'Y'
                OR  vo.gl_acc_id = '99900')
                
                       
        )
    SELECT
       count(1)
    FROM
        F_ACCT_VCHR_1_tmp vo
    LEFT JOIN
        d_prod p
    ON
        vo.prod_cd=p.prod_cd
    AND p.eff_flg = 'Y'
    LEFT JOIN
        d_modl d1
    ON
        vo.modl_id=d1.modl_id
    LEFT JOIN
        d_calc_unit u1
    ON
        vo.calc_unit_id=u1.calc_unit_id
    AND u1.eff_flg = 'Y'
    LEFT JOIN
        d_prod f1
    ON
        vo.fund_tnl_cd=f1.prod_cd
    AND f1.eff_flg = 'Y'
    LEFT JOIN
        d_trans_action d2
    ON
        vo.trans_action_cd=d2.trans_action_cd
    AND d2.eff_flg = 'Y'
    LEFT JOIN
        d_trans_action d3
    ON
        vo.calc_trans_action=d3.trans_action_cd
    AND d3.eff_flg = 'Y'
  • 相关阅读:
    晶振及COMS电路
    笔记16 C# typeof() & GetType()
    笔记15 修饰符
    笔记14 数据库编程技术
    C#基础知识
    C#连接数据库
    笔记13 winform
    笔记12 export to excel (NPOI)
    笔记11 export to excel
    笔记10
  • 原文地址:https://www.cnblogs.com/chenfool/p/5332399.html
Copyright © 2011-2022 走看看