zoukankan      html  css  js  c++  java
  • [转]ORACLE小小说之武朝迷案

    作者:梁敬彬? wabjtam123?
    地址:ORACLE小小说之武朝迷案
    整理:killkill
    下载:原文及附件

    1、武朝迷案

    最近网管组遇到了一个困扰很长时间的棘手问题,就是如下执行频率极高的语句(用于权限管理),在生产环境中运行比较缓慢,执行需要200秒以上,急需优化,最好是控制在10秒以内。(测试环境服务器也是如此状况)

    据同事另外反映两点疑惑:

    1、生产环境及测试服务器环境运行时快时慢
    2、模拟的本机笔记本环境居然跑的比生产快几十倍!

    Select *
       From (Select Rownum Sid, a.*
               From (Select a.Ne_Alarm_List_Id Id,
                            a.Flow_Id, /*流程ID*/
                            a.Oprt_State Oprtstate, /*操作状态*/
                            a.Alarm_State Alarmstate, /*告警状态*/
                            a.Alarm_Level Alarmlevel, /*告警级别*/
                            a.Perf_Msg_Id, /*性能消息ID*/
                            a.Alarm_Class Alarmclass, /*告警类别*/
                            To_Char(a.Generate_Time, 'yyyy-mm-dd') Generatedate, /*产生日期*/
                            To_Char(a.Last_Generate_Time, 'yyyy-mm-dd') Lastdate, /*最后产生日期*/
                            B1.List_Label Alarm_Type,
                            C1.Ne_Name,
                            d.Kpi_Name,
                            a.Kpi_Value,
                            Decode(C1.Ne_Flag, 6, e.Region_Name, C2.Ne_Name) Datasource,
                            '' ||
                            Decode(a.Alarm_Level,
                                   '1',
                                   '严重',
                                   '2',
                                   '重要',
                                   '3',
                                   '一般',
                                   '未知') || '' Alarm_Level,
                            To_Char(a.Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Generate_Time,
                            To_Char(a.Last_Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Last_Generate_Time,
                            Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator,
                            a.Alarm_Times,
                            Decode(a.Oprt_State,
                                   '20',
                                   To_Char(a.Confirm_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '25',
                                   To_Char(a.Suspend_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '30',
                                   To_Char(a.Clear_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '40',
                                   To_Char(a.Delete_Time, 'yyyy-mm-dd hh24:mi:ss')) Executetim,
                            B2.List_Label Alarm_State,
                            Decode(a.Flow_Id, '', '未派单', '已派单') Work_State /*自定义列项*/
                       From Ne_Alarm_List a,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_NE_ALARM_TYPE') B1,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_ALARM_STATE') B2,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_DR_ID_FLAG') B3,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_ALARM_OPRT_STATE') B4,
                            Net_Element C1,
                            Net_Element C2,
                            Kpi_Code_List d,
                            Manage_Region e,
                            Kpi_Mapping_Cfg f,
                            Ne_Trans_Alarm Nta,
                            (Select t.Primary_Id "$PRIMARY_ID",
                                    Sum(t.Has_Read) "$HAS_READ"
                               From Tree_Privilege t
                              Where t.Tree_Cfg_Name = 'NET_ELEMENT'
                                And t.Assign_Object In ('STAFF_3', 'ORG_2', 'STATION_22')
                         Group By t.Primary_Id) "$PRI_VIEW"
                 Where Nvl(a.Config_Ne_Id, a.Ne_Id) ="$PRI_VIEW"."$PRIMARY_ID"(+)
                        And Nvl("$PRI_VIEW"."$HAS_READ", 0) > 0
                        And B1.List_Value = a.Alarm_Type
                        And a.Ne_Id = C1.Ne_Id
                        And B2.List_Value = a.Alarm_State
                        And B3.List_Value = a.Dr_Id || ''
                        And B4.List_Value = a.Oprt_State
                        And a.Config_Ne_Id = C2.Ne_Id(+)
                        And a.Kpi_Id = d.Kpi_Id
                        And a.Kpi_Id = f.Kpi_Id(+)
                        And Nvl(a.Alarm_Region_Origin, '-1') =
                            To_Char(e.Region_Id(+))
                        And a.Ne_Alarm_List_Id = Nta.Ne_Alarm_List_Id(+)
                        And (Select Path || '/'
                               From Net_Element
                              Where Ne_Id = a.Ne_Id) Not Like
                            (Select Path || '/%'
                               From Net_Element
                              Where Ne_Type_Id = 30
                                And Ne_Flag = '6'
                                And State = '0SA') /*非业务系统*/
                        And To_Number(a.Dr_Id) = 0
                      Order By B2.Sort_Id,
                               B4.Sort_Id,
                               a.Alarm_Type,
                               a.Ne_Id,
                               Nvl(a.Last_Send_Time, a.Create_Time) Desc) a) b
      Where b.Sid >= 1
        And b.Sid <= 15;
    

    首先自己确认了一下这个情况,确实,测试环境中执行该语句,需要跑230多秒,继而在别人和自己的笔记本环境上实验(有将语句涉及到的数据同步进自己的环境),10秒左右结果就出来了。另外随着如下子查询IN值的变化,测试环境服务器上该脚本的执行速度也在变化,当该子查询的结果集很小的时候,查的很快。越大越慢!

    Select 
    		t.Primary_Id "$PRIMARY_ID",
    	 	Sum(t.Has_Read) "$HAS_READ"
        From Tree_Privilege t
       Where t.Tree_Cfg_Name = 'NET_ELEMENT'
         And t.Assign_Object In ('STAFF_3', 'ORG_2', 'STATION_22')
    Group By t.Primary_Id) "$PRI_VIEW"	
    

    2、疑云重生

    表面情况都了解清楚了,现在要考虑深入其中,探索真相!

    首先想到的就是,为何在本机上可以跑的很快而在服务器上跑的很慢,不是涉及到的表数据量都一样吗?应该是执行计划出了问题,于是,各自做了个set autotrace traceonly来查看执行计划的差异.

    一分析,发现执行计划是有少许差异,但是逻辑读却有天壤之别,笔记本上788910 consistent gets 而服务器上是37494576 consistent gets 。递归调用对比更是离谱 :笔记本为 0 recursive calls而服务器为1305552 recursive calls 。

    服务器上的执行计划及统计信息:

    -----------------------------------------------------------------------------------------------------------------
    | Id  | Operation                             | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
    -----------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                      |                         |     1 |  2574 |  4426   (1)| 00:00:54 |
    |*  1 |  VIEW                                 |                         |     1 |  2574 |  4426   (1)| 00:00:54 |
    |   2 |   COUNT                               |                         |       |       |            |          |
    |*  3 |    FILTER                             |                         |       |       |            |          |
    |   4 |     VIEW                              |                         |     1 |  2574 |  4424   (1)| 00:00:54 |
    |   5 |      SORT ORDER BY                    |                         |     1 |   368 |  4424   (1)| 00:00:54 |
    |*  6 |       FILTER                          |                         |       |       |            |          |
    |*  7 |        HASH JOIN OUTER                |                         |     1 |   368 |  4423   (1)| 00:00:54 |
    |   8 |         NESTED LOOPS                  |                         |     1 |   349 |  3373   (1)| 00:00:41 |
    |   9 |          NESTED LOOPS                 |                         |     1 |   328 |  3372   (1)| 00:00:41 |
    |  10 |           NESTED LOOPS OUTER          |                         |     1 |   282 |  3371   (1)| 00:00:41 |
    |  11 |            NESTED LOOPS OUTER         |                         |     1 |   263 |  3370   (1)| 00:00:41 |
    |* 12 |             HASH JOIN OUTER           |                         |     1 |   255 |  3370   (1)| 00:00:41 |
    |* 13 |              HASH JOIN                |                         |     1 |   245 |  3367   (1)| 00:00:41 |
    |* 14 |               HASH JOIN               |                         |     2 |   420 |  3361   (1)| 00:00:41 |
    |* 15 |                TABLE ACCESS FULL      | TP_DOMAIN_LISTVALUES    |     3 |    87 |     5   (0)| 00:00:01 |
    |* 16 |                HASH JOIN              |                         |    94 | 17014 |  3355   (1)| 00:00:41 |
    |* 17 |                 TABLE ACCESS FULL     | TP_DOMAIN_LISTVALUES    |     1 |    38 |     5   (0)| 00:00:01 |
    |* 18 |                 HASH JOIN RIGHT OUTER |                         |  1144 |   159K|  3350   (1)| 00:00:41 |
    |  19 |                  TABLE ACCESS FULL    | NE_TRANS_ALARM          |     1 |    13 |     3   (0)| 00:00:01 |
    |* 20 |                  HASH JOIN            |                         |  1144 |   145K|  3346   (1)| 00:00:41 |
    |* 21 |                   TABLE ACCESS FULL   | TP_DOMAIN_LISTVALUES    |     3 |    78 |     5   (0)| 00:00:01 |
    |* 22 |                   TABLE ACCESS FULL   | NE_ALARM_LIST           | 99419 |     9M|  3340   (1)| 00:00:41 |
    |* 23 |               TABLE ACCESS FULL       | TP_DOMAIN_LISTVALUES    |    22 |   770 |     5   (0)| 00:00:01 |
    |  24 |              TABLE ACCESS FULL        | MANAGE_REGION           |   237 |  2370 |     3   (0)| 00:00:01 |
    |* 25 |             INDEX UNIQUE SCAN         | PK_KPI_MAPPING_CFG      |     1 |     8 |     0   (0)| 00:00:01 |
    |  26 |            TABLE ACCESS BY INDEX ROWID| NET_ELEMENT             |     1 |    19 |     1   (0)| 00:00:01 |
    |* 27 |             INDEX UNIQUE SCAN         | PK_NET_ELEMENT          |     1 |       |     0   (0)| 00:00:01 |
    |  28 |           TABLE ACCESS BY INDEX ROWID | KPI_CODE_LIST           |     1 |    46 |     1   (0)| 00:00:01 |
    |* 29 |            INDEX UNIQUE SCAN          | PK_KPI_CODE_LIST        |     1 |       |     0   (0)| 00:00:01 |
    |  30 |          TABLE ACCESS BY INDEX ROWID  | NET_ELEMENT             |     1 |    21 |     1   (0)| 00:00:01 |
    |* 31 |           INDEX UNIQUE SCAN           | PK_NET_ELEMENT          |     1 |       |     0   (0)| 00:00:01 |
    |  32 |         VIEW                          |                         |   170 |  3230 |  1050   (1)| 00:00:13 |
    |  33 |          HASH GROUP BY                |                         |   170 |  5100 |  1050   (1)| 00:00:13 |
    |* 34 |           TABLE ACCESS FULL           | TREE_PRIVILEGE          |   175 |  5250 |  1049   (1)| 00:00:13 |
    |  35 |     TABLE ACCESS BY INDEX ROWID       | NET_ELEMENT             |     1 |    38 |     2   (0)| 00:00:01 |
    |* 36 |      INDEX UNIQUE SCAN                | PK_NET_ELEMENT          |     1 |       |     1   (0)| 00:00:01 |
    |* 37 |      TABLE ACCESS BY INDEX ROWID      | NET_ELEMENT             |     1 |    43 |     2   (0)| 00:00:01 |
    |* 38 |       INDEX RANGE SCAN                | IDX_NE_ELEM_NET_TYPE_ID |     3 |       |     1   (0)| 00:00:01 |
    -----------------------------------------------------------------------------------------------------------------
    
    Statistics
    ----------------------------------------------------------
        1305552  recursive calls
       37494576  consistent gets
          16030  physical reads
    

    笔记本上的计划及统计信息:

    ---------------------------------------------------------------------------------------------------
    | Id  | Operation                                  | Name                 | Rows  | Bytes | Cost  |
    ---------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                           |                      |     1 |  2574 |  1275 |
    |*  1 |  VIEW                                      |                      |     1 |  2574 |  1275 |
    |   2 |   COUNT                                    |                      |       |       |       |
    |   3 |    VIEW                                    |                      |     1 |  2561 |  1275 |
    |   4 |     SORT ORDER BY                          |                      |     1 |   660 |  1275 |
    |*  5 |      FILTER                                |                      |       |       |       |
    |*  6 |       HASH JOIN                            |                      |     1 |   660 |  1271 |
    |*  7 |        HASH JOIN OUTER                     |                      |     1 |   635 |  1267 |
    |*  8 |         FILTER                             |                      |       |       |       |
    |*  9 |          HASH JOIN OUTER                   |                      |     1 |   625 |  1264 |
    |* 10 |           HASH JOIN                        |                      |     1 |   485 |  1256 |
    |* 11 |            HASH JOIN                       |                      |     1 |   451 |  1252 |
    |* 12 |             HASH JOIN                      |                      |     1 |   423 |  1248 |
    |  13 |              NESTED LOOPS                  |                      |     1 |   386 |  1244 |
    |  14 |               NESTED LOOPS OUTER           |                      |     1 |   366 |  1243 |
    |  15 |                NESTED LOOPS                |                      |     1 |   348 |  1242 |
    |  16 |                 NESTED LOOPS OUTER         |                      |  6914 |  2133K|  1242 |
    |  17 |                  NESTED LOOPS OUTER        |                      |  6914 |  2079K|  1242 |
    |  18 |                   TABLE ACCESS FULL        | NE_ALARM_LIST        |  6914 |  1991K|  1242 |
    |* 19 |                   INDEX RANGE SCAN         | IDX_NE_TRAN_LIST_ID  |     1 |    13 |       |
    |* 20 |                  INDEX UNIQUE SCAN         | PK_KPI_MAPPING_CFG   |     1 |     8 |       |
    |  21 |                 TABLE ACCESS BY INDEX ROWID| KPI_CODE_LIST        |     1 |    32 |       |
    |* 22 |                  INDEX UNIQUE SCAN         | PK_KPI_CODE_LIST     |     1 |       |       |
    |  23 |                TABLE ACCESS BY INDEX ROWID | NET_ELEMENT          |     1 |    18 |     1 |
    |* 24 |                 INDEX UNIQUE SCAN          | PK_NET_ELEMENT       |     1 |       |       |
    |  25 |               TABLE ACCESS BY INDEX ROWID  | NET_ELEMENT          |     1 |    20 |     1 |
    |* 26 |                INDEX UNIQUE SCAN           | PK_NET_ELEMENT       |     1 |       |       |
    |* 27 |              TABLE ACCESS FULL             | TP_DOMAIN_LISTVALUES |     4 |   148 |     3 |
    |* 28 |             TABLE ACCESS FULL              | TP_DOMAIN_LISTVALUES |     4 |   112 |     3 |
    |* 29 |            TABLE ACCESS FULL               | TP_DOMAIN_LISTVALUES |    26 |   884 |     3 |
    |  30 |           VIEW                             |                      |    62 |  8680 |     7 |
    |  31 |            SORT GROUP BY                   |                      |    62 |  2418 |     7 |
    |* 32 |             TABLE ACCESS FULL              | TREE_PRIVILEGE       |    63 |  2457 |     4 |
    |  33 |         TABLE ACCESS FULL                  | MANAGE_REGION        |   237 |  2370 |     2 |
    |* 34 |        TABLE ACCESS FULL                   | TP_DOMAIN_LISTVALUES |     4 |   100 |     3 |
    |  35 |       TABLE ACCESS BY INDEX ROWID          | NET_ELEMENT          |     1 |    24 |     2 |
    |* 36 |        INDEX UNIQUE SCAN                   | PK_NET_ELEMENT       |     1 |       |     1 |
    |* 37 |        TABLE ACCESS FULL                   | NET_ELEMENT          |   115 |  3565 |   140 |
    ---------------------------------------------------------------------------------------------------
    
    统计信息
    ----------------------------------------------------------
              0  recursive calls
         788910  consistent gets
              0  physical reads
    

    究竟是什么原因导致如此大的差异呢?执行计划虽有差异,但是却并不感觉特别离谱,限入沉思中。。。。


    3、无所适从

    3.1统计信息不全?

    查询后发现系统确有收集统计信息,心中略有不甘,为防止收集不准确,继续做全收集如下(无分区表,所以不用 dbms_stats包了)

    analyze table Ne_Alarm_List        
            compute statistics for table for all indexes for all indexed columns;
    analyze table Tp_Domain_Listvalues 
            compute statistics for table for all indexes for all indexed columns;
    analyze table Net_Element          
            compute statistics for table for all indexes for all indexed columns;
    analyze table Kpi_Code_List        
            compute statistics for table for all indexes for all indexed columns;
    analyze table Manage_Region        
            compute statistics for table for all indexes for all indexed columns;
    analyze table Kpi_Mapping_Cfg      
            compute statistics for table for all indexes for all indexed columns;
    analyze table Ne_Trans_Alarm       
            compute statistics for table for all indexes for all indexed columns;
    analyze table Tree_Privilege       
            compute statistics for table for all indexes for all indexed columns;
    

    收集完毕,发现执行依旧缓慢,执行计划有变,但是罗极读和递归依旧大到惊人!

    3.2 物理分布OR碎片?

    于是检查如下:

    UNIX服务器上表和索引的情况

     select table_name,num_rows,blocks from user_tables where lower(table_name) in
        ('ne_alarm_list'
        ,'tp_domain_listvalues'
        ,'net_element'
        ,'kpi_code_list'
        ,'manage_region'
        ,'kpi_mapping_cfg'
        ,'ne_trans_alarm'
        ,'tree_privilege')
       order by table_name;
    
    TABLE_NAME                       NUM_ROWS     BLOCKS
    ------------------------------ ---------- ----------
    KPI_CODE_LIST                       64018       1378
    KPI_MAPPING_CFG                      4063         20
    MANAGE_REGION                         237          5
    NET_ELEMENT                        103485       1882
    NE_ALARM_LIST                      138278      15197
    NE_TRANS_ALARM                          0          5
    TP_DOMAIN_LISTVALUES                 1121         13
    TREE_PRIVILEGE                       2443       4780
    
     select table_name,index_name,blevel,leaf_blocks from user_indexes where lower(table_name) in
        ('ne_alarm_list'
        ,'tp_domain_listvalues'
        ,'net_element'
        ,'kpi_code_list'
        ,'manage_region'
        ,'kpi_mapping_cfg'
        ,'ne_trans_alarm'
        ,'tree_privilege')
       order by table_name;
    
    TABLE_NAME                     INDEX_NAME                         BLEVEL LEAF_BLOCKS
    ------------------------------ ------------------------------ ---------- -----------
    KPI_CODE_LIST                  PK_KPI_CODE_LIST                        1         170
    KPI_MAPPING_CFG                PK_KPI_MAPPING_CFG                      1          19
    MANAGE_REGION                  PK_MANAGE_REGION                        0           1
    NET_ELEMENT                    IDX_NE_ELEM_PARENT_NE_ID                1         241
    NET_ELEMENT                    IDX_NE_ELEM_NE_NAME                     2         367
    NET_ELEMENT                    IDX_STATE_PATH                          2         768
    NET_ELEMENT                    PK_NET_ELEMENT                          1         369
    NET_ELEMENT                    IDX_NE_ELEM_STATE                       1         234
    NET_ELEMENT                    IDX_NE_PATH                             2         656
    NET_ELEMENT                    IDX_NEID_NETYPEID                       1         317
    NET_ELEMENT                    IDX_NE_ELEM_NET_TYPE_ID                 1         237
    NE_ALARM_LIST                  PK_NE_ALARM_LIST                        1         338
    NE_TRANS_ALARM                 PK_NE_TRANS_ALARM                       0           0
    TREE_PRIVILEGE                 PK_TREE_PRIVILEGE                       2        5059
    

    笔记本上表和索引的情况

     select table_name,num_rows,blocks from user_tables where lower(table_name) in
        ('ne_alarm_list'
        ,'tp_domain_listvalues'
        ,'net_element'
        ,'kpi_code_list'
        ,'manage_region'
        ,'kpi_mapping_cfg'
        ,'ne_trans_alarm'
        ,'tree_privilege')
       order by table_name;
    
    TABLE_NAME                       NUM_ROWS     BLOCKS
    ------------------------------ ---------- ----------
    KPI_CODE_LIST                       64018       1378
    KPI_MAPPING_CFG                      4063         20
    MANAGE_REGION                         237          5
    NET_ELEMENT                        103485       1440
    NE_ALARM_LIST                      138278      12901
    NE_TRANS_ALARM                          0          0
    TP_DOMAIN_LISTVALUES                 1121         13
    TREE_PRIVILEGE                       2443         21
    
     select table_name,index_name,blevel,leaf_blocks from user_indexes where lower(table_name) in
       ('ne_alarm_list'
        ,'tp_domain_listvalues'
        ,'net_element'
        ,'kpi_code_list'
        ,'manage_region'
        ,'kpi_mapping_cfg'
        ,'ne_trans_alarm'
        ,'tree_privilege')
       order by table_name;
    
    TABLE_NAME                     INDEX_NAME                         BLEVEL LEAF_BLOCKS
    ------------------------------ ------------------------------ ---------- -----------
    KPI_CODE_LIST                  PK_KPI_CODE_LIST                        1         230
    KPI_MAPPING_CFG                PK_KPI_MAPPING_CFG                      1          16
    MANAGE_REGION                  PK_MANAGE_REGION                        0           1
    MANAGE_REGION                  IDX_REGION_141                          0           1
    NET_ELEMENT                    PK_NET_ELEMENT                          1         332
    NET_ELEMENT                    IDX_NE_ELEM_PARENT_NE_ID                1         365
    NE_ALARM_LIST                  IDX_CREATE_TIME                         2         544
    NE_ALARM_LIST                  IDX_ALARM_LIST_NE_ID                    1         435
    NE_ALARM_LIST                  IDX_ALARM_NE_CONFIG_ID                  1         339
    NE_ALARM_LIST                  PK_NE_ALARM_LIST                        1         425
    NE_ALARM_LIST                  IDX_ALARM_LIST_FLOW_ID                  1         377
    NE_TRANS_ALARM                 IDX_NE_TRAN_LIST_ID                     0           0
    NE_TRANS_ALARM                 PK_NE_TRANS_ALARM                       0           0
    TREE_PRIVILEGE                 PK_TREE_PRIVILEGE                       1          13
    

    心头一喜,好像看出TREE_PRIVILEGE 的表和索引的统计信息明显有问题,块是4780对21,叶子的高度是2对1 ,差异还算明显啊!

    于是操刀如下,从数据字典中捞出如下两批语句,分别执行,做好了表重组和索引重建两大工作

    select table_name,index_name,blevel,leaf_blocks
    ,'alter table '||table_name||' move;','alter index '||index_name||' rebuild;'
     from user_indexes where lower(table_name) in 
    ('ne_alarm_list'           
    ,'tp_domain_listvalues'  
    ,'net_element'  
    ,'kpi_code_list'  
    ,'manage_region'  
    ,'kpi_mapping_cfg'  
    ,'ne_trans_alarm'  
    ,'tree_privilege') 
    order by table_name;
    

    操作完以后刚才的4780对21和2对1的情况不见了(具体就不贴出了),变得非常接近了,我感觉问题好像已经搞定了。。。

    欢天喜地一跑脚本,天,依旧奇慢无比,再一看autotrace 跟踪的结果,心凉了半截,执行计划又略为小变动了一下,但是巨大惊人的逻辑读和递归调用形同鬼魅,如影随形,挥之不去!

    3.3 版本,BUG?

    为什么会有这么多逻辑读,为什么我的笔记本没有这么多逻辑读?我们的环境可是一摸一样啊,并且在别的同事的机器上也是跑的和我笔记本一样顺畅,百思不得其解。。。。。

    会不会是版本问题,BUG? 跳起身来,进行了以下查询

    服务器版本为

    SQL> show parameter feat
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    optimizer_features_enable            string      10.2.0.4
    

    我笔记本版本为

    SQL> show parameter feat
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    optimizer_features_enable            string      10.2.0.1
    

    和我笔记本一样OK的别人环境数据库版本为

    SQL> show parameter feat
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    optimizer_features_enable            string      10.2.0.3
    

    眼睛一亮,看来版本问题啊,版本不一样,怪不得莫名奇妙,心跳加速,心里寻思:离谱真的源于BUG?

    猜测不如动手,咋确认呢?来狠的,直接把服务器的版本降下来

    SQL> alter system set optimizer_features_enable='10.2.0.3';
    System altered.
    

    继续执行服务器环境SQL语句,发现没效果,重启数据库,依然如故!

    3.4 分页调优?

    大家对分页调优写法都有了解,本案例中由于如下子查询语句和整个结果集关联,并有 And Nvl("$PRI_VIEW"."$HAS_READ", 0) > 0的条件,导致无法将整个结果集先分页再和子查询关联,因为这样只取ROWNUM范围15行的值参与关联,结果就错了!

    所以本案例中,分页调优难以凑效!

    Select t.Primary_Id "$PRIMARY_ID",
           Sum(t.Has_Read) "$HAS_READ"
        From Tree_Privilege t
       Where t.Tree_Cfg_Name = 'NET_ELEMENT'
         And t.Assign_Object In ('STAFF_3', 'ORG_2', 'STATION_22')
    Group By t.Primary_Id) "$PRI_VIEW"
    

    所以本案例中,分页调优难以凑效!

    唉,长叹一声,无所适从!


    4、峰回路转

    为什么会有那么多次的逻辑读和递归?看来看执行计划还是太表面了,难以回答ORACLE内部在做啥小动作,不查出问题所在,不将谜团大白于天下,誓不罢休!

    试试10046trace吧,看看整个执行开始到结束,到底在执行什么,等待什么?玩啥猫腻!

    分别做了两个10046 TRACE

    服务器上的10046 TRACE

    ********************************************************************************
    count    = number of times OCI procedure was executed
    cpu      = cpu time in seconds executing 
    elapsed  = elapsed time in seconds executing
    disk     = number of physical reads of buffers from disk
    query    = number of buffers gotten for consistent read
    current  = number of buffers gotten in current mode (usually for update)
    rows     = number of rows processed by the fetch or execute call
    ********************************************************************************
    
    Select *
       From (Select Rownum Sid, a.*
               From (Select a.Ne_Alarm_List_Id Id,
                            a.Flow_Id, /*流程ID*/
                            a.Oprt_State Oprtstate, /*操作状态*/
                            a.Alarm_State Alarmstate, /*告警状态*/
                            a.Alarm_Level Alarmlevel, /*告警级别*/
                            a.Perf_Msg_Id, /*性能消息ID*/
                            a.Alarm_Class Alarmclass, /*告警类别*/
                            To_Char(a.Generate_Time, 'yyyy-mm-dd') Generatedate, /*产生日期*/
                            To_Char(a.Last_Generate_Time, 'yyyy-mm-dd') Lastdate, /*最后产生日期*/
                            B1.List_Label Alarm_Type,
                            C1.Ne_Name,
                            d.Kpi_Name,
                            a.Kpi_Value,
                            Decode(C1.Ne_Flag, 6, e.Region_Name, C2.Ne_Name) Datasource,
                            '' ||
                            Decode(a.Alarm_Level,
                                   '1',
                                   '严重',
                                   '2',
                                   '重要',
                                   '3',
                                   '一般',
                                   '未知') || '' Alarm_Level,
                            To_Char(a.Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Generate_Time,
                            To_Char(a.Last_Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Last_Generate_Time,
                            Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator,
                            a.Alarm_Times,
                            Decode(a.Oprt_State,
                                   '20',
                                   To_Char(a.Confirm_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '25',
                                   To_Char(a.Suspend_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '30',
                                   To_Char(a.Clear_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '40',
                                   To_Char(a.Delete_Time, 'yyyy-mm-dd hh24:mi:ss')) Executetim,
                            B2.List_Label Alarm_State,
                            Decode(a.Flow_Id, '', '未派单', '已派单') Work_State /*自定义列项*/
                       From Ne_Alarm_List a,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_NE_ALARM_TYPE') B1,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_ALARM_STATE') B2,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_DR_ID_FLAG') B3,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_ALARM_OPRT_STATE') B4,
                            Net_Element C1,
                            Net_Element C2,
                            Kpi_Code_List d,
                            Manage_Region e,
                            Kpi_Mapping_Cfg f,
                            Ne_Trans_Alarm Nta,
                            (Select t.Primary_Id "$PRIMARY_ID",
                                    Sum(t.Has_Read) "$HAS_READ"
                               From Tree_Privilege t
                              Where t.Tree_Cfg_Name = 'NET_ELEMENT'
                                And t.Assign_Object In ('STAFF_3', 'ORG_2', 'STATION_22')
                         Group By t.Primary_Id) "$PRI_VIEW"
                 Where Nvl(a.Config_Ne_Id, a.Ne_Id) ="$PRI_VIEW"."$PRIMARY_ID"(+)
                        And Nvl("$PRI_VIEW"."$HAS_READ", 0) > 0
                        And B1.List_Value = a.Alarm_Type
                        And a.Ne_Id = C1.Ne_Id
                        And B2.List_Value = a.Alarm_State
                        And B3.List_Value = a.Dr_Id || ''
                        And B4.List_Value = a.Oprt_State
                        And a.Config_Ne_Id = C2.Ne_Id(+)
                        And a.Kpi_Id = d.Kpi_Id
                        And a.Kpi_Id = f.Kpi_Id(+)
                        And Nvl(a.Alarm_Region_Origin, '-1') =
                            To_Char(e.Region_Id(+))
                        And a.Ne_Alarm_List_Id = Nta.Ne_Alarm_List_Id(+)
                        And (Select Path || '/'
                               From Net_Element
                              Where Ne_Id = a.Ne_Id) Not Like
                            (Select Path || '/%'
                               From Net_Element
                              Where Ne_Type_Id = 30
                                And Ne_Flag = '6'
                                And State = '0SA') /*非业务系统*/
                        And To_Number(a.Dr_Id) = 0
                      Order By B2.Sort_Id,
                               B4.Sort_Id,
                               a.Alarm_Type,
                               a.Ne_Id,
                               Nvl(a.Last_Send_Time, a.Create_Time) Desc) a) b
      Where b.Sid >= 1
        And b.Sid <= 15
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.58       0.59          0          0          0           0
    Execute      1      0.00       0.00          0          0          0           0
    Fetch        2     49.06      49.32      12758     665549          0          15
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        4     49.65      49.91      12758     665549          0          15
    
    Misses in library cache during parse: 1
    Optimizer mode: ALL_ROWS
    Parsing user id: 58  
    
    Rows     Row Source Operation
    -------  ---------------------------------------------------
         15  VIEW  (cr=37490604 pr=12759 pw=0 time=272081968 us)
      65251   COUNT  (cr=37490604 pr=12759 pw=0 time=272342955 us)
      65251    FILTER  (cr=37490604 pr=12759 pw=0 time=272212450 us)
      65339     VIEW  (cr=37490592 pr=12759 pw=0 time=272270985 us)
      65339      SORT ORDER BY (cr=37490592 pr=12759 pw=0 time=272205640 us)
      65339       HASH JOIN OUTER (cr=665537 pr=12758 pw=0 time=5914370 us)
      65339        HASH JOIN  (cr=665534 pr=12758 pw=0 time=5616725 us)
      65339         FILTER  (cr=665518 pr=12758 pw=0 time=5507572 us)
      94744          HASH JOIN OUTER (cr=665518 pr=12758 pw=0 time=5595780 us)
      94744           NESTED LOOPS OUTER (cr=665485 pr=12758 pw=0 time=5642876 us)
      94744            NESTED LOOPS  (cr=486659 pr=12758 pw=0 time=4600678 us)
      94744             NESTED LOOPS  (cr=297169 pr=12758 pw=0 time=3463708 us)
      94744              NESTED LOOPS OUTER (cr=107679 pr=12758 pw=0 time=2231991 us)
      94744               HASH JOIN  (cr=12933 pr=12758 pw=0 time=1758206 us)
      94744                HASH JOIN  (cr=12917 pr=12758 pw=0 time=1454846 us)
     138299                 NESTED LOOPS OUTER (cr=12901 pr=12758 pw=0 time=1661499 us)
     138299                  HASH JOIN  (cr=12899 pr=12758 pw=0 time=969972 us)
          2                   TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=292 us)
     138299                   TABLE ACCESS FULL NE_ALARM_LIST (cr=12883 pr=12758 pw=0 time=554956 us)
          0                  INDEX RANGE SCAN IDX_NE_TRAN_LIST_ID (cr=2 pr=0 pw=0 time=389048 us)(object id 105412)
         24                 TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=177 us)
          5                TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=78 us)
        845               INDEX UNIQUE SCAN PK_KPI_MAPPING_CFG (cr=94746 pr=0 pw=0 time=311122 us)(object id 66011)
      94744              TABLE ACCESS BY INDEX ROWID KPI_CODE_LIST (cr=189490 pr=0 pw=0 time=888405 us)
      94744               INDEX UNIQUE SCAN PK_KPI_CODE_LIST (cr=94746 pr=0 pw=0 time=393651 us)(object id 66007)
      94744             TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=189490 pr=0 pw=0 time=896002 us)
      94744              INDEX UNIQUE SCAN PK_NET_ELEMENT (cr=94746 pr=0 pw=0 time=383605 us)(object id 66108)
      89412            TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=178826 pr=0 pw=0 time=836400 us)
      89412             INDEX UNIQUE SCAN PK_NET_ELEMENT (cr=89414 pr=0 pw=0 time=373511 us)(object id 66108)
        195           VIEW  (cr=33 pr=0 pw=0 time=3003 us)
        195            HASH GROUP BY (cr=33 pr=0 pw=0 time=2612 us)
        195             TABLE ACCESS BY INDEX ROWID TREE_PRIVILEGE (cr=33 pr=0 pw=0 time=1030 us)
        195              INDEX RANGE SCAN PK_TREE_PRIVILEGE (cr=12 pr=0 pw=0 time=433 us)(object id 100132)
          3         TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=287 us)
        237        TABLE ACCESS FULL MANAGE_REGION (cr=3 pr=0 pw=0 time=548 us)
          3     TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=9 pr=0 pw=0 time=6953 us)
          3      INDEX UNIQUE SCAN PK_NET_ELEMENT (cr=5 pr=0 pw=0 time=109 us)(object id 66108)
          1     TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=3 pr=0 pw=0 time=56 us)
          1      INDEX RANGE SCAN IDX_NE_ELEM_NET_TYPE_ID (cr=2 pr=0 pw=0 time=40 us)(object id 79075)
    
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      SQL*Net message to client                       2        0.00          0.00
      db file scattered read                        860        0.00          0.16
      db file sequential read                         9        0.00          0.00
      SQL*Net message from client                     2      201.06        201.06
    ********************************************************************************
    
    SELECT PERSON 
    FROM
     TACHE WHERE FLOW_ID = :B1 AND STATE <> 'F'
    
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute  65339      0.78       0.75          0          0          0           0
    Fetch   130670    148.79     145.17          0   32714110          0       65331
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   196010    149.58     145.93          0   32714110          0       65331
    
    Misses in library cache during parse: 1
    Misses in library cache during execute: 1
    Optimizer mode: ALL_ROWS
    Parsing user id: 59     (recursive depth: 1)
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      latch: cache buffers chains                     1        0.00          0.00
    ********************************************************************************
    
    SELECT PKP_FLOW_FUNCTION.GETSTAFFNAMEBYID(LTRIM(RTRIM(TO_CHAR(:B1 )))) 
    FROM
     DUAL
    
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute 522102      7.43       6.78          0          0          0           0
    Fetch   522102     42.59      42.17          0          0          0      522102
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   1044205     50.03      48.96          0          0          0      522102
    
    Misses in library cache during parse: 1
    Misses in library cache during execute: 1
    Optimizer mode: ALL_ROWS
    Parsing user id: 59     (recursive depth: 1)
    ********************************************************************************
    
    SELECT NAME 
    FROM
     STAFF_INFO WHERE STAFF_ID = :B1 
    
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute 522102      8.51       7.78          0          0          0           0
    Fetch   522102     21.35      20.21          0    4110939          0      456771
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   1044205     29.86      27.99          0    4110939          0      456771
    
    Misses in library cache during parse: 1
    Misses in library cache during execute: 1
    Optimizer mode: ALL_ROWS
    Parsing user id: 59     (recursive depth: 2)
    ********************************************************************************
    
    ********************************************************************************
    
    OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        2      0.58       0.59          0          0          0           0
    Execute      2      0.00       0.00          0          0          0           0
    Fetch        2     49.06      49.32      12758     665549          0          15
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        6     49.65      49.91      12758     665549          0          15
    
    Misses in library cache during parse: 1
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      SQL*Net message to client                       3        0.00          0.00
      SQL*Net message from client                     3      201.06        213.83
      SQL*Net more data from client                   2        0.00          0.00
      db file scattered read                        860        0.00          0.16
      db file sequential read                         9        0.00          0.00
    
    
    OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse       11      0.00       0.01          0          0          0           0
    Execute 1109552     16.74      15.33          0          0          0           0
    Fetch   1174883    212.73     207.57          1   36825069          0     1044211
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   2284446    229.48     222.92          1   36825069          0     1044211
    
    Misses in library cache during parse: 6
    Misses in library cache during execute: 6
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      db file sequential read                         1        0.00          0.00
      latch: cache buffers chains                     1        0.00          0.00
    
        5  user  SQL statements in session.
        9  internal SQL statements in session.
       14  SQL statements in session.
    ********************************************************************************
    

    笔记本上的10046 TRACE

    ********************************************************************************
    count    = number of times OCI procedure was executed
    cpu      = cpu time in seconds executing 
    elapsed  = elapsed time in seconds executing
    disk     = number of physical reads of buffers from disk
    query    = number of buffers gotten for consistent read
    current  = number of buffers gotten in current mode (usually for update)
    rows     = number of rows processed by the fetch or execute call
    ********************************************************************************
    
    alter session set events '10046 trace name context forever , level 8'
    
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        0      0.00       0.00          0          0          0           0
    Execute      1      0.00       0.04          0          0          0           0
    Fetch        0      0.00       0.00          0          0          0           0
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        1      0.00       0.04          0          0          0           0
    
    Misses in library cache during parse: 0
    Misses in library cache during execute: 1
    Optimizer mode: CHOOSE
    Parsing user id: 56  
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      SQL*Net message to client                       1        0.00          0.00
      SQL*Net message from client                     1        0.00          0.00
    ********************************************************************************
    
    Select *
       From (Select Rownum Sid, a.*
               From (Select a.Ne_Alarm_List_Id Id,
                            a.Flow_Id, /*流程ID*/
                            a.Oprt_State Oprtstate, /*操作状态*/
                            a.Alarm_State Alarmstate, /*告警状态*/
                            a.Alarm_Level Alarmlevel, /*告警级别*/
                            a.Perf_Msg_Id, /*性能消息ID*/
                            a.Alarm_Class Alarmclass, /*告警类别*/
                            To_Char(a.Generate_Time, 'yyyy-mm-dd') Generatedate, /*产生日期*/
                            To_Char(a.Last_Generate_Time, 'yyyy-mm-dd') Lastdate, /*最后产生日期*/
                            B1.List_Label Alarm_Type,
                            C1.Ne_Name,
                            d.Kpi_Name,
                            a.Kpi_Value,
                            Decode(C1.Ne_Flag, 6, e.Region_Name, C2.Ne_Name) Datasource,
                            '' ||
                            Decode(a.Alarm_Level,
                                   '1',
                                   '严重',
                                   '2',
                                   '重要',
                                   '3',
                                   '一般',
                                   '未知') || '' Alarm_Level,
                            To_Char(a.Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Generate_Time,
                            To_Char(a.Last_Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Last_Generate_Time,
                            Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator,
                            a.Alarm_Times,
                            Decode(a.Oprt_State,
                                   '20',
                                   To_Char(a.Confirm_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '25',
                                   To_Char(a.Suspend_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '30',
                                   To_Char(a.Clear_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '40',
                                   To_Char(a.Delete_Time, 'yyyy-mm-dd hh24:mi:ss')) Executetim,
                            B2.List_Label Alarm_State,
                            Decode(a.Flow_Id, '', '未派单', '已派单') Work_State /*自定义列项*/
                       From Ne_Alarm_List a,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_NE_ALARM_TYPE') B1,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_ALARM_STATE') B2,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_DR_ID_FLAG') B3,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_ALARM_OPRT_STATE') B4,
                            Net_Element C1,
                            Net_Element C2,
                            Kpi_Code_List d,
                            Manage_Region e,
                            Kpi_Mapping_Cfg f,
                            Ne_Trans_Alarm Nta,
                            (Select t.Primary_Id "$PRIMARY_ID",
                                    Sum(t.Has_Read) "$HAS_READ"
                               From Tree_Privilege t
                              Where t.Tree_Cfg_Name = 'NET_ELEMENT'
                                And t.Assign_Object In ('STAFF_3', 'ORG_2', 'STATION_22')
                         Group By t.Primary_Id) "$PRI_VIEW"
                 Where Nvl(a.Config_Ne_Id, a.Ne_Id) ="$PRI_VIEW"."$PRIMARY_ID"(+)
                        And Nvl("$PRI_VIEW"."$HAS_READ", 0) > 0
                        And B1.List_Value = a.Alarm_Type
                        And a.Ne_Id = C1.Ne_Id
                        And B2.List_Value = a.Alarm_State
                        And B3.List_Value = a.Dr_Id || ''
                        And B4.List_Value = a.Oprt_State
                        And a.Config_Ne_Id = C2.Ne_Id(+)
                        And a.Kpi_Id = d.Kpi_Id
                        And a.Kpi_Id = f.Kpi_Id(+)
                        And Nvl(a.Alarm_Region_Origin, '-1') =
                            To_Char(e.Region_Id(+))
                        And a.Ne_Alarm_List_Id = Nta.Ne_Alarm_List_Id(+)
                        And (Select Path || '/'
                               From Net_Element
                              Where Ne_Id = a.Ne_Id) Not Like
                            (Select Path || '/%'
                               From Net_Element
                              Where Ne_Type_Id = 30
                                And Ne_Flag = '6'
                                And State = '0SA') /*非业务系统*/
                        And To_Number(a.Dr_Id) = 0
                      Order By B2.Sort_Id,
                               B4.Sort_Id,
                               a.Alarm_Type,
                               a.Ne_Id,
                               Nvl(a.Last_Send_Time, a.Create_Time) Desc) a) b
      Where b.Sid >= 1
        And b.Sid <= 15
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.28       1.03          0          0          0           0
    Execute      1      0.00       0.00          0          0          0           0
    Fetch        2     40.96      41.37          0    1388865          0          15
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        4     41.25      42.40          0    1388865          0          15
    
    Misses in library cache during parse: 1
    Optimizer mode: CHOOSE
    Parsing user id: 56  
    
    Rows     Row Source Operation
    -------  ---------------------------------------------------
         15  VIEW  (cr=1454108 pr=0 pw=0 time=44434215 us)
      65249   COUNT  (cr=1454108 pr=0 pw=0 time=45608654 us)
      65249    VIEW  (cr=1454108 pr=0 pw=0 time=45151896 us)
      65249     SORT ORDER BY (cr=1454108 pr=0 pw=0 time=44760386 us)
      65249      FILTER  (cr=1388865 pr=0 pw=0 time=24109934 us)
      65332       HASH JOIN  (cr=1387411 pr=0 pw=0 time=23581540 us)
      65332        HASH JOIN  (cr=1387395 pr=0 pw=0 time=23224331 us)
      65332         HASH JOIN  (cr=1387379 pr=0 pw=0 time=23125008 us)
      65332          HASH JOIN  (cr=1387363 pr=0 pw=0 time=22935658 us)
      65332           HASH JOIN OUTER (cr=1387347 pr=0 pw=0 time=22771025 us)
      65332            FILTER  (cr=1387340 pr=0 pw=0 time=22949387 us)
     138272             HASH JOIN OUTER (cr=1387340 pr=0 pw=0 time=22856756 us)
     138272              NESTED LOOPS OUTER (cr=1387320 pr=0 pw=0 time=26824900 us)
     138272               NESTED LOOPS OUTER (cr=1387318 pr=0 pw=0 time=24059444 us)
     138272                NESTED LOOPS  (cr=1119088 pr=0 pw=0 time=17975481 us)
     138272                 NESTED LOOPS  (cr=704272 pr=0 pw=0 time=11200155 us)
     138272                  NESTED LOOPS OUTER (cr=289456 pr=0 pw=0 time=4286553 us)
     138272                   TABLE ACCESS FULL NE_ALARM_LIST (cr=12912 pr=0 pw=0 time=829731 us)
      44379                   INDEX UNIQUE SCAN PK_KPI_MAPPING_CFG (cr=276544 pr=0 pw=0 time=1871242 us)(object id 52751)
     138272                  TABLE ACCESS BY INDEX ROWID KPI_CODE_LIST (cr=414816 pr=0 pw=0 time=5009011 us)
     138272                   INDEX UNIQUE SCAN PK_KPI_CODE_LIST (cr=276544 pr=0 pw=0 time=2152340 us)(object id 52747)
     138272                 TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=414816 pr=0 pw=0 time=5000605 us)
     138272                  INDEX UNIQUE SCAN PK_NET_ELEMENT (cr=276544 pr=0 pw=0 time=2149586 us)(object id 52861)
      89410                TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=268230 pr=0 pw=0 time=3695667 us)
      89410                 INDEX UNIQUE SCAN PK_NET_ELEMENT (cr=178820 pr=0 pw=0 time=1808726 us)(object id 52861)
          0               INDEX RANGE SCAN IDX_NE_TRAN_LIST_ID (cr=2 pr=0 pw=0 time=1327629 us)(object id 52966)
        195              VIEW  (cr=20 pr=0 pw=0 time=2866 us)
        195               SORT GROUP BY (cr=20 pr=0 pw=0 time=1690 us)
        195                TABLE ACCESS FULL TREE_PRIVILEGE (cr=20 pr=0 pw=0 time=1047 us)
        237            TABLE ACCESS FULL MANAGE_REGION (cr=7 pr=0 pw=0 time=1302 us)
          5           TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=118 us)
          3          TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=267 us)
          2         TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=115 us)
         24        TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=183 us)
          3       TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=9 pr=0 pw=0 time=161 us)
          3        INDEX UNIQUE SCAN PK_NET_ELEMENT (cr=6 pr=0 pw=0 time=79 us)(object id 52861)
          1       TABLE ACCESS FULL NET_ELEMENT (cr=1445 pr=0 pw=0 time=6876 us)
    
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      SQL*Net message to client                       2        0.00          0.00
      SQL*Net message from client                     2        0.54          0.54
    ********************************************************************************
    
    SELECT PERSON 
    FROM
     TACHE WHERE FLOW_ID = :B1 AND STATE <> 'F'
    
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute  65249      2.26       1.90          0          0          0           0
    Fetch    65249      1.23       1.30          0      65243          0           0
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   130499      3.50       3.20          0      65243          0           0
    
    Misses in library cache during parse: 1
    Misses in library cache during execute: 1
    Optimizer mode: CHOOSE
    Parsing user id: 57     (recursive depth: 1)
    ********************************************************************************
    
    ********************************************************************************
    
    OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        2      0.29       1.03          0          0          0           0
    Execute      3      0.00       0.04          0          0          0           0
    Fetch        2     40.96      41.37          0    1388865          0          15
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        7     41.26      42.45          0    1388865          0          15
    
    Misses in library cache during parse: 2
    Misses in library cache during execute: 1
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      SQL*Net message to client                       5        0.00          0.00
      SQL*Net message from client                     5       22.57         31.68
      SQL*Net more data from client                   2        0.00          0.00
    
    
    OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse       30      0.00       0.00          0          0          0           0
    Execute  65278      2.26       1.92          0          0          0           0
    Fetch    65278      1.23       1.30          0      65301          0          29
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   130586      3.50       3.23          0      65301          0          29
    
    Misses in library cache during parse: 2
    Misses in library cache during execute: 2
    
        4  user  SQL statements in session.
       29  internal SQL statements in session.
       33  SQL statements in session.
    ********************************************************************************
    

    仔细观察,终于有重大发现了,笔记本中环境10046 TRACE

    SELECT PERSON 
    FROM
     TACHE WHERE FLOW_ID = :B1 AND STATE <> 'F'
     
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute  65249      2.26       1.90          0          0          0           0
    Fetch    65249      1.23       1.30          0      65243          0           0
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   130499      3.50       3.20          0      65243          0           0
    

    这里怎么跑出一个这样的语句,Eeecute6万多次最终获取0行

    而服务器上10046 TRACE中类似的地方确实如下,rows显示为65331和前者为0!有所发现

    SELECT PERSON 
    FROM
     TACHE WHERE FLOW_ID = :B1 AND STATE <> 'F'
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute  65339      0.78       0.75          0          0          0           0
    Fetch   130670    148.79     145.17          0   32714110          0       65331
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   196010    149.58     145.93          0   32714110          0       65331
    

    继续分析,发现笔记本上的10046 TRACE根本找不到如下两处调用!

    SELECT PKP_FLOW_FUNCTION.GETSTAFFNAMEBYID(LTRIM(RTRIM(TO_CHAR(:B1 )))) 
    FROM
     DUAL
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute 522102      7.43       6.78          0          0          0           0
    Fetch   522102     42.59      42.17          0          0          0      522102
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   1044205     50.03      48.96          0          0          0     522102
    
    SELECT NAME 
    FROM
     STAFF_INFO WHERE STAFF_ID = :B1 
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute 522102      8.51       7.78          0          0          0           0
    Fetch   522102     21.35      20.21          0    4110939          0      456771
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   1044205     29.86      27.99          0    4110939          0     456771
    

    仔细观察,还有新发现,服务器的执行计划有递归调用,而笔记本上的执行计划没有递归调用。

    RECURSIVE STATEMENTS
    OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse       11      0.00       0.01          0          0          0           0
    Execute 1109552     16.74      15.33         0          0          0           0
    Fetch   1174883    212.73     207.57         1   36825069          0     1044211
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   2284446    229.48     222.92         1   36825069          0     1044211
    

    看来,这个递归毫无疑问来自函数的调用,仔细回到复杂的SQL脚本一看,发现了原先SQL中有函数 Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator, 查看这个函数调用的代码写法

    FUNCTION getFlowCurStaffName(in_flow_id in number) RETURN VARCHAR2 IS
        ss         VARCHAR2(255);
        person1    VARCHAR2(255);
        v_tabArray PKP_STRING_UTIL.varArray;
        CURSOR c2 IS
          SELECT person
            FROM tache
           WHERE flow_id = in_flow_id
             AND STATE <> 'F';
      BEGIN
        --循环从表中取得
        FOR rel1 IN c2 LOOP
          v_tabArray := PKP_STRING_UTIL.split(rel1.person, ',');
          FOR i IN 1 .. v_tabArray.COUNT() LOOP
            SELECT getStaffNameById(LTRIM(RTRIM(TO_CHAR(v_tabArray(i)))))
              INTO person1
              FROM dual;
            ss := ss || person1 || ' ';
          END LOOP;
        END LOOP;
    
        RETURN(ss);
      END getFlowCurStaffName;
    

    其中getFlowCurStaffName函数还调用了getStaffNameById函数,具体如下:

    FUNCTION getStaffNameById(in_staff_id IN varchar2) return varchar2 is
      ret varchar2(255);
    begin
      select NAME INTO ret FROM STAFF_INFO WHERE STAFF_ID = in_staff_id;
      return(ret);
    end getStaffNameById;
    

    而这些语句都出现在刚才的服务器的10046 TRACE文件中,看来问题就出在这个函数调用中了! 的是函数中的语句为什么与部分没有出现在笔记本的10046 TRACE文件中呢?这个问题我还无法回答。

    不过我相信。。。。。。真相已经离我们很近了!


    5、大白天下

    .

    目标既锁定函数为“元凶巨恶”,那就除恶务尽!

    在剔除了Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator 后,发现服务器环境的SQL从原先的230秒瞬间降低为3秒,乖乖,料到有效果不假,但没料到这么有效果!

    再一看autotrace ,不得不相信真能有这样的提升,逻辑读为: 82617 consistent gets 而原先为:37494576 consistent gets

    到此其实还有一事未明,就是为什么说笔记本环境没有那么多递归调用和逻辑读呢,在前面的基础上继续探讨,问题就变的容易了,因为前面的函数中

    CURSOR c2 IS
       SELECT person
         FROM tache
        WHERE flow_id = in_flow_id
          AND STATE <> 'F';
    

    这个 tache表在笔记本及其他环境中是没记录的,而服务器及生产是有2万多条记录!

    发现这个以后,一切谜团都解开了!

    1、为什么笔记本环境没有看到递归调用?

    这个外循环没值,递啥归啊?

    2、为什么服务器环境看到大量逻辑读?

    一层套一层,在C2的CURSOR不是空的情况下,自然就调用到了getFlowCurStaffName,加上外面的6万次循环,能不产生大量逻辑读吗?笔记本环境C2是空的,做啥啊,呵呵。

    3、为什么生产服务器环境时快时慢

    存在函数调用,由于函数调用次数和外层结果集的记录有关,所以时快时慢,笔记本环境C2游标是空的,当然稳定!


    6、乘胜追击

    现在题是如何解决这个问题,因为去除该函数毕竟不是解决问题的正道!

    先实验一个自定义普通函数

    (其中set_clinet_info函数这个是为了测试最终该函数被调用多少次,实用技术,简单易学,值得推广)

    create or replace function f(id in number)
      Return  number
    is
    begin
      dbms_application_info.set_client_info(userenv('client_info')+1); 
      return id;
    end a;
    /
    

    接着改写代码,将 Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator 修改为f(a.Flow_Id) Flow_Operator,

    发现整体运行时间非常短,大致5秒,说明Pkp_Flow_Function确实是做太多事了。

    通过开始exec dbms_application_info.set_client_info(0);

    后续执行select userenv('client_info') from dual;来观察,发现该函数F(a.Flow_Id) 是执行了65339次,和10046 trace的一致。具体见附件《自定义函数测试调用次数.txt》 另外Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id)也是执行了65339次 ,具体见附件《测试原函数调用次数.txt》 而该函数的子函数却调用了456771次!!!原因在于 FOR i IN 1 .. v_tabArray.COUNT() LOOP这循环是对表的字段里的内容进行循环,(字段中的内容用逗号分割符进行分割,如何拆开循环) 具体见附件《测试原函数的子函数调用次数.txt》

    看来解决的思路如下

    1、尽量减少外部调用,如何能避免调用65339次

    经过分析,由于分页的缘故,每次都只会展现15条记录,而这个函数经确认只是取得姓名的字段,完全可以放在最后执行,修改为将 Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator放置于整个结果集的外面,这样,调用立减为15次。性能瞬间提高仅4-5秒完成任务!优化的最终代码见《终极解决.sql》,其中分页其实并无效果,只是写写而已。

    Select b.*,Pkp_Flow_Function.Getflowcurstaffname(b.Flow_Id) Flow_Operator
      From (Select Rownum Sid, a.*
              From (Select a.Ne_Alarm_List_Id Id,
                           a.Flow_Id, /*流程ID*/
                           a.Oprt_State Oprtstate, /*操作状态*/
                           a.Alarm_State Alarmstate, /*告警状态*/
                           a.Alarm_Level Alarmlevel, /*告警级别*/
                           a.Perf_Msg_Id, /*性能消息ID*/
                           a.Alarm_Class Alarmclass, /*告警类别*/
                           To_Char(a.Generate_Time, 'yyyy-mm-dd') Generatedate, /*产生日期*/
                           To_Char(a.Last_Generate_Time, 'yyyy-mm-dd') Lastdate, /*最后产生日期*/
                           B1.List_Label Alarm_Type,
                           C1.Ne_Name,
                           d.Kpi_Name,
                           a.Kpi_Value,
                           Decode(C1.Ne_Flag, 6, e.Region_Name, C2.Ne_Name) Datasource,
                           '' ||
                           Decode(a.Alarm_Level,
                                  '1',
                                  '严重',
                                  '2',
                                  '重要',
                                  '3',
                                  '一般',
                                  '未知') || '' Alarm_Level,
                           To_Char(a.Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Generate_Time,
                           To_Char(a.Last_Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Last_Generate_Time,
                           a.Alarm_Times,
                           Decode(a.Oprt_State,
                                  '20',
                                  To_Char(a.Confirm_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                  '25',
                                  To_Char(a.Suspend_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                  '30',
                                  To_Char(a.Clear_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                  '40',
                                  To_Char(a.Delete_Time, 'yyyy-mm-dd hh24:mi:ss')) Executetim,
                           B2.List_Label Alarm_State,
                           Decode(a.Flow_Id, '', '未派单', '已派单') Work_State /*自定义列项*/
                      From Ne_Alarm_List a,
                           (Select *
                              From Tp_Domain_Listvalues
                             Where Domain_Code = 'DOMAIN_NE_ALARM_TYPE') B1,
                           (Select *
                              From Tp_Domain_Listvalues
                             Where Domain_Code = 'DOMAIN_ALARM_STATE') B2,
                           (Select *
                              From Tp_Domain_Listvalues
                             Where Domain_Code = 'DOMAIN_DR_ID_FLAG') B3,
                           (Select *
                              From Tp_Domain_Listvalues
                             Where Domain_Code = 'DOMAIN_ALARM_OPRT_STATE') B4,
                           Net_Element C1,
                           Net_Element C2,
                           Kpi_Code_List d,
                           Manage_Region e,
                           Kpi_Mapping_Cfg f,
                           Ne_Trans_Alarm Nta,
                           (Select t.Primary_Id "$PRIMARY_ID",
                                   Sum(t.Has_Read) "$HAS_READ"
                              From Tree_Privilege t
                             Where t.Tree_Cfg_Name = 'NET_ELEMENT'
                               And t.Assign_Object In
                                   ('STAFF_3', 'ORG_2', 'STATION_22')
                             Group By t.Primary_Id) "$PRI_VIEW"
                     Where Nvl(a.Config_Ne_Id, a.Ne_Id) =
                           "$PRI_VIEW"."$PRIMARY_ID"(+)
                       And Nvl("$PRI_VIEW"."$HAS_READ", 0) > 0
                       And B1.List_Value = a.Alarm_Type
                       And a.Ne_Id = C1.Ne_Id
                       And B2.List_Value = a.Alarm_State
                       And B3.List_Value = a.Dr_Id || ''
                       And B4.List_Value = a.Oprt_State
                       And a.Config_Ne_Id = C2.Ne_Id(+)
                       And a.Kpi_Id = d.Kpi_Id
                       And a.Kpi_Id = f.Kpi_Id(+)
                       And Nvl(a.Alarm_Region_Origin, '-1') =
                           To_Char(e.Region_Id(+))
                       And a.Ne_Alarm_List_Id = Nta.Ne_Alarm_List_Id(+)
                       And (Select Path || '/'
                              From Net_Element
                             Where Ne_Id = a.Ne_Id) Not Like
                           (Select Path || '/%'
                              From Net_Element
                             Where Ne_Type_Id = 30
                               And Ne_Flag = '6'
                               And State = '0SA') /*非业务系统*/
                       And To_Number(a.Dr_Id) = 0
                     Order By B2.Sort_Id,
                              B4.Sort_Id,
                              a.Alarm_Type,
                              a.Ne_Id,
                              Nvl(a.Last_Send_Time, a.Create_Time) Desc) a Where Rownum<=15) b
     Where b.Sid >= 1
    

    2、进一步完善,Pkp_Flow_Function调用次数过多,可考虑用表关联改写该函数的代码,减少不必要的递归和调用(这个待后续再考虑完善)

    另外细心的朋友在分析《10046trace_service.txt》的时候或许注意到

    SELECT PERSON 
    FROM
     TACHE WHERE FLOW_ID = :B1 AND STATE <> 'F' 
    

    该语句有着惊人的逻辑读!32714110/65331=500意味着每获取一行需要进行500次的逻辑读,由此可见,该语句应该可以优化,果然,当前语句是全表扫描。 建如下索引后,继续执行,立即发现代价从原先的111缩短到2,收效相当明显!

    create index idx_union on tache (flow_id,person,state)
    

    收效相当明显,在我未进行上述调整函数调用的时候,优化的效果是:从每获取一行需要进行500次的逻辑读到每获取一行需要3次逻辑读。

    ELECT PERSON 
    FROM
     TACHE WHERE FLOW_ID = :B1 AND STATE <> 'F'
     
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute  65339      0.85       0.79          0          0          0           0
    Fetch   130670      1.27       1.11          3     195993          0       65331
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   196010      2.13       1.90          3     195993          0       65331
    

    大家可以看到由于函数调用这个硬伤未处理,递归依然极大,但是逻辑读变小确实不争的事实。事实上是,这个语句在66秒完成,虽说不如意,但总比以前的230多秒好,加上致命硬伤函数调用一解决,最终在3秒多执行完毕。(只改函数调用的致命伤而不优化这个语句,执行是在5秒完成)


    7、余音绕梁

    事情算是最终解决了,想想,有如下几点总结

    1、开发人员尽量避免写过多的函数调用,这样将有可能导致惊人的递归调用和逻辑读,此外在SQL中大量调用函数,也容易导致SQL引擎和PLSQL引擎进行大量的上下文切换,这样性能必定收到影响。

    2、查问题不能想当然,其实我们的笔记本环境和服务器环境大多记录是一样的,唯独tache表是不同,笔记本是空记录,就这一处差别,让我们限入迷茫中。

    3、解决问题要所有的疑惑都解决了,才是真正解决问题,比如我们考虑改写SQL,分页调优,即便真的变快了,能回答为什么笔记本不需要改写SQL就能快吗?

    4、解决问题可以从 看执行计划----看逻辑读递归-------10046trace 这种方式深入。 当然中间少不了各种猜想。

    比如降低数据库版本,在本案例中无用,或许在下一个案例中,就是你发现问题的一个方法。

    比如分页查询,在本案例中或许无用,但是在别的场合,就有可能会有用武之地。

    比如看统计信息收集情况及表物理索引物理分布,这些或许就是在下一案例中的有力武器。

    此外优化的思路莫过于减少访问量和增大吞吐量(吞吐量有些可遇不可求),而访问量是由单次访问量和执行的次数决定的,本次优化处理函数写法就是减少了执行次数,而修改那条SQL则是减少单次的访问量!

    5、其实本案例并未完,该过程的调用写法一定不是最佳的写法,所以要有精益求精的精神,后续,考虑优化这个过程。因为如果不是需求每次只要展现15页,优化这个函数就已成必然!

  • 相关阅读:
    Data Structure and Algorithm
    Data Structure and Algorithm
    Data Structure and Algorithm
    Data Structure and Algorithm
    Data Structure and Algorithm
    Data Structure and Algorithm
    Data Structure and Algorithm
    Data Structure and Algorithm
    Data Structure and Algorithm
    Data Structure and Algorithm
  • 原文地址:https://www.cnblogs.com/killkill/p/1687920.html
Copyright © 2011-2022 走看看