分区表的统计信息收集策略

zoukankan html css js c++ java

分区表的统计信息收集策略
#####1 如果每天产生一个分区，

1.10g库如果是一个日分区表，每天产生20到30万笔数据，可以考虑采用分区复制的方式来缓解10g 晚上22点的统计信息造成的I/O 高峰期的

分区复制方法参考（https://orahow.com/how-to-gather-statistics-on-large-partitioned-tables-in-oracle/）

2.11g库可以采用“Incremental Statistic 的方式只收集增量数据。

（根据http://blog.itpub.net/53956/viewspace-1372944）这个会导致sysaux 表空间增大，典型的空间换时间，

空间是sysaux 空间增大，

时间是不用每次都收集全局统计信息和分区信息。只需要收集分区信息即可

－》在我们实际环境中遇到的问题是，分区（（dba_tab_partitions））的统计信息不及时，导致执行计划走的是本地分区索引，而没有走全局索引，

请见以下sample 2

http://blog.itpub.net/53956/viewspace-1372944
https://orahow.com/how-to-gather-statistics-on-large-partitioned-tables-in-oracle/
http://oracledoug.com/serendipity/index.php?/archives/1565-Statistics-on-Partitioned-Tables-Part-3.html

---非incremental方式下新加分区后对整个分区表收集统计信息，会全量扫描分区表中所有的分区，即使那些没有改变过的分区也会被重新扫描一遍
---全量方式下即使没有更改过的分区的统计信息也会被重刷一遍

--incremental 模式
---统计完后查看global和partition级的统计信息，发现其中仅新建的两个分区和全局的统计信息得到了更新

--table status

select dbms_stats.get_prefs('granularity','dddd','aaa') from dual;
DBMS_STATS.GET_PREFS('GRANULARITY','JD','IMS_RES_MONITOR_2')
--------------------------------------------------------------------------------------------------------------------------------------------
AUTO

select dbms_stats.get_prefs('incremental','dddd','aaa') from dual;
DBMS_STATS.GET_PREFS('INCREMENTAL','JD','IMS_RES_MONITOR_2')
--------------------------------------------------------------------------------------------------------------------------------------------
FALSE

select dbms_stats.get_prefs('estimate_percent','dddd','aaa') from dual;

DBMS_STATS.GET_PREFS('ESTIMATE_PERCENT','JD','IMS_RES_MONITOR_2')
--------------------------------------------------------------------------------------------------------------------------------------------
DBMS_STATS.AUTO_SAMPLE_SIZE

######sample 2

综上所述，如果分区表每个分区的分区统计信息（dba_tab_partitions）　无法及时更新，在查询条件的值既可以走　本地索引和　也可以走　全局索引的条件下，

会导致oracle无法选择　性能更好的　全局索引，而选择走　性能一般　的本地索引。

－－gloable indexex
select * from dba_indexes where index_name='RTH_I4'
2019/6/19 23:28:59

--local indexex

select * from dba_indexes where index_name='RTH_PK'
2019/6/19 23:29:10

select count(*) from dd.aa where TRAN_DATE = to_date('05/01/2019 00:00:00', 'MM/dd/YYYY HH24:MI:SS')
230522

select count(*) from dd.aa where REFERENCE = 'RB99988800000162307240'
0

select count(*) from dd.aa where TRAN_DATE = to_date('06/01/2019 00:00:00', 'MM/dd/YYYY HH24:MI:SS')
239525

select count(*) from dd.aa where REFERENCE = 'RB99988800000162307240'
0

### gloable 统计信息有收集全局统计信息收集时间为 2019/6/19
select table_name, global_stats, last_analyzed, num_rows
from dba_tables
where table_name='aa'
and owner='dd'
order by 1, 2, 4
desc nulls last;

aa YES 2019/6/19 23:28:47 162623888

###每个分区的统计信息，明显滞后，201906分区收集时间大概为　2019/6/22，滞后４天，　
select table_name, partition_name, global_stats, last_analyzed, num_rows
from dba_tab_partitions
where table_name='aa'
and table_owner='dd'
order by 1, 2, 4 desc nulls last;

30 aa RTH_PART_201906 YES 2019/6/22 0:10:12 5557206

分析思路：分析时间为20190623 分析如下：

####good: 使用gather_plan_statistics + 一连窜数字，造成一个硬解析，，查看统计信息，为什么选择好的执行计划。因为每个分区201906的信息在2019/6/22得到更新了。

简单解释下：这个是INDEX RANGE SCAN　，全局索引扫描，消耗比较低，是只需执行执行了一次全局扫描，加起来大概有starts(1) 次，这样造成ＳＱＬ消耗比较低。

select /*+ gather_plan_statistics */

var B2 varchar2(50);
var B1 varchar2(50);
BEGIN :B2:=to_date('05/01/2019 00:00:00', 'MM/dd/YYYY HH24:MI:SS'); END;

BEGIN :B1:='RB99988800000162307240'; END;

SELECT /*+ 321 gather_plan_statistics */ *
FROM
dd.aa WHERE TRAN_DATE = :B2 AND REFERENCE = :B1;
SELECT /*+ gather_plan_statistics */ *
FROM
dd.aa WHERE TRAN_DATE = :B2 AND REFERENCE = :B1;

select sql_id, child_number, sql_text
from v$sql
where sql_text like '%gather_plan_statistics%'

select * from table(dbms_xplan.display_cursor('f0jxv1k4kn240', 0, 'ALLSTATS LAST'))

SQL> /

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID f0jxv1k4kn240, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ * FROM dd.aa
WHERE TRAN_DATE = :B2 AND REFERENCE = :B1

Plan hash value: 3374353779

-------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 4 |

* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| aa | 1 | 1 | 0 |00:00:00.01 | 4 |
|* 2 | INDEX RANGE SCAN | RTH_I4 | 1 | 1 | 0 |00:00:00.01 | 4 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("TRAN_DATE"=:B2)
2 - access("REFERENCE"=:B1)

#####bad : 使用gather_plan_statistics + 一连窜数字，造成一个硬解析，，查看统计信息，为什么选择不好的执行计划。因为每个分区201907的信息（在（20190623 ））没有得到更新了　。也就是oracle 优化器，无法估计准确统计信息值，只能根据经验　选择　主键　查询。

简单解释下：这个是INDEX RANGE SCAN　，本地分区索引扫描，消耗过高，是大概每个　独立的本地分区索引都执行了一次扫描，加起来大概有starts(100) 次，这样造成ＳＱＬ消耗过高。

var B2 varchar2(50);
var B1 varchar2(50);
BEGIN :B2:=to_date('07/22/2019 00:00:00', 'MM/dd/YYYY HH24:MI:SS'); END;

/
BEGIN :B1:='RB99988800000162307240'; END;

/
SELECT /*+ gather_plan_statistics 1235689 */ *
FROM
dd.aa WHERE TRAN_DATE = :B2 AND REFERENCE = :B1;

SELECT /*+ gather_plan_statistics peng */ * FROM dd.aa WHERE TRAN_DATE = :B2 AND REFERENCE = :B1

select sql_id, child_number, sql_text
from v$sql
where sql_text like '%gather_plan_statistics%';

select * from table(dbms_xplan.display_cursor('45txzjv81vxpq', 0, 'ALLSTATS LAST'));

简单解释下：这个是INDEX RANGE SCAN　，本地分区索引扫描，消耗过高，是大概每个　独立的本地分区索引都执行了一次扫描，加起来大概有starts(100) 次，这样造成ＳＱＬ消耗过高。

Plan hash value: 7605104

--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 100 |

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | PARTITION RANGE SINGLE | | 1 | 1 | 0 |00:00:00.01 | 100 |
| 2 | PARTITION HASH ALL | | 1 | 1 | 0 |00:00:00.01 | 100 |
|* 3 | TABLE ACCESS BY LOCAL INDEX ROWID| aa | 100 | 1 | 0 |00:00:00.01 | 100 |
|* 4 | INDEX RANGE SCAN | RTH_PK | 100 | 1 | 0 |00:00:00.01 | 100 |
--------------------------------------------------------------------------------------------------------------

################

https://blogs.oracle.com/optimizer/maintaining-statistics-on-large-partitioned-tables

Maintaining statistics on large partitioned tables

By: Sunil Chakkappen

We have gotten a lot of questions recently regarding how to gather and maintain optimizer statistics on large partitioned tables. The majority of these questions can be summarized into two topics:

When queries access a single partition with stale or non-existent partition level statistics I get a sub optimal plan due to "Out of Range" values

Global statistics collection is extremely expensive in terms of time and system resources

This article will describe both of these issues and explain how you can address them both.

This is big topic so I recommend that you also check out the three-part series of posts on maintaining incremental statistics in partitioned tables.

Out of Range
Large tables are often decomposed into smaller pieces called partitions in order to improve query performance and ease of data management. The Oracle query optimizer relies on both the statistics of the entire table (global statistics) and the statistics of the individual partitions (partition statistics) to select a good execution plan for a SQL statement. If the query needs to access only a single partition, the optimizer uses only the statistics of the accessed partition. If the query access more than one partition, it uses a combination of global and partition statistics.

"Out of Range" means that the value supplied in a where clause predicate is outside the domain of values represented by the [minimum, maximum] column statistics. The optimizer prorates the selectivity based on the distance between the predicate value and the maximum value (assuming the value is higher than the max), that is, the farther the value is from the maximum value, the lower the selectivity will be. This situation occurs most frequently in tables that are range partitioned by a date column, a new partition is added, and then queried while rows are still being loaded in the new partition. The partition statistics will be stale very quickly due to the continuous trickle feed load even if the statistics get refreshed periodically. The maximum value known to the optimizer is not correct leading to the "Out of Range" condition. The under-estimation of selectivity often leads the query optimizer to pick a sub optimal plan. For example, the query optimizer would pick an index access path while a full scan is a better choice.

The "Out of Range" condition can be prevented by using the new copy table statistics procedure available in Oracle Database10.2.0.4 and 11g. This procedure copies the statistics of the source [sub] partition to the destination [sub] partition. It also copies the statistics of the dependent objects: columns, local (partitioned) indexes etc. It adjusts the minimum and maximum values of the partitioning column as follows; it uses the high bound partitioning value as the maximum value of the first partitioning column (it is possible to have concatenated partition columns) and high bound partitioning value of the previous partition as the minimum value of the first partitioning column for range partitioned table. It can optionally scale some of the other statistics like the number of blocks, number of rows etc. of the destination partition.

Assume we have a table called SALES that is ranged partitioned by quarter on the SALES_DATE column. At the end of every day data is loaded into latest partition. However, statistics are only gathered at the end of every quarter when the partition is fully loaded. Assuming global and partition level statistics (for all fully loaded partitions) are up to date, use the following steps in order to prevent getting a sub-optimal plan due to "out of range".
1. Lock the table statistics using LOCK_TABLE_STATS procedure in DBMS_STATS. This is to avoid interference from auto statistics job.
EXEC DBMS_STATS.LOCK_TABLE_STATS('SH','SALES');
2. Before beginning the initial load into each new partition (say SALES_Q4_2000) copy the statistics from the previous partition (say SALES_Q3_2000) using COPY_TABLE_STATS. You need to specify FORCE=>TRUE to override the statistics lock.
EXEC DBMS_STATS.COPY_TABLE_STATS ('SH', 'SALES', 'SALES_Q3_2000', 'SALES_Q4_2000', FORCE=>TRUE);

Expensive global statistics collection

In data warehouse environment it is very common to do a bulk load directly into one or more empty partitions. This will make the partition statistics stale and may also make the global statistics stale. Re-gathering statistics for the effected partitions and for the entire table can be very time consuming. Traditionally, statistics collection is done in a two-pass approach:

In the first pass we will scan the table to gather the global statistics

In the second pass we will scan the partitions that have been changed to gather their partition level statistics.

The full scan of the table for global statistics collection can be very expensive depending on the size of the table. Note that the scan of the entire table is done even if we change a small subset of partitions.

We avoid scanning the whole table when computing global statistics by deriving the global statistics from the partition statistics. Some of the statistics can be derived easily and accurately from partition statistics. For example, number of rows at global level is the sum of number of rows of partitions. Even global histogram can be derived from partition histograms. But the number of distinct values (NDV) of a column cannot be derived from partition level NDVs. So, Oracle maintains another structure called a synopsis for each column at the partition level. A synopsis can be considered as sample of distinct values. The NDV can be accurately derived from synopses. We can also merge multiple synopses into one. The global NDV is derived from the synopsis generated by merging all of the partition level synopses. To summarize:

Gather statistics and create synopses for the changed partitions only

Oracle automatically merges partition level synopses into a global synopsis

The global statistics are automatically derived from the partition level statistics and global synopses

Incremental maintenance feature is disabled by default. It can be enabled by changing the INCREMENTAL table preference to true. It can also be enabled for a particular schema or at the database level.

Assume we have table called SALES that is range partitioned by day on the SALES_DATE column. At the end of every day data is loaded into latest partition and partition statistics are gathered. Global statistics are only gathered at the end of every month because gathering them is very time and resource intensive. Use the following steps in order to maintain global statistics after every load.

Turn on incremental feature for the table.

EXEC DBMS_STATS.SET_TABLE_PREFS('SH','SALES','INCREMENTAL','TRUE');
At the end of every load gather table statistics using GATHER_TABLE_STATS command. You don't need to specify the partition name. Also, do not specify the granularity parameter. The command will collect statistics for partitions where data has change or statistics are missing and update the global statistics based on the partition level statistics and synopsis.
EXEC DBMS_STATS.GATHER_TABLE_STATS('SH','SALES'); （11g）

Note: that the incremental maintenance feature was introduced in Oracle Database 11g Release 1. However, we also provide a solution in Oracle Database10g Release 2 (10.2.0.4) that simulates the same behavior. The 10g solution is a new value, 'APPROX_GLOBAL AND PARTITION' for the GRANULARITY parameter of the GATHER_TABLE_STATS procedures. It behaves the same as the incremental maintenance feature except that we don't update the NDV for non-partitioning columns and number of distinct keys of the index at the global level. For partitioned column we update the NDV as the sum of NDV at the partition levels. Also we set the NDV of columns of unique indexes as the number of rows of the table. In general, non-partitioning column NDV at the global level becomes stale less often. It may be possible to collect global statistics less frequently then the default (when table changes 10%) since approx_global option maintains most of the global statistics accurately.

#########2 11G 新特性：

http://blog.itpub.net/90618/viewspace-1296970/

Applying “Incremental Statistic” for Oracle Big Partition Table

分类： Linux操作系统

2014-10-13 11:17:09

原文地址：Applying “Incremental Statistic” for Oracle Big Partition Table 作者：realkid4

In CBO, Statistic is critical important for Optimizer. Precise and timely statistics will truly reflect the data distribution and volume, and generate wiser SQL Execution Plan (SEP). After years of improvement, CBO is already widely accepted as the default and desirable solution for Oracle Optimizer.

As I mention in many blogs before, statistic is like the raw material for CBO optimizer. Without accurate statistic, it’s impossible to generate optimal SEP for CBO. So in CBO era, DBAs’ concern is about how to collect data statistic and finding out the proper frequency to collect them.

Since 10g, Oracle introduces a new feature of data statistic job to collect statistic automatically. The job will run at daily interval to collect statistic, and provide most database object statistic to CBO. The new feature solves many problems in most cases. The statistic collection job and mature dynamic sampling sweep the barrier of CBO’s usage.

1.      Big Partition Table Statistic

Frankly speaking, Oracle internal functions are already enough for ordinary system requirement. But for some special cases, life would be tougher.

For many OLTP/OLAP systems, they usually load bulk of data into some extremely huge table (Mostly Partition Tables) in the night and some other business-free time, and then doing the processing work. The problem is that Oracle would have few chances to collect statistic after the data loading, which would influence the data volume and distribution information.

The problem would finally lead to the bad performance in the next processing work. In production environments, we often receive complains about some job processing is extremely slow. But after clearly examination, the old statistic is main reason. After collection, the job would be better. And then in the next loading, errors would happen again.

If we insert the statistic collection statement between data loading and processing, there would be some time consuming drawbacks for some big partition tables. For most partitions in these tables, the data is stable, and loading work only affects one or two partitions. But for normal partition statistic works, Oracle will collect all partition including inactive partitions, which consume a lot of resource.

In 11g, Oracle introduces a new feature named “Incremental Statistic”, which will only collect the partitions which are new or undergoing huge data changes. So it will save the time of collecting inactive partitions and make collection work shorter.

2.      Environment Introduction

The new feature of “Incremental Statistic” is first introduced in the update edition of 10.2.0.4, and become formal in Oracle 11gR1. So we chose the Oracle 11gR2 as the test environment.

SQL> select * from v$version;

BANNER

--------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

PL/SQL Release 11.2.0.1.0 - Production

CORE        11.2.0.1.0         Production

I prepare a normal list partition table named T_PART.

SQL> create table t_part

  2  ( owner varchar2(100),

  3    object_id number,

  4    object_name varchar2(100),

  5    create_date date

  6  )

  7  partition by list(owner)

  8  (partition t_part_pub values ('PUBLIC'),

  9   partition t_part_sys values ('SYS'),

10   partition t_part_big1 values ('APEX_030200','SYSMAN'),

11   partition t_part_big2 values ('ORDSYS','MDSYS'),

12   partition t_part_other values(default));

Table created

3.      No-Incremental Statistic Behavior

Firstly, let’s see the default Oracle behavior. for partition statistic.

SQL> insert into t_part select owner, object_id, object_name, sysdate from dba_objects where owner in ('PUBLIC','ORDSYS');

30228 rows inserted

SQL> commit;

Commit complete

SQL> exec dbms_stats.gather_table_stats(user,'T_PART',cascade => true);

PL/SQL procedure successfully completed

SQL> select partition_name, NUM_ROWS, BLOCKS, LAST_ANALYZED, global_stats from dba_tab_partitions where table_owner='SYS' and table_name='T_PART';

PARTITION_NAME    NUM_ROWS     BLOCKS LAST_ANALYZED        GLOBAL_STATS

--------------- ---------- ---------- -------------------- ------------

T_PART_PUB           27696        197 2012/10/9 22:08:11   YES

T_PART_SYS               0          0 2012/10/9 22:08:11   YES

T_PART_BIG1              0          0 2012/10/9 22:08:10   YES

T_PART_BIG2           2532         20 2012/10/9 22:08:10   YES

T_PART_OTHER             0          0 2012/10/9 22:08:10   YES

The default behavior. is that Oracle will collect all partitions according to the last_analyzed column in view dba_tab_partition.

Doing some modification focus on some partitions, let’s see the collection results.

SQL> insert into t_part select owner, object_id, object_name, sysdate from dba_objects where owner in ('SYS','APEX_030200');

33241 rows inserted

SQL> commit;

Commit complete

SQL> exec dbms_stats.gather_table_stats(user,'T_PART',cascade => true);

PL/SQL procedure successfully completed

SQL> select partition_name, NUM_ROWS, BLOCKS, LAST_ANALYZED, global_stats from dba_tab_partitions where table_owner='SYS' and table_name='T_PART';

PARTITION_NAME    NUM_ROWS     BLOCKS LAST_ANALYZED        GLOBAL_STATS

--------------- ---------- ---------- -------------------- ------------

T_PART_PUB           27696        197 2012/10/9 22:13:31   YES

T_PART_SYS           30835        197 2012/10/9 22:13:31   YES

T_PART_BIG1           2406         20 2012/10/9 22:13:30   YES

T_PART_BIG2           2532         20 2012/10/9 22:13:31   YES

T_PART_OTHER             0          0 2012/10/9 22:13:31   YES

The insert statement only affects partition T_PART_SYS and T_PART_BIG1. But when we run the collection statement, all partitions will getting involved in the collection work.

4.      Incremental Statistic Setting

It’s easy to change the statistic strategy for big partition table in Oracle 11g. Using the package dbms_stats will help setting the parameters.

The default strategy for Oracle partition table is not incremental setting. Only three parameters will affect the behavior.

SQL> select dbms_stats.get_prefs('PUBLISH','SYS','T_PART') from dual;

DBMS_STATS.GET_PREFS('PUBLISH'

--------------------------------------------------------------------------------

TRUE

SQL> select dbms_stats.get_prefs('INCREMENTAL','SYS','T_PART') from dual;

DBMS_STATS.GET_PREFS('INCREMEN

--------------------------------------------------------------------------------

FALSE

SQL> select dbms_stats.get_prefs('GRANULARITY','SYS','T_PART') from dual;

DBMS_STATS.GET_PREFS('GRANULAR

--------------------------------------------------------------------------------

AUTO

Parameter “PUBLISH” is used to determinate whether CBO use the newest statistic when new statistic coming. It relates to pending statistic feature in most cases. And we need to ensure its value is true (Default Value).

Parameter “Incremental” default value is false, which means it would not adopt incremental statistic strategy by default.

Parameter “GRANULARITY” is not related to incremental statistic strategy in Oracle 11g, but related in Oracle 10gR2. So we need to keep it “AUTO” in Oracle 11g environment.

The estimated_precent should be keep to “AUTO”, in order to maintain the incremental statistic.

In the next section, we will see the effect of the new feature.

5.      Incremental Statistic Behavior

First, we need to change the setting for Big partition table.

SQL> exec dbms_stats.set_table_prefs(user,'T_PART',pname => 'INCREMENTAL',pvalue => 'TRUE');

PL/SQL procedure successfully completed

SQL> select dbms_stats.get_prefs('INCREMENTAL','SYS','T_PART') from dual;

DBMS_STATS.GET_PREFS('INCREMEN

--------------------------------------------------------------------------------

TRUE

Do some modification work to T_PART, and see the effect of collection.

SQL> select partition_name, NUM_ROWS, BLOCKS, LAST_ANALYZED, global_stats from dba_tab_partitions where table_owner='SYS' and table_name='T_PART';

PARTITION_NAME    NUM_ROWS     BLOCKS LAST_ANALYZED        GLOBAL_STATS

--------------- ---------- ---------- -------------------- ------------

T_PART_PUB           27696        197 2012/10/9 22:13:31   YES

T_PART_SYS           30835        197 2012/10/9 22:13:31   YES

T_PART_BIG1           2406         20 2012/10/9 22:13:30   YES

T_PART_BIG2           2532         20 2012/10/9 22:13:31   YES

T_PART_OTHER             0          0 2012/10/9 22:13:31   YES

SQL> insert into t_part select owner, object_id, object_name, sysdate from dba_objects where owner not in ('SYS','APEX_030200','PUBLIC','SYSMAN','ORDSYS','MDSYS');

3465 rows inserted

SQL> insert into t_part select owner, object_id, object_name, sysdate from dba_objects where owner in ('SYS');

30835 rows inserted

SQL> commit;

Commit complete

SQL> exec dbms_stats.gather_table_stats(user,'T_PART',cascade => true);

PL/SQL procedure successfully completed

SQL> select partition_name, NUM_ROWS, BLOCKS, LAST_ANALYZED, global_stats from dba_tab_partitions where table_owner='SYS' and table_name='T_PART';

PARTITION_NAME    NUM_ROWS     BLOCKS LAST_ANALYZED        GLOBAL_STATS

--------------- ---------- ---------- -------------------- ------------

T_PART_PUB           27696        197 2012/10/9 22:29:26   YES

T_PART_SYS           61670        398 2012/10/9 22:29:21   YES

T_PART_BIG1           2406         20 2012/10/9 22:29:29   YES

T_PART_BIG2           2532         20 2012/10/9 22:29:22   YES

T_PART_OTHER          3465         23 2012/10/9 22:29:28   YES

The result is weird. Only two partitions are affected in the insert serial, but after collection, we find all partitions are collected. Not the things should be~

Let’s do more continue.

--Delete only One Partition Data

SQL> delete t_part where wner='SYS';

61670 rows deleted

SQL> commit;

Commit complete

SQL> exec dbms_stats.gather_table_stats(user,'T_PART');

PL/SQL procedure successfully completed

SQL> select partition_name, NUM_ROWS, BLOCKS, LAST_ANALYZED, global_stats from dba_tab_partitions where table_owner='SYS' and table_name='T_PART';

PARTITION_NAME    NUM_ROWS     BLOCKS LAST_ANALYZED        GLOBAL_STATS

--------------- ---------- ---------- -------------------- ------------

T_PART_PUB           27696        197 2012/10/9 22:29:26   YES

T_PART_SYS               0        398 2012/10/9 22:42:47   YES

T_PART_BIG1           2406         20 2012/10/9 22:29:29   YES

T_PART_BIG2           2532         20 2012/10/9 22:29:22   YES

T_PART_OTHER          3465         23 2012/10/9 22:29:28   YES

The desired result comes out. Only the affected partitions is collected according to the last_analyzed column, and the table analyzed time is also changed.

SQL> select table_name, last_analyzed from dba_tables where wner='SYS' and table_name='T_PART';

TABLE_NAME                     LAST_ANALYZED

------------------------------ --------------------

T_PART                         2012/10/9 22:42:49

If we affect more than one partition at once, how things would be.

SQL> insert into t_part select owner, object_id, object_name, sysdate from dba_objects where owner in ('PUBLIC');

27696 rows inserted

SQL> insert into t_part select owner, object_id, object_name, sysdate from dba_objects where owner in ('APEX_030200');

2406 rows inserted

SQL> commit;

Commit complete

SQL> exec dbms_stats.gather_table_stats(user,'T_PART',cascade => true);

PL/SQL procedure successfully completed

SQL> select partition_name, NUM_ROWS, BLOCKS, LAST_ANALYZED, global_stats from dba_tab_partitions where table_owner='SYS' and table_name='T_PART';

PARTITION_NAME    NUM_ROWS     BLOCKS LAST_ANALYZED        GLOBAL_STATS

--------------- ---------- ---------- -------------------- ------------

T_PART_PUB           55392        388 2012/10/9 22:47:07   YES

T_PART_SYS           30874        398 2012/10/9 22:45:21   YES

T_PART_BIG1           4812         36 2012/10/9 22:47:08   YES

T_PART_BIG2           2532         20 2012/10/9 22:29:22   YES

T_PART_OTHER          3465         23 2012/10/9 22:29:28   YES

The truncate statement will affect the partitions and let’s see the results.

SQL> truncate table t_part;

Table truncated

SQL> exec dbms_stats.gather_table_stats(user,'T_PART',cascade => true);

PL/SQL procedure successfully completed

SQL> select partition_name, NUM_ROWS, BLOCKS, LAST_ANALYZED, global_stats from dba_tab_partitions where table_owner='SYS' and table_name='T_PART';

PARTITION_NAME    NUM_ROWS     BLOCKS LAST_ANALYZED        GLOBAL_STATS

--------------- ---------- ---------- -------------------- ------------

T_PART_PUB               0          0 2012/10/9 22:55:31   YES

T_PART_SYS               0          0 2012/10/9 22:55:31   YES

T_PART_BIG1              0          0 2012/10/9 22:55:31   YES

T_PART_BIG2              0          0 2012/10/9 22:55:31   YES

T_PART_OTHER             0          0 2012/10/9 22:55:31   YES

SQL> insert into t_part select owner, object_id, object_name, sysdate from dba_objects where owner in ('SYS','APEX_030200');

33280 rows inserted

SQL> commit;

Commit complete

SQL> exec dbms_stats.gather_table_stats(user,'T_PART',cascade => true);

PL/SQL procedure successfully completed

SQL> select partition_name, NUM_ROWS, BLOCKS, LAST_ANALYZED, global_stats from dba_tab_partitions where table_owner='SYS' and table_name='T_PART';

PARTITION_NAME    NUM_ROWS     BLOCKS LAST_ANALYZED        GLOBAL_STATS

--------------- ---------- ---------- -------------------- ------------

T_PART_PUB               0          0 2012/10/9 22:55:31   YES

T_PART_SYS           30874        202 2012/10/9 22:58:05   YES

T_PART_BIG1           2406         20 2012/10/9 22:58:07   YES

T_PART_BIG2              0          0 2012/10/9 22:55:31   YES

T_PART_OTHER             0          0 2012/10/9 22:55:31   YES

Ok, the result is correct.

Only one thing we still wandering: Why we first switch the table to incremental statistic mode, and then we do some modification on some partitions. After collection, we found Oracle will collect the statistic for all partition, and then incremental will happen in the next collection.

The reasonable explanation is: Like Oracle incremental backup strategy, although you have a full mode backup, Oracle would do the incremental full backup work at first time. As for incremental statistic, things are similar. Oracle first needs a full incremental statistic, and then collects the increments.

6.      Conclusion

The advantage of incremental statistic is that they can collect only active partitions’ data in the shorter time. The inactive partitions which do not have significant data modification operation will not getting involved in the collection work.

For the load data and then process jobs, it is the desirable collection strategy.

##############3 gather_stats_job - GRANULARITY AUTO for Partitions and Subpartitions

Hi,

we have a few tables with partitions and subpartitions and use the "auto optimizer stats collection" to collect the stats.

My question is about the "GRANULARITY" parameter:

SELECT DBMS_STATS.get_prefs ('GRANULARITY') GRANULARITY FROM DUAL;

GRANULARITY

------------------------------

AUTO

The documentations says: "determines the granularity based on the partitioning type. This is the default value." What does "AUTO" mean, does oracle the stats collection for global, partition and subpartition if they exist?

See my other parameter:

SELECT DBMS_STATS.get_prefs ('cascade') CASCADE FROM DUAL;

CASCADE

------------------------------

DBMS_STATS.AUTO_CASCADE

SELECT DBMS_STATS.get_prefs ('estimate_percent') ESTIMATE_PESQL> RCENT FROM DUAL;

ESTIMATE_PERCENT

------------------------------

DBMS_STATS.AUTO_SAMPLE_SIZE

SELECT DBMS_STATS.get_prefs ('stale_percent') STALE_PERCENT FROM DUAL;

STALE_PERCENT

------------------------------

10

Regards,

Sebastian S.

我有同样的问题显示 0 喜欢(0)

204 查看

标签：

回复

平均用户评级: 无评分 (0 评级)
平均用户评级

无评分

(0 评级)

您的评级：

评级差（1，最高值为 5）评级中下（2，最高值为 5）评级中等（3，最高值为 5）评级中上（4，最高值为 5）评级优（5，最高值为 5）
1. Re: GATHER_STATS_JOB - GRANULARITY AUTO for Partitions and Subpartitions

Tedper-Oracle 2013-11-12 上午8:45 （回复 Sebastian S.）
Hi Sebastian, and welcome to Oracle Communities.
Actually, the best practice is not to specify any value for GRANULARITY, as described in this blog posting. The exact method varies, depending upon your database release level.

Sincerely,

Ted
Oracle Support Services

喜爱显示 0 喜欢(0)

回复

操作

2. Re: GATHER_STATS_JOB - GRANULARITY AUTO for Partitions and Subpartitions

Seliem -Oracle 2013-11-12 上午9:02 （回复 Sebastian S.）
Per oracle doc: Oracle recommends setting GRANULARITY to AUTO to gather subpartition, partition, or global statistics depending on the partition type.

http://docs.oracle.com/cd/E11882_01/server.112/e41573/stats.htm#PFGRF95252

Regards,
Seliem

喜爱显示 0 喜欢(0)

回复

操作

3. Re: GATHER_STATS_JOB - GRANULARITY AUTO for Partitions and Subpartitions

Sebastian S. 2013-11-21 上午3:03 （回复 Sebastian S.）
Hi Ted,

Thanks for your reply. We use the Version 11.2.0.3.8 on the database.

I just read the blogpost and assume that I should use this:

EXEC DBMS_STATS.SET_TABLE_PREFS('SH','SALES','INCREMENTAL','TRUE');

to collec the stats for my table (+partitions+subp.)

The text describes that the db will gather stats for this table only if something changed (table, partition, subp.) + update the global stats for thistable incremental, is this correct?

How does oracle recognize that something (more than 10%) has changed? DBA_TAB_STATISTICS WHERE stale_stats = 'YES'?

Regards,

Sebastian

喜爱显示 0 喜欢(0)

回复

操作

4. Re: GATHER_STATS_JOB - GRANULARITY AUTO for Partitions and Subpartitions

Tedper-Oracle 2013-11-21 上午8:25 （回复 Sebastian S.）
Hi, Sebastian,
The amount of DML activity (and hence, whether or not the segment is stale) is presented externally via DBA_TAB_MODIFICATIONS, columns INSERTS, UPDATES, and DELETES.

Sincerely,

Ted
Oracle Support Services
###########3

深入分区表的增量统计信息收集技术(incremetal statistics collection)

http://blog.itpub.net/53956/viewspace-1372944

############5

Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Information in this document applies to any platform.

SYMPTOMS
- A large database was upgraded (RAC or non-RAC) with large AWR repository
- DBMS_STATS.GATHER_DICTIONARY_STATS() command was ran and took many hours without generation of data dictionary statistics
- Stats gathering seems to be stuck in following command on "SYS"."WRH$_ACTIVE_SESSION_HISTORY" table as per 10046 event tracing at level 12 on the session:
  
  select substrb(dump(val,16,0,32),1,120) ep, cnt
  from
  (select /*+ no_expand_table(t) index_rs(t) no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring no_substrb_pad */max(substrb("CLIENT_ID",1,32)) val,count(*) cnt
  from "SYS"."WRH$_ACTIVE_SESSION_HISTORY" sample ( .3361644492) t
  where substrb("CLIENT_ID",1,32) is not null ...
CAUSE

Depending on how much AWR data has been kept in the database, "SYS"."WRH$_ACTIVE_SESSION_HISTORY" table can be one of the largest SYS objects in some databases. As a result, it can take long time to gather statistics for this table.

SOLUTION

1.Exclude any AWR repository table like  WRH$_ACTIVE_SESSION_HISTORY which is known to take a long time to gather statistics:

$ sqlplus / as sysdba

EXEC DBMS_STATS.LOCK_TABLE_STATS('SYS', 'WRH$_ACTIVE_SESSION_HISTORY');

2. Gather stats only for the objects with 10% or more data changes using options => 'GATHER STALE option:

$ sqlplus / as sysdba

EXEC DBMS_STATS.GATHER_DICTIONARY_STATS ( options => 'GATHER STALE');

####sample

dbms_stats.gather_schema_stats的GATHER STALE选项

http://blog.itpub.net/15415488/viewspace-674679/

STALE

dbms_stats.gather_schema_stats可以使用options => 'GATHER STALE'来只分析有过时的统计信息的表。那么Oracle是如何知道表的统计信息是过时(STALE)的呢？这就是本文探讨的问题。

    首先Oracle是根据dba_tab_statistics的stale_stats字段来找出统计信息STALE的表，默认情况下STALE的定义是自从上次分析之后变动超过10%的表。那么Oracle是如何知道哪些表有多少变动呢？

    答案就是在10g之前，通过alter table monitoring来制定，在10g之后，只要开启了AWR，也就是statistics_level=typical或all之后，所有表都自动被monitoring。这些收集的信息就放在dba_tab_modifications视图里。所以，dba_tab_statistics的stale_stats字段是通过分析dba_tab_modifications来得到。

    但是最近我们遇到的一个问题是，dba_tab_statistics.stale_stats显示没有任何表有stale stats，但是通过dbms_stats.gather_schema_stats的options => 'GATHER STALE'进行收集，却有不少表被分析。

    原来问题出在dba_tab_modifications视图的刷新频率不是实时上，在文档里可以看到：

“For performance reasons, the Oracle Database does not populate this view immediately when the actual modifications occur. Run the FLUSH_DATABASE_MONITORING_INFO procedure in the DIMS_STATS PL/SQL package to populate this view with the latest information. ”

    而dbms_stats.gather_schema_stats的options => 'GATHER STALE'执行之前相当于会自动的FLUSH_DATABASE_MONITORING_INFO一次然后再开始分析。

    所以解决方法是在脚本里添加dbms_stats.FLUSH_DATABASE_MONITORING_INFO();，再查看dba_tab_statistics.stale_stats。



    综上，依赖关系如下：

dbms_stats.gather_schema_stats's options 'GATHER STALE'

=======>   dba_tab_statistics.stale_stats = 'YES'

=======>   dba_tab_modifications has 10% change

=======>   9i(alter table monitoring),>=10G(AWR enabled)

=======>   dbms_stats.FLUSH_DATABASE_MONITORING_INFO(); can help get the most current info, otherwize not accurate.

具体实验如下：

1. user_tab_statistics.stale_stats is based on user_tab_modifications
SQL> insert into AAA select * from AAA;

65536 rows created.

SQL> commit;

Commit complete.

SQL> select TABLE_NAME,INSERTS,UPDATES,DELETES from user_tab_modifications;

no rows selected

SQL> select TABLE_NAME from user_tab_statistics where stale_stats = 'YES';

no rows selected

SQL> exec dbms_stats.FLUSH_DATABASE_MONITORING_INFO();

PL/SQL procedure successfully completed.

SQL> select TABLE_NAME,INSERTS,UPDATES,DELETES from user_tab_modifications;

TABLE_NAME                        INSERTS    UPDATES    DELETES
------------------------------ ---------- ---------- ----------
AAA                                 65536          0          0

SQL> select TABLE_NAME from user_tab_statistics where stale_stats = 'YES';

TABLE_NAME
------------------------------
AAA

2.gather_schema_stats will analyze the actuall 'STALE' tables even if they didn't show up in the user_tab_statistics.

SQL> insert into AAA select * from AAA;

262144 rows created.

SQL> commit;

Commit complete.

SQL> select TABLE_NAME,INSERTS,UPDATES,DELETES from user_tab_modifications;

no rows selected

SQL> select TABLE_NAME from user_tab_statistics where stale_stats = 'YES';

no rows selected

SQL> select to_char(LAST_ANALYZED,'yyyymmdd hh24:mi:ss') from user_Tables where table_name='AAA';

TO_CHAR(LAST_ANAL
-----------------
20100921 20:29:12

SQL> BEGIN
2 dbms_stats.gather_schema_stats(ownname => 'HAOZHU',options => 'GATHER STALE',estimate_percent => 100,method_opt=> 'for all columns size 1', cascade=>true,no_invalidate => false, degree=>1);
END; 3
4 /

PL/SQL procedure successfully completed.

SQL> select to_char(LAST_ANALYZED,'yyyymmdd hh24:mi:ss') from user_Tables where table_name='AAA';

TO_CHAR(LAST_ANAL
-----------------
20100921 20:30:43
查看全文

相关阅读:
第六周总结
 《构建之法》读后感二
 移动端疫情展示
 第五周
 用python爬取疫情数据
 第四周
 疫情图表展示和时间查询
 wpf datagrid row height 行高自动计算使每行行高自适应文本
 c# 实现mysql事务
 c# 简单实现插件模型反射方式

原文地址：https://www.cnblogs.com/feiyun8616/p/9193034.html

分区表的统计信息收集策略

Maintaining statistics on large partitioned tables

1. Re: GATHER_STATS_JOB - GRANULARITY AUTO for Partitions and Subpartitions

2. Re: GATHER_STATS_JOB - GRANULARITY AUTO for Partitions and Subpartitions

3. Re: GATHER_STATS_JOB - GRANULARITY AUTO for Partitions and Subpartitions

4. Re: GATHER_STATS_JOB - GRANULARITY AUTO for Partitions and Subpartitions

SYMPTOMS

CAUSE

SOLUTION

dbms_stats.gather_schema_stats的GATHER STALE选项

http://blog.itpub.net/15415488/viewspace-674679/

STALE