zoukankan      html  css  js  c++  java
  • Asktom: Thanks for the question regarding "Partitioning question", version 8.1.7

    Brian -- Thanks for the question regarding "Partitioning question", version 8.1.7

    Submitted on 12-Oct-2001 13:16 Central time zone
    Last updated 8-Oct-2011 6:53

    You Asked

    Let's say we have a table of documents with key of doc_id.  This table contains millions 
    of documents(rows).  There is also a 
    
    table that assigns one or more industry codes to each doc:
    
    create table doc_ind
    ( doc_id    integer,
      ind_code      varchar2(10)
    )
    
    This table will therefore also contain millions of rows.
    
    The industry code is a part of the default set of industry codes used by our business.
    
    Now let's say that we sign deals with clients that want to use their own industry 
    classifications.  I don't want to create new tables for each client (there could 
    eventually be hundreds of clients and therefore hundreds of new tables) and have some 
    scheme where logic is built into the application to derive the correct table_name to 
    query based on the client.
    
    My first thought is to create a table like:
    
    create table doc_client_ind
    ( client_id     varchar2(10),
      doc_id    integer,
      ind_code      varchar2(10)
    )
    
    The client id could be any value - usually resembling the client name.
    
    Because each new client would add millions of rows to this table, I want to use 
    partitioning.  After reading up on partitioning in your book, my situation does not seem 
    to fit perfectly into standard scenarios.  At first, it seemed I would need to use hash 
    partitioning.  However, the number of distinct client_id values in this table will be 
    very small to start with and grow over time.  It is not clear how I would easily manage 
    hash partitioning well over time.
    
    In order to use range partitioning, it seems to me that client_id's would need to be 
    assigned with values that have a natural ascending sort value (i.e., c0001, c0002, etc.). 
     Then I could have one partition per client and create them on an ongoing as 
    
    needed basis.
    
    I have three questions:
    
    1. Are there other options (with or without partitioning) I don't see to manage this?
    2. Is there a way to manage this effectively using the original essentially random client 
    id values?
    3. What do you suggest given the limited info above?
    4. Given your suggestion in question 3, what are the disadvantages, caveats I need to be 
    aware of?
    
    Thanks - Brian
    
     

    and we said...

    Well, in 9i, there is LIST partitioning as well, that maybe something to consider.  Here 
    each client_id could be targeted to a specific partition.
    
    
    If you just want to achieve a nice distribution of data across many disks/partitions, you 
    could hash right now on DOC_ID.  There will be millions of unique values for that from 
    the get go -- that will hash very nicely.  Since you'll always join on doc_id and 
    client_id, partition elimination will take place just as effectively.
    
    So, either look at 9i with list partitioning if you feel you would gain some advantage to 
    partition by client_id (one would be that if you drop a client, gettng rid of their data 
    is trivial -- drop the partition).
    
    Or, hash partition on the doc_id to achieve a good overall spread of data.
    
     I wanted to know if client id was a small domain, something manageable.  If not, they would not be 
    able to range partition by it as it would be an ongoing job just to split and partition the table.
    
    50 -- they could do it.
    500,000 -- they would have a hard time at it (but hash partitioning would start to come into play).
    
    values in between with ranges/hashes -- not practical at some point.  
    
    
    Index key compression is a feature whereby repeating fields on the leading edge of the index are 
    removed from the leaf block row entries.  Compresses the data.
    
    If you create an index on (owner,object_type,object_type) on a copy of all_objects - with compress 
    2, we would store the owner, object type ONCE per leaf block, rather then once per ROW per leaf 
    block -- leading to significant savings in space.
    
     
    couple of questions for you,
    
    1.What if the column that i'm partitioning contains null entries ? is this allowed ? how will 
    oracle know where to place the data ?
    
    2. in Page 620 of the expert one on one book an error 01502 was raised because the 
    local_nonprefixed index was unusable why ? are "queries" on Partitioned Tables (PT) index dependent 
    ?
    
    3. say i have a table
    create table detail
    ( cust_id      number
    , cust_balance number
    , some other columns.....
    )
    
    and cust_id has 65 distinct values over 7 million records. Will partitioning hash ( cust_id ) be of 
    any use assuming that the Front-End apps issues out a dynamic query that will not use any indexed 
    columns on any of its predicate ?
    
    one advantage that i see in using PT's is data will be forced group by partition.
    
     

    Followup June 27, 2003 - 9am Central time zone:

    1) yes allowed, it'll put them where it "wants".  
    
    ops$tkyte@ORA920> create table t
      2  (
      3    object_id        number ,
      4    object_type      varchar2(18)
      5  )
      6  partition by hash(object_id) partitions 8
      7  /
    
    Table created.
    
    ops$tkyte@ORA920>
    ops$tkyte@ORA920> insert into t select object_id, object_type from all_objects;
    
    31589 rows created.
    
    ops$tkyte@ORA920> insert into t select NULL, 'xxx' from all_users;
    
    47 rows created.
    
    ops$tkyte@ORA920>
    ops$tkyte@ORA920> declare
      2      l_cnt number;
      3  begin
      4      for x in ( select partition_name
      5                   from user_tab_partitions
      6                  where table_name = 'T' )
      7      loop
      8          execute immediate
      9          'select count(*)
     10             from t partition(' || x.partition_name || ')'
     11             into l_cnt;
     12          dbms_output.put_line
     13          ( x.partition_name || ' ' || l_cnt || ' rows...' );
     14
     15          execute immediate
     16          'select count(*)
     17             from t partition(' || x.partition_name || ')
     18            where object_id is null'
     19            into l_cnt;
     20          dbms_output.put_line
     21          ( x.partition_name || ' ' || l_cnt || ' NULL rows...' );
     22      end loop;
     23  end;
     24  /
    SYS_P347 3965 rows...
    SYS_P347 47 NULL rows...
    SYS_P348 3948 rows...
    SYS_P348 0 NULL rows...
    SYS_P349 4042 rows...
    SYS_P349 0 NULL rows...
    SYS_P350 3930 rows...
    SYS_P350 0 NULL rows...
    SYS_P351 3868 rows...
    SYS_P351 0 NULL rows...
    SYS_P352 3895 rows...
    SYS_P352 0 NULL rows...
    SYS_P353 4072 rows...
    SYS_P353 0 NULL rows...
    SYS_P354 3916 rows...
    SYS_P354 0 NULL rows...
    
    PL/SQL procedure successfully completed.
    
    
    for example -- they all went into the "first" partition in that case.
    
    2) page 640 maybe?  it was raised because
    
       a) the index was unusable
       b) the query plan said "i need to read that index"
    
    queries against partitioned tables are no more or less "index dependent" then queries against 
    unpartitioned tables.  same thing would happen against a "normal" table if its indexes were 
    unusable and query plan said "i want to use that index"
    
    3) 65 is too small in all likelyhood, unless you are really lucky.  It would be far too easy to 
    have this happen:
    
    ops$tkyte@ORA920> select count(distinct object_id) from t;
    
    COUNT(DISTINCTOBJECT_ID)
    ------------------------
                          65
    
    ops$tkyte@ORA920>
    ops$tkyte@ORA920> declare
      2      l_cnt number;
      3  begin
      4      for x in ( select partition_name
      5                   from user_tab_partitions
      6                  where table_name = 'T' )
      7      loop
      8          execute immediate
      9          'select count(*)
     10             from t partition(' || x.partition_name || ')'
     11             into l_cnt;
     12          dbms_output.put_line
     13          ( x.partition_name || ' ' || l_cnt || ' rows...' );
     14      end loop;
     15  end;
     16  /
    SYS_P395 65 rows...
    SYS_P396 0 rows...
    SYS_P397 0 rows...
    SYS_P398 0 rows...
    SYS_P399 0 rows...
    SYS_P400 0 rows...
    SYS_P401 0 rows...
    SYS_P402 0 rows...
    
    PL/SQL procedure successfully completed.
    
    
    All 7 million rows could be in the same hash partition.  You would use range or list instead. 
  • 相关阅读:
    列举 spring 支持的事务管理类型?
    memcached 能够更有效地使用内存吗?
    Redis 集群方案什么情况下会导致整个集群不可用?
    详细描述一下 Elasticsearch 更新和删除文档的过程?
    Redis 常见性能问题和解决方案?
    如果有大量的 key 需要设置同一时间过期,一般需要注意什么?
    synchronized 和 ReentrantLock 的区别?
    Redis 支持的 Java 客户端都有哪些?官方推荐用哪个?
    memcached 最大的优势是什么?
    memcached 是原子的吗?
  • 原文地址:https://www.cnblogs.com/tracy/p/2208637.html
Copyright © 2011-2022 走看看