zoukankan      html  css  js  c++  java
  • Measure the size of a PostgreSQL table row

    Q:
     

    I have a PostgreSQL table. select * is very slow whereas select id is nice and quick. I think it may be that the size of the row is very large and it's taking a while to transport, or it may be some other factor.

    I need all of the fields (or nearly all of them), so selecting just a subset isn't a quick fix. Selecting the fields that I want is still slow.

    Here's my table schema minus the names:

    integer                  | not null default nextval('core_page_id_seq'::regclass)
    character varying(255)   | not null
    character varying(64)    | not null
    text                     | default '{}'::text
    character varying(255)   | 
    integer                  | not null default 0
    text                     | default '{}'::text
    text                     | 
    timestamp with time zone | 
    integer                  | 
    timestamp with time zone | 
    integer                  | 

    The size of the text field may be any size. But still, no more than a few kilobytes in the worst case.

    Questions

    1. Is there anything about this that screams 'crazy inefficient'?
    2. Is there a way to measure page size at the Postgres command-line to help me debug this?
     
     
    A1:
     
     
     
     

    Q2: way to measure page size

    PostgreSQL provides a number of Database Object Size Functions, you can use. I packed the most interesting ones in this query and added some Statistics Access Functions.

    This is going to demonstrate that the various methods to measure the "size of a row" can lead to very different results. It all depends what you want to measure exactly.

    Replace public.tbl with your (optionally schema-qualified) table name to get a compact view of collected statistics about the size of your rows.

    WITH x AS (
       SELECT count(*)               AS ct
            , sum(length(t::text))   AS txt_len  -- length in characters
            , 'public.tbl'::regclass AS tbl  -- provide (qualified) table name here
       FROM   public.tbl t  -- ... and here
       )
    , y AS (
       SELECT ARRAY [pg_relation_size(tbl)
                   , pg_relation_size(tbl, 'vm')
                   , pg_relation_size(tbl, 'fsm')
                   , pg_table_size(tbl)
                   , pg_indexes_size(tbl)
                   , pg_total_relation_size(tbl)
                   , txt_len
                 ] AS val
            , ARRAY ['core_relation_size'
                   , 'visibility_map'
                   , 'free_space_map'
                   , 'table_size_incl_toast'
                   , 'indexes_size'
                   , 'total_size_incl_toast_and_indexes'
                   , 'live_rows_in_text_representation'
                 ] AS name
       FROM   x
       )
    SELECT unnest(name)                AS what
         , unnest(val)                 AS "bytes/ct"
         , pg_size_pretty(unnest(val)) AS bytes_pretty
         , unnest(val) / ct            AS bytes_per_row
    FROM   x, y
    
    UNION ALL SELECT '------------------------------', NULL, NULL, NULL
    UNION ALL SELECT 'row_count', ct, NULL, NULL FROM x
    UNION ALL SELECT 'live_tuples', pg_stat_get_live_tuples(tbl), NULL, NULL FROM x
    UNION ALL SELECT 'dead_tuples', pg_stat_get_dead_tuples(tbl), NULL, NULL FROM x;

    I only pack the values in arrays and unnest() again, so I don't have to spell out calculations for every single row repeatedly.

    General row count statistics are appended at the end with unconventional SQL-foo to get everything in one query. You could wrap it into a plpgsql function for repeated use, hand in the table name as parameter and use EXECUTE.

    Result:

                  what                | bytes/ct | bytes_pretty | bytes_per_row
    -----------------------------------+----------+--------------+---------------
     core_relation_size                | 44138496 | 42 MB        |            91
     visibility_map                    |        0 | 0 bytes      |             0
     free_space_map                    |    32768 | 32 kB        |             0
     table_size_incl_toast             | 44179456 | 42 MB        |            91
     indexes_size                      | 33128448 | 32 MB        |            68
     total_size_incl_toast_and_indexes | 77307904 | 74 MB        |           159
     live_rows_in_text_representation  | 29987360 | 29 MB        |            62
     ------------------------------    |          |              |
     row_count                         |   483424 |              |
     live_tuples                       |   483424 |              |
     dead_tuples                       |     2677 |              |

    The additional module pgstattuple provides more useful functions.

    Update for Postgres 9.3+

    We could use the new form of unnest() in pg 9.4 taking multiple parameters to unnest arrays in parallel.
    But using LATERAL and a VALUES expression, this can be simplified further. Plus some other improvements:

    SELECT l.what, l.nr AS "bytes/ct"
         , CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty
         , CASE WHEN is_size THEN nr / x.ct END          AS bytes_per_row
    FROM  (
       SELECT min(tableoid)        AS tbl      -- same as 'public.tbl'::regclass::oid
            , count(*)             AS ct
            , sum(length(t::text)) AS txt_len  -- length in characters
       FROM   public.tbl t  -- provide table name *once*
       ) x
     , LATERAL (
       VALUES
          (true , 'core_relation_size'               , pg_relation_size(tbl))
        , (true , 'visibility_map'                   , pg_relation_size(tbl, 'vm'))
        , (true , 'free_space_map'                   , pg_relation_size(tbl, 'fsm'))
        , (true , 'table_size_incl_toast'            , pg_table_size(tbl))
        , (true , 'indexes_size'                     , pg_indexes_size(tbl))
        , (true , 'total_size_incl_toast_and_indexes', pg_total_relation_size(tbl))
        , (true , 'live_rows_in_text_representation' , txt_len)
        , (false, '------------------------------'   , NULL)
        , (false, 'row_count'                        , ct)
        , (false, 'live_tuples'                      , pg_stat_get_live_tuples(tbl))
        , (false, 'dead_tuples'                      , pg_stat_get_dead_tuples(tbl))
       ) l(is_size, what, nr);

    Same result.

    Q1: anything inefficient?

    You could optimize column order to save some bytes per row, currently wasted to alignment padding:

    integer                  | not null default nextval('core_page_id_seq'::regclass)
    integer                  | not null default 0
    character varying(255)   | not null
    character varying(64)    | not null
    text                     | default '{}'::text
    character varying(255)   | 
    text                     | default '{}'::text
    text                     |
    timestamp with time zone |
    timestamp with time zone |
    integer                  |
    integer                  |

    This saves between 8 and 18 bytes per row. I call it "column tetris". Details:

    Also consider:

    A2:
     
    An approximation of the size of a row, including the TOAST'ed contents, is easy to get by querying the length of the TEXT representation of the entire row:
    SELECT octet_length(t.*::text) FROM tablename AS t WHERE primary_key=:value;

    This is a close approximation to the number of bytes that will be retrieved client-side when executing:

    SELECT * FROM tablename WHERE primary_key=:value;

    ...assuming that the caller of the query is requesting results in text format, which is what most programs do (binary format is possible, but it's not worth the trouble in most cases).

    The same technique could be applied to locate the N "biggest-in-text" rows of tablename:

    SELECT primary_key, octet_length(t.*::text) FROM tablename AS t
       ORDER BY 2 DESC LIMIT :N;
    注:
    1、octet_length(string)函数表示的是Number of bytes in binary string,而length则表示的字符个数。
     
     
     
     
     
  • 相关阅读:
    大数据是否有可能有效配置资源?
    在大数据时代,我们需要数据售货员
    在大数据时代,我们需要数据售货员
    粗糙的贝叶斯转化概率预测模型
    粗糙的贝叶斯转化概率预测模型
    MVC中的ViewData、ViewBag和TempData
    58被微信玩弄,你知道吗?
    你的以太网速度足够快吗?四种更快的速度正在路上······
    hibernate预编译SQL语句中的setParameter和setParameterList
    创建单线性链表的不同表示方法和操作
  • 原文地址:https://www.cnblogs.com/xiaotengyi/p/7794064.html
Copyright © 2011-2022 走看看