zoukankan      html  css  js  c++  java
  • PostgreSQL在何处处理 sql查询之二十二

    接前面。

    回到程序调用关系上来:

    estimate_rel_size -> RelationGetNumberOfBlocks->RelationGetNumberOfBlocksINFork

    ->Smgrnblocks->mdnblocks...

    折腾了一圈,就是为了评估一个表的大小。

    那么,我们所获得的block,它到底是个什么单位?

    BlockNumber
    mdnblocks(SMgrRelation reln, ForkNumber forknum)
    {
        MdfdVec    *v = mdopen(reln, forknum, EXTENSION_FAIL);
        BlockNumber nblocks;
        BlockNumber segno = 0;
    
        /*
         * Skip through any segments that aren't the last one, to avoid redundant
         * seeks on them.  We have previously verified that these segments are
         * exactly RELSEG_SIZE long, and it's useless to recheck that each time.
         *
         * NOTE: this assumption could only be wrong if another backend has
         * truncated the relation.    We rely on higher code levels to handle that
         * scenario by closing and re-opening the md fd, which is handled via
         * relcache flush.    (Since the checkpointer doesn't participate in
         * relcache flush, it could have segment chain entries for inactive
         * segments; that's OK because the checkpointer never needs to compute
         * relation size.)
         */
        while (v->mdfd_chain != NULL)
        {
            segno++;
            v = v->mdfd_chain;
        }
    
        for (;;)
        {
            nblocks = _mdnblocks(reln, forknum, v);
            fprintf(stderr,"%d blocks by process %d\n\n",nblocks,getpid());
    
            if (nblocks > ((BlockNumber) RELSEG_SIZE))
                elog(FATAL, "segment too big");
            if (nblocks < ((BlockNumber) RELSEG_SIZE))
                return (segno * ((BlockNumber) RELSEG_SIZE)) + nblocks;
    
            /*
             * If segment is exactly RELSEG_SIZE, advance to next one.
             */
            segno++;
    
            if (v->mdfd_chain == NULL)
            {
                /*
                 * Because we pass O_CREAT, we will create the next segment (with
                 * zero length) immediately, if the last segment is of length
                 * RELSEG_SIZE.  While perhaps not strictly necessary, this keeps
                 * the logic simple.
                 */
                v->mdfd_chain = _mdfd_openseg(reln, forknum, segno, O_CREAT);
                if (v->mdfd_chain == NULL)
                    ereport(ERROR,
                            (errcode_for_file_access(),
                             errmsg("could not open file \"%s\": %m",
                                    _mdfd_segpath(reln, forknum, segno))));
            }
    
            v = v->mdfd_chain;
        }
    }

    还是用实验来验证一下吧:

    先建立表:

    postgres=# create table tst01(id integer);
    CREATE TABLE
    postgres=# 
    
    postgres=# select oid from pg_class where relname='tst01';
      oid  
    -------
     16384
    (1 row)

    据我所知,PostgreSQL中,integer类型的数据会在每条记录中占用4个字节。

    那么我想,4字节×2048条记录=8192字节,也就是8K。

    事实如何呢?

    [root@lex base]# ls ./12788/16384
    ./12788/16384
    
    postgres=# insert into tst01 values(generate_series(1,2048));
    INSERT 0 2048
    postgres=# 
    
    [root@lex base]# ls -lrt ./12788/16384
    -rw------- 1 postgres postgres 81920 May 28 11:54 ./12788/16384
    [root@lex base]# ls -lrt -kb ./12788/16384
    -rw------- 1 postgres postgres 80 May 28 11:54 ./12788/16384
    [root@lex base]# 

    不是8K,而是 80K!

    数据量再翻上一倍会如何?

    postgres=# insert into tst01 values(generate_series(2049,4096));
    INSERT 0 2048
    postgres=#
    
    
    [root@lex base]# ls -lrt -kb ./12788/16384
    -rw------- 1 postgres postgres 152 May 28 11:56 ./12788/16384
    [root@lex base]# 

    原本我以为,8K为单位的block,仅仅是一小部分是冗余数据(如Header),但事实是并非这样。

    问了牛人,得到的答复是:

    postgres=# select pg_column_size(id) from tst01 limit 1;
     pg_column_size
    ----------------
                  4
    (1 row)
    
    
    postgres=# select pg_column_size(t) from tst01 t limit 1;
     pg_column_size
    ----------------
                 28
    (1 row)
    
    

     然后再来看程序里对block的处理:

    postgres=# select count(*) from tst01;
     count 
    -------
      4096
    (1 row)
    
    postgres=# 

    此时,后台输出的是:

    19 blocks by process 4920

    19是什么概念:

    [root@lex 12788]# ls -lrt 16384
    -rw------- 1 postgres postgres 155648 May 28 11:58 16384
    [root@lex 12788]# 
    
    155648/8096 = 19.225296442688

    正好合拍。所以PostgreSQL的源代码中,mdnblocks 取得的block数目,就是 8K为单位的数据块的个数。

    从前面的小实验中也可以看到,如果一条记录中的数据较少,header部分所占冗余就占比较大了。

    因此,如果想要正确评估一个表所占用的实际空间,基本上要靠抽样了。

  • 相关阅读:
    where field in
    看看 高考
    高分的标准
    UCOS-消息邮箱(学习笔记)
    UCOS-互斥信号量(学习笔记)
    UCOS-信号量(学习笔记)
    RVMDK的DEBUG调试-实时数据查看
    OSTimeDelay(1)
    STM32中断控制及优先级设置
    MODBUS-RTU学习
  • 原文地址:https://www.cnblogs.com/gaojian/p/3103274.html
Copyright © 2011-2022 走看看