zoukankan      html  css  js  c++  java
  • pg_pathman 的初步使用

    os: centos 7.4
    postgresql: 9.6

    pg_pathman是postgresql管理分区插件,postgresql 9.6、10 的内置分区管理也一直都在完善。使用哪种方式来管理,用户自己决定。不过pg_pathman 确实很方便。
    由于pg_pathman使用了custom scan provider api,所以只支持PostgreSQL 9.5以及以上的版本

    The pg_pathman module provides optimized partitioning mechanism and functions to manage partitions.

    The extension is compatible with:

    PostgreSQL 9.5, 9.6, 10;
    Postgres Pro Standard 9.5, 9.6;
    Postgres Pro Enterprise;
    

    PostgreSQL <= 10 supports partitioning via table inheritance: each partition must be created as a child table with CHECK CONSTRAINT:

    CREATE TABLE test (id SERIAL PRIMARY KEY, title TEXT);
    CREATE TABLE test_1 (CHECK ( id >= 100 AND id < 200 )) INHERITS (test);
    CREATE TABLE test_2 (CHECK ( id >= 200 AND id < 300 )) INHERITS (test);

    PostgreSQL 10 provides native partitioning:

    CREATE TABLE test(id int4, value text) PARTITION BY RANGE(id);
    CREATE TABLE test_1 PARTITION OF test FOR VALUES FROM (1) TO (10);
    CREATE TABLE test_2 PARTITION OF test FOR VALUES FROM (10) TO (20);

    下载

    $ git clone https://github.com/postgrespro/pg_pathman.git
    $ git branch -a
    $ git checkout PGPRO
    

    安装

    export PGHOME=/usr/pgsql-9.6 
    export PGDATA=/var/lib/pgsql/9.6/main
    export MANPATH=$PGHOME/share/man:$MANPATH
    export LD_LIBRARY_PATH=$PGHOME/lib
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
    export PATH=$PGHOME/bin:$PATH
    
    $ make USE_PGXS=1
    $ make install USE_PGXS=1
    $ ls -l /usr/pgsql-9.6/lib/ |grep -i path
    -rwxr-xr-x 1 postgres postgres 253688 Jul  3 17:02 pg_pathman.so
    $ ls -l /usr/pgsql-9.6/share/extension/ |grep -i path
    -rw-r--r-- 1 postgres postgres 55882 Jul  3 17:02 pg_pathman--1.0--1.1.sql
    -rw-r--r-- 1 postgres postgres 36973 Jul  3 17:02 pg_pathman--1.1--1.2.sql
    -rw-r--r-- 1 postgres postgres 29154 Jul  3 17:02 pg_pathman--1.2--1.3.sql
    -rw-r--r-- 1 postgres postgres 43021 Jul  3 17:02 pg_pathman--1.3--1.4.sql
    -rw-r--r-- 1 postgres postgres 57125 Jul  3 17:02 pg_pathman--1.4.sql
    -rw-r--r-- 1 postgres postgres   131 Jul  3 17:02 pg_pathman.control
    
    
    $ vi postgresql.conf
    
    shared_preload_libraries = 'pg_pathman, pg_stat_statements'
    
    $ psql
    postgres=# c peiybdb
    You are now connected to database "peiybdb" as user "postgres".
    peiybdb=# show shared_preload_libraries;
        shared_preload_libraries    
    --------------------------------
     pg_pathman, pg_stat_statements
    (1 row)
    
    peiybdb=# select * from pg_available_extensions where name like '%path%';
        name    | default_version | installed_version |             comment              
    ------------+-----------------+-------------------+----------------------------------
     pg_pathman | 1.4             | 1.4               | Partitioning tool for PostgreSQL
    (1 row)
    
    peiybdb=# CREATE EXTENSION pg_pathman;
    CREATE EXTENSION
    
    peiybdb=# select * from pg_extension where 1=1 and extname like '%path%';
      extname   | extowner | extnamespace | extrelocatable | extversion |   extconfig   | extcondition 
    ------------+----------+--------------+----------------+------------+---------------+--------------
     pg_pathman |       10 |         2200 | f              | 1.4        | {16403,16414} | {"",""}
    (1 row)
    

    更新

    ALTER EXTENSION pg_pathman UPDATE TO "X.Y";
    SET pg_pathman.enable = t;

    使用

    range 分区

    peiybdb=# create table tmp_t1(id int, name text, insert_timestamp timestamp not null);
    CREATE TABLE
    peiybdb=# insert into tmp_t1 select id,md5(random()::text),current_date - id from generate_series(1,10000) t(id);
    INSERT 0 10000
    peiybdb=# select min(insert_timestamp),max(insert_timestamp) from tmp_t1;
             min         |         max         
    ---------------------+---------------------
     1991-02-15 00:00:00 | 2018-07-02 00:00:00
    (1 row)
    
    

    分区表有一些限制条件
    1、分区列not null约束
    2、分区个数必须能覆盖已有的所有记录

    创建1000个分区,每个分区包含一个月的数据。

    peiybdb=# select create_range_partitions(
    'tmp_t1'::regclass,                 -- 主表oid
    'insert_timestamp',                 -- 分区字段,一定要not null约束
    '1990-01-01 00:00:00'::timestamp,   -- 开始时间
    interval '1 month',                 -- 分区间隔,一个月
    1000,                               -- 分区表数量
    false                               -- 不立即将数据从主表迁移到分区
    );

    执行非堵塞迁移

    peiybdb=# df+ partition_table_concurrently
    List of functions
    -[ RECORD 1 ]-------+--------------------------------------------------------------------------------------------
    Schema              | public
    Name                | partition_table_concurrently
    Result data type    | void
    Argument data types | relation regclass, batch_size integer DEFAULT 1000, sleep_time double precision DEFAULT 1.0
    Type                | normal
    Volatility          | volatile
    Parallel            | unsafe
    Owner               | postgres
    Security            | invoker
    Access privileges   | 
    Language            | c
    Source code         | partition_table_concurrently
    Description         |
    
    peiybdb=# partition_table_concurrently(
    relation   REGCLASS,              -- 主表OID
    batch_size INTEGER DEFAULT 1000,  -- 一个事务批量迁移多少记录
    sleep_time FLOAT8 DEFAULT 1.0     -- 获得行锁失败时,休眠多久再次获取,重试60次退出任务
    )    
    
    peiybdb=# select partition_table_concurrently('tmp_t1'::regclass,1000,1.0);
    

    禁用主表

    peiybdb=# select set_enable_parent('tmp_t1'::regclass, false);

    迁移后,查看数据

    peiybdb=# select count(*) from  tmp_t1;
     count 
    -------
     10000
    (1 row)
    
    peiybdb=# select count(*) from only tmp_t1;
     count 
    -------
         0
    (1 row)

    pathman_partition_list 展示的是具体的分区表

    peiybdb=# select * from pathman_partition_list limit 100;
     parent | partition  | parttype |       expr       |      range_min      |      range_max      
    --------+------------+----------+------------------+---------------------+---------------------
     tmp_t1 | tmp_t1_1   |        2 | insert_timestamp | 1990-01-01 00:00:00 | 1990-02-01 00:00:00
     tmp_t1 | tmp_t1_2   |        2 | insert_timestamp | 1990-02-01 00:00:00 | 1990-03-01 00:00:00
     tmp_t1 | tmp_t1_3   |        2 | insert_timestamp | 1990-03-01 00:00:00 | 1990-04-01 00:00:00
     tmp_t1 | tmp_t1_4   |        2 | insert_timestamp | 1990-04-01 00:00:00 | 1990-05-01 00:00:00
     tmp_t1 | tmp_t1_5   |        2 | insert_timestamp | 1990-05-01 00:00:00 | 1990-06-01 00:00:00
     tmp_t1 | tmp_t1_6   |        2 | insert_timestamp | 1990-06-01 00:00:00 | 1990-07-01 00:00:00
     tmp_t1 | tmp_t1_7   |        2 | insert_timestamp | 1990-07-01 00:00:00 | 1990-08-01 00:00:00
     tmp_t1 | tmp_t1_8   |        2 | insert_timestamp | 1990-08-01 00:00:00 | 1990-09-01 00:00:00
     tmp_t1 | tmp_t1_9   |        2 | insert_timestamp | 1990-09-01 00:00:00 | 1990-10-01 00:00:00
     tmp_t1 | tmp_t1_10  |        2 | insert_timestamp | 1990-10-01 00:00:00 | 1990-11-01 00:00:00

    这是如果插入的分区值不在分区表的范围内,pg_pathman会自动创建相关的分区表,这个确实很实用、很方便。

    hash 分区

    peiybdb=# create table tmp_hash(id int not null, name text, insert_timestamp timestamp not null);
    CREATE TABLE
    peiybdb=# insert into tmp_hash select id,md5(random()::text),current_date - id from generate_series(1,10000) t(id);
    INSERT 0 10000
    peiybdb=# select min(id),max(id) from tmp_hash;
     min |  max  
    -----+-------
       1 | 10000
    (1 row)
    

    创建50个hash分区。

    peiybdb=# select create_hash_partitions(
    'tmp_hash'::regclass,               -- 主表oid
    'id',                               -- 分区字段,一定要not null约束
    50,                                 -- 分区表数量
    false                               -- 不立即将数据从主表迁移到分区
    );

    执行非堵塞迁移

    peiybdb=# partition_table_concurrently(
    relation   REGCLASS,              -- 主表OID
    batch_size INTEGER DEFAULT 1000,  -- 一个事务批量迁移多少记录
    sleep_time FLOAT8 DEFAULT 1.0     -- 获得行锁失败时,休眠多久再次获取,重试60次退出任务
    )    
    
    peiybdb=# select partition_table_concurrently('tmp_hash'::regclass,1000,1.0);
    

    禁用主表

    peiybdb=# select set_enable_parent('tmp_hash'::regclass, false);

    查看结果

    peiybdb=# select count(1) from tmp_hash;
     count 
    -------
     10000
    (1 row)
    
    peiybdb=# select count(1) from only tmp_hash;
     count 
    -------
         0
    (1 row)

    查看一些相关信息

    peiybdb=# d+
                                       List of relations
     Schema |             Name              |   Type   |  Owner   |    Size    | Description 
    --------+-------------------------------+----------+----------+------------+-------------
     public | pathman_cache_stats           | view     | postgres | 0 bytes    | 
     public | pathman_concurrent_part_tasks | view     | postgres | 0 bytes    | 
     public | pathman_config                | table    | postgres | 16 kB      | 
     public | pathman_config_params         | table    | postgres | 16 kB      | 
     public | pathman_partition_list        | view     | postgres | 0 bytes    |
    

    建议

    1. 分区列必须有not null约束
    2. 分区个数必须能覆盖已有的所有记录
    3. 建议使用非堵塞式迁移接口(select partition_table_concurrently(‘tmp_t1’::regclass,1000,1.0);)
    4. 建议数据迁移完成后,禁用主表 (select set_enable_parent(‘tmp_t1’::regclass, false);)

    参考:
    https://github.com/postgrespro/pg_pathman
    https://github.com/postgrespro/pg_pathman/wiki

    https://github.com/postgrespro/pg_pathman_build

  • 相关阅读:
    感谢那些给予我无偿帮助的人!
    软件工程总结
    《暗时间》部分感想!
    四个数混合运算,数据库存题,程序集构建三层建构
    三个数混合运算和三层架构
    需求
    数据库实现,以及工厂方法模式实现
    WPF中实现
    git简单操作
    git操作??
  • 原文地址:https://www.cnblogs.com/ctypyb2002/p/9792952.html
Copyright © 2011-2022 走看看