PostgreSQL Partitions

zoukankan html css js c++ java

PostgreSQL Partitions

why we need partitions

The first and most demanding reason to use partitions in a database is to increase the performance of the database. This is achieved by partition-wise joins; if a user’s queries perform a lot of full-table scans, partitioning will help vastly, because partitions will limit the scope of this search. The second important reason to partition is ease of managing large tables.

Partitioning always helps manage large objects. Although it is applicable to objects of any size, the advantages are more apparent in large tables. When a user recreates an index on a nonpartitioned table, the only option is to build the entire index in one statement. On the other hand, if the table is partitioned, the user can rebuild partitions of the local indexes one at a time.

PostgreSQL supports table partitioning through table inheritance, which means every

partition will be created as a child table of a single parent table. Partitioning is performed in such a way that every child table inherits a single parent table. The parent table will be empty; it exists just to describe the whole dataset. Currently in PostgreSQL, partitioning can be implemented in range partitioning or list partitioning.

Range partitioning can be done, for example, by record number ranges (such as record 0 to 100 or 100 to 200) or even using date ranges (such as from 2014-11-01 to 2014-11-30).

List partitioning can be done, for example, by a list of cities (such as New York, Boston,

Chicago, and Houston) or list of departments (such as HR, finance, administration, and so on).

There are five simple steps used to create a partition in PostgreSQL, which are as follows:

1. Create the master table.

2. Create multiple child tables without having an overlapped table constraint.

3. Create indexes.

4. Create a trigger function to insert data into child tables.

5. Enable the constraint exclusion.

Range partition

The range partition is the partition in which we partition a table into ranges defined by a single column or multiple columns. When defining the ranges, the user will have to take care that ranges should be connected and not overlapping with each other; moreover, ranges must be defined using the < value operator. For instance, one can partition by date ranges or ranges of identifiers for specific business objects.

Creating the master table

[postgres@minion1 bin]$ ./psql

psql (9.4.5)

Type "help" for help.

postgres=# create database partition_db;

CREATE DATABASE

postgres=# c partition_db

You are now connected to database "partition_db" as user "postgres".

partition_db=# create database warehouse_db;

CREATE DATABASE

partition_db=# c warehouse_db

You are now connected to database "warehouse_db" as user "postgres".

warehouse_db=# CREATE TABLE sales_record

warehouse_db-# (

warehouse_db(# id NUMERIC PRIMARY KEY,

warehouse_db(# sales_amount NUMERIC,

warehouse_db(# sales_date DATE NOT NULL DEFAULT CURRENT_DATE

warehouse_db(# );

CREATE TABLE

Creating a range partition table

Each partition will have two months’ data.

Create the sales_record_m1_to_m2 child table as follows:

warehouse_db=# CREATE TABLE sales_record_m1_to_m2

warehouse_db-# (

warehouse_db(# PRIMARY KEY (id, sales_date),

warehouse_db(# CHECK (sales_date >= DATE '2015-01-01'

warehouse_db(# AND sales_date < DATE '2015-03-01')

warehouse_db(# )

warehouse_db-# INHERITS (sales_record);

CREATE TABLE

This child table will contain data of January and February.

Now, create the sales_record_m3_to_m4 child table as follows:

warehouse_db=# CREATE TABLE sales_record_m3_to_m4

warehouse_db-# (

warehouse_db(# PRIMARY KEY (id, sales_date),

warehouse_db(# CHECK (sales_date >= DATE '2015-03-01'

warehouse_db(# AND sales_date < DATE '2015-05-01')

warehouse_db(# )

warehouse_db-# INHERITS (sales_record);

CREATE TABLE

This child table will contain data of March and April.

Create the sales_record_m5_to_m6 child table as follows:

warehouse_db=# CREATE TABLE sales_record_m5_to_m6

warehouse_db-# (

warehouse_db(# PRIMARY KEY (id, sales_date),

warehouse_db(# CHECK (sales_date >= DATE '2015-05-01'

warehouse_db(# AND sales_date < DATE '2015-07-01')

warehouse_db(# )

warehouse_db-# INHERITS (sales_record);

CREATE TABLE

This child table will contain data of May and June.

Create the sales_record_m7_to_m8 child table as follows:

warehouse_db=# CREATE TABLE sales_record_m7_to_m8

warehouse_db-# (

warehouse_db(# PRIMARY KEY (id, sales_date),

warehouse_db(# CHECK (sales_date >= DATE '2015-07-01'

warehouse_db(# AND sales_date < DATE '2015-09-01')

warehouse_db(# )

warehouse_db-# INHERITS (sales_record);

CREATE TABLE

This child table will contain data of July and August.

Create the sales_record_m9_to_m10 child table as follows:

warehouse_db=# CREATE TABLE sales_record_m9_to_m10

warehouse_db-# (

warehouse_db(# PRIMARY KEY (id, sales_date),

warehouse_db(# CHECK (sales_date >= DATE '2015-09-01'

warehouse_db(# AND sales_date < DATE '2015-11-01')

warehouse_db(# )

warehouse_db-# INHERITS (sales_record);

CREATE TABLE

This child table will contain data of September and October.

Now, create the sales_record_m11_to_m12 child table as follows:

warehouse_db=# CREATE TABLE sales_record_m11_to_m12

warehouse_db-# (

warehouse_db(# PRIMARY KEY (id, sales_date),

warehouse_db(# CHECK (sales_date >= DATE '2015-11-01'

warehouse_db(# AND sales_date < DATE '2016-01-01')

warehouse_db(# )

warehouse_db-# INHERITS (sales_record);

CREATE TABLE

This child table will contain data of November and December.

Verifing that the tables are linked and the partition is successfully created using the following query:

warehouse_db=# d+ sales_record

                                     Table "public.sales_record"

    Column    | Type   |              Modifiers               | Storage | Stats target | Description

--------------+---------+--------------------------------------+---------+--------------+-------------

id           | numeric | not null                             | main    |              |

sales_amount | numeric |                                      | main    |              |

sales_date   | date    | not null default ('now'::text)::date | plain   |              |

Indexes:

    "sales_record_pkey" PRIMARY KEY, btree (id)

Child tables: sales_record_m11_to_m12,

              sales_record_m1_to_m2,

              sales_record_m3_to_m4,

              sales_record_m5_to_m6,

              sales_record_m7_to_m8,

              sales_record_m9_to_m10

Creating an index on child tables

Now, we are going to create indexes on child tables to speed up the sales_day field usage, using almost all queries (INSERT, SELECT, and UPDATE) on the date field.

Create the m1_to_m2_sales_date index on the sales_record_m1_to_m2 child table as follows:

warehouse_db=# CREATE INDEX m1_to_m2_sales_date ON sales_record_m1_to_m2

warehouse_db-# (sales_date);

CREATE INDEX

Create the m3_to_m4_sales_date index on the sales_record_m3_to_m4 child table as

follows:

warehouse_db=# CREATE INDEX m3_to_m4_sales_date ON sales_record_m3_to_m4

warehouse_db-# (sales_date);

CREATE INDEX

Now, let’s create the m5_to_m6_sales_date index on the sales_record_m5_to_m6 child table:

warehouse_db=# CREATE INDEX m5_to_m6_sales_date ON sales_record_m5_to_m6

warehouse_db-# (sales_date);

CREATE INDEX

Create the m7_to_m8_sales_date index on the sales_record_m7_to_m8 child table as

follows:

warehouse_db=# CREATE INDEX m7_to_m8_sales_date ON sales_record_m7_to_m8

warehouse_db-# (sales_date);

CREATE INDEX

Then, create the m9_to_m10_sales_date index on the sales_record_m9_to_m10 child table as follows:

warehouse_db=# CREATE INDEX m9_to_m10_sales_date ON sales_record_m9_to_m10

warehouse_db-# (sales_date);

CREATE INDEX

Create the m11_to_m12_sales_date index on the sales_record_m11_to_m12 child table as follows:

warehouse_db=# CREATE INDEX m11_to_m12_sales_date ON

warehouse_db-# sales_record_m11_to_m12 (sales_date);

CREATE INDEX

Creating a trigger on the master table

Conditions must be exactly the same as what the child tables check.

creating a trigger function using the following syntax:

warehouse_db=# CREATE OR REPLACE FUNCTION sales_record_insert()

warehouse_db-# RETURNS TRIGGER AS $$

warehouse_db$# BEGIN

warehouse_db$# IF (NEW.sales_date >= DATE '2015-01-01' AND NEW.sales_date < DATE '2015-03-01') THEN

warehouse_db$# INSERT INTO sales_record_m1_to_m2 VALUES (NEW.*);

warehouse_db$# ELSEIF (NEW.sales_date >= DATE '2015-03-01' AND NEW.sales_date < DATE '2015-05-01') THEN

warehouse_db$# INSERT INTO sales_record_m3_to_m4 VALUES (NEW.*);

warehouse_db$# ELSEIF (NEW.sales_date >= DATE '2015-05-01' AND NEW.sales_date < DATE '2015-07-01') THEN

warehouse_db$# INSERT INTO sales_record_m5_to_m6 VALUES (NEW.*);

warehouse_db$# ELSEIF (NEW.sales_date >= DATE '2015-07-01' AND NEW.sales_date < DATE '2015-09-01') THEN

warehouse_db$# INSERT INTO sales_record_m7_to_m8 VALUES (NEW.*);

warehouse_db$# ELSEIF (NEW.sales_date >= DATE '2015-09-01' AND NEW.sales_date < DATE '2015-11-01') THEN

warehouse_db$# INSERT INTO sales_record_m9_to_m10 VALUES (NEW.*);

warehouse_db$# ELSEIF (NEW.sales_date >= DATE '2015-11-01' AND NEW.sales_date < DATE '2016-01-01') THEN

warehouse_db$# INSERT INTO sales_record_m11_to_m12 VALUES (NEW.*);

warehouse_db$# ELSE

warehouse_db$# RAISE EXCEPTION 'Date is out of range. Something is wrong with sales_record_insert_trigger() function';

warehouse_db$# END IF;

warehouse_db$# RETURN NULL;

warehouse_db$# END;

warehouse_db$# $$LANGUAGE plpgsql;

CREATE FUNCTION

This function will simply populate the data in the respective child table on the basis of the sales_date condition.

Now, the supportive trigger will call the preceding trigger function automatically

whenever a user uses the INSERT, UPDATE, or DELETE operations on data in the master table. Creating the supportive trigger in the following manner:

warehouse_db=# CREATE TRIGGER sales_day_trigger

warehouse_db-# BEFORE INSERT ON sales_record

warehouse_db-# FOR EACH ROW

warehouse_db-# EXECUTE PROCEDURE sales_record_insert();

CREATE TRIGGER

Enabling the constraint exclusion

The final step in implementation of partitioning is enabling the constraint exclusion.Constraint exclusion is backed by the CHECK constraints that we have used in our child table’s CREATE syntax. Do remember that if the constraint exclusion is disabled, then our

queries will not use CHECK constraints, and as a result, every query scan will be done on all the child tables and thus will reduce performance; this is why constraint exclusion to is very important when using partitioned tables.

The steps to enable constraint exclusion are as follows:

1. Open the postgresql.conf file that is present in the data directory on your default installation path. In most cases, it is /usr/local/pgsql/data.

2. Set constraint exclusion on with the following row in postgresql.conf:

constraint_exclusion = on

Alternatively, you can set constraint exclusion to on using the following command on

psql:

warehouse_db=# SET constraint_exclusion = on;

Congratulations! Finally, the master table is available for the DML and DDL operations,and all the INSERT, SELECT and DELETE operations go to the child tables by date.

Performing DML operations on a partition table

When a user inserts a row in the master table, our trigger sales_day_trigger will be triggered, and it will call our sales_record_insert() trigger function, and based on sales_date, the insertion will be made to a specific child table. Now, let’s insert a few records into the sales_record table.We will then perform the SELECT statements to verify that records are properly populated in child tables. This can be done in the following manner:

warehouse_db=# INSERT INTO sales_record

(id, sales_amount, sales_date)

VALUES

(1, 500, TO_DATE('02/12/2015','MM/DD/YYYY'));

INSERT 0 0

warehouse_db=# INSERT INTO sales_record

warehouse_db-# (id, sales_amount, sales_date)

warehouse_db-# VALUES

warehouse_db-# (2, 1500, TO_DATE('03/10/2015','MM/DD/YYYY'));

INSERT 0 0

warehouse_db=# INSERT INTO sales_record

warehouse_db-# (id, sales_amount, sales_date)

warehouse_db-# VALUES

warehouse_db-# (3, 2500, TO_DATE('05/15/2015','MM/DD/YYYY'));

INSERT 0 0

warehouse_db=# INSERT INTO sales_record

warehouse_db-# (id, sales_amount, sales_date)

warehouse_db-# VALUES

warehouse_db-# (4, 2000, TO_DATE('07/25/2015','MM/DD/YYYY'));

INSERT 0 0

warehouse_db=# INSERT INTO sales_record

warehouse_db-# (id, sales_amount, sales_date)

warehouse_db-# VALUES

warehouse_db-# (5, 2200, TO_DATE('09/15/2015','MM/DD/YYYY'));

INSERT 0 0

warehouse_db=# INSERT INTO sales_record

warehouse_db-# (id, sales_amount, sales_date)

warehouse_db-# VALUES

warehouse_db-# (6, 1200, TO_DATE('11/15/2015','MM/DD/YYYY'));

INSERT 0 0

We have inserted six records, and we are now going to perform the SELECT queries on our child tables to verify that our child tables get the right data. Let’s do a SELECT operation on the sales_record_m3_to_m4 table in the following manner:

warehouse_db=# SELECT * FROM sales_record_m3_to_m4;

id | sales_amount | sales_date

----+--------------+------------

2 |         1500 | 2015-03-10

(1 row)

Let’s do a select operation on the sales_record_m9_to_m10 table in the following

manner:

warehouse_db=# SELECT * FROM sales_record_m9_to_m10;

id | sales_amount | sales_date

----+--------------+------------

5 |         2200 | 2015-09-15

(1 row)

As you can see in the preceding result, the sales_record_m9_to_m10 child table contains the record that has sales_date for September and October.

A SELECT operation on the parent/master table will grab data from all child tables. This can be seen using the following statement:

warehouse_db=# SELECT * FROM sales_record;

id | sales_amount | sales_date

----+--------------+------------

1 |          500 | 2015-02-12

2 |         1500 | 2015-03-10

3 |         2500 | 2015-05-15

4 |         2000 | 2015-07-25

5 |         2200 | 2015-09-15

6 |         1200 | 2015-11-15

(6 rows)

Handling the UPDATE and DELETE statements on a partition table

We don’t need any UPDATE or DELETE triggers, the INSERT trigger is self-sufficient to handle the UPDATE or DELETE statement as well. First, let’s perform an UPDATE action and verify the impact on the child table in the following manner:

warehouse_db=# UPDATE sales_record SET sales_date='2015-9-13' WHERE id = 5;

UPDATE 1

The preceding query will update the record of the date 2015-9-15 with id = 5.

Let’s perform a SELECT operation to verify the update of the record in the following

manner:

warehouse_db=# SELECT * FROM sales_record_m9_to_m10;

id | sales_amount | sales_date

----+--------------+------------

5 |         2200 | 2015-09-13

(1 row)

We can now verify that the record is updated and sales_date is changed to a new date,that is, 2015-09-13.

An update on a table that can cause row movement between children requires an UPDATE trigger.

Now, let’s perform a simple DELETE operation and verify the impact on partitioned tables:

warehouse_db=# DELETE FROM sales_record where sales_date = '2015-9-13';

DELETE 1

warehouse_db=# SELECT * FROM sales_record_m9_to_m10;

id | sales_amount | sales_date

----+--------------+------------

(0 rows)

Since the SELECT statement did not return any record, this confirms that we have successfully deleted the record with sales_date = '2015-9-13'.

List partition

List partition is very much similar to range partition. The table is partitioned by explicitly listing which key values appear in each partition. In list partition, each partition is defined and designated based on a column value in one set of value lists, instead of one set of adjoining ranges of values. This will be done by defining each partition by means of the values IN (value_list) syntax, where value_list is a comma-separated list of values.

We will create a master table that will have a sales record along with the city information. The list partition will use the city column as a base to create the child partitions. Let’s create a master table first in the following manner:

warehouse_db=# CREATE TABLE sales_record_listpart

warehouse_db-# (

warehouse_db(# id NUMERIC primary key,

warehouse_db(# sales_date date,

warehouse_db(# sales_amount NUMERIC,

warehouse_db(# city text

warehouse_db(# );

CREATE TABLE

Now, let’s create the child tables on the basis of the city list.

Create the sales_record_list1 table in the following manner:

warehouse_db=# CREATE TABLE sales_record_list1

warehouse_db-# (

warehouse_db(# PRIMARY KEY (id, city),

warehouse_db(# CHECK (city IN ('new york', 'sydney'))

warehouse_db(# )

warehouse_db-# INHERITS (sales_record_listpart);

CREATE TABLE

Now, create the sales_record_list2 table in the following manner:

warehouse_db=# CREATE TABLE sales_record_list2

warehouse_db-# (

warehouse_db(# PRIMARY KEY (id, city),

warehouse_db(# CHECK (city IN ('Islamabad', 'Boston', 'London'))

warehouse_db(# )

warehouse_db-# INHERITS (sales_record_listpart);

CREATE TABLE

Let’s create the index for the sales_record_list1 table:

warehouse_db=# CREATE INDEX list1_index ON sales_record_list1(city);

CREATE INDEX

Let’s create the index for the sales_record_list2 table:

warehouse_db=# CREATE INDEX list2_index ON sales_record_list2 (city);

CREATE INDEX

Now, create the trigger function in the following manner:

warehouse_db=# CREATE OR REPLACE FUNCTION sales_record_list_insert()

warehouse_db-# RETURNS TRIGGER AS $$

warehouse_db$# BEGIN

warehouse_db$# IF (NEW.city IN ('new york', 'sydney')) THEN

warehouse_db$# INSERT INTO sales_record_list1 VALUES (NEW.*);

warehouse_db$# ELSEIF (NEW.city IN ('Islamabad', 'Boston', 'London')) THEN

warehouse_db$# INSERT INTO sales_record_list2 VALUES (NEW.*);

warehouse_db$# ELSE

warehouse_db$# RAISE EXCEPTION 'CITY not present in this lists';

warehouse_db$# END IF;

warehouse_db$# RETURN NULL;

warehouse_db$# END;

warehouse_db$# $$LANGUAGE plpgsql;

CREATE FUNCTION

In the end, we need to create the supporting trigger in the following manner:

warehouse_db=# CREATE TRIGGER sales_day_trigger

warehouse_db-# BEFORE INSERT ON sales_record_listpart

warehouse_db-# FOR EACH ROW

warehouse_db-# EXECUTE PROCEDURE sales_record_list_insert();

CREATE TRIGGER

Verifing that the partition is linked with the master table using the following

command:

warehouse_db=# d+ sales_record_listpart

                    Table "public.sales_record_listpart"

    Column    | Type   | Modifiers | Storage | Stats target | Description

--------------+---------+-----------+----------+--------------+-------------

id           | numeric | not null | main     |             |

sales_date   | date    |           | plain    |              |

sales_amount | numeric |           | main     |              |

city         | text    |           | extended |              |

Indexes:

    "sales_record_listpart_pkey" PRIMARY KEY, btree (id)

Triggers:

    sales_day_trigger BEFORE INSERT ON sales_record_listpart FOR EACH ROW EXECUTE PROCEDURE sales_record_list_insert()

Child tables: sales_record_list1,

           sales_record_list2

Now, let’s do some quick inserts and verify that our list partition is also working how we expect it do so:

warehouse_db=# INSERT INTO sales_record_listpart

warehouse_db-# (id, sales_date, sales_amount, city)

warehouse_db-# VALUES

warehouse_db-# (1,'15-APR-2008',1200,'sydney');

INSERT 0 0

warehouse_db=# INSERT INTO sales_record_listpart

warehouse_db-# (id, sales_date, sales_amount, city)

warehouse_db-# VALUES

warehouse_db-# (2,'15-APR-2008',1500,'Boston');

INSERT 0 0

warehouse_db=# INSERT INTO sales_record_listpart

warehouse_db-# (id, sales_date, sales_amount, city)

warehouse_db-# VALUES

warehouse_db-# (3,'16-APR-2008',1800,'Islamabad');

INSERT 0 0

warehouse_db=# INSERT INTO sales_record_listpart

warehouse_db-# (id, sales_date, sales_amount, city)

warehouse_db-# VALUES

warehouse_db-# (4,'20-APR-2008',1300,'new york');

INSERT 0 0

When you perform the preceding INSERT statements, you will observe that the INSERT query returns the INSERT 0 0 message; this is because the record is inserted in the child tables instead of the master tables.

Perform SELECT on salse_record_list1 to verify that the record is inserted as expected in the following manner:

warehouse_db=# SELECT * FROM sales_record_list1;

id | sales_date | sales_amount |   city

----+------------+--------------+----------

1 | 2008-04-15 |         1200 | sydney

4 | 2008-04-20 |         1300 | new york

(2 rows)

Perform SELECT on sales_record_list2 to verify that the record is inserted as expected in the following manner:

warehouse_db=# SELECT * FROM sales_record_list2;

id | sales_date | sales_amount |   city

----+------------+--------------+-----------

2 | 2008-04-15 |         1500 | Boston

3 | 2008-04-16 |         1800 | Islamabad

(2 rows)

Perform SELECT on sales_record_listpart to verify that the record is inserted as expected in the following manner:

warehouse_db=# SELECT * FROM sales_record_listpart;

id | sales_date | sales_amount |   city

----+------------+--------------+-----------

1 | 2008-04-15 |         1200 | sydney

4 | 2008-04-20 |         1300 | new york

2 | 2008-04-15 |         1500 | Boston

3 | 2008-04-16 |         1800 | Islamabad

(4 rows)

查看全文

相关阅读:
Spark IMF传奇行动第19课：spark排序总结
 Spark IMF传奇行动第18课：RDD持久化、广播、累加器总结
 Spark IMF传奇行动第17课Transformations实战总结
 Spark IMF传奇行动第16课RDD实战总结
 Spark3000门徒第15课RDD创建内幕彻底解密总结
 Spark3000门徒第14课spark RDD解密总结
 Spark3000门徒第13课Spark内核架构解密总结
 Spark3000门徒第12课Spark HA实战总结
 Spark3000门徒第11课彻底解密WordCount运行原理总结
 Spark3000门徒第10课Java开发Spark实战总结

原文地址：https://www.cnblogs.com/songyuejie/p/5091293.html