Friday, 26 May 2017

PostgreSQL Partitioning Concepts


Partitioning refers to splitting the large table into smaller physical pieces or Chunks. PostgreSQL supports basic table partitioning. Partitioning can provide several benefits: 


  • Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table.


  • Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE NO INHERIT and DROP TABLE are both far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.


  • Seldom-used data can be migrated to cheaper and slower storage media.

  • Currently, PostgreSQL supports partitioning via table inheritance. Each partition must be created as a child table of a single parent table. The parent table itself is normally empty; it exists just to represent the entire data set. In PostgreSQL, a table can be partitioned in 5 steps 


  • Create master table


  • Create child tables without overlapping table constraints


  • Create indexes


  • Create trigger function to inserting data to child tables


  • Enable constraint exclusion

  • The following forms of partitioning can be implemented in PostgreSQL 


  • Range Partitioning


  • List Partitioning


  • Range Partitioning

  • The table is partitioned into "ranges" defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example one might partition by date ranges, or by ranges of identifiers for particular business objects. 
    Range partitioning can be done for example by ID ranges (like 0-10000, 10001-20000, 20001-30000 etc.) or Date ranges (like 2009-11-01 - 2009-11-30, 2009-12-01 - 2009-12-31 etc.) . 

    In PostgreSQL, a table can be partitioned in 5 steps. 


  • Create master table


  • Create child tables without overlapping table constraints


  • Create indexes


  • Create trigger function to inserting data to child tables


  • Enable constraint exclusion

  • Below example describes how to create range partition in PostgreSQL. 


  • Create a Master table to Store the Transaction Trace Log details

    1. CREATE TABLE log_details    
    2. (    
    3. log_id numeric(20),    
    4. log_module varchar(30),    
    5. log_time timestamp without time zone,    
    6. log_err_code varchar(100),    
    7. log_message text,    
    8. PRIMARY KEY(log_id)    
    9. );  


  • Create child tables, which inherits the master table and adds checks for dates, because we want ensure that we have only right data on each partition.The partition schema to be defined based on the requirement. Here I have created a partion with monthy partition scheme.

    1. CREATE TABLE log_details_y2017_m1  
    2. (  
    3. -- Inherited from table log_details:  log_id number(20),  
    4. -- Inherited from table log_details:  log_module varchar(30),  
    5. -- Inherited from table log_details:  log_time timestamp without timezone,  
    6. -- Inherited from table log_details:  log_err_code varchar(100),  
    7. -- Inherited from table log_details:  log_message text,  
    8.   CONSTRAINT log_details_y2017_m1_pk PRIMARY KEY (log_id),  
    9.   CONSTRAINT log_details_y2017_m1_log_time_check CHECK (log_time >= '2017-01-01 00:00:00'::timestamp without time zone AND log_time < '2017-02-01 00:00:00'::timestamp without time zone)  
    10. )  
    11. INHERITS (log_details);  
    12.   
    13. CREATE TABLE log_details_y2017_m2  
    14. (  
    15. -- Inherited from table log_details:  log_id number(20),  
    16. -- Inherited from table log_details:  log_module varchar(30),  
    17. -- Inherited from table log_details:  log_time timestamp without timezone,  
    18. -- Inherited from table log_details:  log_err_code varchar(100),  
    19. -- Inherited from table log_details:  log_message text,  
    20.   CONSTRAINT log_details_y2017_m1_pk PRIMARY KEY (log_id),  
    21.   CONSTRAINT log_details_y2017_m1_log_time_check CHECK (log_time >= '2017-02-01 00:00:00'::timestamp without time zone AND log_time < '2017-03-01 00:00:00'::timestamp without time zone)  
    22. )  
    23. INHERITS (log_details);  
    24.   
    25. CREATE TABLE log_details_y2017_m3  
    26. (  
    27. -- Inherited from table log_details:  log_id number(20),  
    28. -- Inherited from table log_details:  log_module varchar(30),  
    29. -- Inherited from table log_details:  log_time timestamp without timezone,  
    30. -- Inherited from table log_details:  log_err_code varchar(100),  
    31. -- Inherited from table log_details:  log_message text,  
    32.   CONSTRAINT log_details_y2017_m1_pk PRIMARY KEY (log_id),  
    33.   CONSTRAINT log_details_y2017_m1_log_time_check CHECK (log_time >= '2017-03-01 00:00:00'::timestamp without time zone AND log_time < '2017-04-01 00:00:00'::timestamp without time zone)  
    34. )  
    35. INHERITS (log_details);  


  • Create indexes to child tables to speed up the query on the log_time field.

    1. CREATE INDEX log_details_y2017_m1  
    2.   ON log_details_y2017_m1  
    3.   USING btree  
    4.   (log_time);  
    5.   
    6. CREATE INDEX log_details_y2017_m2  
    7.   ON log_details_y2017_m2  
    8.   USING btree  
    9.   (log_time);  
    10.   
    11. CREATE INDEX log_details_y2017_m3  
    12.   ON log_details_y2017_m3  
    13.   USING btree  
    14.   (log_time);  


  • Create trigger function to master table. Conditions must be exactly the same as what the child tables checks.

    1. CREATE OR REPLACE FUNCTION log_details_partition_fnc()  
    2.   RETURNS trigger AS  
    3. $BODY$  
    4. DECLARE  
    5.     _tablename text;  
    6.     _startdate text;  
    7.     _month text;  
    8.     _year text;  
    9.     text_var1 text:=NULL;  
    10.     text_var2 text:=NULL;  
    11.     text_var3 text:=NULL;  
    12.     createTableQuery text:=NULL;  
    13. BEGIN  
    14.     _startdate := cast(date_trunc('month', NEW.log_time) as date);  
    15.     _month := date_part('month', NEW.log_time)::TEXT;  
    16.     _year := date_part('year', NEW.log_time)::TEXT;  
    17.     _tablename := 'log_details'||'_y'||_year||'_m'||_month;  
    18.   
    19.     --RAISE NOTICE '_tablename is %',_tablename ;  
    20.     -- Check if the partition needed for the current record exists  
    21.     PERFORM 1  
    22.     FROM   pg_catalog.pg_class c  
    23.     JOIN   pg_catalog.pg_namespace n ON n.oid = c.relnamespace  
    24.     WHERE  c.relkind = 'r'  
    25.     AND    c.relname = LOWER(_tablename)  
    26.     AND    n.nspname = 'public';  
    27.     IF NOT FOUND   
    28.     THEN  
    29.         BEGIN     
    30.             RAISE NOTICE 'inside IF';  
    31.       
    32.             EXECUTE 'INSERT INTO log_details_default VALUES ($1.*)' USING NEW;  
    33.   
    34.         EXCEPTION   
    35.             WHEN OTHERS THEN  
    36.                 RAISE NOTICE 'Inside exception';  
    37.                 GET STACKED DIAGNOSTICS text_var1 = MESSAGE_TEXT,  
    38.                     text_var2 = PG_EXCEPTION_DETAIL,  
    39.                     text_var3 = PG_EXCEPTION_HINT;  
    40.                 RAISE NOTICE 'Exception Details MESSAGE_TEXT = % PG_EXCEPTION_DETAIL= % PG_EXCEPTION_HINT= %',text_var1,text_var2,text_var3;  
    41.         END;  
    42.     ELSE  
    43.         BEGIN     
    44.             -- Insert the current record into the correct partition, which we are sure will now exist.  
    45.             EXECUTE 'INSERT INTO ' || _tablename || ' VALUES ($1.*)' USING NEW;  
    46.         END;  
    47.     END IF;  
    48. RETURN NULL;  
    49. END;  
    50. $BODY$  
    51.   LANGUAGE plpgsql VOLATILE  
    52.   COST 100;  


  • Define the Trigger

    1. CREATE TRIGGER log_details_trigger  
    2.   BEFORE INSERT  
    3.   ON log_details  
    4.   FOR EACH ROW  
    5.   EXECUTE PROCEDURE log_details_partition_fnc();  


  • Define the Trigger

    1. CREATE TRIGGER log_details_trigger  
    2.   BEFORE INSERT  
    3.   ON log_details  
    4.   FOR EACH ROW  
    5.   EXECUTE PROCEDURE log_details_partition_fnc();  


  • Make sure that Constraint Exclusion is enabled. Constraint exclusion is driven by CHECK constraints. If constraint exclusion is disabled then query is not using check constraints and every query scans thru whole all child tables. So constraint exclusion is very important when using partitioned tables. This can be set in the postgresql.conf file located in the cluster data directory.

    1. constraint_exclusion = on 

    After this setting is done, reload the configuration by executing the following command.


    1. pg_ctl reload   

    Now if data is inserted, deleted or updated, it is applicable only on the right partition. No data is stored in the master table and all the data is stored in the corresponding child table(partition). 


  • List Partitioning

  • In list partitioning, the table is partitioned by explicitly listing which key values appear in each partition. Example is given below. 

    1. --Master Table  
    2. CREATE TABLE transfer_log (  
    3.     log_id  
    4.     log_time TIMESTAMP WITHOUT TIME ZONE DEFAULT NOW(),  
    5.     country_code char(2),  
    6.     data text,  
    7.     primary key(log_id)  
    8. );  
    9. --Create Index  
    10. CREATE INDEX idx_transfer_log ON action_log USING btree (country_code);  
    11. On next step, create child tables and trigger:  
    12.   
    13. --- Child tables  
    14. CREATE TABLE transfer_log_usa ( CHECK ( country_code = 'USA') ) INHERITS (transfer_log);  
    15. CREATE TABLE transfer_uk ( CHECK ( country_code = 'UK' ) ) INHERITS (transfer_log);  
    16.   
    17. CREATE INDEX idx_transfer_log_usa ON transfer_log_usa USING btree (country_code);  
    18. CREATE INDEX idx_transfer_log_uk ON transfer_log_uk USING btree (country_code);  
    19.   
    20. --- Trigger function  
    21. CREATE OR REPLACE FUNCTION transfer_log_insert_fnc() RETURNS TRIGGER AS $$  
    22. BEGIN  
    23.     IF ( NEW.country_code = 'USA' ) THEN  
    24.         INSERT INTO transfer_log_usa VALUES (NEW.*);  
    25.     ELSIF ( NEW.country_code = 'UK' ) THEN  
    26.         INSERT INTO transfer_log_uk VALUES (NEW.*);  
    27.     ELSE  
    28.         RAISE EXCEPTION 'Invalid Country code';  
    29.     END IF;  
    30.   
    31.     RETURN NULL;  
    32. END;  
    33. $$ LANGUAGE plpgsql;  
    34.   
    35. --- Attach trigger function to table  
    36. CREATE TRIGGER transfer_log_insert_trg  
    37.     BEFORE INSERT ON transfer_log  
    38.     FOR EACH ROW EXECUTE PROCEDURE transfer_log_insert_fnc();  

    At this point, you can insert some data into transfer_log table. You will see what your data is automaticaly redirected to correponding child table and this is totally transparent for user. 






    No comments:

    Post a Comment