Technical Discussions: May 2017

Partitioning refers to splitting the large table into smaller physical pieces or Chunks. PostgreSQL supports basic table partitioning. Partitioning can provide several benefits:

Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory. When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table.

Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE NO INHERIT and DROP TABLE are both far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.

Seldom-used data can be migrated to cheaper and slower storage media.

Currently, PostgreSQL supports partitioning via table inheritance. Each partition must be created as a child table of a single parent table. The parent table itself is normally empty; it exists just to represent the entire data set. In PostgreSQL, a table can be partitioned in 5 steps

Create master table

Create child tables without overlapping table constraints

Create indexes

Create trigger function to inserting data to child tables

Enable constraint exclusion

The following forms of partitioning can be implemented in PostgreSQL

Range Partitioning

List Partitioning

Range Partitioning

The table is partitioned into "ranges" defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example one might partition by date ranges, or by ranges of identifiers for particular business objects.
Range partitioning can be done for example by ID ranges (like 0-10000, 10001-20000, 20001-30000 etc.) or Date ranges (like 2009-11-01 - 2009-11-30, 2009-12-01 - 2009-12-31 etc.) .

In PostgreSQL, a table can be partitioned in 5 steps.

Create master table

Create child tables without overlapping table constraints

Create indexes

Create trigger function to inserting data to child tables

Enable constraint exclusion

Below example describes how to create range partition in PostgreSQL.

Create a Master table to Store the Transaction Trace Log details

CREATE TABLE log_details
(
log_id numeric(20),
log_module varchar(30),
log_time timestamp without time zone,
log_err_code varchar(100),
log_message text,
PRIMARY KEY(log_id)
);

Create child tables, which inherits the master table and adds checks for dates, because we want ensure that we have only right data on each partition.The partition schema to be defined based on the requirement. Here I have created a partion with monthy partition scheme.

CREATE TABLE log_details_y2017_m1
(
-- Inherited from table log_details: log_id number(20),
-- Inherited from table log_details: log_module varchar(30),
-- Inherited from table log_details: log_time timestamp without timezone,
-- Inherited from table log_details: log_err_code varchar(100),
-- Inherited from table log_details: log_message text,
CONSTRAINT log_details_y2017_m1_pk PRIMARY KEY (log_id),
CONSTRAINT log_details_y2017_m1_log_time_check CHECK (log_time >= '2017-01-01 00:00:00'::timestamp without time zone AND log_time < '2017-02-01 00:00:00'::timestamp without time zone)
)
INHERITS (log_details);
CREATE TABLE log_details_y2017_m2
(
-- Inherited from table log_details: log_id number(20),
-- Inherited from table log_details: log_module varchar(30),
-- Inherited from table log_details: log_time timestamp without timezone,
-- Inherited from table log_details: log_err_code varchar(100),
-- Inherited from table log_details: log_message text,
CONSTRAINT log_details_y2017_m1_pk PRIMARY KEY (log_id),
CONSTRAINT log_details_y2017_m1_log_time_check CHECK (log_time >= '2017-02-01 00:00:00'::timestamp without time zone AND log_time < '2017-03-01 00:00:00'::timestamp without time zone)
)
INHERITS (log_details);
CREATE TABLE log_details_y2017_m3
(
-- Inherited from table log_details: log_id number(20),
-- Inherited from table log_details: log_module varchar(30),
-- Inherited from table log_details: log_time timestamp without timezone,
-- Inherited from table log_details: log_err_code varchar(100),
-- Inherited from table log_details: log_message text,
CONSTRAINT log_details_y2017_m1_pk PRIMARY KEY (log_id),
CONSTRAINT log_details_y2017_m1_log_time_check CHECK (log_time >= '2017-03-01 00:00:00'::timestamp without time zone AND log_time < '2017-04-01 00:00:00'::timestamp without time zone)
)
INHERITS (log_details);

Create indexes to child tables to speed up the query on the log_time field.

CREATE INDEX log_details_y2017_m1
ON log_details_y2017_m1
USING btree
(log_time);
CREATE INDEX log_details_y2017_m2
ON log_details_y2017_m2
USING btree
(log_time);
CREATE INDEX log_details_y2017_m3
ON log_details_y2017_m3
USING btree
(log_time);

Create trigger function to master table. Conditions must be exactly the same as what the child tables checks.

CREATE OR REPLACE FUNCTION log_details_partition_fnc()
RETURNS trigger AS
$BODY$
DECLARE
_tablename text;
_startdate text;
_month text;
_year text;
text_var1 text:=NULL;
text_var2 text:=NULL;
text_var3 text:=NULL;
createTableQuery text:=NULL;
BEGIN
_startdate := cast(date_trunc('month', NEW.log_time) as date);
_month := date_part('month', NEW.log_time)::TEXT;
_year := date_part('year', NEW.log_time)::TEXT;
_tablename := 'log_details'||'_y'||_year||'_m'||_month;
--RAISE NOTICE '_tablename is %',_tablename ;
-- Check if the partition needed for the current record exists
PERFORM 1
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'r'
AND c.relname = LOWER(_tablename)
AND n.nspname = 'public';
IF NOT FOUND
THEN
BEGIN
RAISE NOTICE 'inside IF';
EXECUTE 'INSERT INTO log_details_default VALUES ($1.*)' USING NEW;
EXCEPTION
WHEN OTHERS THEN
RAISE NOTICE 'Inside exception';
GET STACKED DIAGNOSTICS text_var1 = MESSAGE_TEXT,
text_var2 = PG_EXCEPTION_DETAIL,
text_var3 = PG_EXCEPTION_HINT;
RAISE NOTICE 'Exception Details MESSAGE_TEXT = % PG_EXCEPTION_DETAIL= % PG_EXCEPTION_HINT= %',text_var1,text_var2,text_var3;
END;
ELSE
BEGIN
-- Insert the current record into the correct partition, which we are sure will now exist.
EXECUTE 'INSERT INTO ' || _tablename || ' VALUES ($1.*)' USING NEW;
END;
END IF;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;

Define the Trigger

CREATE TRIGGER log_details_trigger
BEFORE INSERT
ON log_details
FOR EACH ROW
EXECUTE PROCEDURE log_details_partition_fnc();

Define the Trigger

CREATE TRIGGER log_details_trigger
BEFORE INSERT
ON log_details
FOR EACH ROW
EXECUTE PROCEDURE log_details_partition_fnc();

Make sure that Constraint Exclusion is enabled. Constraint exclusion is driven by CHECK constraints. If constraint exclusion is disabled then query is not using check constraints and every query scans thru whole all child tables. So constraint exclusion is very important when using partitioned tables. This can be set in the postgresql.conf file located in the cluster data directory.

constraint_exclusion = on

After this setting is done, reload the configuration by executing the following command.

pg_ctl reload

Now if data is inserted, deleted or updated, it is applicable only on the right partition. No data is stored in the master table and all the data is stored in the corresponding child table(partition).

List Partitioning

In list partitioning, the table is partitioned by explicitly listing which key values appear in each partition. Example is given below.

--Master Table
CREATE TABLE transfer_log (
log_id
log_time TIMESTAMP WITHOUT TIME ZONE DEFAULT NOW(),
country_code char(2),
data text,
primary key(log_id)
);
--Create Index
CREATE INDEX idx_transfer_log ON action_log USING btree (country_code);
On next step, create child tables and trigger:
--- Child tables
CREATE TABLE transfer_log_usa ( CHECK ( country_code = 'USA') ) INHERITS (transfer_log);
CREATE TABLE transfer_uk ( CHECK ( country_code = 'UK' ) ) INHERITS (transfer_log);
CREATE INDEX idx_transfer_log_usa ON transfer_log_usa USING btree (country_code);
CREATE INDEX idx_transfer_log_uk ON transfer_log_uk USING btree (country_code);
--- Trigger function
CREATE OR REPLACE FUNCTION transfer_log_insert_fnc() RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.country_code = 'USA' ) THEN
INSERT INTO transfer_log_usa VALUES (NEW.*);
ELSIF ( NEW.country_code = 'UK' ) THEN
INSERT INTO transfer_log_uk VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Invalid Country code';
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
--- Attach trigger function to table
CREATE TRIGGER transfer_log_insert_trg
BEFORE INSERT ON transfer_log
FOR EACH ROW EXECUTE PROCEDURE transfer_log_insert_fnc();

At this point, you can insert some data into transfer_log table. You will see what your data is automaticaly redirected to correponding child table and this is totally transparent for user.

Technical Discussions

Wednesday, 31 May 2017

Unix Tips and Tricks: Script to Generate Random Values

Friday, 26 May 2017

Query to Find the List of Table names and the Record Count

PostgreSQL Partitioning Concepts