-->

Wednesday, August 3, 2016

pgBucket beta2 is ready

Hi Everyone,

I would like to inform to you all that, pgBucket beta2[Simple concurrent job scheduler for postgresql] version is ready with more stability.

Thank you all in advance for your inputs/comments/suggestions.

--Dinesh

Monday, July 4, 2016

pgBucket beta version is ready

Hi Everyone,

I would like to inform to you all that, pgBucket[Simple concurrent job scheduler for postgresql] beta version is ready with enhanced architecture and new features.

It would be more great if you could share your inputs and suggestions on this, which will help me to make this tool as stable.

Thank you all in advance.

--Dinesh

Friday, June 3, 2016

pgBucket - A new concurrent job scheduler

Hi All,

I'm so excited to announce about my first contribution tool for postgresql. I have been working with PostgreSQL from 2011 and I'm really impressed with such a nice database.

I started few projects in last 2 years like pgHawk[A beautiful report generator for Openwatch] , pgOwlt [CUI monitoring. It is still under development, incase you are interested to see what it is, attaching the image here for you ],


pgBucket [Which I'm gonna talk about] and learned a lot and lot about PostgreSQL/Linux internals.

Using pgBucket we can schedule jobs easily and we can also maintain them using it's CLI options. We can update/insert/delete jobs at online. And here is its architecture which gives you a basic idea about how it works.


Yeah, I know there are other good job schedulers available for PostgreSQL. I haven't tested them and not comparing them with this, as I implemented it in my way.

Features are:
  • OS/DB jobs
  • Cron style sytax
  • Online job modifications
  • Required cli options

Dependencies:
  • C++11
Here is the link for the source/build instructions, which hopefully helpful for you.

Let me know your inputs/suggestions/comments, which will help me to improve this tool.

Thanks as always.

--Dinesh Kumar



Thursday, July 9, 2015

Parallel Operations With pl/pgSQL


Hi,

I am pretty sure that, there will be a right heading for this post. For now, i am going with this. If you could suggest me proper heading, i will update it :-)

OK. let me explain the situation. Then will let you know what i am trying to do here, and how i did it.

Situation here is,

We have a table, which we need to run update on “R” no.of records. The update query is using some joins to get the desired result, and do update the table. 
 To process these “R” no.of records, it is taking “H” no.of hours. That too, it’s giving load on the production server. So, we planned to run this UPDATE as batch process. 
Per a batch process, we took “N” no.or records. To process this batch UPDATE, it is taking “S” no.of seconds.
With the above batch process, production server is pretty stable, and doing great. So, we planned to run these Batch updates parallel. 
 I mean, “K” sessions, running different record UPDATEs. Of-course, we can also increase the Batch size here. 
But we want to use much cpus to complete all these UPDATES as soon as possible.
Problem here is, So, as i said, we need to run multiple UPDATEs on multiple records in parallel. But, how can one session is going to communicate with other sessions on this batch records.
I mean, If one session is running updates on 1 to 1000, how could the second session knows that the other session was processing from 1 to 1000.
If the second session knows this information, this will start from 1001 to 2000 in parallel. This is the problem i am trying to solve here.

I am not sure whether this is the optimal solution, but as per my requirement it’s working. :-)  Let me know if you see any problems in it.

Object Definitions
                      Table "public.test"
 Column |  Type   |                     Modifiers
--------+---------+----------------------------------------------------
 t      | text    |
 i      | boolean |
 seq    | bigint  | not null default nextval('test_seq_seq'::regclass)

postgres=# INSERT INTO test VALUES(generate_series(1, 9000), false, generate_series(1, 9000));
INSERT 0 9000

postgres=# \ds testing
           List of relations
 Schema |  Name   |   Type   |  Owner
--------+---------+----------+----------
 public | testing | sequence | postgres
(1 row)


CREATE OR REPLACE FUNCTION public.update_test_parallel(batch integer)
 RETURNS void
 LANGUAGE plpgsql
AS $function$
DECLARE
VAR BIGINT;
DUMMY TEXT;
BEGIN

-- Adding this for Demo
--

SELECT pg_sleep(10) INTO DUMMY;

SELECT pg_advisory_lock(-1234) INTO DUMMY;

        SELECT nextval('testing') INTO VAR;
        EXECUTE E'SELECT nextval(\'testing\') FROM generate_series('||VAR||','||VAR+BATCH||')';

        -- We need to decrease the sequence value by one, since we executed nextval expression once
        -- Otherwise, it will affect the other session''s  execution.
        --
        SELECT setval('testing', currval('testing')-1) INTO DUMMY;

SELECT pg_advisory_unlock(-1234) INTO DUMMY;

        -- I want to update the test table of the column "I" with value "true".
        --
UPDATE test SET I=true WHERE SEQ BETWEEN VAR AND (VAR+BATCH);


RAISE NOTICE 'VAR IS %, VAR+BATCH IS %', VAR, (VAR+BATCH);
RAISE NOTICE 'CURRENT SEQ VALUE IS %', currval('testing');

EXCEPTION WHEN OTHERS THEN

        -- If there is an exception, we need to reset the sequence to it''s start position again.
        -- So that, the other sessions, will try with the same sequence numbers.
        --
        SELECT setval('testing', VAR-1) INTO DUMMY;
        SELECT pg_advisory_unlock(-1234) INTO DUMMY;
        RAISE EXCEPTION '%', SQLERRM;
END;
$function$;
Session 1
 
postgres=# SELECT public.update_test_parallel(3000);
NOTICE:  VAR IS 1, VAR+BATCH IS 3001
NOTICE:  CURRENT SEQ VALUE IS 3000
update_test_parallel
----------------------

(1 row)
Session 2
 
postgres=# SELECT public.update_test_parallel(3000);
NOTICE:  VAR IS 3001, VAR+BATCH IS 6001
NOTICE:  CURRENT SEQ VALUE IS 6000
update_test_parallel
----------------------

(1 row)
Session 3
 
postgres=# SELECT public.update_test_parallel(3000);
NOTICE:  VAR IS 6001, VAR+BATCH IS 9001
NOTICE:  CURRENT SEQ VALUE IS 9000
update_test_parallel
----------------------

(1 row)
Desired result
 
postgres=# SELECT COUNT(*) FROM test WHERE i is true;
count
-------
  9000
(1 row)

In the above implementation, i took "sequence" for the session's parallel execution with the help of advisory locks. Hope this helps to others as well.

Thanks as always for reading it, and welcome your inputs.
 --Dinesh Kumar

Tuesday, December 30, 2014

Heterogeneous Database Sync

Hi

As a part of ORACLE to PostgreSQL Migration, I come across to implement a trigger on Oracle, which sync it's data to PostgreSQL. I have tried with
a simple table as below, which is hopefully helpful to others.

Find this link to configure the heterogeneous dblink to postgres.

I believe, the below approach works effectively with the Primary Key tables of Oracle Database.
If we don't have primary key in a table, then the UPDATE,DELETE statements going to fire multiple times in Postgres, which leads performance issues.

ORACLE
CREATE TABLE test(t INT PRIMARY KEY);

CREATE OR REPLACE TRIGGER testref AFTER INSERT OR UPDATE OR DELETE ON test
FOR EACH ROW
DECLARE
 PRAGMA AUTONOMOUS_TRANSACTION;
 C number;
 N number;
BEGIN
 c:=DBMS_HS_PASSTHROUGH.OPEN_CURSOR@pglink;
 IF INSERTING THEN
  DBMS_HS_PASSTHROUGH.PARSE@pglink(c, 'INSERT INTO test VALUES('||:NEW.t||');');
  n:=DBMS_HS_PASSTHROUGH.EXECUTE_NON_QUERY@pglink(c);
 ELSIF DELETING THEN
  DBMS_HS_PASSTHROUGH.PARSE@pglink(c, 'DELETE FROM test WHERE t='||:OLD.t);
  n:=DBMS_HS_PASSTHROUGH.EXECUTE_NON_QUERY@pglink(c);
 ELSIF UPDATING THEN
  DBMS_HS_PASSTHROUGH.PARSE@pglink(c, 'UPDATE test SET t='||:NEW.T||' WHERE t='||:OLD.T);
  n:=DBMS_HS_PASSTHROUGH.EXECUTE_NON_QUERY@pglink(c);
 END IF;
 DBMS_HS_PASSTHROUGH.CLOSE_CURSOR@pglink(c);
COMMIT;
END;
/
SQL> INSERT INTO test VALUES(1);
SQL> INSERT INTO test VALUES(2);
SQL> INSERT INTO test VALUES(3);


SQL> DELETE FROM test WHERE t=2;

SQL> UPDATE test SET t=10 WHERE t=1;
SQL> COMMIT;
Commit complete.

PostgreSQL

CREATE TABLE test(t INT PRIMARY KEY);
postgres=# SELECT * FROM test;
t
----
3
10
(2 rows)

NOTE
Oracle's DG4ODBC don't support the 2 phase commit with PostgreSQL. In the above case, if we issue ROLLBACK in oracle, the data in postgresql will remain, which will lead to inconsistency. To fix this, i believe we have to use Asynchronous data copy by scheduling a job in Oracle as below.

CREATE SEQUENCE test_seq ORDER;
CREATE TABLE test_backup(SEQ INT, EVT VARCHAR(2), t INT, PRIMARY KEY(SEQ, EVT, t));
-- Trigger, which records the test table's information into test_backup.
CREATE OR REPLACE TRIGGER testref AFTER INSERT OR UPDATE OR DELETE ON test
FOR EACH ROW
DECLARE
BEGIN
IF INSERTING THEN
  INSERT INTO test_backup VALUES(test_seq.nextval, 'I', :NEW.t);
ELSIF DELETING THEN
  INSERT INTO test_backup VALUES(test_seq.nextval, 'D', :OLD.t);
ELSIF UPDATING THEN
    INSERT INTO test_backup VALUES(test_seq.nextval, 'UD', :OLD.t);
  INSERT INTO test_backup VALUES(test_seq.nextval, 'UI', :NEW.t);
END IF;
END;
/

-- Procedure, which sync the test_backup table to postgres test table.
create or replace PROCEDURE SYNCORATOPG
IS
REC TEST_BACKUP%ROWTYPE;
C NUMBER;
N NUMBER;
PRAGMA AUTONOMOUS_TRANSACTION;
TYPE SYNCID IS TABLE OF INTEGER;
SYNCNUM SYNCID;
CNT INT:=0;
BEGIN
  
    BEGIN
        SYNCNUM:=SYNCID();
        FOR REC IN (SELECT * FROM test_backup ORDER BY SEQ ASC)
        LOOP
        SYNCNUM.EXTEND;
        CNT:=CNT+1;
        SYNCNUM(CNT):=REC.SEQ;
        c:=DBMS_HS_PASSTHROUGH.OPEN_CURSOR@TOPG;
        IF REC.EVT = 'I' OR REC.EVT='UI' THEN
           DBMS_HS_PASSTHROUGH.PARSE@TOPG(c, 'INSERT INTO test VALUES('||REC.t||');');
        ELSIF REC.EVT = 'D' OR REC.EVT='UD' THEN
           DBMS_HS_PASSTHROUGH.PARSE@TOPG(c, 'DELETE FROM test WHERE t='||REC.T);
        END IF;
      
         n:=DBMS_HS_PASSTHROUGH.EXECUTE_NON_QUERY@TOPG(c);
         DBMS_HS_PASSTHROUGH.CLOSE_CURSOR@TOPG(c);
        END LOOP;
        
        COMMIT;
    END;
    
    BEGIN
      FOR i IN 1..SYNCNUM.COUNT
      LOOP
        DELETE FROM test_backup WHERE SEQ=SYNCNUM(i);
        SYNCNUM.DELETE(i);
      END LOOP;
       COMMIT;
    END;
  EXCEPTION
    WHEN OTHERS THEN
      RAISE_APPLICATION_ERROR(-20000, 'Fatal error.'||SQLERRM);
END;

--Scheduling a JOB in Oracle, which call the above SYNC procedure for every 1 minute.
BEGIN
DBMS_JOB.isubmit (
job => 113,
what => 'BEGIN SYNCORATOPG; END;',
next_date => SYSDATE,
interval => 'SYSDATE + 1/(24*60) /* 1 Minute */');

COMMIT;
END;
/
-- Inserting Sample Data In Oracle
INSERT INTO test VALUES(-1);
INSERT INTO test VALUES(-2);
INSERT INTO test VALUES(-3);

UPDATE test SET t=-20 WHERE t=-2;
DELETE FROM test WHERE t=-1;
INSERT INTO test VALUES(-2);
COMMIT;

-- PostgreSQL
postgres=# SELECT * FROM test;
 t
----
  -3
 -20
  -2
(3 rows)

As usual, welcome your inputs.

 --Dinesh Kumar