Skip to main content

Seq scan vs Index Scan

Hi,

Today, let me discuss some thing about PG Optimizer while taking a better execution plan.

Seq Scan (vs) Index Scan
----------------------------------

Both scans having its own pros and cons. Sometimes, seq scan will give you the better execution time and sometimes Index scan.

Let's find out..

Step 1:-

postgres=# CREATE TABLE TEST(T INT PRIMARY KEY);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test"
CREATE TABLE

postgres=# INSERT INTO TEST VALUES(GENERATE_SERIES(1,100));
INSERT 0 100

Step 2:-

Check the no.of pages occupied by the table "test" from pg_class.

postgres=# ANALYZE TEST;
ANALYZE
postgres=# SELECT RELNAME,RELPAGES FROM PG_CLASS WHERE relname like 'test';
relname | relpages
---------+----------
test | 1


Step 3:-

Find the below execution plans.

postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=10;
QUERY PLAN
----------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..2.25 rows=1 width=4) (actual time=0.032..0.063 rows=1 loops=1)
Filter: (t = 10)
Total runtime: 0.110 ms

postgres=# show seq_page_cost ;
seq_page_cost
---------------
1
(1 row)

postgres=# show random_page_cost ;
random_page_cost
------------------
4
(1 row)

Let's make it to take Index scan.

postgres=# set seq_page_cost to 4;
SET
postgres=# set random_page_cost to 1;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Index Scan using test_pkey on test (cost=0.00..2.27 rows=1 width=4) (actual time=0.049..0.053 rows=1 loops=1)
Index Cond: (t = 10)
Total runtime: 0.113 ms

In this case, we need index_scan as well heap_scan(Fetching data from the pages). That's why we have a bit overhead. However, in sequential scan, we don't have any overhead.

So, depends on no. Of pages occupied by the table and the page costs,the optimizer will take the better execution plans.

Step 4:

Let's increase no.of pages for the table (by inserting some data).

postgres=# INSERT INTO TEST VALUES(generate_series(1,1000));
INSERT 0 1000
postgres=# ANALYZE TEST;
ANALYZE

postgres=# ANALYZE TEST;
ANALYZE
postgres=# SELECT RELNAME,RELPAGES FROM PG_CLASS WHERE relname like 'test';
relname | relpages
---------+----------
test | 4

postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Index Scan using test_pkey on test (cost=0.00..8.27 rows=1 width=4) (actual time=0.019..0.023 rows=1 loops=1)
Index Cond: (t = 10)
Total runtime: 0.078 ms

postgres=# show random_page_cost ;
random_page_cost
------------------
4
(1 row)

postgres=# show seq_page_cost ;
seq_page_cost
---------------
1
(1 row)

Let's make this to use sequentail scan.

postgres=# set random_page_cost to 10;
SET
postgres=# set seq_page_cost to 1;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=10;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..16.50 rows=1 width=4) (actual time=0.022..0.301 rows=1 loops=1)
Filter: (t = 10)
Total runtime: 0.337 ms
(3 rows)


Let's find out another way of sequential scan and index scans.

Step 1

> drop table test;

> CREATE TABLE TEST(T INT);

> INSERT INTO TEST SELECT 2 FROM GENERATE_SERIES(1,1);

> INSERT INTO TEST SELECT 1 FROM GENERATE_SERIES(1,10000);

> CREATE INDEX IDX ON TEST(T);

Step 2

Let's find out the most_comm_frequencies from pg_stats.

postgres=# SELECT * FROM PG_STATS WHERE TABLENAME='test' and schemaname='public';

postgres=# SELECT * FROM PG_STATS WHERE TABLENAME='test' and schemaname='public';
-[ RECORD 1 ]-----+-----------
schemaname | public
tablename | test
attname | t
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 2
most_common_vals | {1}
most_common_freqs | {0.999833}
histogram_bounds |
correlation | 0.999001

Step 3

postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..181.64 rows=11009 width=4) (actual time=0.215..94.610 rows=10000 loops=1)
Filter: (t = 1)
Total runtime: 180.054 ms

Here Seq Scan, is the best plan. Because, 99% of the table entries having "1".

Let's make it to use index scan which leads more over head.

postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Scan using idx on test (cost=0.00..17393.65 rows=11009 width=4) (actual time=0.210..264.606 rows=10000 loops=1)
Index Cond: (t = 1)
Total runtime: 388.713 ms

See how much difference is here when compare to Seq Scan. Almost we spend 3 times of actual execution.


Step 4

postgres=# set seq_page_cost to 1;
SET
postgres=# set random_page_cost to 4;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=2;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Index Scan using idx on test (cost=0.00..8.29 rows=2 width=4) (actual time=1.567..1.572 rows=1 loops=1)
Index Cond: (t = 2)
Total runtime: 1.853 ms
(3 rows)


In this case, this plan is good. Because, we have only one entry in the table with the value 2.

Let's make this to use Seq Scan, which leads to take more time.

postgres=# set random_page_cost to 400; ("Made this to big value for picking the seq scan, instead of index scan).
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM TEST WHERE T=2;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..181.64 rows=2 width=4) (actual time=0.447..32.853 rows=1 loops=1)
Filter: (t = 2)
Total runtime: 32.995 ms

See the difference, we spend nearly 32 times of prior execution.

So, as per out test cases Optimizer is always including "seq_page_cost","random_page_cost","no.of pages" and "most_ common_ values'" for making better a execution plan for the given query.

Will keep on posting on Optimizer ....

-Dinesh

Comments

  1. his brilliant site really has the entirety of the data I needed concerning this subject and didn't have a clue who to inquire.
    best interiors

    ReplyDelete

Post a Comment

Popular posts from this blog

Parallel Operations With pl/pgSQL

Hi, I am pretty sure that, there will be a right heading for this post. For now, i am going with this. If you could suggest me proper heading, i will update it :-) OK. let me explain the situation. Then will let you know what i am trying to do here, and how i did it. Situation here is, We have a table, which we need to run update on “R” no.of records. The update query is using some joins to get the desired result, and do update the table.  To process these “R” no.of records, it is taking “H” no.of hours. That too, it’s giving load on the production server. So, we planned to run this UPDATE as batch process.  Per a batch process, we took “N” no.or records. To process this batch UPDATE, it is taking “S” no.of seconds. With the above batch process, production server is pretty stable, and doing great. So, we planned to run these Batch updates parallel.  I mean, “K” sessions, running different record UPDATEs. Of-course, we can also increase the Batch size here.  But

How To Send E-Mail From PostgreSQL

Hi , If you want to send E-Mails from PostgreSQL, then use the below Python 3.2 Script as below. I have used ActivePython 3.2 with PostgreSQL 9.1 for sending E-Mails from PostgreSQL. If you want to configure the Python 3.2 with PostgreSQL 9.1 then, please refer the below steps. http://manojadinesh.blogspot.in/2012/06/fatal-python-error-pyinitialize-unable.html Once, your Python 3.2 successful then follow the below steps to send an e-mail. Step 1 ===== postgres=# CREATE OR REPLACE FUNCTION public.send_email(_from Text,_password Text,smtp Text,port INT,receiver text, subject text, send_message text) RETURNS TEXT  LANGUAGE plpython3u AS $function$ import smtplib sender = _from receivers = receiver message = ("From: %s\nTo: %s\nSubject: %s\n\n %s"  % (_from,receiver,subject,send_message)) try:   smtpObj = smtplib.SMTP(smtp,port)   smtpObj.starttls()   smtpObj.login(_from, _password)   smtpObj.sendmail(sender, receivers,message)   print ('Successf

::Pipelined in Oracle as well in PostgreSQL::

Pipelined Table Functions:- [ORACLE] =========================== If you want to return multiple rows to the calling environment, then piplined table functions is prefred. It will increase the dbperformance as well. Ex:- Step 1: ----------- CREATE TABLE EMP(EMPNO INT,ENAME VARCHAR2(10),SAL INT); Step 2: ----------- Insert sample data. Step 3: ----------- Create an object for the row type casting. CREATE OR REPLACE TYPE emp_row AS OBJECT ( empno INT, ename VARCHAR2(20), SAL INT ); Step 4: ----------- Create a Return Type for the pipelined function. CREATE OR REPLACE TYPE emp_table_type AS TABLE OF emp_row; Step 5: ----------- CREATE OR REPLACE FUNCTION emp_pipe_function RETURN emp_table_type PIPELINED IS BEGIN FOR rec in (select * from emp) LOOP PIPE ROW (emp_row(rec.empno,rec.ename,rec.sal)); END LOOP; RETURN; END; Step 6: ---------- SQL> select * from table(emp_pipe_function); EMPNO ENAME SAL ---------- ----