PostgreSQL DBA : Indexes in PostgreSQL

Indexes in PostgreSQL

PostgreSQL: Script to find the unused and duplicate index

As I am preparing important scripts for PostgreSQL DBA, so here I am also sharing one more important script to find the unused and duplicate index in PostgreSQL.

The management and maintenance of database index is a day to day exercise for a Database Administrator, and unused index can create a performance issues for the whole database system.

At every insert and update, the data of an index are also changing, and it requires some IO operations. Better to find unused index and drop it.

Sometimes, I found that duplicate indexes on the same table, e.g. same table, same columns, same order of columns and created with a different name. Internally this will also impact to our database performance.

I am sharing two different scripts for finding the unused and duplicate index in PostgreSQL.

Script to find the unused indexes in PostgreSQL:

SELECT

PSUI.indexrelid::regclass AS IndexName

,PSUI.relid::regclass AS TableName

FROM pg_stat_user_indexes AS PSUI

JOIN pg_index AS PI

ON PSUI.IndexRelid = PI.IndexRelid

WHERE PSUI.idx_scan = 0

AND PI.indisunique IS FALSE;

Script to find the duplicate indexes in PostgreSQL:

SELECT

indrelid::regclass AS TableName

,array_agg(indexrelid::regclass) AS Indexes

FROM pg_index

GROUP BY

indrelid

,indkey

HAVING COUNT(*) > 1;

PostgreSQL: Script to find a Missing Indexes of the schema

In this post, I am sharing a script to find missing indexes in the schema of PostgreSQL.

The full table scanning is always creating a performance issue for any database.

As a Database Professional, you are adding the database indexes on a table but periodically the volume of data is changing so we need to analyze the old indexes, or we should find missing indexes which may require for the better query performance.

On the other hand, Database Administrator may also require a report on missing indexes which they can share with developers and users so that they can modify indexes accordingly.

SELECT

relname AS TableName

,seq_scan-idx_scan AS TotalSeqScan

,CASE WHEN seq_scan-idx_scan > 0

THEN 'Missing Index Found'

ELSE 'Missing Index Not Found'

END AS MissingIndex

,pg_size_pretty(pg_relation_size(relname::regclass)) AS TableSize

,idx_scan AS TotalIndexScan

FROM pg_stat_all_tables

WHERE schemaname='public'

AND pg_relation_size(relname::regclass)>100000

ORDER BY 2 DESC;

PostgreSQL: How we can create Index on Expression?

If you are searching about, What is Expression Index, this is the one of the right article for you.

We know about the different types of PostgreSQL Index.

If you don’t about this, please visit below few links.

What is BRIN (Block Range Index )?

What is Partial Index?

What is BTree Index?

Most of the Database Administrators or Developers are doing a common mistake,

When you create any Index on Table Column, dose not mean that you can use the Indexed column in any expression and Index will work as per the expectation.

When you used Indexed column in any kind of expression, query planner simply skips the scanning of Indexes of that column.

For example,

We have created one index on Date column and in WHERE are extracting days from this Date column.

In this situation, the Index will not work on Date column and for such specific requirement, we have to create the Expression Index of PostgreSQL.

Let me demonstrate this.

First create one sample table:

CREATE TABLE tbl_ItemTransactions

(

TranID SERIAL

,TransactionDate TIMESTAMPTZ

,TransactionName TEXT

);

Generate sample data for testing the performance of Indexes:

INSERT INTO tbl_ItemTransactions

(TransactionDate, TransactionName)

SELECT x, 'dbrnd'

FROM generate_series('2015-01-01 00:00:00'::timestamptz, '2016-08-01 00:00:00'::timestamptz,'2 seconds'::interval) a(x);

Total inserted record count is 24969601:

SELECT COUNT(1) FROM tbl_ItemTransactions;

Now create index on TransactionDate column:

CREATE INDEX idx_tbl_ItemTransactions_TransactionDate

ON tbl_ItemTransactions (TransactionDate);

Lets see the plan of query with date filter:

EXPLAIN ANALYZE

SELECT *FROM

tbl_ItemTransactions

WHERE TransactionDate BETWEEN '20150808' AND '20160108';

"Index Scan using idx_tbl_itemtransactions_transactiondate on tbl_itemtransactions

(cost=0.44..249472.54 rows=6681405 width=18)

(actual time=61.558..17765.979 rows=6609601 loops=1)"

"Index Cond: ((transactiondate >= '2015-08-08 00:00:00+05:30'::timestamp with time zone) AND

(transactiondate <= '2016-01-08 00:00:00+05:30'::timestamp with time zone))"

"Planning time: 87.751 ms"

"Execution time: 29691.279 ms"

Check Index usage by above query

(Result is one index scan):

SELECT *

FROM pg_stat_user_indexes

WHERE indexrelname='idx_tbl_itemtransactions_transactiondate';

Lets see the plan of query with date function filter:

EXPLAIN ANALYZE

SELECT *FROM

tbl_ItemTransactions

WHERE EXTRACT(day FROM TransactionDate) = 8;

"Seq Scan on tbl_itemtransactions

(cost=0.00..533587.00 rows=124848 width=18)

(actual time=1246.093..37028.883 rows=820800 loops=1)"

" Filter: (date_part('day'::text, transactiondate) = '8'::double precision)"

" Rows Removed by Filter: 24148801"

"Planning time: 0.396 ms"

"Execution time: 58230.847 ms"

Check Index usage by above query:

The Result is one index scan, means above query processed without scaning any index.

We created one index on TransactionDate, but when we use this column with any of default function, planner skips the index fot that column.

SELECT *

FROM pg_stat_user_indexes

WHERE indexrelname='idx_tbl_itemtransactions_transactiondate';

Now we should create one expression Index:

Please provide your require timezone otherwise It produces error like:

ERROR: functions in index expression must be marked IMMUTABLE

CREATE INDEX idx_tbl_ItemTransactions_TransactionDate_day

ON tbl_ItemTransactions ((EXTRACT(day FROM TransactionDate AT TIME ZONE 'UTC')));

Lets execute same query with date function filter:

EXPLAIN ANALYZE

SELECT *FROM

tbl_ItemTransactions

WHERE EXTRACT(day FROM TransactionDate AT TIME ZONE 'UTC') = 8;

"Bitmap Heap Scan on tbl_itemtransactions (cost=2340.01..160893.01 rows=124848 width=18)

(actual time=367.653..2181.489 rows=820800 loops=1)"

" Recheck Cond: (date_part('day'::text,

(transactiondate)::timestamp without time zone) = '8'::double precision)"

" Heap Blocks: exact=5248"

" -> Bitmap Index Scan on idx_tbl_itemtransactions_transactiondate_day

(cost=0.00..2308.80 rows=124848 width=0)

(actual time=365.386..365.386 rows=820800 loops=1)"

" Index Cond: (date_part('day'::text,

(transactiondate)::timestamp without time zone) = '8'::double precision)"

"Planning time: 0.297 ms"

"Execution time: 3608.713 ms"

Check created new Expression Index usage by above query

(Result is one index scan):

SELECT *

FROM pg_stat_user_indexes

WHERE indexrelname='idx_tbl_itemtransactions_transactiondate_day';

PostgreSQL 9.5: Introduced BRIN – Block Range Index with Performance Report

PostgreSQL 9.5 introduced the powerful BRIN Index, which is performance much faster than the regular BTREE Index.

The most important two lines of the BRIN are: It stores only minimum and maximum value per block so it does not require more space. For extremely large table It runs faster than any other Indexes.

In this post, I am going to show the example of BRIN index with the full performance report (testing over the 6gb of Table Data).

Below are steps:

First create one sample table:

CREATE TABLE tbl_ItemTransactions

(

TranID SERIAL

,TransactionDate TIMESTAMPTZ

,TransactionName TEXT

);

Insert Millions of data to test the performance of BRIN Index:

INSERT INTO tbl_ItemTransactions

(TransactionDate, TransactionName)

SELECT x, 'dbrnd'

FROM generate_series('2008-01-01 00:00:00'::timestamptz, '2016-08-01 00:00:00'::timestamptz,'2 seconds'::interval) a(x);

Check the total size of table:

SELECT pg_size_pretty(pg_total_relation_size('tbl_ItemTransactions')) AS TableSize;

TableSize

------------------

6741 MB

Now Check the performance without any Index:

EXPLAIN ANALYSE

SELECT COUNT(1) FROM tbl_ItemTransactions

WHERE TransactionDate BETWEEN '2012-01-01 00:00:00' and '2014-08-08 08:08:08';

--Result:

QueryPlan

-------------------------------------------------------------------------

Aggregate (cost=2997896.81..2997896.82 rows=1 width=0)

(actual time=40651.793..40651.793 rows=1 loops=1)

-> Seq Scan on tbl_itemtransactions (cost=0.00..2894105.00 rows=41516724 width=0)

(actual time=0.009..38726.686 rows=41054645 loops=1)

Filter: ((transactiondate >= '2012-01-01 00:00:00+05:30'::timestamp with time zone)

AND (transactiondate <= '2014-08-08 08:08:08+05:30'::timestamp with time zone))

Rows Removed by Filter: 94377356

Planning time: 0.860 ms

Execution time: 80651.837 ms

Create BRIN index on TransactionDate Column:

CREATE INDEX idx_tbl_ItemTransactions_TransactionDate

ON tbl_ItemTransactions

USING BRIN (TransactionDate);

Now Check the performance of the same query which has BRIN index:

EXPLAIN ANALYSE

SELECT COUNT(1) FROM tbl_ItemTransactions

WHERE TransactionDate BETWEEN '2012-01-01 00:00:00' and '2014-08-08 08:08:08';

QueryPlan

---------------------------------------------------------------------------

Aggregate (cost=2014834.09..2014834.10 rows=1 width=0)

(actual time=7108.998..7108.998 rows=1 loops=1)

-> Bitmap Heap Scan on tbl_itemtransactions (cost=425666.42..1911042.28 rows=41516724 width=0)

(actual time=16.995..5415.086 rows=41054645 loops=1)

Recheck Cond: ((transactiondate >= '2012-01-01 00:00:00+05:30'::timestamp with time zone)

AND (transactiondate <= '2014-08-08 08:08:08+05:30'::timestamp with time zone))

Rows Removed by Index Recheck: 21579

Heap Blocks: lossy=261632

-> Bitmap Index Scan on idx_tbl_itemtransactions_transactiondate

(cost=0.00..415287.24 rows=41516724 width=0) (actual time=15.547..15.547 rows=2616320 loops=1)

Index Cond: ((transactiondate >= '2012-01-01 00:00:00+05:30'::timestamp with time zone)

AND (transactiondate <= '2014-08-08 08:08:08+05:30'::timestamp with time zone))

Planning time: 0.059 ms

Execution time: 7109.060 ms

Now, you can see the difference between the result of the above two queries.

With the BRIN index same query took only 7 seconds and without BRIN it took around 80 seconds.

Create Partial BRIN index on TransactionDate Column:

You can also create Partial BRIN index for your individual range of data. The Partial BRIN index is also faster than normal BRIN index, but we should apply proper filter based on created Partial BRIN index.

CREATE INDEX idx_tbl_ItemTransactions_TransactionDate_2012

ON tbl_ItemTransactions

USING BRIN (TransactionDate)

WHERE TransactionDate BETWEEN '2012-01-01' AND '2012-12-31';

CREATE INDEX idx_tbl_ItemTransactions_TransactionDate_2013

ON tbl_ItemTransactions

USING BRIN (TransactionDate)

WHERE TransactionDate BETWEEN '2013-01-01' AND '2013-12-31';

PostgreSQL DBA

Wednesday, 14 August 2024

Indexes in PostgreSQL

No comments:

Post a Comment

Master and Slave - Sync check - PostgreSQL