Wednesday, 14 August 2024

Tuning Input/Output (I/O) Operations for PostgreSQL

 
Tuning Input/Output (I/O) Operations for PostgreSQL
 


Tuning PostgreSQL I/O:

Tuning PostgreSQL I/O is imperative for building a highly performant and scalable database architecture. Let us look at various factors impacting I/O performance:

Indexing
Partitioning
Checkpoints
VACUUM, ANALYZE (with FILLFACTOR)
Other I/O problems
PostgreSQL I/O on Cloud
Tools

Indexing
Indexing is one of the core tuning techniques which plays an imperative role in improving database I/O performance. This applies to any database really. PostgreSQL supports various index types which can speed up read operations to a great extent, yielding enhanced scalability for applications. Whilst creating indexes is fairly simple and straightforward, it is essential for DBAs and developers to have the knowledge of what type of index to choose, and on what columns. The latter is based on various factors like query complexity, data type, data cardinality, volume of writes, data size, disk architecture, infrastructure (public cloud, private cloud or on-premises), etc..

Whilst indexing can dramatically improve query read performance, it can also slow down the writes hitting the indexed columns. Let us look at an example:

Impact of Indexes on READ operations
A table called emp with around 1 million rows.

READ Performance without an Index
postgres=# select * from emp where eid=10;

 eid | ename    | peid | did  |    doj
-----+---------------+--------+------+------------
  10 | emp        |          |   1   | 2018-06-06
(1 row)
 
Time: 70.020 ms => took about 70+ milli-seconds to respond with on row
READ Performance with an Index
Let us put an index on eid column and see the difference

postgres=# create index indx001 on emp ( eid );
CREATE INDEX

postgres=# select * from emp where eid=10;

 eid | ename  | peid | did |    doj
------+-------------+-------+------+------------
  10 | emp      |          |   1   | 2018-06-06
(1 row)
 
Time: 0.454 ms =>  0.4+ milli-seconds!!! thats a huge difference - isn’t it?
So, Indexing is important.

Impact of Indexes on WRITE operations
Indexes slow down the performance of writes. Whilst the Indexes have an impact on all types of write operations, let us look at some analysis on the impact of Indexes on INSERTs

Inserting 1 million rows into a Table without indexes
postgres=# do $$
postgres$# declare
postgres$# i integer;
postgres$# begin
postgres$# for i in 1..1000000 loop
postgres$# insert into emp values (i,'emp',null,1,current_date);
postgres$# end loop;
postgres$# end $$;
DO

Time: 4818.470 ms (00:04.818) => Takes about 4.8 seconds
Inserting the same 1 million rows with an Index
Let us create an Index first

postgres=# create index indx001 on emp ( eid );
CREATE INDEX

postgres=# do $$
postgres$# declare
postgres$# i integer;
postgres$# begin
postgres$# for i in 1..1000000 loop
postgres$# insert into emp values (i,'emp',null,1,current_date);
postgres$# end loop;
postgres$# end $$;
DO

Time: 7825.494 ms (00:07.825) =>  Takes about 7.8 seconds
So, as we can observe, the INSERT time increased by 80% with just one index and can take much higher time to finish when there are multiple indexes. It can get even worse when there are function based indexes. That is what DBAs have to live with! Indexes will increase the write performance. There are ways to tackle this problem though, which is disk architecture dependent. If the database server is using multiple disk file systems, then the indexes and tables can be placed across multiple tablespaces sitting across multiple disk file systems. In this way, better I/O performance can be achieved.

Index management TIPS
Understand the need for indexes. Intelligent indexing is key.
Avoid creating multiple indexes, and definitely no unnecessary indexes, this can really degrade write performance.
Monitor the usage of indexes and drop any unused indexes.
When indexed columns are subjected to data changes, indexes get bloated as well. So, regularly reorganize indexes.
Partitioning
An effective partitioning strategy can reduce I/O performance problems to a great extent. Large tables can be partitioned based on business logic. PostgreSQL supports table partitioning. Although it does not fully support all the features at the moment, it can only help with some of the real-time use-cases. In PostgreSQL, partitioned child tables are completely individual to the master table which is a bottleneck. E.g., Constraints created on the master table cannot be automatically inherited to the child tables.

However, from balancing I/O perspective, partitioning can really help. All the child partitions can be split across multiple tablespaces and disk file systems. Queries with a date range in the “where” clause hitting the table, partitioned based on date range, can benefit from partitioning by just scanning one or two partitions instead of the full table.

Checkpointing
Checkpoints define the consistent state of the database. They are critical and it is important that checkpoints occur regularly enough to ensure data changes are permanently saved to disk and the database is at consistent state all the time. That being said, improper configuration of checkpoints can lead to I/O performance issues. DBAs must be meticulous about configuring checkpoints to ensure there is no I/O spike and also this depends on how good the disks are and how well the data files layout is architected.

What checkpoint does ?
In simple terms, checkpoints will ensure:

All the committed data is written to the data files on the disk.
clog files are updated with commit status.
Transaction log files in pg_xlog (now pg_wal) directory are recycled.
That explains how I/O intensive checkpoints are. There are parameters in postgresql.conf which can be configured / tuned to control checkpoint behavior and those parameters are max_wal_size, min_wal_size, checkpoint_timeout and checkpoint_completion_target. These parameters will decide how frequently the checkpoints should occur, and within how much time the checkpoints have to finish.

How to understand what configuration is better for checkpoints? How to tune them?

Here are some tips:

Evaluate the database TPS. Evaluate the total volume of transactions occurring in the database in a business day and also identify at what time the highest number of transactions hits the database.
Discuss with application developers and other technical teams regularly to understand the database transaction rate statistics as well as future transaction growth.
This can be done from the database end as well:
Monitor the database and evaluate the number of transactions occurring during the day. This can be done by querying pgcatalog tables like pg_stat_user_tables.

Evaluate the number of wal archive files generated per day

Monitor to understand how the checkpoints are performing by enabling log_checkpoints parameter

2018-06-06 15:03:16.446 IST [2111] LOG:  checkpoint starting: xlog
2018-06-06 15:03:22.734 IST [2111] LOG:  checkpoint complete: wrote 12112 buffers (73.9%); 0 WAL file(s) added, 0 removed, 25 recycled; write=6.058 s, sync=0.218 s, total=6.287 s; sync files=4, longest=0.178 s, average=0.054 s; distance=409706 kB, estimate=412479 kB
Understand if the current checkpoint configuration is good enough for the database. Configure checkpoint_warning parameter (by default configured to 30 seconds) to see the below warnings in the postgres log files.

2018-06-06 15:02:42.295 IST [2111] LOG:  checkpoints are occurring too frequently (11 seconds apart)
2018-06-06 15:02:42.295 IST [2111] HINT:  Consider increasing the configuration parameter "max_wal_size".
What does the above warning mean?
Checkpoints generally occur whenever max_wal_size (1 GB by default which means 64 WAL files) worth of logfiles are filled up or when checkpoint_timeout (every 5 mins every default) is reached. The above warning means configured max_wal_size is not adequate and the checkpoints are occurring every 11 seconds, that in-turn means 64 WAL files in PG_WAL directory are getting filled up in just 11 seconds, which is too frequent. In other words, if there are less frequent transactions, then, the checkpoints will occur every 5 minutes. So, as the hint suggests, increase the max_wal_size parameter to a higher value, max_min_size parameter can be increased to the same or a lesser than former.

Another critical parameter to consider from I/O performance perspective is checkpoint_completion_target which is by default configured to 0.5.

checkpoint_completion_target = 0.5 x checkpoint_timeout = 2.5 minutes

That means, checkpoints have got 2.5 mins to sync the dirty blocks to the disk. Are 2.5 minutes enough? That needs to be evaluated. If the number of dirty blocks to be written is very high, then 2.5 minutes can seem very very aggressive and that is when an I/O spike can be observed. Configuring the completion_target parameter must be done based on max_wal_size and checkpoint_timeout values. If these parameters are raised to a higher value, consider raising checkpoint_completion_target accordingly.

VACUUM, ANALYZE (with FILLFACTOR)
VACUUM is one of the most powerful features of PostgreSQL. It can be used to remove bloats (fragmented space) within tables and indexes, and is generated by transactions. The database must be subjected to VACUUMing regularly to ensure healthy maintenance and better performance. Again, not VACUUMing the database regularly can lead to serious performance problems. ANALYZE must be performed along with VACUUM (VACUUM ANALYZE) to ensure up-to-date statistics for the query planner.

VACUUM ANALYZE can be performed in two ways: manual, automatic or both. In a real-time production environment, it is generally both. Automatic VACUUM is enabled by the parameter “autovacuum” which is by default configured to “on”. With autovacuum enabled, PostgreSQL automatically starts VACUUMing the Tables periodically. The candidate tables in need of vacuuming are picked up by autovacuum processes based on various thresholds set by various autovacuum* parameters, these parameters can be tweaked / tuned to ensure bloats of the tables are cleared periodically. Let us look at some parameters and their use –

Autovacuum parameters
autovacuum=on This parameter is used to enable / disable autovacuum. Default is “on”.
log_autovacuum_min_duration = -1 Logs the duration of the autovacuum process. This is important to understand how long the autovacuum process was running for.
autovacuum_max_workers = 3 Number of autovacuum processes needed. This depends on how aggressive database transactions are, and how many CPUs you can offer for autovacuum processes.
autovacuum_naptime = 1 min Autovacuum rest time between autovacuum runs.
Parameters defining threshold for Autovacuum process to kick off
Autovacuum job(s) kick off when a certain threshold is reached. Below are the parameters which can be used to set a certain threshold, based on which, the autovacuum process will start.

autovacuum_vacuum_threshold = 50 The table will be vacuumed when minimum of 50 rows will be updated / deleted in a Table.
autovacuum_analyze_threshold = 50 The table will be analyzed when minimum of 50 rows will be updated / deleted in a Table.
autovacuum_vacuum_scale_factor = 0.2 The table will be vacuumed when minimum of 20% of the rows are updated / deleted in a Table.
autovacuum_analyze_scale_factor = 0.1 The table will be vacuumed when minimum of 10% of the rows are updated / deleted in a Table.
Above threshold parameters can be modified based on database behavior. DBAs will need to analyze and identify the hot tables and ensure those tables are vacuumed as frequently as possible to ensure good performance. Arriving at a certain value for these parameters could be a challenge in a high-transaction environment, wherein data changes happen every second. Many-a-times I did notice that autovacuum processes take quite long to complete, ending up consuming too much resources in production systems.

I would suggest not to depend completely on autovacuum process, the best way is to schedule a nightly VACUUM ANALYZE job so that the burden on autovacuum is reduced. To start with, consider manually VACUUMing big tables with a high-transaction rate.

VACUUM FULL
VACUUM FULL helps reclaim the bloated space in the tables and indexes. This utility cannot be used when the database is online as it locks the table. Tables must be subjected to VACUUM FULL only when the applications are shutdown. Indexes will also be re-organized along with tables during VACUUM FULL.

Let us take a look at the impact of VACUUM ANALYZE

Bloats: How to identify bloats? When are bloats generated?
Here are some tests:

I have got a table of size 1 GB with 10 million rows.

postgres=# select pg_relation_size('pgbench_accounts')/1024/1024/1024;

 ?column? 
----------------
        1

postgres=# select count(*) From pgbench_accounts ;
  count   
-----------------
 10000000
Let us look at the impact of bloats on a simple query: select * from pgbench_accounts;

Below is the explain plan for the query:

postgres=# explain analyze select * from pgbench_accounts;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on pgbench_accounts  (cost=0.00..263935.00 rows=10000000 width=97) 
 (actual time=0.033..1054.257 rows=10000000 loops=1)
 Planning time: 0.255 ms
 Execution time: 1494.448 ms
Now, let us update all the rows in the table and see the impact of the above SELECT query.

postgres=# update pgbench_accounts set abalance=1;
UPDATE 10000000

postgres=# select count(*) From pgbench_accounts ;
  count   
-----------------
 10000000
Below is the EXPLAIN PLAN of the query post UPDATE execution.

postgres=# explain analyze select * from pgbench_accounts;

QUERY PLAN                                                             
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on pgbench_accounts  (cost=0.00..527868.39 rows=19999939 width=97) 
 (actual time=404.474..1520.175 rows=10000000 loops=1)
 Planning time: 0.051 ms
 Execution time: 1958.532 ms
The size of the table increased to 2 GB after the UPDATE

postgres=# select pg_relation_size('pgbench_accounts')/1024/1024/1024;

 ?column? 
-----------------
        2
If you can observe and compare the cost numbers of the earlier EXPLAIN PLAN, there is a huge difference. The cost has increased by a big margin. More importantly if you observe carefully, the number of rows (just over 19 million) being scanned after the UPDATE is higher which is almost two times the actual existing rows (10 million). That means, the number of bloated rows are 9+ million and actual time increased as well and the execution time increased from 1.4 seconds to 1.9 seconds.

So, that is the impact of not VACUUMing the TABLE after the UPDATE. The above EXPLAIN PLAN numbers precisely means, the table is bloated.

How to identify if the table is bloated? Use pgstattuple contrib module:

postgres=# select * from pgstattuple('pgbench_accounts');
 table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent 
------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
 2685902848 |    10000000 | 1210000000 |         45.05 |          9879891 |     1195466811 |              44.51 |   52096468 |         1.94
The above number indicates that half of the table is bloated.

Let us VACUUM ANALYZE the table and see the impact now:

postgres=# VACUUM ANALYZE pgbench_accounts ;
VACUUM

postgres=# explain analyze select * from pgbench_accounts;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on pgbench_accounts  (cost=0.00..428189.05 rows=10032005 width=97) 
 (actual time=400.023..1472.118 rows=10000000 loops=1)
 Planning time: 4.374 ms
 Execution time: 1913.541 ms
After VACUUM ANALYZE, the cost numbers have decreased. Now, the number of rows scanned is showing up close to 10 million, also the actual time and the execution time did not change much. That is because, though the bloats in the table have vanished, the size of the table to be scanned remains the same. Below is the pgstattuple output post VACUUM ANALYZE.

postgres=# select * from pgstattuple('pgbench_accounts');

 table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent 
------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
 2685902848 |    10000000 | 1210000000 |         45.05 |             0 |              0 |                  0 | 1316722516 |        49.02
Above number indicates that all the bloats (dead tuples) have vanished.

Let us look at the impact of VACUUM FULL ANALYZE and see what happens:

postgres=# vacuum full analyze pgbench_accounts ;
VACUUM

postgres=# explain analyze select * from pgbench_accounts;

                            QUERY PLAN                                                            
---------------------------------------------------------------------------
 Seq Scan on pgbench_accounts  (cost=0.00..263935.35 rows=10000035 width=97) 
(actual time=0.015..1089.726 rows=10000000 loops=1)
 Planning time: 0.148 ms
 Execution time: 1532.596 ms
If you observe, the actual time and the execution time numbers are similar to the numbers before UPDATE. Also, the size of the table has now decreased from 2 GB to 1 GB.

postgres=# select pg_relation_size('pgbench_accounts')/1024/1024/1024;

 ?column? 
-----------------
        1
That is the impact of VACUUM FULL.

FILLFACTOR
FILLFACTOR is a very important attribute which can make real difference to the database maintenance strategy at a table and index level. This value indicates the amount of space to be used by the INSERTs within a data block. FILLFACTOR value defaults to 100%, which means, INSERTs can utilize all the space available in a data block. It also means, no space is available for UPDATEs. This value can be decreased to a certain value for heavily updated tables.

This parameter can be configured to each table and an index. If FILLFACTOR is configured to the optimal value, you can see real difference in VACUUM performance and query performance too. In short, optimal FILLFACTOR values ensure unnecessary number of blocks are not allocated.

Let us look at the same example above –

The table has one million rows

postgres=# select count(*) From pgbench_accounts ;
  count   
-----------------
 10000000
Before update the size of the table is 1 GB

postgres=# select pg_relation_size('pgbench_accounts')/1024/1024/1024;

?column? 
--------
   1

postgres=# update pgbench_accounts set abalance=1;
UPDATE 10000000
After update, the size of the table increased to 2 GB after the UPDATE

postgres=# select pg_relation_size('pgbench_accounts')/1024/1024/1024;

?column? 
---------
    2
That means, number of blocks allocated to the table has increased by 100%. If the FILLFACTOR was configured, the size of the table may not have increased by that margin.

How to know what value to configure to FILLFACTOR?
It all depends on what columns are being updated and the size of the updated columns. In general, it would be good to evaluate the FILLFACTOR value by testing it out in UAT Databases. If the columns being updated are say 10% of the whole table, then, consider configuring fillfactor to 90% or 80%.

Important Note: If you change the FILLFACTOR value for the existing table with the data, you will need to do a VACUUM FULL or a re-organization of the table to ensure FILLFACTOR value is in effect for the existing data.

VACUUMing TIPS
As said above, consider running VACUUM ANALYZE job manually every night on the heavily used tables even when autovacuum is enabled.
Consider running VACUUM ANALYZE on tables after bulk INSERT. This is important as many believe that VACUUMing may not be needed after INSERTs.
Monitor to ensure highly active tables are getting VACUUMed regularly by querying the table pg_stat_user_tables.
Use pg_stattuple contrib module to identify the size of the bloated space within the table segments.
VACUUM FULL utility cannot be used on production database systems. Consider using tools like pg_reorg or pg_repack which will help reorganize tables and indexes online without locks.
Ensure AUTOVACUUM process does run for a longer time during business (high traffic) hours.
Enable log_autovacuum_min_duration parameter to log the timings and duration of AUTOVACUUM processes.
Importantly, ensure FILLFACTOR is configured to an optimal value on high transaction Tables and Indexes.
Other I/O problems
Disk Sorting
Queries performing sorting is another common occurrence in real-time production databases and most of these cannot be avoided. Queries using clauses like GROUP BY, ORDER BY, DISTINCT, CREATE INDEX, VACUUM FULL etc. perform sorting and the sorting can take place on disk. Sorting takes place in the memory if the selection and sorting is done based on indexed columns. This is where composite-indexes play a key role. Indexes are aggressively cached into memory. Otherwise, if there arises a need to sort the data on disk, the performance would slow down drastically.

To ensure sorting takes place in memory, the work_mem parameter can be used. This parameter can be configured to a value such that the whole sorting can be done in memory. The core advantage of this parameter is that, apart from configuring it in postgresql.conf, it can also be configured at session level, user level or database level. How much should the work_mem value be? How to know which queries are performing disk sorting? How to monitor queries performing disk sorting on a real-time production database?

The answer is – configure log_temp_files parameter to a certain value. The value is in bytes, a value of 0 logs all the temp files (along with their sizes) generated on disk due to disk sorting. Once the parameter is configured, you will be able to see the following messages in the log files

2018-06-07 22:48:02.358 IST [4219] LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp4219.0", size 200425472
2018-06-07 22:48:02.358 IST [4219] STATEMENT:  create index bid_idx on pgbench_accounts(bid);
2018-06-07 22:48:02.366 IST [4219] LOG:  duration: 6421.705 ms  statement: create index bid_idx on pgbench_accounts(bid);
The above message means that the CREATE INDEX query was performing disk sorting and has generated a file of size 200425472 bytes which is 191+ MB. That precisely means, the work_mem parameter must be configured to 191+ MB or above for this particular query to perform memory sorting.

Well, for the application queries, work_mem parameter can only be configured at the user level. Before doing so, beware of the number of connections that user is making to the database and number of sorting queries being executed by that user. Because PostgreSQL tries to allocate work_mem to each process (performing sorting) in each connection which could potentially starve the memory on the database server.

Database file-system layout
Designing efficient and performance conducive database file-system layout is important from performance and scalability perspective. Importantly, this is not dependent on the database size. In general, the perception is that huge size databases will need high performance disk architecture which is NOT true. Even if the database size is 50 GB, you may be in need of a good disk architecture. And this may not be possible without incurring extra costs.

Here are some TIPS for the same:

Ensure the database has multiple tablespaces, with tables and indexes grouped based on the transaction rates.
The tablespace must be placed across multiple disk file systems for balanced I/O. This will also ensure multiple CPUs come into play to perform transactions across multiple disks.
Consider placing pg_xlog or pg_wal directory on a separate disk on a high transaction database.
Ensure *_cost parameters are configured based on the infrastructure
Use iostat, mpstat and other I/O monitoring tools to understand the I/O stats across all the disks and architect / manage the database objects accordingly.
PostgreSQL on Cloud
Infrastructure is critical for good database performance. Performance engineering strategies differ based on infrastructure and environment. Special care needs to be taken for PostgreSQL databases hosted in the cloud. Performance benchmarking for databases hosted on physical barebone servers in a local data center can be entirely different from databases hosted in the public cloud.

In general, cloud instances could be a little slower and benchmarks differ by considerable margin especially in terms of I/O. Always perform I/O latency checks before choosing / building a cloud instance. To my surprise, I learnt that performance of cloud instances can vary depending on the regions too, even though they are from the same cloud provider. To explain this further, a cloud instance with same specs built in two different regions could give you different performance results.

Bulk data load
Offline bulk data loading operations are pretty common in the database world. They can generate significant I/O load, which in turn slows down the data load performance. I have faced such challenges in my experience as DBA. Often, data load gets terribly slow and has to be tuned. Here are some tips. Mind you, these apply to offline data loading operations only and cannot be considered for data loading on live production database.

Since most of the data load operations are carried out during off business hours, ensure the following parameters are configured during the data load –
Configure checkpoint related values big enough so that checkpoints do not cause any performance issues.
Switch off full_page_write
Switch off wal archiving
Configure synchronous_commit parameter to “off”
Drop constraints and indexes for those tables subjected to the data load (Constraints and indexes can be re-created post the data load with a bigger work_mem value)
If you are doing the data load from a CSV file, bigger maintenance_work_mem can get you good results.
Though there will be a significant performance benefit, DO NOT switch off fsync parameter as that could lead to data corruption.

TIPS for cloud performance analysis
Perform thorough I/O latency tests using pgbench. In my experience, I had pretty ordinary performance results when doing disk latency checks as part of TPS evaluation. There were issues with cache performance on some public cloud instances. This will help chose the appropriate specs for the cloud instance chosen for the databases.
Cloud instances can perform differently from region to region. A cloud instance with certains specs in a region can give different performance results compared to a cloud instance with same specs in another region. My pgbench tests executed on multiple cloud instances (all same specs with the same cloud vendor) across different regions gave me different results on some of them. This is important especially when you are migrating to cloud.
Query performance on the cloud might need a different tuning approach. DBAs will need to be using *_cost parameters to ensure healthy query execution plans are generated.
Tools to Monitor PostgreSQL Performance
There are various tools to monitor PostgreSQL performance. Let me highlight some of those.

pg_top is a GREAT tool to monitor PostgreSQL database dynamically. I would highly recommend this tool for DBAs for various reasons. This tool has numerous advantages, let me list them out:
pg_top tool uses textual interface and is similar to Unix “top” utility.
Will clearly list out the processes and the hardware resources utilized. What excites me with this tool is that it will clearly tell you if a particular process is currently on DISK or CPU – in my view that’s excellent. DBAs can clearly pick the process running for longer time on the disk.
You can check the EXPLAIN PLAN of the top SQLs dynamically or instantly
You can also find out what Tables or Indexes are being scanned instantly
Nagios is a popular monitoring tool for PostgreSQL which has both open-source and commercial versions. Open source version should suffice for monitoring. Custom Perl scripts can be built and plugged into Nagios module.

Pgbadger is a popular tool which can be used to analyze PostgreSQL log files and generate performance reports. This report can be used to analyze the performance of checkpoints, disk sorting.
Zabbix is another popular tool used for PostgreSQL monitoring.
ClusterControl is an up-and-coming management platform for PostgreSQL. Apart from monitoring, it also has functionality to deploy replication setups with load balancers, automatic failover, backup management, among others.

What is The Use of Initdb in PostgreSQL?

 

What is The Use of Initdb in PostgreSQL?

It will create a new PostgreSQL cluster.
In Postgres cluster is a collection of databases that are managed by a single cluster.
It will create the data directories in which the database data will live, generating the shared catalog tables and creating default databases.
* Postgresql — used for connections

* template 0 — user connection is not allowed and maintain DB consistency

* template 1 — When we creating a database will take an exact copy of template1.

In case the user doesn’t have permission on the data directory will create an empty directory.
initdb initializes the database cluster with default locale and character set encoding. This character set encoding, collation order (LC_COLLATE) and character set classes (LC_CTYPE, e.g., upper, lower, digit) can be set separately for a database when it is created. initdb determines those settings for the template1 database, which will serve as the default for all other databases.
Before the Initdb initialises, the data directory was empty.
[postgres@prime]$cd /u01/pgdatabase/data
[postgres@prime data]$ ls -lrth
total of 0
Now we are going to execute the initdb -D “ path ”, here -D specifies data directory location
postgres@prime data]$ /u01/postgresql-10.0/bin/initdb -D /u01/pgdatabase/data
Once the initdb initialised, to verify the data directory.
[postgres@prime data]$ ls -lrth
total 124K
-rw------- 1 postgres postgres 88 Oct 30 2019 postgresql.auto.conf
-rw------- 1 postgres postgres 3 Oct 30 2019 PG_VERSION
drwx------ 2 postgres postgres 4.0K Oct 30 2019 pg_twophase
drwx------ 2 postgres postgres 4.0K Oct 30 2019 pg_tblspc
drwx------ 2 postgres postgres 4.0K Oct 30 2019 pg_snapshots
drwx------ 2 postgres postgres 4.0K Oct 30 2019 pg_serial
drwx------ 2 postgres postgres 4.0K Oct 30 2019 pg_replslot
drwx------ 4 postgres postgres 4.0K Oct 30 2019 pg_multixact
-rw------- 1 postgres postgres 1.6K Oct 30 2019 pg_ident.conf
-rw------- 1 postgres postgres 4.5K Oct 30 2019 pg_hba.conf
drwx------ 2 postgres postgres 4.0K Oct 30 2019 pg_dynshmem
drwx------ 2 postgres postgres 4.0K Oct 30 2019 pg_commit_ts
drwx------ 2 postgres postgres 4.0K Oct 30 2019 pg_xact
drwx------ 2 postgres postgres 4.0K Oct 30 2019 pg_subtrans
-rw------- 1 postgres postgres 23K Feb 14 22:17 postgresql.conf
drwx------ 11 postgres postgres 4.0K Feb 14 22:43 base
drwx------ 2 postgres postgres 4.0K Feb 14 22:44 pg_stat
-rw------- 1 postgres postgres 769 Feb 14 22:46 logfile
drwx------ 3 postgres postgres 4.0K Mar 2 11:40 pg_wal
-rw------- 1 postgres postgres 62 Mar 7 13:38 postmaster.opts
drwx------ 2 postgres postgres 4.0K Mar 7 13:38 pg_notify
-rw------- 1 postgres postgres 78 Mar 7 13:38 postmaster.pid
drwx------ 2 postgres postgres 4.0K Mar 7 13:46 global
drwx------ 4 postgres postgres 4.0K Mar 7 15:43 pg_logical
drwx------ 2 postgres postgres 4.0K Mar 8 11:33 pg_stat_tmp
Here the important files is created automatically i.e postgresql.conf,pg_hba.conf,base,pg_tblspc ..etc


Summary:

The initdb use to create the data directories and default databases as well. By using initdb we can create multiple clusters on the same machine with the different directory path.

Daily checklist of a PostgreSQL DBA

 

daily checklist of a PostgreSQL DBA

 
The daily checklist of a PostgreSQL DBA
We often get this question, What are the most important things a PostgreSQL DBA should do to guarantee optimal performance and reliability, Do we have checklist for PostgreSQL DBAs to folalow daily ? Since we are getting this question too often, Thought let’s note it as blog post and share with community of PostgreSQL ecosystem. The only objective this post is to share the information, Please don’t consider this as a run-book or recommendation from MinervaDB PostgreSQL support. We at MinervaDB are not accountable of any negative performance in you PostgreSQL performance with running these scripts in production database infrastructure of your business, The following is a simple daily checklist for PostgreSQL DBA:

Task 1: Check that all the PostgreSQL instances are up and operational:
=====================================================================

pgrep -u postgres -fa -- -D
What if you have several instances of PostgreSQL are running:


pgrep -fa -- -D |grep postgres




Task 2: Monitoring PostgreSQL logsRecord PostgreSQL error logs: Open postgresql.conf configuration file, Under the ERROR REPORTING AND LOGGING section of the file, use following config parameters:
====================================

log_destination = 'stderr' 
logging_collector = on 
log_directory = 'pg_log' 
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' 
log_truncate_on_rotation = off 
log_rotation_age = 1d 
log_min_duration_statement = 0 
log_connections = on 
log_duration = on 
log_hostname = on 
log_timezone = 'UTC'
Save the postgresql.conf file and restart the PostgreSQL server:


sudo service postgresql restart






Task 3: Confirm PostgreSQL backup completed successfully

Use backup logs (possible only with PostgreSQL logical backup) to audit backup quality:

1
$ pg_dumpall > /backup-path/pg-backup-dump.sql > /var/log/postgres/pg-backup-dump.log



Task 4: Monitoring PostgreSQL Database Size:

select datname, pg_size_pretty(pg_database_size(datname)) from pg_database order by pg_database_size(datname);
Task 5: Monitor all PostgreSQL queries running (please repeat this task every 90 minutes during business / peak hours):

1
2
3
4
SELECT pid, age(clock_timestamp(), query_start), usename, query 
FROM pg_stat_activity 
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' 
ORDER BY query_start desc;



Task 6: Inventory of indexes in PostgreSQL database:


select
    t.relname as table_name,
    i.relname as index_name,
    string_agg(a.attname, ',') as column_name
from
    pg_class t,
    pg_class i,
    pg_index ix,
    pg_attribute a
where
    t.oid = ix.indrelid
    and i.oid = ix.indexrelid
    and a.attrelid = t.oid
    and a.attnum = ANY(ix.indkey)
    and t.relkind = 'r'
    and t.relname not like 'pg_%'
group by  
    t.relname,
    i.relname
order by
    t.relname,
    i.relname;





Task 7: Finding the largest databases in your PostgreSQL cluster:

SELECT d.datname as Name,  pg_catalog.pg_get_userbyid(d.datdba) as Owner,
    CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
        THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
        ELSE 'No Access'
    END as Size
FROM pg_catalog.pg_database d
    order by
    CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
        THEN pg_catalog.pg_database_size(d.datname)
        ELSE NULL
    END desc -- nulls first
    LIMIT 20





Task 8: when you are suspecting some serious performance bottleneck in PostgreSQL ? Especially when you suspecting transactions blocking each other:


WITH RECURSIVE l AS (
  SELECT pid, locktype, mode, granted,
ROW(locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid) obj
  FROM pg_locks
), pairs AS (
  SELECT w.pid waiter, l.pid locker, l.obj, l.mode
  FROM l w
  JOIN l ON l.obj IS NOT DISTINCT FROM w.obj AND l.locktype=w.locktype AND NOT l.pid=w.pid AND l.granted
  WHERE NOT w.granted
), tree AS (
  SELECT l.locker pid, l.locker root, NULL::record obj, NULL AS mode, 0 lvl, locker::text path, array_agg(l.locker) OVER () all_pids
  FROM ( SELECT DISTINCT locker FROM pairs l WHERE NOT EXISTS (SELECT 1 FROM pairs WHERE waiter=l.locker) ) l
  UNION ALL
  SELECT w.waiter pid, tree.root, w.obj, w.mode, tree.lvl+1, tree.path||'.'||w.waiter, all_pids || array_agg(w.waiter) OVER ()
  FROM tree JOIN pairs w ON tree.pid=w.locker AND NOT w.waiter = ANY ( all_pids )
)
SELECT (clock_timestamp() - a.xact_start)::interval(3) AS ts_age,
       replace(a.state, 'idle in transaction', 'idletx') state,
       (clock_timestamp() - state_change)::interval(3) AS change_age,
       a.datname,tree.pid,a.usename,a.client_addr,lvl,
       (SELECT count(*) FROM tree p WHERE p.path ~ ('^'||tree.path) AND NOT p.path=tree.path) blocked,
       repeat(' .', lvl)||' '||left(regexp_replace(query, 's+', ' ', 'g'),100) query
FROM tree
JOIN pg_stat_activity a USING (pid)
ORDER BY path;




Task 9:  Identify bloated Tables in PostgreSQL :


WITH constants AS (
    -- define some constants for sizes of things
    -- for reference down the query and easy maintenance
    SELECT current_setting('block_size')::numeric AS bs, 23 AS hdr, 8 AS ma
),
no_stats AS (
    -- screen out table who have attributes
    -- which dont have stats, such as JSON
    SELECT table_schema, table_name, 
        n_live_tup::numeric as est_rows,
        pg_table_size(relid)::numeric as table_size
    FROM information_schema.columns
        JOIN pg_stat_user_tables as psut
           ON table_schema = psut.schemaname
           AND table_name = psut.relname
        LEFT OUTER JOIN pg_stats
        ON table_schema = pg_stats.schemaname
            AND table_name = pg_stats.tablename
            AND column_name = attname 
    WHERE attname IS NULL
        AND table_schema NOT IN ('pg_catalog', 'information_schema')
    GROUP BY table_schema, table_name, relid, n_live_tup
),
null_headers AS (
    -- calculate null header sizes
    -- omitting tables which dont have complete stats
    -- and attributes which aren't visible
    SELECT
        hdr+1+(sum(case when null_frac <> 0 THEN 1 else 0 END)/8) as nullhdr,
        SUM((1-null_frac)*avg_width) as datawidth,
        MAX(null_frac) as maxfracsum,
        schemaname,
        tablename,
        hdr, ma, bs
    FROM pg_stats CROSS JOIN constants
        LEFT OUTER JOIN no_stats
            ON schemaname = no_stats.table_schema
            AND tablename = no_stats.table_name
    WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
        AND no_stats.table_name IS NULL
        AND EXISTS ( SELECT 1
            FROM information_schema.columns
                WHERE schemaname = columns.table_schema
                    AND tablename = columns.table_name )
    GROUP BY schemaname, tablename, hdr, ma, bs
),
data_headers AS (
    -- estimate header and row size
    SELECT
        ma, bs, hdr, schemaname, tablename,
        (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
        (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
    FROM null_headers
),
table_estimates AS (
    -- make estimates of how large the table should be
    -- based on row and page size
    SELECT schemaname, tablename, bs,
        reltuples::numeric as est_rows, relpages * bs as table_bytes,
    CEIL((reltuples*
            (datahdr + nullhdr2 + 4 + ma -
                (CASE WHEN datahdr%ma=0
                    THEN ma ELSE datahdr%ma END)
                )/(bs-20))) * bs AS expected_bytes,
        reltoastrelid
    FROM data_headers
        JOIN pg_class ON tablename = relname
        JOIN pg_namespace ON relnamespace = pg_namespace.oid
            AND schemaname = nspname
    WHERE pg_class.relkind = 'r'
),
estimates_with_toast AS (
    -- add in estimated TOAST table sizes
    -- estimate based on 4 toast tuples per page because we dont have 
    -- anything better.  also append the no_data tables
    SELECT schemaname, tablename, 
        TRUE as can_estimate,
        est_rows,
        table_bytes + ( coalesce(toast.relpages, 0) * bs ) as table_bytes,
        expected_bytes + ( ceil( coalesce(toast.reltuples, 0) / 4 ) * bs ) as expected_bytes
    FROM table_estimates LEFT OUTER JOIN pg_class as toast
        ON table_estimates.reltoastrelid = toast.oid
            AND toast.relkind = 't'
),
table_estimates_plus AS (
-- add some extra metadata to the table data
-- and calculations to be reused
-- including whether we cant estimate it
-- or whether we think it might be compressed
    SELECT current_database() as databasename,
            schemaname, tablename, can_estimate, 
            est_rows,
            CASE WHEN table_bytes > 0
                THEN table_bytes::NUMERIC
                ELSE NULL::NUMERIC END
                AS table_bytes,
            CASE WHEN expected_bytes > 0 
                THEN expected_bytes::NUMERIC
                ELSE NULL::NUMERIC END
                    AS expected_bytes,
            CASE WHEN expected_bytes > 0 AND table_bytes > 0
                AND expected_bytes <= table_bytes
                THEN (table_bytes - expected_bytes)::NUMERIC
                ELSE 0::NUMERIC END AS bloat_bytes
    FROM estimates_with_toast
    UNION ALL
    SELECT current_database() as databasename, 
        table_schema, table_name, FALSE, 
        est_rows, table_size,
        NULL::NUMERIC, NULL::NUMERIC
    FROM no_stats
),
bloat_data AS (
    -- do final math calculations and formatting
    select current_database() as databasename,
        schemaname, tablename, can_estimate, 
        table_bytes, round(table_bytes/(1024^2)::NUMERIC,3) as table_mb,
        expected_bytes, round(expected_bytes/(1024^2)::NUMERIC,3) as expected_mb,
        round(bloat_bytes*100/table_bytes) as pct_bloat,
        round(bloat_bytes/(1024::NUMERIC^2),2) as mb_bloat,
        table_bytes, expected_bytes, est_rows
    FROM table_estimates_plus
)
-- filter output for bloated tables
SELECT databasename, schemaname, tablename,
    can_estimate,
    est_rows,
    pct_bloat, mb_bloat,
    table_mb
FROM bloat_data
-- this where clause defines which tables actually appear
-- in the bloat chart
-- example below filters for tables which are either 50%
-- bloated and more than 20mb in size, or more than 25%
-- bloated and more than 4GB in size
WHERE ( pct_bloat >= 50 AND mb_bloat >= 10 )
    OR ( pct_bloat >= 25 AND mb_bloat >= 1000 )
ORDER BY pct_bloat DESC;








Task 10:  Identify bloated indexes in PostgreSQL :


-- btree index stats query
-- estimates bloat for btree indexes
WITH btree_index_atts AS (
    SELECT nspname, 
        indexclass.relname as index_name, 
        indexclass.reltuples, 
        indexclass.relpages, 
        indrelid, indexrelid,
        indexclass.relam,
        tableclass.relname as tablename,
        regexp_split_to_table(indkey::text, ' ')::smallint AS attnum,
        indexrelid as index_oid
    FROM pg_index
    JOIN pg_class AS indexclass ON pg_index.indexrelid = indexclass.oid
    JOIN pg_class AS tableclass ON pg_index.indrelid = tableclass.oid
    JOIN pg_namespace ON pg_namespace.oid = indexclass.relnamespace
    JOIN pg_am ON indexclass.relam = pg_am.oid
    WHERE pg_am.amname = 'btree' and indexclass.relpages > 0
         AND nspname NOT IN ('pg_catalog','information_schema')
    ),
index_item_sizes AS (
    SELECT
    ind_atts.nspname, ind_atts.index_name, 
    ind_atts.reltuples, ind_atts.relpages, ind_atts.relam,
    indrelid AS table_oid, index_oid,
    current_setting('block_size')::numeric AS bs,
    8 AS maxalign,
    24 AS pagehdr,
    CASE WHEN max(coalesce(pg_stats.null_frac,0)) = 0
        THEN 2
        ELSE 6
    END AS index_tuple_hdr,
    sum( (1-coalesce(pg_stats.null_frac, 0)) * coalesce(pg_stats.avg_width, 1024) ) AS nulldatawidth
    FROM pg_attribute
    JOIN btree_index_atts AS ind_atts ON pg_attribute.attrelid = ind_atts.indexrelid AND pg_attribute.attnum = ind_atts.attnum
    JOIN pg_stats ON pg_stats.schemaname = ind_atts.nspname
          -- stats for regular index columns
          AND ( (pg_stats.tablename = ind_atts.tablename AND pg_stats.attname = pg_catalog.pg_get_indexdef(pg_attribute.attrelid, pg_attribute.attnum, TRUE)) 
          -- stats for functional indexes
          OR   (pg_stats.tablename = ind_atts.index_name AND pg_stats.attname = pg_attribute.attname))
    WHERE pg_attribute.attnum > 0
    GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9
),
index_aligned_est AS (
    SELECT maxalign, bs, nspname, index_name, reltuples,
        relpages, relam, table_oid, index_oid,
        coalesce (
            ceil (
                reltuples * ( 6 
                    + maxalign 
                    - CASE
                        WHEN index_tuple_hdr%maxalign = 0 THEN maxalign
                        ELSE index_tuple_hdr%maxalign
                      END
                    + nulldatawidth 
                    + maxalign 
                    - CASE /* Add padding to the data to align on MAXALIGN */
                        WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
                        ELSE nulldatawidth::integer%maxalign
                      END
                )::numeric 
              / ( bs - pagehdr::NUMERIC )
              +1 )
         , 0 )
      as expected
    FROM index_item_sizes
),
raw_bloat AS (
    SELECT current_database() as dbname, nspname, pg_class.relname AS table_name, index_name,
        bs*(index_aligned_est.relpages)::bigint AS totalbytes, expected,
        CASE
            WHEN index_aligned_est.relpages <= expected 
                THEN 0
                ELSE bs*(index_aligned_est.relpages-expected)::bigint 
            END AS wastedbytes,
        CASE
            WHEN index_aligned_est.relpages <= expected
                THEN 0 
                ELSE bs*(index_aligned_est.relpages-expected)::bigint * 100 / (bs*(index_aligned_est.relpages)::bigint) 
            END AS realbloat,
        pg_relation_size(index_aligned_est.table_oid) as table_bytes,
        stat.idx_scan as index_scans
    FROM index_aligned_est
    JOIN pg_class ON pg_class.oid=index_aligned_est.table_oid
    JOIN pg_stat_user_indexes AS stat ON index_aligned_est.index_oid = stat.indexrelid
),
format_bloat AS (
SELECT dbname as database_name, nspname as schema_name, table_name, index_name,
        round(realbloat) as bloat_pct, round(wastedbytes/(1024^2)::NUMERIC) as bloat_mb,
        round(totalbytes/(1024^2)::NUMERIC,3) as index_mb,
        round(table_bytes/(1024^2)::NUMERIC,3) as table_mb,
        index_scans
FROM raw_bloat
)
-- final query outputting the bloated indexes
-- change the where and order by to change
-- what shows up as bloated
SELECT *
FROM format_bloat
WHERE ( bloat_pct > 50 and bloat_mb > 10 )
ORDER BY bloat_mb DESC;







Task 11:  Monitor blocked and blocking activities in PostgreSQL:


 SELECT blocked_locks.pid     AS blocked_pid,
         blocked_activity.usename  AS blocked_user,
         blocking_locks.pid     AS blocking_pid,
         blocking_activity.usename AS blocking_user,
         blocked_activity.query    AS blocked_statement,
         blocking_activity.query   AS current_statement_in_blocking_process
   FROM  pg_catalog.pg_locks         blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks         blocking_locks 
        ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
 
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
   WHERE NOT blocked_locks.granted;






Task 12: Monitoring PostgreSQL Disk I/O performance:


-- perform a "select pg_stat_reset();" when you want to reset counter statistics
with 
all_tables as
(
SELECT  *
FROM    (
    SELECT  'all'::text as table_name, 
        sum( (coalesce(heap_blks_read,0) + coalesce(idx_blks_read,0) + coalesce(toast_blks_read,0) + coalesce(tidx_blks_read,0)) ) as from_disk, 
        sum( (coalesce(heap_blks_hit,0)  + coalesce(idx_blks_hit,0)  + coalesce(toast_blks_hit,0)  + coalesce(tidx_blks_hit,0))  ) as from_cache    
    FROM    pg_statio_all_tables  --> change to pg_statio_USER_tables if you want to check only user tables (excluding postgres's own tables)
    ) a
WHERE   (from_disk + from_cache) > 0 -- discard tables without hits
),
tables as 
(
SELECT  *
FROM    (
    SELECT  relname as table_name, 
        ( (coalesce(heap_blks_read,0) + coalesce(idx_blks_read,0) + coalesce(toast_blks_read,0) + coalesce(tidx_blks_read,0)) ) as from_disk, 
        ( (coalesce(heap_blks_hit,0)  + coalesce(idx_blks_hit,0)  + coalesce(toast_blks_hit,0)  + coalesce(tidx_blks_hit,0))  ) as from_cache    
    FROM    pg_statio_all_tables --> change to pg_statio_USER_tables if you want to check only user tables (excluding postgres's own tables)
    ) a
WHERE   (from_disk + from_cache) > 0 -- discard tables without hits
)
SELECT  table_name as "table name",
    from_disk as "disk hits",
    round((from_disk::numeric / (from_disk + from_cache)::numeric)*100.0,2) as "% disk hits",
    round((from_cache::numeric / (from_disk + from_cache)::numeric)*100.0,2) as "% cache hits",
    (from_disk + from_cache) as "total hits"
FROM    (SELECT * FROM all_tables UNION ALL SELECT * FROM tables) a
ORDER   BY (case when table_name = 'all' then 0 else 1 end), from_disk desc;




References 

https://www.postgresql.org/developer/related-projects/ 
https://www.postgresql.org/community/
https://github.com/pgexperts

Master and Slave - Sync check - PostgreSQL

  1) Run the below Query on Primary:- SELECT     pid,     usename,     application_name,     client_addr,     state,     sync_state,     sen...