Monitoring postgresql replication is in progress but in not adequate I think. here is some scripts for checking a standby replication. In some cases replication has terminated because of some problems. this scripts give you a glance and you will see the problem about replication. But do not forget to check the log file, you can miss some problems if you do mot check the log file. As I said before postresql views is not adequate.
check overall information about replication on primary
1 | select * from pg_stat_replication; |
If you are using replication slots then replication slots information view is
1 | select * from pg_replication_slots; |
check overall information about replication on standby
1 | select * from pg_stat_wal_receiver; |
check status of replication on standby (version > 10)
1 2 3 4 5 | select pg_is_in_recovery(), pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp(); |
check status of replication on standby (version < 10)
1 2 3 4 | select pg_is_in_recovery(), pg_last_xlog_receive_location(), pg_last_xlog_replay_location(), pg_last_xact_replay_timestamp(); |
run this query for a few times and check if lsn/xlocation number ad replay timestamp is changing.
if it has not change then check the primary log file there could be an error about replication.
–check lag on standby (version >10)
1 2 3 4 | SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay; |
check lag on standby (version <10)
1 2 3 4 | SELECT CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay; |
check the primary lsn and compare with standby’s last applied lsn
get current lsn from primary
1 2 3 4 | SELECT pg_current_wal_lsn(); -[ RECORD 1 ]------+----------- pg_current_wal_lsn | 0/1ADE2F80 |
get last applied lsn from standby
1 2 3 4 5 6 7 8 9 10 11 12 | select pg_is_in_recovery(), pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp(); -[ RECORD 1 ]-----------------+------------------------------ pg_is_in_recovery | t pg_is_wal_replay_paused | f pg_last_wal_receive_lsn | 0/1005EEEF pg_last_wal_replay_lsn | 0/1005EEEF pg_last_xact_replay_timestamp | 2021-05-05 13:01:14.328936+00 |
calculate difference between two lsn from above queries
1 2 3 4 5 6 | pg_wal_lsn_diff(lsn pg_lsn, lsn pg_lsn) (version > 10 ) pg_xlog_location_diff (location pg_lsn, location pg_lsn) (version < 10) select pg_wal_lsn_diff('0/1ADE2F80','0/1005EEEF'); -[ RECORD 1 ]---+--------- pg_wal_lsn_diff | 181944465 |
previous query return result in bytes, we can get gap size in human readable like this
1 2 3 4 5 6 7 | select round(181944465/pow(1024,3.0),2) missing_lsn_GiB; -[ RECORD 1 ]---+----- missing_lsn_gib | 0.17 select round(181944465/pow(1024,2.0),2) missing_lsn_GiB; -[ RECORD 1 ]---+------- missing_lsn_gib | 173.52 |
No comments:
Post a Comment