Saturday, 17 February 2024

Auto Vacuum in POSTGRESQL

 

In a large-scale datacenter, the tables in the Application Server PostgreSQL database can grow quite large. Performance can degrade significantly if stale and temporary data are not systematically removed. Vacuuming cleans up stale or temporary data in a table, and analyzing refreshes its knowledge of all the tables for the query planner.


PostgreSQL database tables are auto-vacuumed by default when 20% of the rows plus 50 rows are inserted, updated, or deleted. Tables are auto-analyzed when a threshold is met for 10% of the rows plus 50 rows. For example, a table with 10000 rows is not auto-vacuumed until 2050 rows are inserted, updated, or deleted. That same table is auto-analyzed when 1050 rows are inserted, updated, or deleted.

The default auto-vacuum analyze and vacuum settings are sufficient for a small deployment, but the percentage thresholds take longer to trigger as the tables grow larger. Performance degrades significantly before the auto-vacuum vacuuming and analyzing occurs.

What is autovacuum?

Autovacuum is one of the background utility processes that starts automatically when you start PostgreSQL. As you see in the following log, the postmaster (parent PostgreSQL process) with pid 2862 has started the autovacuum launcher process with pid 2868. To start autovacuum, you must have the parameter autovacuum set to ON. In fact, you should not set it to OFF in a production system unless you are 100% sure about what you are doing and its implications.


UPDATE in PostgreSQL would perform an insert and a delete. Hence, all the records being UPDATED have been deleted and inserted back with the new value.


VACUUM in PostgreSQL

As per above statement ,every such record that has been deleted but is still taking some space is called a dead tuple. Once there is no dependency on those dead tuples with the already running transactions, the dead tuples are no longer needed.

Thus, PostgreSQL runs VACUUM on such Tables. VACUUM reclaims the storage occupied by these dead tuples. The space occupied by these dead tuples may be referred to as Bloat. VACUUM scans the pages for dead tuples and marks them to the freespace map (FSM). Each relation apart from hash indexes has an FSM stored in a separate file called <relation_oid>_fsm.

=# select oid from pg_class where relname = ‘employee’;
oid
— — — -
24613
(1 row)


Upon VACUUM, this space is not reclaimed to disk but can be re-used by future inserts on this table. VACUUM stores the free space available on each heap (or index) page to the FSM file.

Running a VACUUM is a non-blocking operation. It never causes exclusive locks on tables. This means VACUUM can run on a busy transactional table in production while there are several transactions writing to it.

As we discussed earlier, an UPDATE of 10 records has generated 10 dead tuples. Let us see the following log to understand what happens to those dead tuples after a VACUUM.

# VACUUM scott.employee ;
VACUUM
# SELECT t_xmin, t_xmax, tuple_data_split(‘scott.employee’::regclass, t_data, t_infomask, t_infomask2, t_bits) FROM heap_page_items(get_raw_page(‘scott.employee’, 0));
t_xmin | t_xmax | tuple_data_split
— — — — + — — — — + — — — — — — — — — — — — — — — — — — — —
| |
| |
| |
| |
| |
| |
673 | 0 | {“\\x01000000”,”\\x0b61766969",”\\x01000000"}
673 | 0 | {“\\x02000000”,”\\x0b61766969",”\\x01000000"}
673 | 0 | {“\\x03000000”,”\\x0b61766969",”\\x01000000"}
673 | 0 | {“\\x04000000”,”\\x0b61766969",”\\x01000000"}
673 | 0 | {“\\x05000000”,”\\x0b61766969",”\\x01000000"}
673 | 0 | {“\\x06000000”,”\\x0b61766969",”\\x01000000"}
673 | 0 | {“\\x07000000”,”\\x0b61766969",”\\x01000000"}
673 | 0 | {“\\x08000000”,”\\x0b61766969",”\\x01000000"}
673 | 0 | {“\\x09000000”,”\\x0b61766969",”\\x01000000"}
673 | 0 | {“\\x0a000000”,”\\x0b61766969",”\\x01000000"}
(20 rows)

In the above log, you might notice that the dead tuples are removed and the space is available for re-use. However, this space is not reclaimed to filesystem after VACUUM. Only the future inserts can use this space.

VACUUM does an additional task. All the rows that are inserted and successfully committed in the past are marked as frozen, which indicates that they are visible to all the current and future transactions. We will be discussing this in detail in our future blog post “Transaction ID Wraparound in PostgreSQL”.

VACUUM does not usually reclaim the space to filesystem unless the dead tuples are beyond the high water mark.

The default auto-vacuum analyze and vacuum settings are sufficient for a small deployment, but the percentage thresholds take longer to trigger as the tables grow larger. Performance degrades significantly before the auto-vacuum vacuuming and analyzing occurs.

NOTE : For larger installations (that is, thousands of workloads), we recommend that you customize the settings as follows on individual tables to ensure optimal performance for data access:

  • The scale factor should be set to zero for both vacuum and analyze auto-vacuum settings.
  • The threshold should be set to 1000 for both the vacuum and analyze threshold settings.

What is Postgres Autovacuum?
As you might guess by the name, autovacuum is the same thing as the normal VACUUM command described above, except that it is managed and executed automatically.

Of course you could setup a cronjob that run VACUUM on a daily schedule, however that would not be very efficient and it would come with a lot of downsides such as:

Your database now rely on some external service to work properly.
The database might not need to do a VACUUM and you waste CPU resources for no reason.
The database might be under heavy load with a ton of updates to the data and it will have to keep all of this until your prescheduled job occurs.
The solution is to make sure that Postgres takes responsibility to cleanup its own data whenever its needed. This is what autovacuum is for.

Luckily for us, autovacuum is enabled by default on PostgreSQL. The default settings mean that it will cleanup a table whenever the table has more than 50 dead rows and those rows are more than 20% of the total row count of the table.

It is doing so by spawning an autovacuum worker process on the OS that executes the VACUUM command on a table at a time.

These settings are quite restrictive, imagine if you have a table that store 10 GB of data, a threshold of 20% would mean that it would collect 2 GB of dead rows before it would trigger the autovacuum. 😳 Ouch !

Tuning Autovacuum in PostgreSQL

We need to understand that these are global settings. These settings are applicable to all the databases in the instance. This means, regardless of the table size, if the above formula is reached, a table is eligible for autovacuum vacuum or analyze.

Is this a problem?

Consider a table with ten records versus a table with a million records. Even though the table with a million records may be involved in transactions far more often, the frequency at which a vacuum or an analyze runs automatically could be greater for the table with just ten records.

Consequently, PostgreSQL allows you to configure individual table level autovacuum settings that bypass global settings.

ALTER TABLE scott.employee SET (autovacuum_vacuum_scale_factor = 0, autovacuum_vacuum_threshold = 100);

 

SELECT n_tup_ins as “inserts”,n_tup_upd as “updates”,n_tup_del as “deletes”, n_live_tup as “live_tuples”, n_dead_tup as “dead_tuples”
FROM pg_stat_user_tables
WHERE schemaname = ‘scott’ and relname = ‘employee’;
inserts | updates | deletes | live_tuples | dead_tuples
— — — — -+ — — — — -+ — — — — -+ — — — — — — -+ — — — — — 
30 | 40 | 9 | 21 | 39
(1 row)

How many autovacuum processes can run at a time?

There cannot be more than autovacuum_max_workers number of autovacuum processes running at a time, across the instance/cluster that may contain more than one database. Autovacuum launcher background process starts a worker process for a table that needs a vacuum or an analyze. If there are four databases with autovacuum_max_workers set to 3, then, the 4th database has to wait until one of the existing worker process gets free.

Before starting the next autovacuum, it waits for autovacuum_naptime, the default is 1 min on most of the versions. If you have three databases, the next autovacuum waits for 60/3 seconds. So, the wait time before starting next autovacuum is always (autovacuum_naptime/N) where N is the total number of databases in the instance.

Does increasing autovacuum_max_workers alone increase the number of autovacuum processes that can run in parallel?
NO. This is explained better in the next few lines.

Is AUTO VACUUM IO intensive?

Autovacuum can be considered as a cleanup. As discussed earlier, we have 1 worker process per table. Autovacuum reads 8KB (default block_size) pages of a table from disk and modifies/writes to the pages containing dead tuples. This involves both read and write IO. Thus, this could be an IO intensive operation, when there is an autovacuum running on a huge table with many dead tuples, during a peak transaction time. To avoid this issue, we have a few parameters that are set to minimize the impact on IO due to vacuum.

The following are the parameters used to tune autovacuum IO

  • autovacuum_vacuum_cost_limit : total cost limit autovacuum could reach (combined by all autovacuum jobs).Set the maximum limit value of cost in automatic vacuum operation. The default is -1, at this time the value of vacuum_cost_limit is used. The setting of this parameter can modify the postgresql.conf configuration file, and can also be set by commands. This setting can cover each table through the pg_autovacuum project
  • autovacuum_vacuum_cost_delay : autovacuum will sleep for these many milliseconds when a cleanup reaching autovacuum_vacuum_cost_limit cost is done.
  • vacuum_cost_page_hit : Cost of reading a page that is already in shared buffers and doesn’t need a disk read.
  • vacuum_cost_page_miss : Cost of fetching a page that is not in shared buffers.
  • vacuum_cost_page_dirty : Cost of writing to each page when dead tuples are found in it.
  • autovacuum_freeze_max_age (integer ):
    In order to prevent the reset of the transaction ID, before the VACUUM force operation, set the maximum value of the pg_class .relfrozenxid field of the table. The default is 200 million.This parameter can only be set at startup.
  • autovacuum_analyze_scale_factor (floating point): When
    triggering the ANALYZE judgment, set autovacuum_vacuum_threshold to add the table capacity fragment. The default is 0.1 (ie 10%).The setting of this parameter can modify the postgresql.conf configuration file, and can also be set by commands. This setting can cover each table through the pg_autovacuum project.

Consider autovacuum VACUUM running on the table users.employee.

Let’s imagine what can happen in 1 second. (1 second = 1000 milliseconds)

In a best-case scenario where read latency is 0 milliseconds, autovacuum can wake up and go for sleep 50 times (1000 milliseconds / 20 ms) because the delay between wake-ups needs to be 20 milliseconds.

1 second = 1000 milliseconds = 50 * autovacuum_vacuum_cost_delay

Since the cost associated per reading a page in shared_buffers is 1, in every wake up 200 pages can be read, and in 50 wake-ups 50*200 pages can be read.

If all the pages with dead tuples are found in shared buffers, with an autovacuum_vacuum_cost_delay of 20ms, then it can read: ((200 / vacuum_cost_page_hit) * 8) KB in each round that needs to wait forautovacuum_vacuum_cost_delay amount of time.

Thus, at the most, an autovacuum can read : 50 * 200 * 8 KB = 78.13 MB per second (if blocks are already found in shared_buffers), considering the block_size as 8192 bytes.

If the blocks are not in shared buffers and need to be fetched from disk, an autovacuum can read : 50 * ((200 / vacuum_cost_page_miss) * 8) KB = 7.81 MB per second.

All the information we have seen above is for read IO.

Now, in order to delete dead tuples from a page/block, the cost of a write operation is : vacuum_cost_page_dirty, set to 20 by default.

At the most, an autovacuum can write/dirty : 50 * ((200 / vacuum_cost_page_dirty) * 8) KB = 3.9 MB per second.

Generally, this cost is equally divided to all the autovacuum_max_workers number of autovacuum processes running in the Instance. So, increasing the autovacuum_max_workers may delay the autovacuum execution for the currently running autovacuum workers. And increasing the autovacuum_vacuum_cost_limit may cause IO bottlenecks. An important point to note is that this behavior can be overridden by setting the storage parameters of individual tables, which would subsequently ignore the global settings.

ostgres=# alter table users.employee set (autovacuum_vacuum_cost_limit = 500);

ALTER TABLE

postgres=# alter table users.employee set (autovacuum_vacuum_cost_delay = 10);

ALTER TABLE

postgres=#

postgres=# \d+ users.employee

Table “users.employee”

Column | Type | Collation | Nullable | Default | Storage | Stats target | Description

— — — — + — — — — -+ — — — — — -+ — — — — — + — — — — -+ — 

id | integer | | | | plain | |

Options: autovacuum_vacuum_threshold=10000, autovacuum_vacuum_cost_limit=500, autovacuum_vacuum_cost_delay=10


Thus, on a busy OLTP database, always have a strategy to implement manual VACUUM on tables that are frequently hit with DMLs, during a low peak window. You may have as many parallel vacuum jobs as possible when you run it manually after setting relevant autovacuum_* settings. For this reason, a scheduled manual Vacuum Job is always recommended alongside finely tuned autovacuum settings.

No comments:

Post a Comment

Master and Slave - Sync check - PostgreSQL

  1) Run the below Query on Primary:- SELECT     pid,     usename,     application_name,     client_addr,     state,     sync_state,     sen...