Mark works as a developer for a company that uses a database to store customer data. One day, he received a complaint from a customer that their account was slow to load and the website was unresponsive.
Mark investigated the issue and found that the customer_data
table, which stores information about all the company's customers, had grown to a massive size over time even though the real data that in this table is not that large. The table contained millions of rows, and many of them were dead rows that were not marked as free.
Mark realized that this was causing a problem called “bloat”, where the size of the table grows over time due to the accumulation of dead space. This was slowing down the database queries and causing the website to be unresponsive.
Are there any ways to help Mark solve this problem?
Fortunately, PostgreSQL offers a command called VACUUM that scans tables and frees space taken up by dead tuples. However, determining when and how often to run this command can be challenging. If the command is run too infrequently, tables may grow larger than desired, as Mark discovered. On the other hand, running the command too often can lead to other issues, such as increased overhead and table fragmentation, which can also negatively impact performance.
So what should we do then?
Luckily, the PostgreSQL developer has already considered this situation and has come up with a process that will run in the background and monitors the activity on the tables in the database. When certain thresholds are met, such as a certain percentage of dead rows or a certain amount of changes to the table, it will initiate a vacuum or analyze operation on the table. It is called AUTOVACUUM
.
What is autovacuum
In PostgreSQL, AUTOVACUUM
is a process that automatically reclaims storage space and optimizes the performance of database tables. When a row is deleted or updated in a table, the old version of the row becomes a dead row that takes up space in the table until it is reclaimed by a vacuum operation. Autovacuum periodically scans tables to remove dead rows and free up space, thereby preventing the table from becoming bloated and slowing down queries.
The AUTOVACUUM
process consists of two separate processes: AUTOVACUUM VACUUM
and AUTOVACUUM ANALYZE
.AUTOVACUUM VACUUM
is like the original vacuum, reclaiming storage space taken by dead tuples. AUTOVACUUM ANALYZE
is responsible for updating table statistics to ensure that queries are optimized for performance.
How to enable autovacuum
The autovacuum
is enabled by default, but there are some ways to check whether it is enabled or not, and if it is disabled, we can re-enable it:
Check if autovacuum
is enabled in your PostgreSQL:
SHOW autovacuum;
If autovacuum
is enabled, you will see a result like this:
autovacuum
--------------
on
Turn on autovacuum
:
If autovacuum
is not enabled, you can enable it by modifying the postgresql.conf
file. Locate the postgresql.conf
file in your PostgreSQL installation directory (e.g., /etc/postgresql/12/main/postgresql.conf) and open it in a text editor.
Find the line that starts with "autovacuum" and make sure it is set to "on". The default value is "on", so if it's set to "off", change it to "on".
autovacuum = on
track_counts = on
The track_counts is a configuration parameter that enables or disables the tracking of statistics on the number of inserts, updates, and deletes on each table in the database. Autovacuum uses that to determine when to run a vacuum operation on a certain table.
How to log autovacuum
Now after knowing how to turn on/off the autovacuum
, you might want to ask how can I print out the log information of autovacuum
?
To do that, you can go to postgresql.conf
file again, find the ‘log_autovacuum_min_duration’ parameter:
log_autovacuum_min_duration = 1000
This parameter controls the minimum duration of any SQL statement that will be logged, including autovacuum
statements. The default value is -1, which means that all statements will be logged. If you want to log only statements that take longer than a certain duration, you can set this value to the desired threshold in milliseconds.
In the example above, PostgreSQL will log the statement that takes longer than 1000 ms.
When to run autovacuum
vacuum/analyze?
Vacuuming will be required when the number of dead tuples exceeds a certain threshold, more specifically it depends on the equation below:
num_of_dead_tuple >= autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * number of tuples
Let’s understand these parameters in detail:
autovacuum_vacuum_scale_factor: In PostgreSQL, autovacuum_scale_factor is a configuration parameter that determines the threshold at which the autovacuum process will trigger a table. It is a floating-point value and its default value is 0.2, which means that autovacuum will trigger when the table has had at least 20% of its rows modified or deleted.
autovacuum_vacuum_threshold: it is a configuration parameter in PostgreSQL that specifies the minimum number of dead rows that must exist in a table before the autovacuum process will initiate a VACUUM operation to reclaim the space used by those dead rows. By default, autovacuum_vacuum_threshold
is set to 50, which means that the autovacuum process will initiate a VACUUM operation if the number of dead rows in a table exceeds 50 or more rows.
Similarly, the autovacuum analysis equation is the same:
num_of_inserts/deletes/updates >= autovacuum_analyze_threshold + autovacuum_analyze_scale_factor * number of tuples
Also, the same as above:
autovacuum_analyze_scale_factor: This parameter is a fraction of the table records. The default value of it is 0.1, which means, only after the number of modified tuples is larger than 10% of the table size, will the autovacuum analyze operation initiate.
autovacuum_analyze_threshold: It is a parameter that defines the minimum number of row inserts, updates, and deletes that must occur in a table before initiating the ANALYZE operation. The default value of it is also 50.
Thanks for reading!
No comments:
Post a Comment